Fin Eng
Fin Eng
S. Dharmaraja
Aparna Mehra
Reshma Khemchandani
α
Alpha Science International Ltd.
Oxford, U.K.
Financial Mathematics
An Introduction
516 pgs. | 48 figs. | 23 tbls.
Suresh Chandra
S. Dharmaraja
Aparna Mehra
Department of Mathematics
Indian Institute of Technology
Hauz Khas, New Delhi
Reshma Khemchandani
RBS Global Banking and Markets
DLF Building, 7, Cybercity
DLF Phase 3, Sector 25 A
Gurgaon, Haryana
Copyright © 2013
ALPHA SCIENCE INTERNATIONAL LTD.
7200 The Quorum, Oxford Business Park North
Garsington Road, Oxford OX4 2JZ, U.K.
www.alphasci.com
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior written permission of the publisher.
ISBN 978-1-84265-654-9
E-ISBN 978-1-78332-002-8
Printed in India
viii Preface
Preface ix
xii Contents
Contents xiii
xiv Contents
Contents xv
1
Financial Mathematics: An Overview
1.1 Introduction
The word financial mathematics refers to the application of mathematics to study
the problems arising in the area of finance. Here the word mathematics has been
used in a wider sense so as to include subjects like probability and statistics,
stochastic processes, optimization, econometrics, numerical analysis and partial
differential equations. Also the scope of the word finance could be very broad but
here it should be understood in terms of investment of money for the purpose of
receiving more money (hopefully!!) at sometime in future. Thus financial mathe-
matics could also be termed as investment science or investment theory.
Though investment as an art form has always been there, it developed as sci-
ence/theory due to the pioneering work of Fisher Black, Myron Scholes, Robert
Merton, Harry Markowitz, William Sharpe and John Lintner amongst many oth-
ers. In fact, Merton and Scholes got Nobel Prize in Economics (in 1997) for their
work on pricing of option carried out during late 60’s and early 70’s. Black could
not share this prize as he died of illness before 1997. Earlier Markowitz along with
Sharpe (Markowitz’s Ph.D. student) were awarded Nobel Prize (in 1990) for their
work on portfolio optimization carried out during late 50’s and early 60’s. They
shared Nobel Prize with another famous economist Merton Miller.
The above extra ordinary theoretical developments had great impetus on re-
search in finance. This together with globalization of investment activities and
tremendous power of computing technology has made the financial mathematics
area intellectually very fertile. There is a significant growth in theory as well as
in applications in the areas of derivative pricing, portfolio management, interest
rate modeling, credit risk methodologies and algorithmic trading.
The aim of this chapter is to introduce basic notions and assumptions of a
simple financial market model so as to facilitate the presentation of some simple
2 Financial Mathematics: An Introduction
Thus when we refer to the return of an asset we really mean the rate of return of
that asset.
Let an investor hold x shares of stock and y units of bond at t = 0. The pair
(x, y) is called a portfolio and it is denoted by P : (x, y). The value of the portfolio
P : (x, y) at time t = 0 is given by
VP (0) = x S(0) + y B(0).
Similarly the value of the portfolio at time instant t = 1 is given by
VP (1) = x S(1) + y B(1).
The quantity
Vp (1) − Vp (0)
rP = ,
Vp (0)
is called the rate of return of the portfolio P : (x, y). We note that rP is a random
variable because VP (1) is so.
The units of B(0), B(1), S(0) and S(1) are units of money and here they are
always taken in Rupees. Thus B(0) = 100 etc. will always mean B(0) = Rs 100 etc.
We shall follow this convention throughout the book.
Since the portfolio return rP is a random variable we can talk of its expected
value and variance. The quantity E(rP ) is called the expected return of the portfolio
P : (x, y) and the standard deviation σP of rP is called its risk. Obviously if we are
given a choice between two portfolios with the same expected return, we should
choose the one for which the risk is least. In a similar manner, if the risk levels of
two portfolios are same we should choose the one for which return is maximum.
This is a typical problem of portfolio optimization. We shall have opportunity
of discussing a general n-asset problem of portfolio optimization in context of
mean-variance theory of Markowitz in later chapters of the book. The concept of
risk itself has been a topic of research in the recent past. Is standard deviation
or variance the correct measure of risk? Are there some other better definitions
of risk? All these questions need to be answered and we plan to discuss them in
greater detail in a later chapter.
Example 1.2.1 Let B(0) = 100, B(1) = 110, S(0) = 80 and
100, with probability 0 · 8
S(1) =
60, with probability 0 · 2.
What is the expected return on the stock? Consider the portfolio P : (x = 50, y =
60). Determine the expected return and the risk for the given portfolio.
4 Financial Mathematics: An Introduction
Solution The return on bond is obviously 10%. But for the stock
⎧
⎪
⎪ 100 − 80
⎪
⎪ , with probability 0 · 8
⎪
⎪ 80
⎨
rS = ⎪
⎪
⎪
⎪ 60 − 80
⎪
⎪
⎩ , with probability 0 · 2,
80
i.e.
0 · 25, with probability 0 · 8
rS =
−0 · 25, with probability 0 · 2.
Therefore E(rS ) = (0 · 25)(0 · 8) + (−0 · 25)(0 · 2) = 0 · 15.
Also
(50 × 100) + (60 × 110), with probability 0 · 8
VP (1) =
(50 × 60) + (60 × 110), with probability 0 · 2
11, 600, with probability 0 · 8
=
9, 600, with probability 0 · 2.
Therefore
VP (1) − VP (0)
rP =
VP (0)
0 · 16, with probability 0 · 8
=
−0 · 04, with probability 0 · 2.
We have defined the standard deviation of the portfolio as its risk. Therefore the
risk of given portfolio is
2
σP = E rp − E(rp ) = (0 · 16 − 0 · 12)2 × (0 · 8) + (−0 · 04 − 0 · 12)2 × (0 · 2)
= 0 · 08.
Certain Assumptions 5
Assumption 4 (Solvency)
The wealth of the investor should never be negative, i.e. VP (t) ≥ 0 at all times t.
For the single period case it simply means that VP (0) ≥ 0 and VP (1) ≥ 0.
A portfolio (x, y) satisfying the solvency property is termed as an admissible
portfolio.
Short selling is generally considered to be extremely risky because in short
selling there are possibilities of unlimited losses. Let the borrowed asset be sold
for an amount X0 at t = 0 and later at t = 1 it be purchased for an amount X1
so as to close the short position. If X1 is less than X0 then a profit of (X0 − X1 ) is
made. In other words short selling is profitable if the asset price goes down. But
if the asset value increases then the loss is (X1 − X0 ), which can be arbitrary large
since X1 can increase arbitrary. Because of this not all financial institutions allow
short selling. In some institutions it may be allowed only on some assets and that
too with certain restrictions.
Let us consider an example in this regard. Let S(0) = Rs 10 and let an investor
decide to short 100 shares of this stock at t = 0. For this he/she borrows 100
shares from a broker and sells these in the stock market to receive an amount
of Rs 1000. Let at t = 1, the stock price S(1) become Rs 9, so the investor buys
back 100 shares from the market at Rs 900 and gives these shares to the broker
to close the short position. This has been a favorable position to the investor as
stock price has gone down. In fact the investor has made a profit of Rs 100 due
to short selling. If the price had gone up, the investor would obviously had made
losses.
Now imagine another investor who has not gone for short selling but rather has
purchased 100 shares in the beginning at t = 0. Imagine that this investor has to
sell these shares at t = 1 (the problem being a single period only). This will entail
a loss of Rs 100 to this investor. The rate of return on the asset for this investor
will be
900 − 1000
r= = −0 · 10
1000
i.e. −10%.
This example clearly illustrates that short selling allows an investor to convert
a negative rate of return into a profit. This is because the original investment
of the investor is also negative. It gives a profit of (original investment)×(rate of
return), i.e. (−1000)(−0 · 10) = 100.
8 Financial Mathematics: An Introduction
where VP,1 (ωi ) refers to the value of the random variable VP (1) when the state of
economy is ωi (i = 1, 2, . . . , m). In a similar manner
To see that it should be true, suppose that VP (0) > VQ (0). Then at t = 0
an investor can sell the costlier portfolio (Portfolio P) and purchase the cheaper
portfolio (Portfolio Q), pocketing the difference. At t = 1, no matter what the state
of economy is in, the investor receives the common final value of the portfolios.
Thus the investor looses nothing at the end and made profit. But this violates the
principle of no arbitrage. Hence VP (0) cannot be more than VQ (0). In a similar
manner we can show that VQ (0) cannot be more than VP (0). Hence VP (0) = VQ (0).
The law of one price has proved to be a very effective tool in pricing derivative
securities, which we shall discuss in detail in later chapters.
Apart from the above main assumptions it also assumed that there are no
commissions or transaction costs and the lending rate is equal to the borrowing
rate.
Forward Contracts
A forward contract is a derivative security whose underlying is a stock and it has
the below given basic characteristics.
Derivative Securities 11
(i) It is an agreement to buy or sell the stock at a specified future time, called
delivery date, for a fixed price F, called the forward price which is agreed at
t = 0. Obviously for the single period model, delivery date has to be t = 1.
(ii) An investor who agrees to buy the stock is said to enter into a long forward
contract or take a long forward position.
(iii) An investor who agrees to sell the stock is said to enter into a short forward
contract or take a short forward position.
(iv) No money is paid at t = 0 when a forward contract is exchanged.
(v) A forward contract guarantees that the stock will be bought (for long position)
or sold (for short position) for forward price F at the delivery date.
In general, the party holding the long forward contract will benefit if S(1) > F
and suffer a loss if S(1) < F. Therefore the pay-off of a long forward contract is
S(1) − F, which could be zero, positive or negative. For a short forward position,
the pay-off is obviously F − S(1).
The question which we have to answer is as follows: What should be the forward
price F? Though a formal theoretical formula will be derived in Chapter 2, here
we explain the logic through an example.
Example 1.4.1 Let B(0) = 100, B(1) = 110, S(0) = 50. Determine the forward
price F of a forward contract on the given stock.
Solution Here delivery is one period (from t = 0 to t = 1) which let us take as
one year. As B(0) = 100 and B(1) = 110, the nominal rate of interest is 10%. Since
S(0) = 50, the forward price F should at least be the amount which the holder of
the forward contract would have got by depositing Rs 50 in the bank for one year
with rate of interest 10%. Thus F should be at least Rs 55.
We shall now argue that F should be exactly Rs 55. If possible let F > 55. Then
at t = 0, we construct the portfolio P : (x = 1, y = − 12 , z = −1), where z denotes
the units of forward contracts in the portfolio. The interpretation of the portfolio
P so constructed is that at time t = 0 we borrow Rs 50 as (y = − 12 ), buy the stock
for S(0) = 50 as (x = 1) and enter into a short forward contract as (z = −1) with
forward price F and delivery date t = 1. This is the interpretation of x = 1, y = − 12
and z = −1.This gives
1
VP (0) = x S(0) + y B(0) = (1 × 50) + (− × 100) = 0.
2
Here it may be noted that in VP (0) there is no term corresponding to the short
forward contract (z = −1) because in forward contract there is no exchange of
money at the time when the contract is initialized, i.e. t = 0.
12 Financial Mathematics: An Introduction
Now at t = 1, we close the short forward position by selling the asset for Rs F
and close the risk free position by paying 21 B(1) = 12 × 110 = Rs 55. Therefore at
t = 1, VP (1) = F − 55, which is strictly positive as we have assumed that F > 55.
This clearly violates the no arbitrage principle. Hence F = 55.
Call and Put Options
Similar to forward contracts, the call and put options in their basic form are also
derivative securities whose underlying is stock. An option has the below given
characteristics.
(i) The owner of a call option gets the right to buy stock at a future date (called
maturity or expiration date) at a predetermined price (called strike or exercise
price). In a similar manner, the owner of a put option gets the right to sell
stock at maturity at a predetermined strike price.
(ii) The owner of the option (whether call or put) has the right without any
obligation. Thus at maturity the owner may not exercise the option if he/she
so wishes.
(iii) The person who has bought the option is called its holder and the person
who has sold the option is called its writer.
(iv) As explained in (ii) above, the holder of the option has the right but no
obligation whereas the writer of the option does have a potential obligation.
Thus in case of a call option, the writer is duty bound to sell the stock should
the holder choose to exercise the call option. Similarly in case of a put option,
the writer is duty bound to purchase the stock should the option is exercised
by the holder.
What we have described above are basic characteristics of a European call/European
put option. Here the option can be exercised at maturity only and not any time
before that. Thus for a single period model, maturity has to be t = 1. In contrast,
if the option can be exercised at any time before or at maturity, then it is called
an American option. Thus for a single period model, an American option can be
exercised at t = 1 or at t = 0. Unless we mention it specifically, for us an option
shall always mean a European option.
Since an option confers on its holder a right without any obligation, the holder
needs to pay some amount at t = 0 to have this right. This amount which is paid
at t = 0 when option is bought is called the premium or price of the option. Thus
we need to determine the price of a European call, American call, European put
and American put etc.
Derivative Securities 13
Remark 1.4.1 At first, a call option resembles very much with a long forward
contract because both involve buying stock at a future date for a fixed price in
advance. But the holder of long forward contract is committed for buying the asset
for the fixed price whereas the holder of a call option has the right but no obligation
to do so. For this reason there is no premium in case of a long forward contract
but there is a premium in the case of an option.
Remark 1.4.2 In the case of derivatives e.g. forward contracts, options etc., there
are two parties which get together and set a rule by which one of two parties will
receive a payment from the other party depending upon the value of some financial
variables. Also profit of one party is the loss of other party. This is a typical
situation of a two person zero sum game. Therefore one can visualize many of the
derivative pricing and other finance related problems as game theoretic problems
and use the rich theory available there.
Example 1.4.2 Let B(0) = 100, B(1) = 110, S(0) = 100 and
120, with probability 0 · 8
S(1) =
80, with probability 0 · 2.
Consider a European call option C on the given stock for which the strike price
K = 100 and expiration date is 1 year. Determine C(0), i.e. the call price of this
option.
Solution Here S(1), the stock price at t = 1, takes two values. Denoting these by
u S(0) and d S(0), we obtain u = 1.2 and d = 0.8. Such a stock dynamics is called
a single period binomial model. Here
u S(0), with probability p
S(1) =
d S(0), with probability (1 − p) ,
and then construct a portfolio P : (x, y) of stock and bond such that VP (1) equals
C(1). Then the law of one price tells that the required price of the call has to be
the value of the portfolio at t = 0 ; i.e. C(0) = VP (0).
To compute the pay-off C(1) we argue that the holder will exercise the call
option only when the stock price S(1) is more than K and receive the amount
S(1) − K. If S(1) ≤ K, then it is not beneficial for the holder and therefore he/she
will use his/her right not to exercise the option. Thus
S(1) − K , if S(1) > K
C(1) =
0, if S(1) ≤ K ,
i.e.
C(1) = max ((S(1) − K), 0) = (S(1) − K)+ .
Here for x ∈ R, x+ denotes max(x, 0).
For our problem
20, with probability 0 · 8
C(1) =
0, with probability 0 · 2 .
Next we need to determine x and y such that for the portfolio P : (x, y), VP (1) =
C(1). This gives
x S(1) + y B(1) = C(1) ,
i.e.
120x + 110y = 20
80x + 110y = 0.
1 4
Solving above two equations we get x = and y = − . Hence by the law of one
2 11
price we obtain
Thus the buyer (holder) of the call option has to pay a premium of Rs 13.64
to the seller (writer) of the option.
Derivative Securities 15
Here it must be noted that the above arguments are not valid if the inequality
d < (1 + r) < u is not true. This is because if this inequality is not true, then no
arbitrage principle does not hold, and hence the law of one price is not valid.
Remark 1.4.3 At the time of writing the option S(1) is unknown. Therefore the
first problem of option pricing is to find how much the holder should pay the writer
for an asset (stock) worth (S(1)−K)+ at time t = 1? From the writers point of view
it is imperative to use the premium C(0) in constructing a portfolio (x, y) such that
it generates an amount (S(1) − K)+ at t = 1. This is the problem of hedging the
option and the portfolio (x, y) is called the replicating portfolio because in terms of
pay-off at t = 1 it replicates the given option. Thus while pricing a derivative, two
problems are being solved simultaneously namely the pricing problem for the holder
and hedging problem for the writer. Here again a game theoretic interpretation
could be given so as to interpret these two problems as primal-dual pair.
Remark 1.4.4 On first reading, option may look superfluous in a market because
we are replicating in terms of two basic securities of stock and bond. This is cer-
tainly true for a single period model. But for multi period and continuous time
models, this is not that simple because it involves re-balancing of the replicating
portfolio at every time instant. Therefore options and other derivatives are treated
as separate and independent financial instruments and not something which can
be expressed in terms of stock and bond.
Example 1.4.3 Let B(0), S(0), B(1), S(1), K and T be same as in Example 1.4.2.
Let the option P be a put option. Determine P(0).
Solution It is simple to argue that for the case of put option the pay-off at time
t = 1 is Max ((K − S(1)), 0) = (K − S(1))+ .
Thus
0, with probability 0 · 80
P(1) =
20, with probability 0 · 20.
This gives
120x + 110y = 0
80x + 110y = 20 ,
1 6
i.e. x = − and y = . Therefore,
2 11
1 6
P(0) = x S(0) + y B(0) = − (100) + (100) = 4.54.
2 11
16 Financial Mathematics: An Introduction
Remark 1.4.5 In above examples we have seen that while determining C(0) and
P(0) we have never used the actual probabilities p = 0 · 8 and (1 − p) = 0 · 2. It is
probably not possible to explain its reason at this stage because it has to be explained
in terms of risk neutral probability measure (RNPM) which will be discussed in
Chapter 3 and also in other chapters on option pricing.
In context of stock, we call the owner of the stock as stockholder . The profit
that the company distributes to the stockholders is called dividend. Dividend in
general is not known in advance because it depends upon the company’s profit and
its policy. The difference between the selling price and the initial price is called
capital gain or loss.
r tm
1+ is called the growth factor.
m
18 Financial Mathematics: An Introduction
Example 1.5.1 How long will it take for a sum of Rs 800 attracting simple in-
terest to became Rs 830 if the rate of interest is 9% per year? Also compute the
return on this investment.
Solution From the formula for the simple interest we have
830 = 800(1 + 0 · 09 t) ,
which gives t = 0 · 417 year which is approximately 152 days. Further the return
on this investment will be
V(t) − V(0) 830 − 800
= = 0 · 0375 ,
V(0) 800
i.e. 3 · 75%.
Example 1.5.2 Which will deliver a higher future value after one year, a deposit
of Rs 1, 000 attracting interest at 15% compounded daily, or at 15·5% compounded
semi-annually?
Solution At 15% compounded daily the deposit will grow to
0 · 15 1×365
V(t) = 1000 1 + Rs 1161 · 80 ,
365
after one year. If the interest is compounded semi-annually at 15 · 5%, the value
after one year will be
0 · 155 1×2
V(t) = 1000 1 + Rs 1161 · 01 .
2
So interest at 15% compounded daily will give higher future value.
We next introduce the concept of continuous compounding. Let us recall the
formula
r tm
V(t) = P 1 + ,
m
which can be written as ⎡ ⎤rt
⎢⎢
m ⎥⎥
⎢ r r ⎥⎥
V(t) = P ⎢⎢⎢⎢ 1 + ⎥⎥ .
⎣ m ⎦⎥
Example 1.5.3 It is given that the future value of Rs 950 subject to continuous
compounding will be Rs 1, 000 after half year. Find the interest rate.
Solution The rate r satisfies
e r = 1 + re .
Solution We have
2
0.01
1+ = (1 + re ) ,
2
which gives re = 10 · 25%.
Example 1.5.7 Is daily compounding at 15% preferable to semi-annual com-
pounding at 15 · 5%?
Solution Let the corresponding effective rates be re and re . Then
0.15 365
1 + re = 1 + 1 · 1618 ,
365
and
2
0.155
1 + re = 1 + 1 · 1610.
2
Thus re 16 · 18% and re 16 · 10%. Therefore daily compounding at 15% is
preferable.
We can also express future value in terms of effective rate re because of its
relationship with the growth rate. In fact we can show that
V(t) = P (1 + re )t
for all t ≥ 0. This formula holds both for the periodic compounding and con-
tinuous compounding, but it cannot hold for the case of simple interest. This is
because for the simple interest case, V(t) is a linear function of t, whereas for the
periodic/continuous compounding it is an exponential function of t.
Streams of Payments
An annuity is a sequence of finitely many payments of a fixed amount due at
equal time intervals. Let payment of an amount C are to be made once a year for
n years, the first one is due a year hence. If we assume that annual compounding
applies, then the present value of such a stream of payments is given by
C C C C
+ + + . . . + .
(1 + r) (1 + r)2 (1 + r)3 (1 + r)n
If we denote
22 Financial Mathematics: An Introduction
1 1 1
PA(r, n) = + + ... + ,
(1 + r) (1 + r) 2 (1 + r)n
then the present value of the given stream of payments is (C × PA(r, n)). Here
the number PA(r, n) is called the present value factor for an annuity. Obviously
1 − (1 + r)−n
PA(r, n) = .
r
We define perpetuity as an infinite stream of payments of a fixed amount C
occurring at the end of each year. Therefore the perpetuity can be computed by
C
limn→∞ PA(r, n) × C which equals .
r
But
1 − (1 + 0 · 15)−5
PA(0 · 15, 5) = .
0 · 15
This gives C Rs 298 · 32.
Example 1.5.9 Let the interest rate be 18% and a person can afford to pay
Rs 10, 000 at the end of each year for next 10 years. Determine the amount which
the person can borrow?
Solution The person can borrow the amount PA(0 · 18, 10) × 10000 = Rs 44, 941.
Money Market
By the money market we mean default free securities. Thus money market
consists of risk free bonds like treasury bills and notes and other financial securities
which promise the holder a sequence of guaranteed future payments. Risk free here
means default free i.e. these payments will be delivered with certainty. However in
Some More Terminology 23
practice the risk, even for risk free securities, cannot be completely avoided since
the market prices of such securities may fluctuate unpredictably and there may
be default in payments. But in our discussion here we shall assume that it is a
default free, and hence risk free situation.
As mentioned earlier, the simplest case of a bond is zero coupon bond. Here the
issuing institution (a government, a bank, or a company) promises to exchange
the bond for a certain amount of money F, called the face value, on a given day T,
called the maturity date. A typical zero coupon bond has the face value of Rs 100
(or some other round figure) and maturity date as one year (or multiple of a year).
If an investor buys a bond, he/she becomes its holder and its effect is that he/she
is lending money to the bond writer (a government, a bank, or a company). Here
it must be noted that investor need not always be an individual, it could also be
a financial institution.
Given the interest rate, the present value of such a bond can be computed
easily. Thus if the face value of a bond is F = Rs 100 and maturity date T = 1
year, then for 12% compounding annually we have V(0) = F(1 + r)−1 Rs 89 · 29.
For simplicity, we shall assume that the face value F of the given bond is
Rs 1. Such bonds are called unit bonds. Typically a bond can be sold any time
prior to maturity at the prevailing market price. Let B(t, T) denote the price of
the given unit bond at time t when the maturity is T. Thus B(T, T) = 1 and
B(t, T) = e−r(T−t) . Here we have assumed that r remains constant throughout the
period up to maturity.
But in reality the bonds are freely traded in the market and their prices are
determined by market forces. Therefore the market interest rate (depending upon
the market price of the given bond) may be different from the implied interest
F
rate r = − 1 given by the formula V(0) = F(1 + r)−1 . In fact the implied
V(0)
interest rate should not be constant as it should depend on the trading time t as
well as on the maturity time T. Not only this, but also the bond prices B(t, T) are
determined by (random) market forces. Therefore the implied interest rate should
be taken as a random process r(t, T). We shall have occasion to discuss this aspect
in a later chapter.
As our bond is a unit bond we can treat B(0, T) as the discount factor and
B(0, T)−1 as the growth factor. This observation allows to compute time value of
money without restoring to the corresponding interest rates.
Coupon Bonds
In zero coupon bonds, there is just a single payment at maturity. But in coupon
24 Financial Mathematics: An Introduction
bonds, there is a sequence of payments. These payments consist of the face value
due at maturity, and coupons paid regularly, say annually, semi-annually or quar-
terly. The last coupon becomes due at maturity only. If the interest rate r is
assumed to be constant throughout, then it is simple to compute the price of a
coupon bond by discounting all the future payments. We explain this process with
the help of the next example.
Example 1.5.10 Find the price of a bond with face value Rs 100 and Rs 5 annual
coupons that matures in 4 years, if the continuous compounding rate is 8%.
Solution The price of the given bond is obtained as
A(0)
A(t) = B(t, T) = A(0)ert .
B(0, T)
The investment in bond has a finite time horizon, say T. To extend the position
in the money market beyond T, the investor can invest the amount received at
maturity into a newly issued bond at time T, but maturing at T (T > T). We
explain this by the below given example.
Summary and Additional Notes 25
1.7 Exercises
Exercise 1.1 Let B(0) = Rs 100, B(1) = Rs 110 and S(0) = Rs 80. Also let
100, with probability p = 0.80
S(1) =
60, with probability p = 0.20.
Design a portfolio with initial wealth of Rs 100, 000, split fifty-fifty between stock
and bond. Compute the expected return and the risk of the portfolio so constructed.
Exercise 1.2 Let B(0), B(1) S(0) and S(1) be as specified in Exercise 1.1. Also let
C and P respectively be a European call and European put with K = Rs 100 and
T = 1 year.
(i) Determine C(0) and P(0).
(ii) Find the final wealth of an investment with initial capital of Rs 900 being
invested equally in the given stock, the given call and the given put.
Exercise 1.3 In the data of Exercise 1.2, let there be a minor change. Instead of
B(1) = Rs 110, let B(1) = Rs 120. Find the replicating portfolio (x, y) for the call C.
Are we justified in taking C(0) = x S(0) + y B(0)? Give appropriate mathematical
reason for your answer. Will your answer change if instead of call C we take the
put P ?
Exercise 1.4 Spot an arbitrage opportunity (if it exists) in the following situation.
Suppose that a dealer A offers to buy British pounds in an year from now at a
rate of Rs 79 a pound, while dealer B would sell British pounds immediately at a
rate of Rs 80 a pound. Assume that a rupee can be borrowed at an annual rate of
4% and a British pound can be invested in a bank at 6% annual interest.
Exercises 27
Exercise 1.5 Let B(0) = Rs 100, B(1) = Rs 112, S(0) = Rs 34 and T = 1. Find
the forward price F. Also find an arbitrage opportunity if F is taken to be Rs 38.60.
Exercise 1.6 Assume that a person can afford to pay Rs 10, 000 at the end of
each year. How much the person can borrow if the interest rate is 12% and he/she
wishes to clear the loan in 5 years?
Exercise 1.7 What will be the difference between the value after 1 year of
Rs 10, 000 deposited at 10% compounded monthly and compounded continuously.
How frequently should the periodic compounding be done for the difference to be
less than Rs 1?
Exercise 1.8 Consider a loan of Rs 10, 000 to be paid back in 10 equal instalments
at yearly intervals. The instalments include both the interest payable each year
calculated at 15% of the current outstanding balance and repayment of a fraction
of the loan. Determine the amount of each instalment. What is the amount of
interest included in each instalment and what is the outstanding balance of the
loan after each instalment is paid?
Exercise 1.9 Find the return on a 75-day investment in zero-coupon bonds if
B(0, 1) = Rs 0.89.
Exercise 1.10 After how many days will a bond purchased for B(0, 1) = Rs 0.92
produces a 5% return?
2
Forward and Futures Contracts
2.1 Introduction
A forward contract is probably the simplest of all derivative securities. It has a
simple pricing mechanism and has wide applications, particularly in commodity
and foreign exchange markets. We had already initiated some discussion on for-
ward contracts in Chapter 1 where we had also explained the pricing methodology
through a simple example.
Though futures contracts are very much in the spirit of forward contracts, they
have specially been designed to standardize these contracts so as to eliminate the
risk of default by the party suffering the loss. For this, a process called marking to
market is conceptualized which requires an individual to open a margin account
which is managed by an organized clearing house/exchange.
This chapter continues our earlier discussion on forward contracts and explains
various concepts related with the working of futures contracts. We also describe
very briefly another derivative security called swap which is again very popular in
commodity and foreign exchange markets.
when the agreement for the forward contract is made. Some typical examples of
forward contract could be a farmer wishing to fix the sale price of his/her crops in
advance, an importer arranging to buy foreign currency at a fixed rate in future, a
fund manager wishing to sell stock for a fixed price in future or a country wishing
to import commodities (like wheat, sugar, oil etc.) from another country at a fixed
rate at some specified future date. In these situations, forward contracts become
very handy because they provide an opportunity to hedge against the unknown
future price of the underlying risky asset.
Let t = 0 denote the time when the two parties enter into the specific forward
contract agreement and t = T be the delivery date. Let the agreed forward price be
F(0, T). Here the two arguments in F denote the time t = 0 and t = T respectively.
In case the context is clear and there is no ambiguity, we write F(0, T) simply as
F.
Let us now analyze the two scenarios, namely F(0, T) < S(T) and F(0, T) > S(T).
Let us first consider the case F(0, T) < S(T). In this case the party having the long
forward contract will benefit because it can buy the asset for F(0, T) and sell the
same for the market price S(T), making a profit of S(T) − F(0, T). But the other
party which has taken a short forward position will suffer loss of S(T) − F(0, T)
because it will have to sell below the market price. Fig. 2.1 clearly exhibits these
two positions.
Pay−off0
1
Pay−off 1
0
0
1
0
1
111111111111111
000000000000000
0
1 0
1
000000000000000
111111111111111
0
1 0
1
000000000000000
111111111111111
0
1 0
1
0
1
000000000000000
111111111111111
0
1 0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111
0
1
0
1
0
1
0
1
0
1
01
000000000000000
111111111111111
00
11 111111111
000000000
0
1 0
10
1 0
1
000000000000000
111111111111111
00
11
S(T)−F(0,T) 0
1
000000000000000
111111111111111 01
10 0
1
0
1
000000000000000
111111111111111 0
1
0
1 0
1
0
1
000000000000000
111111111111111
0
1 0
1 0
1
000000000000000
111111111111111
0
1 0
1 0
1
0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111 0
1 0
1
0
1
000000000000000
111111111111111 0
1 0
1
0
1
000000000000000
111111111111111 0
1
0
1 0
1
0
1
000000000000000
111111111111111
0
1 0
1 0
1
0
1
000000000000000
111111111111111
11111111111111111
00000000000000000
0
1
0
1 1
0
000000000000000
111111111111111
000000000000000
111111111111111
0
1
000000000000000
111111111111111
0
1 F(0,T)
0
1
11
00
S(T) Price
0
1
0
1
0
1
0
1
11
00
1111111111111111
0000000000000000
F(0,T) S(T) 1
0
Price
000000000000000
111111111111111
0
1 0
1
000000000000000
111111111111111
0
1 0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111
0
1
000000000000000
111111111111111
11
00
00
11
0
1
0
1
F(0.T)−S(T) 1
0
0
1
0
1
000000000000000
111111111111111 0
1
0
1
000000000000000
111111111111111
0
1 0
1
000000000000000
111111111111111
0
1 0
1
000000000000000
111111111111111
0
1 0
1
0
1 0
1
If F(0, T) > S(T) then these arguments will simply get reversed. In this case the
Forward Price Formula 31
party taking a short forward position will benefit and make a profit of F(0, T)−S(T)
whereas the other party (which has taken a long forward position) will make a
loss of F(0, T) − S(T) and analogue of Fig. 2.1 can be drawn for this case as well.
Therefore in either case the pay-off of a long forward contract at delivery is
S(T) − F(0, T) and that of a short forward contract is F(0, T) − S(T). Sometimes
the contract may be initiated at time t < T rather than t = 0. In this scenario the
pay-offs of a long forward position and a short forward position are S(T) − F(t, T)
and F(t, T) − S(T) respectively where F(t, T) denotes the forward price.
Now at t = T, we sell the asset for F(0, T) and pay the amount (S(0)/d(0, T)) to
clear the loan with interest and thereby close the short position. This gives
which is strictly positive under the stated assumption. Thus there is a strictly
positive amount of risk free profit with zero net investment. This is contrary to
the no arbitrage principle.
Next suppose that F(0, T) < (S(0)/d(0, T)). In this case, at t = 0, we construct
the portfolio Q as follows
(i) short sell one unit of the underlying asset (thus if the underlying asset is stock,
then we borrow one share of stock and sell the same in the market with all
conditions and procedure as specified in short selling),
(ii) invest the proceeds (i.e. the amount S(0) received by selling one unit of the
asset) at the risk free rate up to t = T,
(iii) enter into a long forward contract with delivery as t = T and forward price
F(0, T).
This gives
VQ (0) = S(0) − S(0) = 0.
Now at t = 1, we
(i) cash the risk free investment with interest, i.e. collect cash (S(0)/d(0, T)) ,
(ii) buy the underlying asset for F(0, T) using the long forward position,
(iii) close the short position on the underlying asset by returning the borrowed
unit of asset to the owner.
This gives
VQ (T) = (S(0)/d(0, T)) − F(0, T) ,
which is strictly positive. This again allows a strictly positive risk free profit
with zero net investment contradicting the no arbitrage principle. Thus F(0, T) =
S(0)/d(0, T).
Remark 2.3.1 For the case of constant interest rate r being compounded contin-
uously, we have d(0, T) = e−rT and hence
Remark 2.3.2 If the contract is initiated at some intermediate time t, 0 < t < T,
then d(t, T) = e−r(T−t) and
F(t, T) = S(t) er(T−t) , (2.3)
where r is same as in Remark 2.3.1.
Forward Price Formula 33
Remark 2.3.3 In terms of zero coupon bond prices, the forward price formula
2.1 becomes
S(0)
F(0, T) = , (2.4)
B(0, T)
which is more convenient to use as it does not require the assumption that the
interest rate r is constant.
It is a matter of common knowledge that holding of physical assets like gold,
sugar, oil, wheat etc. entail inventory carrying costs, such as rental for storage
and insurance fees etc. These costs affect the theoretical forward price formula
as given at (2.1). The standard forward price formula (2.1) can be generalized
in several ways to take into consideration the costs of carry and also to include
dividends. We have the below given result in this regard.
Theorem 2.3.2 (Forward Price Formula with Carrying Costs) Let an as-
set carry a holding cost of c(i) per unit in period i (i = 0, 1, 2, . . . (n − 1)). Also let
at t = 0, the price of this asset be S(0) and short selling be allowed. Then
S(0) c(i)
n−1
F(0, T) = + , (2.5)
d(0, T) i=0 d(i, n)
where delivery date is t = T and between t = 0 and t = T there are n periods which
have appropriately been identified as per the given context.
Next we consider the case when the underlying asset (e.g. stock) is dividend
paying.
Theorem 2.3.3 (Forward Price Formula with Dividend) Let an asset be
stored at zero cost and also sold short. Let the price of this asset at t = 0 be S(0)
and a dividend of Rs div be paid at time τ, 0 < τ < T. Then
[S(0) − (div) (d(0, τ))]
F(0, T) = . (2.6)
d(0, T)
For the case of constant interest rate r being compounded continuously, the above
formula becomes
F(0, T) = S(0) − (div) e−rτ erT . (2.7)
Sometimes the asset (stock) may pay dividends continuously at a rate of rdiv >
0. This is called continuous dividend yield. If the dividends are reinvested in the
stock, then an investment in one share held at t = 0 will become e(rdiv )T shares at
t = T. This is very similar to continuous compounding and therefore e−(rdiv )T shares
at t = 0 will give one share at t = T. This observation leads to Theorem 2.3.4.
34 Financial Mathematics: An Introduction
Theorems 2.3.2, 2.3.3 and 2.3.4 are not proved here as their proofs can be
constructed analogous the proof of Theorem 2.3.1. Interested readers may refer to
Luenberger [85] and Capinski and Zastawniak [25] in this regard.
Forward contracts in foreign currency market are very common and we wish
to derive a formula for determining the price of the same. To be specific, let us
consider the two currencies as British pound and US dollars with the latter as the
underlying. The readers may imagine a British importer of US goods requiring
US dollars after t = T. So this British importer may think of taking a forward
contract on US dollars with delivery as t = T to hedge against the fluctuating
exchange rate of British pound versus US dollars.
Let at t = τ, the buying and selling exchange rate be: 1 British pound = P(t)
US dollars. Also let the risk free interest rates for investments in British pounds
and US dollars be rGBP and rUSD respectively. Then the forward price is given by
The above formula gives the agreed exchange rate at t = T, i.e. at t = T, the
British importer will be able to buy F(0, T) US dollars for one British pound.
To prove formula (2.9), let us consider two strategies as described below.
Strategy A
Invest P(0) US dollars at the rate of rUSD until t = T.
Strategy B
Buy one British pound for P(0) US dollars, invest it until t = T at the rate of
rGBP , and take a short position in exp(rGBP · T) British pound forward contract
with delivery as t = T and forward price as F(0, T). This gives
Solution Here S(0) = 18, 000, r = 8% per year and T = 9 months = 3/4 year.
Hence by the forward price formula
F = S(0) erT
= (18000) e(0.08×3/4)
= (18000) e0.06
= Rs 19113.06.
Example 2.3.2 Find the forward price of a non-dividend paying stock traded to-
day at Rs 100, with the continuously compounded interest rate of 8% per year, for
a contract expiring seven months from today.
Solution We have S(0) = 100, r = 8% per year and T = 7/12 year. This gives
5
F(0, 5) = (1.0075)5 (60) + (1.0075)i (.1)
i=1
= Rs 62.79.
Example 2.3.4 Let the price of a stock on 1st April 2010 be 10% lower than it
was on 1st January 2010. Let the risk free rate be constant at r = 6%. Find the
percentage drop of the forward price on 1st April 2010 as compared to the one on
1st January 2010 for a forward contract with delivery on 1st October 2010.
Solution It is convenient to take 1st January 2010 as t = 0. Then 1st October
2010 is 9 months, i.e. 3/4 year. Thus t = 3/4. Also t = τ is 1st April 2010, i.e.
3/12 = 1/4. Therefore using Theorem 2.3.1 we get
36 Financial Mathematics: An Introduction
Here it may be noted that on 1st April 2010 the price of the stock is 10% lower
than that on 1st January 2010, i.e. (0.9)S(0). Now
progress, the value of the forward contract initiated at t = 0 will be changing. Let
at t = τ, its value be f (τ). Is there a relationship connecting f (τ), F(0, T) and
F(τ, T)? The below given theorem is precisely the answer to this question.
Theorem 2.4.1 (Value of a Forward Contract) Let f (τ), F(0, T) and F(τ, T)
be as explained above. Then
where d(τ, T) is the risk free discount factor over the period t = τ to t = T.
Proof. If possible let
VP (τ) = 0.
Next at time t = T, we
(i) close the forward contracts by collecting (or paying, depending upon the sign of
pay-offs) the amounts S(T) − F(0, T) for the long forward position and −S(T) +
F(τ, T) for the short forward position,
(ii) pay back the loan amount with interest, i.e. amount f (τ)/d(τ, T) .
Therefore the value of the portfolio at t = T is
VP (T) = F(τ, T) − F(0, T) − f (τ)/d(τ, T) ,
which is strictly positive and risk free. This violates no arbitrage principle.
Let us now consider the second case, namely f (τ) > [F(τ, T) − F(0, T)] d(τ, T).
In this case our strategy is to construct a portfolio Q at t = τ as follows
(i) sell the forward contract which was initiated at t = 0 for the amount f (τ),
(ii) invest this amount f (τ) risk free from t = τ to t = T,
38 Financial Mathematics: An Introduction
(iii) enter into a long forward contract with delivery time t = T and forward price
as F(τ, T).
Then VQ (τ) = 0 and VQ (T) is given by
f (τ)
VQ (T) = + (S(T) − F(τ, T)) + (F(0, T) − S(T))
d(τ, T)
f (τ)
= + F(0, T) − F(τ, T) ,
d(τ, T)
which is strictly positive and risk free. This is not possible due to no arbitrage
principle.
Example 2.4.1 Let at the beginning of the year, a stock be sold for Rs 45 and risk
free interest rate be 6%. Consider a forward contract on this stock with delivery
date as one year. Find its forward price. Also find its value after 9 months if it is
given that the stock price at that time turns out to be Rs 49.
Solution From (2.2), the initial forward price F(0, 1) is given by
Therefore by Theorem 2.4.1, the value of the forward contract after 9 months is
Also by (2.3)
F(9/12, 1) = S(9/12) e(0·06)(1−9/12) = Rs 49 · 74.
Hence by Theorem 2.4.1,
f (9/12) = Rs 3 · 96.
(v) The futures price satisfy the condition that f (t, T) = S(T). This condition has
to hold because the futures cost of immediate delivery of goods has to be the
market price or spot price.
(vi) It costs nothing to close, open or alter a futures position at any time step
between t = 0 and t = T. This condition can be met if at each time step
n = 0, 1, 2 . . . with nt ≤ T, the value of futures position is zero. For n ≥ 1, this
value is computed after marking to market.
But what is the physical meaning of futures price f (n, T)? Suppose a forward
contract initiated at time t = 0 has forward price F(0, T), where time t = T is the
delivery date. Let the forward price for a new contract initiated at time t = 1 with
delivery date time t = T be F(1, T). Now the clearing house comes into picture.
At the second day it revises all earlier contracts to the new delivery price F(1, T)
and accordingly an investor which holds a long forward contract initiated at t = 0
receives or pays the difference of two prices depending upon if the change in price
reflects a loss or gain. So if F(1, T) > F(0, T), then an investor holding a long
forward contract receives F(1, T) − F(0, T) from the clearing house because at the
delivery he/she has to pay F(1, T) rather than F(0, T). Continuing in this manner
and assuming that the investor stays until maturity, the profit/loss pay-off of a
long futures position will be [F(1, T) − F(0, T)] + [F(2, T) − F(1, T)] + . . . + [F(T, T) −
F(T − 1, T)], which equals S(T) − F(0, T) because F(T, T) = S(T).
The above discussion demonstrates that the pay-off of a futures contract is
the pay-off of the corresponding forward contract, except that the pay-off is paid
throughout the life of the contract rather than at maturity. However there is
no requirement that the investor of a futures contract has to stay till maturity.
He/she can come out of the contract any time by taking the opposite position in
the contract with the same maturity. Thus f (n, t), n = 0, 1, 2, . . . with nt ≤ T are
essentially the forward prices as perceived by the market.
Here it may be noted that f (n, T) are not known and are taken as random
variables dictated by the market. This is because interest rate r is rarely constant
and, in general, it is stochastic in nature.
This process of adjusting the contract is called marking to market. Here an
individual is required to open a margin account with the clearing house. This
account must have a specified amount of cash for each futures contract. In practice
it is about 5 to 10% of the value of futures contract. The margin account is
compulsory for all contract holders whether they have long or short position.
If the price of futures contract increased that day, then the parties having long
position receive an amount which equals (change in price)× (the contract quantity)
which is deposited in their account. The short parties loose the same amount and
Futures Contract 41
therefore this amount is deducted from their account. This process is termed as
marking of accounts to the market which is carried out at the end of each trading
day. This guarantees that both parties of the contract cover their obligations.
Thus each margin account value fluctuates from day to day according to change
in futures price. At the delivery date, delivery is made at the futures contract price
at that time which may be different from the futures price when the contract was
first initiated.
Margin accounts serve a dual role. They serve as accounts to collect or pay
out daily profits and also guarantee that contract holders do not default on their
obligations. If the value of a margin account drops below a pre-defined margin level
(in practice about 75% of the initial margin requirement), a margin call is issued to
the contract holder demanding additional margin. Otherwise futures position will
be closed by taking an equal and opposite position. Also any excess amount above
the initial margin can be withdrawn by the investor. This margin account is totally
managed by the futures clearing house. There are other rules/procedures/practices
for managing the market account but these details are not presented here.
Example 2.5.1 Suppose that the initial margin is set at 10% and the mainte-
nance margin at 5% of the futures price. Suppose for n = 0, 1, 2, 3, 4, the futures
prices are 140, 138, 130, 140 and 150 respectively. Show the working of marking to
market and margin account in a tabular form.
Solution In the below given table, the two columns termed as Margin 1 and
Margin 2, respectively refer to the deposits at the beginning and end of the day.
price increases and Rs 9 is withdrawn leaving a 10% margin. On day 4, the futures
price again goes up, which allows the investor to further withdraw Rs 9. At the
end of day, the investor decides to close the position collecting the balance of the
deposit. The total of all payments is Rs 10 which is the same as the increase in
future price from day 0 to day 4.
for an asset which is non dividend paying. For a dividend paying stock, the above
formula (2.11) could be modified in an obvious manner.
Earlier we have noted that the futures prices are random and dictated by the
market perception. What is important to note here is that if the market futures
prices depart significantly from the values given by formula (2.11), it indicates
that the market does not believe that the interest rate is constant. The difference
between the two values signifies the market’s perception of future interest rate
changes.
Theorem 2.6.1 Let the interest rate r be constant and compounded continuously.
Then
f (0, T) = F(0, T).
Proof. For the sake of simplicity, let us assume that in the entire life of futures
contract (i.e. from t = 0 to t = T), the marking to market is carried out at two
time steps only. Let these be denoted by t = t1 and t = t2 with 0 < t1 < t2 < T. The
general case of n time steps can be dealt with by making suitable modifications
in the arguments.
We now consider two strategies, namely strategy A and strategy B, as follows
Strategy A
At t = 0: we take a long forward position with delivery as time T and forward
price as F(0, T). Also we invest the amount e−rT F(0, T) risk free until time T.
Futures Pricing 43
At t = T: we close the risk free investment and collect the amount F(0, T). We then
use this amount to purchase one share of the asset, and sell the same at market
price S(T).
Therefore at time T our final wealth will be F(0, T) + S(T) − F(0, T) i.e. S(T).
Strategy B
At t = 0: we initiate e−r(T−t1 ) units of long futures position. This does not involve
any cost. Also we invest the amount e−rT f (0, T) risk free until time T.
At t = t1 : we receive (or pay) the amount e−r(T−t1 ) [ f (t1 , T) − f (0, T)] as a re-
sult of marking to market. We invest (or borrow, depending upon to sign)
e−r(T−t1 ) [ f (t1 , T) − f (0, T)] risk free. We further increase our long futures position
to e−r(T−t2 ) units; which does not involve any cost.
At t = t2 : we cash (or pay) e−r(T−t2 ) [ f (t2 , T) − f (t1 , T)] as a result of mark-
ing to market. We invest (or borrow, depending upon the sign) the amount
e−r(T−t2 ) [ f (t2 , T) − f (t1 , T)] risk free until time T. We increase the long futures
position to 1, which does not involve any cost.
At t = T: we collect the risk free investments which are f (0, T)+[ f (t1 , T)− f (0, T)]+
[ f (t2 , T) − f (t1 , T)], i.e. f (t2 , T). We close the futures position, receiving (or paying)
the amount S(T) − f (t2 , T).
Therefore at t = T, the final wealth of strategy B is f (t2 , T)+S(T)− f (t2 , T) = S(T).
Since at t = T, the final wealth of both strategies are same, under the principle
of no arbitrage, the initial wealth needed to initiate Strategy A and Strategy B
should be same. Thus
e−rT f (0, T) = e−rT F(0, T),
i.e.
f (0, T) = F(0, T).
Example 2.6.1 Let the interest rate r be 6% which is compounded continuously.
Let S(0) = Rs 80, and S(1) be the asset price after one day. If the marking to
market of a futures contract initiated on day 0 (i.e. first day of the year) with
delivery in 3 months is zero, then find S(1).
1
Solution We are given that t = T = 3 months= 3
4
year, t = 0 = day 1 = year,
365
r = 6% and S(0) = Rs 80. Hence by formula (2.11)
1 3 3 1
f , − f 0, =S e0.6((3/4)−(1/365)) − S(0) e(0.6×3/4) .
365 4 4 365
44 Financial Mathematics: An Introduction
From the given condition, on day 1 the marking to market is zero, i.e.
1 3 3
f , − f 0, = 0.
365 4 4
This gives
1
S = 80 e0.6/365 ,
365
which is the same as investing Rs 80 for one day with risk free rate of 6%.
2.7 Swaps
There are many investment situations where we desire to transform one cash flow
stream into another by appropriate market activity. For example, a company which
has sold fixed-coupon bonds and is paying fixed interest may wish to switch into
paying the floating rate instead. This can be realized by writing a floating coupon
bond and paying a fixed coupon bond with the same present value. This may be
achieved by a financial instrument (contract/derivate) called swap. Here one party
swaps a series of fixed-level payments for a series of variable level payments. We
can visualize a swap as a series of forward contracts, and hence price the same by
using the concept of forward pricing.
We shall have occasion to discuss interest rate swaps in a later chapter and
therefore, here, we shall introduce a commodity swap only. For this consider the
below given example.
Let us consider an electric power company that has to purchase oil every month
for its power generation facility. If it purchases the oil from the spot market, the
company will experience randomly fluctuating cash flows caused by fluctuating
spot prices. Therefore the company may desire to swap this payment scheme for
the one that is constant. But for this, the company needs to find a counter party
willing to swap. Here the swap counter party agrees to pay to the power company
the amount ((spot price of oil) × (a fixed number of barrels)), and in return the
power company pays to the counter party a fixed price per barrel for the same
number of barrels over the life of the swap. In this way, the variable cash flow
stream is transformed to a fixed cash flow stream which is depicted in Fig 2.2.
Value of Commodity Swap 45
Fixed payments
Swap buyer
Swap seller
(Power company)
(Counter party)
Variable payments
M
V= d(0, i) (F(i) − X) N.
i=1
Hence the value of the swap can be determined by the series of forward prices.
We choose X to make the value V as zero, so that the swap represents an equal
exchange.
46 Financial Mathematics: An Introduction
2.10 Exercises
Exercise 2.1 A share of INFOSYS stock can be purchased at Rs 2, 500 today or
at Rs 2, 850 six months from now. Which of these prices is the spot price, and
which is the forward price?
Exercise 2.2 An investor enters a futures contract on SBI stock at Rs 534 today.
Tomorrow the futures price is Rs 535. How much goes into or out of investor’s
margin account? What will be the answer if tomorrow’s price in equal to Rs 532?
Exercise 2.3 Let B(0) = 100, B(1) = 112, S(0) = 34 and T = 1. Can the forward
price F of the stock can be Rs 38.60? Justify your answer mathematically.
Exercise 2.4 Explain the difference between entering into a long forward contract
with the forward price of Rs 50 and buying a call option with strike price of Rs 50.
You may assume that the other parameters, namely S(0), S(1), r and T remain
same.
Exercises 47
Exercise 2.5 At the beginning of April 1 year, the silver forward price (in Rs/Kg)
were as follows
April 60, 650
July 61, 664
Sept 62, 348
Dec 63, 384
Assume that contracts settle at the end of the given month. The carrying cost of
silver is Rs 2, 000 per kg per year, paid at the beginning of each month. Estimate
the interest rate at that time.
Exercise 2.6 Suppose that interest rate r is constant. Given S(0), find the price
S(1) of the stock after one day such that the marking to market of futures with
delivery in 3 months is zero on that day.
Exercise 2.7 The current price of gold is Rs 25, 000 per 10 gm. The storage cost
is Rs 200 per gm per year, payable quarterly in advance. Assuming a constant
interest rate of 9% compounded quarterly, find the theoretical forward price of
gold for delivery in 9 months.
Exercise 2.8 The April 14, 2012 edition of the Economic Times gives the fol-
lowing listing for the price of a USD
• today: Rs 46.50
• 90 days forward: Rs 47.60
In other words, one can purchase 1 USD today at the price of Rs 46.50. In addition,
one can sign a contract to purchase 1 USD in 90 days at a price to be paid on
delivery of Rs 47.60. Let rIR and rUSD be the nominal yearly interest rates being
compounded continuously. Obtain the value of (rINR − rUSD ).
Exercise 2.9 Suppose that the value of a stock exchange index (say SENSEX) is
16,500, the futures price for delivery in 9 months is 17,100 index points. If the
interest rate r is 8%, find the dividend yield rdiv .
Exercise 2.10 The difference between the spot and future prices is called the basis
b(t, T). Show that as t → T, b(t, T) → 0. Derive an explicit formula for b(t, T) in
a market with constant interest rate r.
3
Basic Theory of Option Pricing-I
3.1 Introduction
We have already seen some examples of European options in Chapter 1. Though
these examples were specific to single step discrete time scenario, they were general
enough to guide the basic principle of pricing methodology. There the strategy had
been to replicate the given option in terms of stock and bond so that at the end of
expiry, the pay-off of the option matches with the value of the replicating portfolio.
Then, by no arbitrage principle, the initial value of the portfolio became the price
of the given option.
The aim of this chapter is to formalize the above pricing methodology and
introduce single and multi-period binomial lattice models for pricing of European
and American call/put options. This discussion is continued in Chapter 4 as well,
where first the Cox-Ross-Rubinstein (C.R.R.) model is presented, and then the cel-
ebrated Black-Scholes formula of option pricing is introduced. The Black-Scholes
formula will be re-visited in Chapter 10 after readers have acquired the necessary
background in stochastic calculus.
Here the term ‘underlying’ has a general meaning. It could be stock, com-
modity, foreign currency, stock index or even interest rate. However, unless it is
otherwise stated, we shall always mean stock option, i.e. the underlying asset is
being taken as stock. Thus a European (stock) call option is a derivative security
whose underlying security is stock. It gives the holder the right, to buy stock under
specified terms as prescribed in Definition 3.2.1. There is absolutely no obligation
for the holder to buy stock, but he/she has got the right to buy if he/she so wishes.
A European put option is exactly similar to a European call option, except that
the word ‘buy’ changes to ‘sell’. Therefore a European put option is a contract
giving the holder the right to sell the underlying asset for the strike price K at the
exercise time T.
Along with the European call/put options another term, namely American
call/put options, is also very commonly used. The basic distinction between a
European option and an American option is that an American option allows exer-
cise at any time before and including the expiry; whereas a European option can
be exercised only at the expiry. Here we may note that the word European and
American refer to two different conventions of exercise rather than having any
geographical significance. Thus the words European and American have become
two standards in option market, referring to two different structures, no matter
where they are issued.
The above discussion suggests that an option can be described by describing
its four basic features. These are as listed below.
(i) The description of the underlying asset.
(ii) The nature of the option - whether a call or a put; a European or an American.
(iii) The strike or exercise price.
(iv) The exercise or expiry date.
Next we try to understand the meaning of the term option pricing. By definition,
the holder of an option gets the right to buy or sell the asset (depending upon
the nature of the option) but has no obligation, so some amount has to be paid
at the time of contract to get this right. This amount is termed as the premium
or price of the option, and the problem of option pricing revolves around the
methodologies to find this price in a fair way.
An option has two sides or parties. The party that grants the option is said
to write the option and the party that obtains the option is said to purchase it.
The party that purchases the option becomes its holder, and it has no risk of
loss other than the original premium paid because it has the right to exercise the
option and has no obligation attached to it. Whereas the party that writes the
Basic Definitions and Preliminaries 51
option has major risk. This is because if the option, say a call, is exercised then
the writer has to arrange for the asset. If the writer does not already own the
asset then it might have to be acquired from the market at a price higher than
the agreed strike price. Similarly in the case of a put option, the writer may have
to accept the asset at a much lower price (namely the strike price) than what
is prevailing in the market. The problem of option pricing is to be fair to both
the parties and determine the fair price of the option under consideration. There
is another terminology used for the writer and the purchaser of the option. The
buyer (purchaser) of the option is said to have taken the long position and the
seller (writer) is said to have taken the short position. Thus we have terms like
long call , short call , long put and short put etc.
The next important concept to understand is the pay-off or the value of the
option at expiry. To fix our ideas, let us consider the case of a European call option
with strike price K and expiry as T. Let S(T) denote the price of the underlying
asset (stock) at the expiry. Then, the holder will not exercise the option at the
expiry if S(T) ≤ K but will certainly exercise the option if S(T) > K. Therefore the
pay-off to the holder is zero for S(T) ≤ K and S(T) − K for S(T) > K. If we now
introduce the notation x+ as
+ x, x>0
x =
0, otherwise,
then the pay-off of the given call option can be written as (S(T) − K)+ .
Definition 3.2.2 (Pay-off of a European Call Option) Let C be a European
call option with specifications as prescribed in Definition 3.2.1. Then
C(T, K, S) = (S(T) − K)+ = Max (S(T) − K, 0) ,
is called the value or the pay-off of the call option C. Here S(T) denotes the price
of the underlying at the exercise time T.
It is simple to define the pay-off or the value of a European put option P.
Obviously the holder of a European put option will exercise the option only when
K > S(T) and get the profit as K − S(T). Therefore the pay-off or the value of the
given European put option is defined as
P(T, K, S) = (K − S(T))+ = Max(K − S(T), 0) .
The left hand side diagram of Fig. 3.1 depicts the pay-off of a European call option
whereas the pay-off of a European put option is depicted in the right hand side
diagram of Fig. 3.1.
52 Financial Mathematics: An Introduction
C(T) P(T)
K S(T) K S(T)
We say that a European call option is in the money, at the money or out of
money depending upon S(T) > K, S(T) = K or S(T) < K. A European put option is
in the money, at the money or out of money depending upon S(T) < K, S(T) = K
or S(T) > K.
We also define the gain of an option. The gain of an option buyer is the pay-
off modified by the premium C(0) or P(0) paid for the option. Thus for the call
option C, the gain is (S(T) − K)+ − C(0)erT . Similarly the gain of the buyer of the
put option P is (K − S(T))+ − P(0)erT .
We shall discuss the pay-off corresponding to an American option at a later
place. Also sometimes to make the context specific, we shall write CE or CA (re-
spectively PE or PA ) to identify whether the option is European or American.
However, if only C or P is used, it will always mean a European option.
Though there are many results connecting CE , PE and CA , PA , the following
lemma, called put-call parity is interesting and useful.
Though a formal proof could be given to the above lemma, we present here only
an intuitive argument. Suppose we construct a portfolio by writing and selling one
put and buying one call option, both with the same strike price K and expiry date
T. Now if S(T) ≥ K, then the call will pay S(T) − K and the put will be worthless.
If S(T) < K then the call will be worthless and the writer of the put will need to
pay K − S(T). In either case, the value of the portfolio will be S(T) − K at expiry.
But S(T) − K is also the pay-off of a long forward contract with forward price
K and delivery time T. Therefore by no arbitrage principle, the current value of
the constructed portfolio of options should be that of the forward contract, i.e.
CE (0) − PE (0) should be S(0) − Ke−rT . Thus CE (0) − PE (0) = S(0) − Ke−rT . Fig. 3.2.
depicts this argument.
Call pay-off
Put pay-off
− =
K S(T) K S(T) K S(T)
It is also quite obvious that CE (0) ≤ CA (0) and PE (0) ≤ PA (0). This is because
an American option gives us more freedom to exercise the right and therefore we
have to pay more premium in comparison to a European option. Here of course,
it is being assumed that both American and European options have the same
parameters, T, K and S.
The below given lemma states that for a non-dividend paying stock, the Amer-
ican call option will never be exercised prior to expiry. This being so, it should be
equivalent to the European call option, i.e. CA (0) = CE (0).
Lemma 3.2.2. Let CE and CA be respectively the prices of a European call and
an American call defined over the same stock with price S. Let both these calls
have the same strike price and the same expiry date. Further, let the stock be
non-dividend paying. Then CA (0) = CE (0).
54 Financial Mathematics: An Introduction
Again rather than giving a formal proof, let us give an intuitive argument for
the above lemma. We have already seen that CA (0) ≥ CE (0) because in comparison
to a European call option, an American option gives more right with regard to
the time of exercise of the given option. Also, by using the put-call parity, CE (0) ≥
S(0) − Ke−rT , as PE (0) ≥ 0. Therefore CA (0) ≥ CE (0) ≥ S(0) − Ke−rT . But r > 0 and
hence this last inequality gives CA (0) > S(0) − K.
But CA (0) > (S(0)−K) implies that the price of the given American call option is
greater than the pay-off. Therefore the option should be sold sooner than exercised
at t = 0. Here the choice of t = 0 is for the purpose of reference time and is totally
arbitrary. Hence taking any t < T instead of t = 0, the above arguments show
that the American call option will not be exercised at time t, i.e. the American
call option will not be exercised prior to expiry, and then it is equivalent to the
European call option. It is important to note here that the assumption of stock
being non-dividend paying is crucial for the above assertion. The situation would
be different for dividend paying stock. Moreover, this result does not hold for
American put, i.e. PA (0) is different from PE (0) even when the stock is non-dividend
paying (see Example 3.8.1).
Though the above lemma convinces that for a non dividend paying stock,
CA (0) = CE (0), we attempt to give an independent formal proof of this equality.
This proof does not make use of the result that a non-dividend paying stock, an
American call option will always be exercised at the expiry.
Lemma 3.2.3. Let the stock be non-dividend paying. Then for the same K, T and r
we have CA (0) = CE (0).
Proof. We know that CA (0) ≥ CE (0). Therefore we assume that CA (0) > CE (0)
and then arrive at a contradiction.
At time t = 0, let the investor write and sell an American call so as to get the
amount CA (0). Further let a European call be bought for the amount CE (0) and
the balance CA (0) − CE (0) be invested risk free at the interest rate r.
If the American call is exercised at time t < T, then the investor borrows a share
of stock and sells the same for K to the buyer so as to settle his/her obligation
as writer of the American call option. Further the investor invests this amount K
risk free up to time T. Now at time t = T, the investor exercises his/her European
call option to buy a share of stock for K and closes the short position on stock.
This will result in an arbitrage profit of CA (0) − CE (0) erT + Ker(T−t) − K > 0. If
the American option is not exercised at all, then the investor will end up with
the European call and even an arbitrage profit CA (0) − CE (0) erT > 0. Therefore
CA (0) = CE (0).
Basic Definitions and Preliminaries 55
Lemma 3.2.4. (Put-Call parity Estimate for American Option) Let the
stock be non dividend paying. Then for the same K, T and S, we have
Proof. We shall first prove that S(0) − Ke−rT ≥ CA (0) − PA (0). For this, we assume
that CA (0) − PA (0) − S(0) + Ke−rT > 0, and arrive at some contradiction. Now at
t = 0, we write and sell an American call, buy an American put, buy a share of
stock, and finance the transactions in the money market.
If the holder of the American call chooses to exercise it at time t ≤ T, then we
shall receive K for the share of stock and settle the money market position, ending
up with the put and a positive amount
K + CA (0) − PA (0) − S(0) ert = Ke−rt + CA (0) − PA (0) − S(0) ert
≥ Ke−rT + CA (0) − PA (0) − S(0) ert
> 0.
This argument presumes that CA (0) − PA (0) − S(0) > 0. In case CA (0) − PA (0) −
S(0) < 0, then this much money is borrowed from the bank at t = 0. Then
at t, we get K because of the sold American call. But under our assumption
K > CA (0) − PA (0) − S(0) ert and therefore we close the position in the money
market and still having a positive amount. This violates no arbitrage principle.
Next suppose that CA (0) − PA (0) − S(0) + K < 0. In this case at t = 0, we
write and sell a put, buy a call, sell short one share of stock and finance the
transactions in the money market. If the American put is exercised at t ≤ T,
then we can withdraw K from the money market to buy a share of stock and
close
the short sale. We
shall be left with the call option and a positive amount
−C (0) + P (0) + S(0) ert − K > Kert − K ≥ 0. If the put is not exercised at all,
A A
then we can buy a share of stock for K by exercising the call at time T and close
the short position on stock. On closing the money market position, we shall also
end up with a positive amount. This again contradicts the no arbitrage principle.
56 Financial Mathematics: An Introduction
and
CE (K2 ) − PE (K2 ) = S(0) − K2 e−rt .
If we now subtract the above two equations, we get
Lemma 3.3.3. Let K1 < K2 and 0 ≤ α ≤ 1. Then CE (K) and PE (K) satisfy
(i) CE (αK1 + (1 − α)K2 ) ≤ αCE (K1 ) + (1 − α)CE (K2 )
(ii) PE (αK1 + (1 − α)K2 ) ≤ αPE (K1 ) + (1 − α)PE (K2 ).
This means that CE (K)and PE (K) are convex functions of K.
Proof. If possible let CE (T1 ) > CE (T2 ). We write and sell one option expiring at
time T1 and buy one with the same strike price but expiry as T2 , investing the
balance without risk. If the written option is exercised at T1 , we can exercised
the option immediately to cover our liability. The balance (CE (T1 ) − CE (T2 )) > 0
invested without risk will be our arbitrage profit.
The inequality (ii) can be proved on similar lines.
58 Financial Mathematics: An Introduction
Lemma 3.3.5. Let 0 < x1 < x2 . Further let at time t = 0, S = x1 S(0) and
Ŝ = x2 S(0). Then
(i) CE (S) ≤ CE (Ŝ)
(ii) PE (S) ≥ PE (Ŝ).
Proof. We shall prove the first inequality. If possible let CE (S) > CE (Ŝ). We can
write and sell a call on a portfolio with x1 shares and buy a call on a portfolio with
x2 shares having the same strike price K and exercise time T. Also we invest the
amount CE (S) − CE (Ŝ) risk-free. As x1 < x2 , we have (x1 S(T) − K)+ ≤ (x2 S(T) − K)+ .
If the sold option is exercised at time T, we can exercised other option to cover
our liability. The balance CE (S) − CE (Ŝ) > 0 invested risk-free will be our arbitrage
profit.
and
On subtraction, we get
CE (Ŝ) − CE (S) + PE (S) − PE (Ŝ) = (Ŝ − S),
which gives inequalities (i) and (ii) because both terms on the right hand side are
non-negative.
Results similar to above Lemmas also hold for American options. For these
details we may refer to Capinski and Zastawniak [25].
Pay-off Curves of Options Combinations 59
Bull Spread
Consider a scenario in which the investor expects the stock price to rise and wants
to speculate on that. Obviously the investor should buy a call option say CK1 , with
strike price K1 which is close to the current stock price. Here we have used the
symbol CK1 rather than C(K1 ) for the sake of simplicity. The premium may be
reduced by selling a call option, say CK2 , with strike price K2 > K1 . This strategy
should bring good returns provided the stock price increases are moderate. The
spread of the combination CK1 − CK2 is called the bull spread , which is depicted in
Fig 3.3.
+ =
K1 S(T) K1 K2 S(T) K1 K2 S(T)
Bear Spread
The pay-off curve of the combination CK1 − CK2 with (K1 > K2 ) gives rise to a
bear spread . This is usually employed by an investor who is expecting a moderate
decline in the stock price. The bear spread is depicted in Fig. 3.4.
60 Financial Mathematics: An Introduction
K1
+ = K1
S(T) K2 K1 S(T) K2
S(T)
Butterfly Spread
Let K1 < K2 < K3 . Let CK1 , CK2 and CK3 be call options with strike prices K1 ,
Pay−off of
Pay-off of Ck1 −2CK2
(long position)
K2
+ +
K1 K2 K3 S(T) K1 K3 S(T)
Pay−off of
CK3 Pay−off of
CK1 − 2CK2 + CK3
=
K1 K2 K3 S(T) K1 K2 S(T)
K3
K2 and K3 respectively. Let these calls be defined over the same stock and have
common expiry T. Then the pay-off curve of the combination CK1 − 2CK2 + CK3
gives rise to a butterfly spread. This option combination is used by an investor who
feels that the stock price will generally remain unaltered, i.e. it will not change
significantly. The butterfly spread is depicted in Fig 3.5.
The above discussion suggests that by forming the combinations of options,
any pay-off function can be approximated by a sequence of straight line segments.
In other words, any continuous pay-off function can be made close to pay-off of
an appropriate option combination.
Now similar to pay-off curve, we can also sketch gain curve of a given option
combination. The readers can sketch the gain curves for the above examples in an
obvious manner by utilizing the definition of gain of an option.
Example 3.4.1 Let the sale and purchase of options over the same stock and
having the same expiry be given by the expression
P120 C150
−P100
+ +
100 120 100 120 150
100
C150
+ =
100 120 150 180 100 120 150 180
Let C be the given European call option on the stock with strike price K and
expiry at the end of period (i.e. t = 1). Our aim is to find C(0). Since at t = 1, the
stock price could be either u S(0) (with probability p) or d S(0) (with probability
(1 − p)). We have
Cu = Max(u S(0) − K, 0), with probability p
C(1) =
Cd = Max(u S(1) − K, 0), with probability (1 − p).
Though the value of the risk free asset (say bond) is deterministic, we can take
it as a (degenerate) derivative of the stock-degenerate in the sense that the same
value R is assigned at the end of each arc. Thus we have the following figures
p j4 u S(0) p mm6 R
m u C = Max(u S(0) − K, 0)
jjjj
j mmm eee2
eeeeee
p
S(0) TTTT 1 QQQQQ C(0) YYYYYYYY
TT* Q( Y,
1−p R 1−p
1−p d S(0) Cd = Max(d S(0) − K, 0)
Before we actually derive the pricing formula for the given scenario, let us try
to justify the assumption u > R > d. For this we have below given lemma.
Single Period Binomial Lattice Model for Option Pricing 63
Lemma 3.5.1. If u > R > d does not hold then “no arbitrage principle” is vio-
lated.
Proof.
We first consider
the case R ≥ u > d. Now construct the portfolio P :
1 1
x=− , y= , where B(0) denotes the price of the bond at t = 0. Let
S(0) B(0)
VP (0) and VP (1) respectively denote the value of the portfolio P at t = 0 and
t = 1. Then
1 1
VP (0) = x S(0) + y B(0) = − S(0) + B(0) = 0,
S(0) B(0)
and
VP (1) = x S(1) + y B(1)
⎧
⎪
⎪ 1 1
⎪
⎪ − u S(0) + R B(0), with probability p
⎨ S(0) B(0)
=⎪⎪
⎪
⎪ 1 1
⎩− d S(0) + R B(0), with probability (1 − p)
S(0) B(0)
R − u, with probability p
=
R − d, with probability (1 − p).
But under the assumption R ≥ u > d, (R − u) ≥ 0 and (R − d) > 0. As 0 < p < 1,
VP (1) ≥ 0 with probability 1 and it can take positive value with positive probability
even though VP (0) = 0. This clearly violates the “no arbitrage principle”.
We next consider the case u > d ≥ R, and construct the portfolio Q :
1 1
x= ,y=− and note that VQ (0) = 0. Further VQ (1) ≥ 0 with probability
S(0) B(0)
1 and it can take positive value (u − R) with positive probability, which again
violates the “no arbitrage principle”. Therefore to avoid the arbitrage opportunity,
we need to assume that u > R > d.
Now to find the price C(0) of the call, we need to replicate it in terms of stock
and bond so that at t = 1, the value of this replicating portfolio equals the pay-off
of the call at t = 1. Let the replicating portfolio be RP : (x = a, y = b), where a is
the number of shares of stock and b is the units of bond. Then we need to find a
and b such that VRP (1) = C(1). But
VRP (1) = a S(1) + b B(1)
a u S(0) + b R B(0), with probability p
=
a d S(0) + b R B(0), with probability (1 − p),
64 Financial Mathematics: An Introduction
and
Cu , with probability p
C(1) =
Cd , with probability (1 − p).
In (3.7) we should note that 0 < p̂ < 1 because u > R > d. Hence p̂ can be
considered a probability. This probability p̂ is called the risk neutral probability.
Before we discuss its interpretation and nomenclature, we observe from (3.7) that
1
C(0) = Ep̂ (C(1)), where Ep̂ denotes the expectation under risk neutral probability
R
p̂. The formula
1
C(0) = Ep̂ (C(1)) (3.8)
R
is very general and is valid for any derivative security with “appropriate” modifi-
cations.
In the above derivation, there is no role of B(0). In fact we may take B(0) = 1,
giving B(1) = R. Therefore we can think of bond as cash, as Rs 1 gives Rs (1 + r) =
Rs R after one time period. In this light, formula (3.8) tells that to find the price
C(0) of the call, we should first take the expectation of the pay-off at expiry with
respect to the risk neutral probability, and then discount the same according to
the risk free rate.
Here we should note that risk neutral probability p̂ is different from p. The
actual probability p describes the (stochastic) price movement but the risk neutral
probability p̂ has a totally different interpretation. In fact the actual probability p
enters nowhere in the derivation of C(0). However it gives a motivation to introduce
risk neutral probability. This is because one would invest in stock only if the
expected growth rate of stock is higher than that of the rate at money market
(bank/bond/cash), i.e.
i.e.
1
S(0) < p (u S(0)) + (1 − p) (d S(0)) . (3.9)
R
But
R−d u−R
p̂ (u S(0)) + (1 − p̂) (d S(0)) = (u S(0)) + (d S(0)) ,
u−d u−d
which on simplification gives
1
p̂ (u S(0)) + (1 − p̂) (d S(0)) = S(0). (3.10)
R
Equation (3.10) tells that under risk neutral probability p̂, the mean rate of
growth of stock equals the rate of growth in the money market account which is
risk free. Now if this is the case then the investor must be neutral about the risk
66 Financial Mathematics: An Introduction
and hence the name risk neutral probability. The formula (3.7) is therefore referred
as the risk neutral pricing formula for European call option.
The role of risk neutral probability is very important because it provides fair
opportunity to the writer of the option to hedge his/her risk and thereby provides
a fair price of the call. The hedging problem in the present context is to find an
equivalent replicating portfolio, i.e. a portfolio of stock and bond at t = 0 such
that the value of this portfolio at t = 1 matches with the pay-off of the call. This
process of replicating the option is called the hedging problem, and the replicating
portfolio is called the hedge of the given option.
Remark 3.5.1 Though the actual probabilities p and (1 − p) have not entered
in the pricing formula (3.7), they have played an important role indirectly. Thus
the investor will like to buy a call option provided he/she feels that there is high
probability of the stock price going up at t = 1. Also the agreed strike price K
depends to some extent on these probabilities. This is because if there is a feeling
that at t = 1, the stock price will go up with high value of p then he/she may agree
for a higher value of K.
The above procedure of finding the call price is valid provided we guarantee
that the risk neutral probability measure (RNPM) always exists and is unique.
The below given result tells that under no arbitrage principle RNPM always exists.
Further if in addition, the market is complete, then the RNPM is unique.
Lemma 3.5.2. A risk neutral probability measure exists if and only if no arbitrage
principle holds. Further if the market is complete then, the RNPM is unique.
We shall partly prove this Lemma in Section 3.7 by making use of the concept
of duality in linear programming. Specifically we shall prove that RNPM exists
if and only if, no arbitrage principle holds. The discussion about the uniqueness
of RNPM will be postponed till the concept of completeness of the market is
introduced.
Example 3.5.1 Find the price of a European call option with the given data as
B(0) = 100, B(1) = 110, S(0) = 100, K = 100, T = 1 and
120, with probability 0 · 6
S(1) =
80, with probability 0 · 4.
Will the price change if the probabilities p and (1 − p) are taken as 0.3 and 0.7
respectively?
Single Period Binomial Lattice Model for Option Pricing 67
Solution As B(1) = 110, B(0) = 100, we get r = 10%, i.e. R = 1 + r = 1.1. Further
S(0) = 100, u S(0) = 120 and d S(0) = 80 yield u = 1.2 and d = 0.8. Here the risk
3
neutral probability p̂ is as shown below.
4
R − d 1.1 − 0.8 3
p̂ = = =
u − d 1.2 − 0.8 4
3 1
i.e. 1 − p̂ = 1 − = .
4 4
Also
Cu = Max(u S(0) − K, 0) = Max(120 − 100, 0) = 20
Cd = Max(d S(0) − K, 0) = Max(80 − 100, 0) = 0.
Therefore
1
C(0) = p̂ Cu + (1 − p̂) Cd
R
1 3 1
= (20) + (0)
1.1 4 4
= 13.63.
We exhibit the above details in the form of following tables
t 0 1
t 0 1 C(t) 3
p̂= 4 hh
C (= 20)
h3 u
S(t) hhhh
p
ii4 120 h h
iiii C(0) hhh
100 UUUUU V
U (?) VVVVVVVVV
(1−p) * 80 VV+
(1−p̂)= 1 4 Cd (= 0)
This gives C(0) = Rs 13.64. As call price does not depend on p, it will not change
if 0.6 and 0.4 are changed to 0.7 and 0.3 respectively.
To find the price of a European put option we can follow exactly the same
derivation as for the European call. We need to use pay-off of the put at expiry
instead of the pay-off of the call to get the following pricing formula
1
p̂ Pu + (1 − p̂) Pd ,
P(0) = (3.11)
R
where Pu = Max (K − u S(0), 0) and Pd = Max (K − d S(0), 0).
68 Financial Mathematics: An Introduction
Example 3.5.2 For the data given in Example 3.5.1, find the price of the corre-
sponding European put option.
3 1
Solution We have already obtained p̂ = , (1 − p̂) = and R = 1.1. Further
4 4
Pu = Max (K − u S(0), 0) = 0 and Pd = Max (K − d S(0), 0) = 20. Therefore
1
P(0) = p̂ Pu + (1 − p̂) Pd
R
1 3 1
= ×0 + × 20
1.1 4 4
= 4.54,
t 0 1
P(t)
h
p̂ h4 Pu (= 0)
hhhh
P(0) VVVV
V*
(1−p̂) Pd (= 20)
At this stage we can also verify the put-call parity. For the given data we
have already obtained C(0) =Rs 13.63 and P(0) =Rs 4.54. Hence the value of the
expression C(0) − P(0) + d(0, 1)K comes out to be 13.64 − 4.54 + (1.1)−1 × 100 = 100,
1
which is same as the value S(0). Here d(0, 1) = is the discount factor between
R
t = 0 and t = 1.
t 0 1 2
S(t) 2
hh4 u S(0)
hhhh
ii4
u S(0) VVV
iiii VV+
S(0) UUUU h3 u d S(0)
UU* hhhhh
d S(0) VVVV
VVV*
d2 S(0)
Along the line of the single period case, the table for the call price should be
t 0 1 2
C(t)
kk5 Cuu
kkkk
C
oo7 u NNNNN
oooo N'
C(0)
7 Cud
=? OOOOOO ppppp
' p
Cd SSSS
SS)
Cdd
Here Cuu , Cud and Cdd are pay-offs of the call at the expiry (i.e. at t = 2). As
per our definition, it is obvious that
But what are the values of Cu and Cd ? For this let us re-look the binomial lattice
for the price of the stock. We can visualize this bigger lattice as a combination of
three binomial lattices of the following type
u S(0) 2
hh4 u S(0)
h3 u d S(0)
ii4 h hhhh
iiii hhh
S(0) UUUU u S(0) VVV d S(0) VVVV
UU* VV+ VVV*
d S(0) u d S(0) d2 S(0)
Let above lattices be referred as L(1), L(2) and L(3) respectively. Treating lattice
L(3) as the single period binomial lattice and using formula (3.7), we obtain
1
Cd = p̂ Cud + (1 − p̂) Cdd , (3.12)
R
70 Financial Mathematics: An Introduction
R−d
where as before p̂ is the risk neutral probability given by p̂ = . Similarly
u−d
using lattice L(2) we obtain
1
Cu = p̂ Cuu + (1 − p̂) Cud . (3.13)
R
Finally using lattice L(1) we obtain
1
C0 = p̂ Cu + (1 − p̂) Cd . (3.14)
R
Here the same risk neutral probability p̂ has been used for all three lattices L(1),
L(2) and L(3). The reason being that for these single period lattices the up-tick
probability p, the down-tick probability 1 − p and the factors u and d do not
change - they remain same for t = 0 as well as for t = 1. Now substituting for Cd
and Cu from (3.12) and (3.13), equation (3.14) gives
1 2
C0 = p̂ Cuu + 2 p̂ (1 − p̂) Cud + (1 − p̂) 2
Cdd . (3.15)
R2
Formula (3.15) can also be written as
⎡ ⎤
⎢
⎢
1 ⎢⎢
2 ⎥⎥
2! + ⎥
C0 = 2 ⎢⎢ (p̂) (1 − p̂)
j 2− j
u d S(0) − K ⎥⎥⎥ ,
j 2− j
(3.16)
⎣
R j=0 j!(2 − j)! ⎦
+
where u j d2−j S(0) − K = Max u j d2− j S(0) − K, 0 .
Remark 3.6.1 Here, unlike the single period case, Cu and Cd are NOT the pay-
offs - Max(u S(0) − K, 0) and Max(d S(0) − K, 0) at t = 1, because the call is being
exercised at t = 2 only.
In deriving formula (3.15), we have not made use of the concept of replicating
portfolio directly. This has been possible because we have visualized the given 2-
period binomial lattice as a combination of three 1-period binomial lattices which
have already been studied. But it will be interesting to derive the same formula
by employing the replicating portfolio strategy as well.
Though we are not giving the complete details here, we are giving enough hint
to get the complete solution. In order to have replication at maturity t = 2 we
start replicating backwards, from the end of lattice. For this at t = 1, we consider
the upper node of the lattice L2 and determine the scalars a1 and b1 such that
Multi Period Binomial Lattice Model for Option Pricing 71
a1 u2 S(0) + b1 R = Cuu ,
and
a1 (udS(0)) + b1 R = Cud .
This system can be solved to get
Cuu − Cud
a1 = ,
uS(0)(u − d)
and
Cuu − a1 (u2 S(0))
b1 = .
R
Then Cu = a1 (uS(0)) + b1 becomes the value of the replicating portfolio in the
upper node at t = 1. We can similarly find the value of Cd . Then we need to
replicate these two values in the first period. Thus we need to find a0 and b0 such
that
a0 (uS(0)) + b0 = Cu ,
and
a0 (dS(0)) + b0 = Cd .
The above system can be solved as before to get the values of a0 and b0 . Now
a little manipulation and the expression C(0) = a0 S(0) + b0 R will give the formula
(3.15).
Remark 3.6.2 It is to be noted that in this model, the number of shares in the
replicating portfolio is always equal to the ratio of ∆C and ∆S, where ∆C is the
change in the future values of the call and ∆S is the change in the future values of
the stock. This ratio is called the delta of the call, and it is different at different
nodes of the lattice. Delta is one of the Greeks to be studied later for a derivative
security defined over a given undertaking. Greeks have been used extensively in
derivative pricing for hedging purposes.
Example 3.6.1 Find the price of a European call option with the given data as
S(0) = 100, K = 100, u = 1.2, d = 0.8, r = 10% per year and time to expiry T = 2
years.
Solution We have
72 Financial Mathematics: An Introduction
t 0 1 2
t 0 1 2 C(t)
kk5 Cuu
S(t) i4 144 kkkk
iiiii C
oo7 u NNNNN
ii4 120 UUUUUU ooo N'
iiii * C(0) o
100 UUUUU ii4 96 7 Cud
U* iiii i =? OOOOOO ppp
80 UUUUUU ' ppp
U* Cd SSSS
64 SS)
Cdd
R − d 1.1 − 0.8 3
From the given data R = 1 + r = 1.1 and p̂ = = = . Also
u − d 1.2 − 0.8 4
Cuu = Max(u2 S(0) − K, 0) = Max(144 − 100, 0) = 44
Cud = Max(u d S(0) − K, 0) = Max(96 − 100, 0) = 0
Cdd = Max(d2 S(0) − K, 0) = Max(64 − 100, 0) = 0.
Therefore
1 1 3 1
Cu = p̂ Cuu + (1 − p̂) Cud = × 44 + × 0 = 30
R 1.1 4 4
1 3 1
Cd = ×0 + × 0 = 0,
1.1 4 4
which gives
1
C(0) = p̂ Cu + (1 − p̂) Cd
R
1 3 1
= × 30 + ×0
1.1 4 4
= 20.45.
3 1
Solution We have already obtained p̂ = , (1 − p̂) = and R = 1.1. Further
4 4
Puu = Max(K − u2 S(0), 0) = Max(−44, 0) = 0
Pud = Max(K − u d S(0), 0) = Max(4, 0) = 4
Pdd = Max(K − d2 S(0), 0) = Max(36, 0) = 36.
Therefore
1 3 1 1
Pu = ×0 + ×4 =
1.1 4 4 1.1
1 3 1 12
Pd = ×4 + × 36 = .
1.1 4 4 1.1
This gives
1
P(0) = p̂ Pu + (1 − p̂) Pd
R
1 3 1 1 12
= × + ×
1.1 4 1.1 4 1.1
= 3.10.
2
2
1 3 3 1 1
= ×0 + 2× × ×4 + × 36
1.12 4 4 4 4
= 3.10.
We can also use put-call parity to find P(0), once C(0) is known. Specifically
C(0) − P(0) + d(0, 2)K = S(0),
gives
We can take d(0, 2) = (1 + 2r)−1 or e−2r depending upon the nature of the com-
pounding of the interest rate r; but it has to be same for the entire calculations.
74 Financial Mathematics: An Introduction
The development of the pricing methodology for the case of multi period bino-
mial lattice model is similar to the one discussed for the two period lattice model.
The single period risk free discounting is carried out at every node of the lattice
as has been done for the two period case. Obviously we need to start from the
final period (t = N) and then proceed backward till we reach the initial time
period (t = 0). For the case of European call option, this process will result in the
following formula
⎡ ⎤
⎢
1 ⎢⎢N
N! +⎥⎥⎥
C(0) = N ⎢⎢⎢ (p̂) j (1 − p̂)N− j u j dN− j S(0) − K ⎥⎥⎥ , (3.17)
R ⎣ j!(N − j)!
j=0
⎦
the same underlying (stock), they all will be random variables taking m possible
values.
As a convention, we take k = 0 for bond, k = 1 for the underlying (stock)
and (k = 2, . . . , n) for other (derivative) securities. Let r be the risk free interest
rate for the period t = 0 to t = 1. It is convenient to assume that S(0) 0
= 1 and
(0)
S1 (ω j ) = R = (1 + r), (j = 1, 2, . . . , m).
Definition 3.7.1 (Risk Neutral Probability Measure) A risk neutral prob-
ability measure (RNPM) is a vector p̂ = (p̂1 , p̂2 , . . . , p̂m )T such that
(i) p̂ j > 0 ( j = 1, 2, . . . , m),
m
(ii) p̂ j = 1,
j=1
If we denote by Ep̂ S1(k) the expected value of S(k)
1
with respect to RNPM p̂, then
(3.18) can be written as
1 (k)
S(k)
0
= Ep̂ S1 . (3.19)
R
Here S(k)
1
denotes the value of the kth security at t = 1 and S0(k) is the price of the
same security at t = 0.
Regarding the existence and uniqueness of RNPM we have the following two
main theorems
Theorem 3.7.1 (First Fundamental Theorem of Asset Pricing) A risk
neutral probability measure p̂ exists if and only if no arbitrage principle holds.
Theorem 3.7.2 (Second Fundamental Theorem of Asset Pricing) The
RNPM is unique if and only if the market is complete.
Thus for an arbitrage free market, there is unique RNPM p̂ if and only if the
market is complete. We have not yet discussed the meaning of market completeness
but that we postpone for the time being.
We shall now give a linear programming based proof of Theorem 3.7.1. Let us
recall the following primal-dual pair of linear programming problem
76 Financial Mathematics: An Introduction
(LP) Min cT x
subject to
Ax ≥ b
x ≥ 0,
and
(LD) Max bT y
subject to
AT y ≤ c
y ≥ 0.
It is well known in duality theory that if (LP) and (LD) both are feasible then both
have optimal solutions. The below given theorem, called strict complementarity
theorem, gives some additional information as well.
Theorem 3.7.3 (Goldman-Tucker Theorem) Let (LP) and (LD) both be fea-
sible. Then they both have optimal solutions x∗ (for(LP)) and y∗ (for(LD)) satis-
fying
x∗ + (c − AT y∗ ) > 0. (3.20)
The condition (3.20) is called the strict complementarity condition. Here it may
be noted that Theorem 3.7.3 does not tell that (3.20) holds for every pair (x, y)
of optimal solution of (LP) and (LD). But rather it guarantees the existence of a
pair (x∗ , y∗ ) of optimal solution of (LP) and (LD) for which (3.20) holds. We can
refer to Goldman and Tucker [52] for the proof of Theorem 3.7.3.
We now proceed to prove Theorem 3.7.1. For this let us consider a portfolio
P : (x0 , x1 , . . . , xn ). Then the value of this portfolio at t = 0 is
n
VP (0) = xk S(k)
0
.
k=0
i.e.
n
Min xk S(k)
0
k=0
subject to
n
xk S1(k) (ω j ) ≥ 0 ( j = 1, 2, . . . , m). (3.21)
k=0
subject to
m
S(k)
1
(ω j ) p j = S(k)
0
(k = 0, 1, 2, . . . , n)
j=1
(3.22)
pj ≥ 0 ( j = 1, 2, . . . , m).
We now write the primal-dual pair (3.21) - (3.22) in the form of the pair (LP)-
(LD) and then apply the Goldman-Tucker theorem (Theorem 3.7.3) to this pair.
This implies that there exists x optimal to (3.21) and p optimal to (3.22) such
that
78 Financial Mathematics: An Introduction
n
− S(k)
1
(ω j )xk ≤ 0 ( j = 1, 2, . . . , m), (3.23)
k=0
m
S(k)
1
(ω j ) p j = S(k)
0
(k = 0, 1, 2, . . . , n), (3.24)
j=1
pj ≥ 0 ( j = 1, 2, . . . , m), (3.25)
m n
0.p j = S(k)
0
xk = 0, (3.26)
j=1 k=0
and
⎛ ⎛ n ⎞⎞
⎜⎜ ⎜⎜ (k) ⎟⎟⎟⎟
p j + ⎜⎜⎜⎝0 − ⎜⎜⎜⎝− S1 (ω j ) xk ⎟⎟⎟⎠⎟⎟⎟⎠ > 0 ( j = 1, 2, . . . , m). (3.27)
k=0
But from (3.26), Vp (0) = 0 for the portfolio (xk , (k = 0, 1, 2, . . . , n)). Hence by no
n
arbitrage principle, Vp (1) should be zero with probability 1, i.e. S(k)
1
(ω j ) xk =
k=0
0 ( j = 1, 2, . . . , m). Therefore (3.27) gives p j > 0 ( j = 1, 2, . . . , m).
Now we recall that k = 0 refers to the bond. Taking S(0)
0
= 1 we get S(0)
1
(ω j ) =
1 + r = R ( j = 1, 2, . . . , m). Then (3.24) gives
m
R p j = 1,
j=1
i.e.
m
1
pj = . (3.28)
j=1
R
But then this shows that p̂ = (p̂1 , p̂2 , . . . , p̂ j , . . . , p̂m ) is a risk neutral probability
measure.
Existence of Risk Neutral Probability Measure 79
1 (k)
S(k)
0
= Ep̂ S1 (k = 0, 1, 2, . . . , n), (3.30)
R
where Ep̂ denotes the expectation under RNPM p̂. This proves that if no arbitrage
principle holds then RNPM exists. The converse can be proved on similar lines.
Remark 3.7.1 The formula (3.30) is very general. Apart from the fact that it
holds for the stock and bond, it should hold for other securities (k = 2, 3, . . . , n) as
well, be it European call, put, forward contract etc. What we simply need to do is
to find the expectation of the pay-off of the given security at expiry under RNPM
and discount the same for t = 0.
Remark 3.7.2 For the case of single period binomial lattice model the linear pro-
gramming problem to find RNPM is
1
C(0) = p̂1 Cu + p̂2 Cd ,
R
and
1
P(0) = p̂1 Pu + p̂2 Pd .
R
Example 3.7.1 Consider a forward contract with the given data as S(0) =
100, u = 1.2, d = 0.8, T = 1 year and r = 10% per year. Determine the for-
ward price F.
80 Financial Mathematics: An Introduction
Solution We know that F = RS(0) = (1.1) × 100 = Rs 110. But here we have to
determine F by utilizing the formula (3.30). For this we note that the pay-off of
the forward contract is (S − F) if S > F and −(F − S) if S ≤ F. Also from the given
3 1
data (p̂1 = , p̂2 = ). Therefore formula (3.30) gives
4 4
1 3 1
F(0) = (120 − F) + (80 − F) . (3.31)
1.1 4 4
But in the case of forward contract F(0) = 0, and then (3.31) gives F = 110.
Let us again look at formula (3.29) and concentrate for the case k = 1. We recall
that k = 1 refers to the stock and therefore if we write S10 = S(0) and S11 = S(1)
then
1
S(0) = Ep̂ (S(1))
R
S(1)
= Ep̂
R
S(1)
= Ep̂
B(1)
= Ep̂ ((
S(1)). (3.32)
S(0) = Ep̂ ((
( S(1)) (3.33)
S(k) (l) = Ep̂ (
( S(k) (l + 1)/(
S(k) (l) (k = 0, 1, 2, . . . , n).
Therefore the problem of asset pricing gets translated into the problem of find-
ing a unique RNPM p̂ or to be precise p̂-martingale on the set of scenarios Ω.
Moreover Ep̂ is called the risk neutral or martingle expectation with respect to p̂.
These concepts will probably get much clearer once we are familiar with basics of
stochastic process and stochastic calculus.
Since there are no goods in the model other than money, it is convenient to
pick one of the security as reference and normalize others with respect to it. The
security so chosen for normalization purpose is called numeraire. In our context,
the usual choice of numeraire is the bond price and that is what exactly we have
done in defining (S(0) = S(0)/B(0), (S(1) = S(1)/B(1) etc.
Example 3.7.2 Consider the data: S(0) = 100, u = 1.2, d = 0.9, r = 10% per
year and T = 2 years. Determine RNPM p̂ and show that discounted stock prices
form a p̂-martingale.
Solution We have p̂ = 2
3
and (1 − p̂) = 13 . We have the dynamics of the stock price
as
t t=0 t=1 t=2
S(t) i4 144
iiiii
ii4 120 UUUUUU
iiii *
100 UUUUU iii 4 108
U* iii
90 UUUUUU
U*
81
We can check the following
1 2 1
(144) + (108) = 120
1.1 3 3
1 2 1
(108) + (81) = 90
1.1 3 3
1 2 1
(120) + (90) = 100.
1.1 3 3
These give
)
S(2) S(1) 120 120
Ep̂ = =
(1.1)2 1.1 1.1 1.1
)
S(2) S(1) 90 90
Ep̂ = =
(1.1)2 1.1 1.1 1.1
82 Financial Mathematics: An Introduction
etc. Thus
Ep̂ (S(2)/(
S(1) = (
S(1).
The below given theorem is very general and is valid for any European derivative
security.
Theorem 3.7.4 Let D be a European derivative security whose pay-off in the
N-period binomial model is f (S(N)). Then
1
D(0) = Ep̂ f (S(N)) . (3.35)
RN
The above theorem essentially tells that the price of a European derivative
security D with pay-off f (S(N)) in the N-period binomial model is the expectation
of the discounted pay-off under the risk neutral probably measure.
We know that for a European call, f (S(N)) = (S(N) − K)+ ; for a European put
f (S(N)) = (K − S(N))+ and for a forward contract f (S(N)) = (S(N) − F). Therefore
the same formula (3.35) can be used to price each of these derivative securities.
Example 3.7.3 Use Theorem 3.7.1 to justify the pricing formula (3.15) for the
European call option.
Solution Here N = 2 and the three possible value for f (S(2)) are
Cuu = Max u2 S(0) − K, 0
Cud = Max (u d S(0) − K, 0)
Cdd = Max d2 S(0) − K, 0 .
(R − d)
Also p̂ = . Therefore if we define p̂1 = (p̂)2 , p̂2 = 2p̂(1 − p̂) and p̂3 = (1 − p̂)2
(u − d)
then
(i) p̂i > 0 (i = 1, 2, 3)
3
(ii) p̂i = 1 and
i=1
(iii) Ep̂ (S(2)) = R2 S(0).
The third assertion can be verified as follows
Existence of Risk Neutral Probability Measure 83
Ep̂ (S(2)) = p̂1 u2 S(0) + p̂2 (u d S(0)) + p̂3 d2 S(0)
= S(0) u2 (p̂)2 + 2p̂ (1 − p̂) u d S(0) + (1 − p̂)2 d2 S(0)
2
= S(0) u p̂ + (1 − p̂) d
2
u(R − d) + d(u − R)
= S(0)
(u − d)
2
R(u − d)
= S(0) = R2 S(0),
u−d
i.e.
1
S(0) = Ep̂ (S(2)) .
R2
Therefore p̂i (i = 1, 2, 3) as defined above is in fact RNPM. Hence
1
C(0) = Ep̂ f (S(2))
R2
1
= 2 (p̂)2 Cuu + 2p̂ (1 − p̂) Cud + (1 − p̂)2 Cdd ,
R
as obtained by formula (3.15).
Remark 3.7.3 Apparently we have employed two distinct approaches to price a
given derivative. These are the replicating portfolio approach and the RNPM ap-
proach. Various illustrative examples presented above and also the derivation of
single period binomial lattice model suggest that these two approaches are related.
In fact for a complete market, the two approaches are equivalent. This is because
the replicating portfolio approach uses the law of one price which is essentially a
consequence of no arbitrage principle. This is because then the unique RNPM ex-
ists which can be used to price any contingent claim. For the existence of RNPM,
the market has to be arbitrage free. Therefore if no arbitrage principle does not
hold then we cannot price the given derivative even if the corresponding unique
replicating portfolio exists. The below given example illustrates the point. We shall
further discuss this aspect in Section 3.10.
Example 3.7.4 Let B(0) = 100, B(1) = 120, S(0) = 100 and S(1) take values 120
and 80 with probabilities 0.8 and 0.2 respectively. Let C be a European call with
K = 100 and T = 1 year. Find the replicating portfolio (x, y) for the call C. Are
we justifying in taking C(0) = x S(0) + y B(0)? Determine RNPM if it exists.
84 Financial Mathematics: An Introduction
Therefore if (x, y) is the replicating portfolio, then we have x S(1) + y B(1) = C(1).
This gives the following two equations
120x + 120y = 20
80x + 120y = 0,
having solution as (x = 1/2, y = −1/3). Here we cannot take C(0) = 1/2S(0) −
1/3B(0) because this utilizes the law of one price which is valid only under no
arbitrage principle. But as u = 1.2 and R = 1 + r = 1.2 we do not have the
required condition u > R > d for no arbitrage principle to hold.
To determine the RNPM we need to solve the system
p̂1 + p̂2 = 1
1
120p̂1 + 80p̂2 = 100
1.2
p̂1 > 0, p̂2 > 0.
The above system does not have a solution because the first two equations give
pˆ1 = 1, pˆ2 = 0. Therefore the RNPM does not exist. This is again because u =
1.2 = R which violates the condition u > R > d and therefore no arbitrage principle
does not hold.
Here we may note that for the two period binomial lattice model, f (S(2)) will
have three possible values, depending upon the three possible values of S(2). If D
denote an American put then these values are denoted by PAuu , PAud and PAdd . For
the case of an American call these values are denoted by CAuu , CAud and CAdd . Since
D is a American derivative and the binomial lattice is of two periods, the holder
can exercise his/her option at t = 0, t = 1 or t = 2.
At t = 1, the option holder has a choice. He/She can exercise immediately or
wait until t = 2. If he/she exercises his/her option at t = 1, the pay-off is f (S(1)),
otherwise it is f (S(2)). But is it worth waiting until t = 2 to exercise the option?
This question can be answered by computing the value of waiting. For this we
may treat f (S(2)) as a one step European option to be priced at t = 1. Let as
(R − d)
before p̂ = denote one step RNPM. Then the value of the given derivative
(u − d)
1
at t = 1 is given by Ep̂ ( f (S(2))), where
R
1 1
Ep̂ ( f (S(2))) = p̂ f (u S(1)) + (1 − p̂) f (d S(1)) . (3.36)
R R
Therefore the option holder has to choose the higher of two values f (S(1)) and
1
Ep̂ f (S(2)) as given at (3.36). Thus at t = 1, the given American option is worth
R
the price DA (1), where
1
DA (1) = Max f (S(1)), (p̂ f (u S(1)) + (1 − p̂) f (d S(1))) .
R
In a similar manner we can argue to get
1
D (0) = Max f (S(0)), (p̂ f1 (u S(0)) + (1 − p̂) f1 (d S(0)) ,
A
(3.37)
R
where
1
f1 (x) = Max f (x), (p̂ f (u x) + (1 − p̂) f (d x)) . (3.38)
R
Remark 3.8.1 As we have assumed that the stock is non-dividend paying, an
American call is the same as a European call and CA (0) = CE (0). Therefore the
formula (3.37) is essentially for the case of an American put, though it can be
applied for an American call as well if one so wishes. For a dividend paying stock,
an American call will behave differently from a European call.
86 Financial Mathematics: An Introduction
Step 3. Compute
1 f f
1 2 1
Pbu = p̂ Puu + (1 − p̂) Pud = ( × 0) + ( × 0) = 0
R 1.05 3 3
1 f f 1 2 1
Pbd = p̂ Pud + (1 − p̂) Pdd = ( × 0) + ( × 7.80) = 2.476.
R 1.05 3 3
Pricing American Options: A Binomial Lattice Model 87
Step 4. Evaluate
f
Pu = Max (K − u S(0), 0) = Max(80 − 88, 0) = 0
f
Pd = Max (K − d S(0), 0) = Max(80 − 76, 0) = 4.
Step 5. Evaluate
f
Max(Pu , Pbu ) = Max(0, 0) = 0 = PMax
P (1) =
A
f
u
Max(Pd , Pbd ) = Max(4, 2.476) = 4 = PMax
d
.
Step 6. Compute
1 Max
PA (0) = Max K − S(0), p̂ Pu + (1 − p̂) PMax
d
R
1 2 1
= Max (80 − 80), ( × 0) + ( × 4)
1.05 3 3
= 1.27.
The above calculation are depicted in the following tables which are now self ex-
planatory
n 0 1 2
A
P (n) f
(Puu = 0)
iii4
iiii
7 (?) UUUUU
ooooo UU*
o f
(?) OOO
i4 (Pud = 0)
OOO i i
' iiii
(?) UUUUU
U*
f
(Pdd = 7.80)
n 0 1 2
A
P (n) ff3 0.0
ffffff
PMax = [0, 0] W
gg3 WWWWW
u
gWgggg WW+
(?) WWWWW g3 0.0
+ ggggg
PMax = [4, 2.47]
d XXXXX
+
7.80
88 Financial Mathematics: An Introduction
n 0 1 2
A
P (n) ff3 0.0
ffffff
Max
e2 Pu = [0, 0] W
eeeeee WWWWW
WW+
[80 − 80, 1.27] YY ggggg3 0.0
YYY, gg
PMax = [4, 2.47]
d XXXXX
+
7.80
Here the notation [a, b] is used to denote that b is the maximum of a and b,
i.e. b = Max(a, b). Therefore PA (0) = 1.27, and the given American put should be
exercised early in the down state at t = 1. This will give a pay-off of Rs 4.00 which
is more than the value of holding it to the expiry for the value of Rs 2.47. We may
also compute PE (0) and get PE (0) = 0.79. Clearly PE (0) < PA (0). Hence these are
different even for a non-dividend paying stock.
Example 3.8.2 Find the price of an American call and a European call for the
data as S(0) = 120, K = 120, u = 1.2, d = 0.9, r = 10% per year and time to
expiry T = 2 years.
Solution As the stock is non-dividend paying, CA (0) = CE (0). Computing CE (0)
by formula (3.15) or utilizing the below given tables we get CE (0) = 22.92. Here
2
p̂ = .
3
n 0 1 2
S(n) 3 172.80 h
hhhh
jj5 144 VVVVV+
jjjj
12 TTTTT h3 129.60
T) hhhhh
V
96 VVVVVV
+
97.20
n 0 1 2
CE (n) h3 52.80
hhhh
h3 34.91 VVVVVV
hhhh +
22.92 VVVVV
V+ hhhhhh3 9.60
h
5.82 VVVVV
VV+
0.0
Options on Dividend Paying Stock 89
n 0 1 2
A
C (n)
jj4 52.80
jjjj
ll6 (?) SSSS
llll SS)
(?) RRRR kk5 9.60
RR( kkkk
(?) TTTTT
TT*
0.0
n 0 1 2
A
C (n) g3 52.80
gggg
4
[24, 34.9] V
i VVVV
iiiii V*
(?) UUUUU hhh 4 9.60
UU* hhhh
[0, 5.8] WWW
WWWW+
0.00
1 2 1
C (0) = Max (120 − 120),
A
(34.9) + (5.80)
1.1 3 3
= Max(0, 22.92)
= 22.92.
binomial model. For this we divide the time interval [0, T] into N time periods.
Assuming that the dividend date τ occurs in the kth period, we form a lattice
in the usual way, but subtract the amount Q from the nodes at the kth period.
But then the nodes at time period k do not recombine and therefore we get a
binomial tree model rather than a binomial lattice model. Except for this change
and some obvious modifications, the procedure for finding PE (0), CE (0), PA (0) and
CA (0) remains same as discussed in Section 3.7 and Section 3.8. We illustrate the
procedure in the below given exercise.
Example 3.9.1 Compute the price of the European put with the given data as
S(0) = 12, K = 14, u = 1.1, d = 0.95, r = 2% per year and time to expiry in two
years. Assume that a dividend of Rs 2.00 is paid at time t = 1.
Solution The ex-dividend price are as given in the below given table
n 0 1 2
S(n)
(ex-div) h3 12.32
hhhh
11.20
8 VVVV
ppppp V+
pp 10.64
12.00 VVVVV 3
h 10.34
V+ hhhhh
9.40 WWWWWW
W+
8.93
7
Then following the methodology discussed in Section 3.8 we get p̂ = and
15
n 0 1 2
PE i4 1.68
iiiii
2.53 UUUUUU
u: *
uuu 3.36
u
uu
(?) TTTTT jj4 3.66
T) jjjj
4.33 UUUUUU
*
5.07
Here
1 7 8
2.53 = (1.68) + (3.36)
1.02 15 15
1 7 8
4.33 = (3.66) + (5.07) .
1.02 15 15
Options on Dividend Paying Stock 91
Therefore
1 7 8
P (0) =
E
(2.53) + (4.33) = 3.42.
1.02 15 15
Example 3.9.2 For the data given in Example 3.9.1, evaluate PA (0).
Solution Continuing with the data of Example 3.9.1, we have
n 0 1 2
PA 3 1.68
ggggg
[2.80, 2.53] W
p8 WWWW+
ppppp 3.36
ppp
(?) VVVVV 3 3.66
* hhhhh
[4.60, 4.33] W
WWWW+
5.07
Therefore
1 7 8
P (0) = Max K − S(0),
A
(2.80) + (4.60)
1.02 15 15
1 7 8
= (2.80) + (4.60)
1.02 15 15
= 3.69.
Example 3.9.3 Find the price CE (0) and CA (0) for the data, S(0) = 120, K = 120,
u = 1.2, d = 0.90, r = 10% per year and time to expiry is T = 2 periods. What are
the value of CE (0) and CA (0) if it is additionally given that a dividend of Rs 14
is paid at T = 2 ?
Solution If there is no dividend on the stock then CA (0) = CE (0). The value of
2
CE (0) can be easily determined as Rs 22.92. Here p̂ = .
3
We next consider the case when there is a dividend of Rs 14.00 at time t = 2.
This gives the following table for the stock price at different periods.
92 Financial Mathematics: An Introduction
n 0 1 2
S(n)
(ex-div) (1.2)2 (120) − 14 = 158.80
ddddddd2
j5 144 ZddZZZZZZZ
j Z-
jjjj
120 TTTTT (1.2)(0.9)(120)
1
− 14 = 115.60
T) d dddddddd
108 ZZZZZZZZZ,
(0.9)2 (120) − 14 = 83.20
n 0 1 2 n 0 1 2
CA CA g3 38.80
jj4 38.80 gggg
jjjj
ll6 (?) SSSSSS [24, 21.5] VV
llll i4 VVVVV
S) iiiii *
(?) RRRR k 5 0.0
kk (?) UUUUU hhhh3 0.0
UUU* h
RR( kkkk hhh h
(?) TTTTT [0, 0] WWWWW
TT* WWWW+
0.0 0.0
Therefore
1 2 1
C (0) =
A
(24) + (0)
1.1 3 3
= 14.54.
To compute CE (0) we follow the usual methodology to get the following table
n 0 1 2
CE
h4 38.80
hhhh
jj5 23.5 UUUUUUU
jjjj *
(?) TTTT
TT) iiiiii4 0.0
i
0.0 VVVVVV
V*
0.0
This gives
1 2 1
C (0) =
E
(23.5) + (0) = 14.24.
1.1 3 3
As expected, here CE (0) and CA (0) are not equal because the stock is dividend
paying.
Notion of Complete Markets 93
xuS(0) + yB(1) = Cu
xS(0) + yB(1) = Cn
xdS(0) + yB(1) = Cd ,
We see that the above linear system only permits a solution for a select subset of
contingent claims (e.g. European calls here) with pay-offs within a 2-dimensional
subspace. Some readers may like to verify that the linear system under consider-
ation is consistent only when
Thus not all contigent claims can be priced by the trinomial model. This means
that the trinomial model is not complete.
In fact if we try to determine the unique RNPM for the trinomial model we
shall again face the problem. Here we need to find p̂1 > 0, p̂2 > 0, and p̂3 > 0 such
that
and
Obviously this system is consistent but it cannot have unique solution. This again
shows that the trinomial model is not complete.
To tackle such situation we enlarge the set of basic securities. For example, we
may take bond, stock and (say) an option whose price C(0) is known. Now we
wish to obtain p̂1 , p̂2 , p̂3 (p̂i > 0, i = 1, 2, 3) such that
The above system has unique solution giving the unique RNPM. This unique
RNPM can now be used to price other contingent claims. In this way we have
artificially made the trinomial model as a complete market model.
Example 3.10.1 Let S(0) = 100 and S(1) takes three values 120, 100 and 90 with
probabilities 0.4, 0.2 and 0.4 respectively. The price of the European call on this
stock with the strike price 105 is 5 and the price of the European call with strike
price 95 is 10, with both option expiring at t = 1. Replicate the risk-free security
that pays Rs 1 at t = 1 regardless of what happens.
Solution Let x denote the number of shares of stock, z1 the number of calls with
strike price 100 and z2 the number of calls with strike price 95. Therefore in order
to replicate the risk-free security we need to solve
Summary and Additional Notes 95
The solution of the above system is x = 1/90, z1 = 2/135 and z3 = −1/45. Further
the cost of this portfolio is (1/90)100 + (2/135)5 − (1/45)10, i.e. 0.963.
Therefore the price of risk-free security is 0.963. Since its relative return is
(1/0.963) − 1 = 3.85%, the risk-free rate in this model is 3.85%.
3.12 Exercises
Exercise 3.1 Let S(T) be the price of a given security (stock) at time T. All of the
following options have exercise time T and (unless stated otherwise) strike price
K. Give the pay-off at time T that is earned by an investor who
(i) owns one call and one put option.
(ii) owns two calls and has sold short one share of stock.
(iii) owns one share of stock and has sold one call.
(iv) owns one call having strike price K1 and has sold one put having strike price
K2 .
Exercise 3.2 Consider a family of call options on a non-dividend paying stock,
each option being identical except for its strike price. The value of the call with
strike price K is denoted by C(K). Let K1 < K2 < K3 . Show that
K3 − K2 K2 − K1
C(K2 ) ≤ C(K1 ) + C(K3 ).
K3 − K1 K3 − K1
Exercises 97
Exercise 3.3 In a binomial lattice model, let possible values of S(2) be 32, 28 and
x. Find the value of x.
Exercise 3.4 Draw the pay-off curves of the following option portfolios
(i) −P80 + P100 + 2C130 − C150 .
(ii) −P100 + P120 + C150 .
Exercise 3.5 Let B(0) = 100, B(1) = 110, S(0) = 100 and S(1) take values
120 and 80 with probabilities 0.8 and 0.2 respectively. Also let CE and PE re-
spectively be a European call and a European put with K = 100 and T = 2.
(i) Verify put call pairty.
(ii) What can you say if the options are American options CA and PA ?
(iii) What difficulty (if any) will be faced if instead of B(1) = 110 we have B(1) =
120? Give mathematical justification to your answer.
Exercise 3.6 Let r = 0.2. Find the risk neutral conditional expectation of S(3)
given that S(2) = Rs 110.
Exercise 3.7 A certain stock is selling for Rs 50. The feeling is that for each
month, for the next two months, the stock price will rise by 10% or fall by 10%.
Assuming that the risk free rate is 1%, calculate the price of the European call
with the strike price of Rs 48.
Exercise 3.8 Prove the following
(i) S(0) − Ke−rt − D(0) ≤ CE (0)
(ii) Ke−rt + D(0) − S(0) ≤ PE (0),
where D(0) is the present value of the dividend paid by the stock. The rest of
terminologies stand for usual meanings.
Exercise 3.9 Consider the data S(0) = 60, K = 62, u = 1.1, d = 0.95, r =
0.03 and T = 3. Find CE (0), PE (0), CA (0) and PA (0). Identify the time instants
when CA and PA will be exercised.
Exercise 3.10 Let S(0) = 120, u = 1.2, d = 0.9 and r = 1%. Consider a call option
with strike price K = 120 and T = 2. Find the option price and the replicating
strategy.
Exercise 3.11 For the single period trinomial model, let S(0) = 100 and S(1)
take three values 120, 100 and 90 with probabilities 0.4, 0.2 and 0.4 respectively.
Let r = 3.85%. Further let the price of the European call on this stock with the
strike price 105 be 5. Find the price of the European call with the strike price 95.
4
Basic Theory of Option Pricing-II
4.1 Introduction
The world of options underwent a revolutionary change in 1973 when Fisher Black
and Maryon Scholes [15] and Robert C. Merton [94] published their seminal pa-
pers on theory of option pricing. The basic idea behind their studies is that the
price of an option is determined implicitly by the price of the underlying stock
and certain other parameters whose values, except one of them, are easily ob-
servable. Black-Scholes option pricing model constructed a continuously hedging
strategy to protect the writer’s short position in option. Also Merton designed a
self-financing and dynamically replicating hedged position containing the options,
the underlying risky stock and a riskless asset. Both approaches led to a partial
differential equation, now famously known as the Black-Scholes partial differential
equation or simply Black-Scholes equation, with appropriate boundary conditions
defined according to the contractual specifications of the option. These models will
be presented and discussed later in Chapter 10. The price of a particular deriva-
tive security or option is obtained by solving the Black-Scholes equation subject
to boundary conditions. In recognition to their pioneering contributions, Scholes
and Merton were awarded 1997 Nobel prize in Economics. Black’s contribution
was also recorded by the Swedish Academy though he was found ineligible for
prize as he died in 1995.
Despite its richness and simplicity, the Black-Scholes model has one major
limitation. It cannot be used to accurately price options with an American-style
exercise options as it only calculates the option price at expiration. It does not
consider the steps along the way where there could be the possibility of early
exercise of an option. To this end, came the alternative proposal of using binomial
option pricing model. Both the Black-Scholes model and the binomial model are
based on the same theoretical foundations and assumptions but there are also some
100 Financial Mathematics: An Introduction
important differences between the two. The mathematical tools Black, Scholes and
Merton employed are quite advanced, like solving a partial differential equation,
and this fact does not make the underlying economics more clear. In contrast to
the Black-Scholes model the binomial option-pricing model is simple and can easily
be understood with undergraduate knowledge. Partly the same has already been
experienced in previous chapter wherein the readers have been provided with the
glimpse of multi period binomial lattice models for option pricing through simple
two period examples.
As already exhibited in the previous chapter, the guiding principle of the bi-
nomial model is to break down the time to expiration into a large number of time
intervals or steps. Thereafter, at each step, it is assumed that the stock price will
move up or down by certain amounts. This produces a binomial lattice of un-
derlying stock prices. Subsequently the option prices at each node of the lattice
are calculated, working backward from expiration to the present. For almost all
practical purposes, in practice the discrete model of option pricing is preferred.
In this chapter, we aim to present the binomial lattice model proposed by John
C. Cox, Stephen A. Ross and Mark Rubinstein [33] in 1979 for option pricing.
This model is now famously called the CRR model for option pricing. Quoting
Cox, Ross and Rubinstein, “Our formulation, by its very construction, leads to
an alternative numerical procedure which is both simpler, and for many purposes,
computationally more efficient”. In fact the Black-Scholes formula turns out to be
a particular limiting case of the discrete binomial CRR model. In Section 4.2, we
describe the construction of the CRR model. In Section 4.3, we present the case
when the CRR model can be matched with the multi-period binomial model. This
matching finally leads to the Black-Scholes formula in the limiting case. Section
4.4 presents an extension of the Black-Scholes formula for options on dividend
paying underlying stocks. Section 4.5 briefly describes the parameters, called the
Greeks, which are vital tools in risk management.
S(T) = S(0)E1 E2 · · · En
= S(0) eH , (say)
where eH = E1 E2 · · · En . Thus,
n
ln(S(T)) = ln(S(0)) + H; H= ln(Ek ).
k=1
uuuS(0)
p
uuS(0)
p
uS(0) 1−p
uudS(0)
p
1−p
S(0) udS(0)
1−p p
uddS(0)
dS(0) p
1−p
ddS(0)
1−p
dddS(0)
The parameter µ is called the drift and the parameter σ is called the volatility.
It may appear a little strange as to how and why we suddenly bring in two
new parameters µ and σ2 . As of now we urge to take them on face value as we
shall be providing their physical interpretation later in this chapter. Now for each
k = 1, . . . , n, let us introduce another random variable
ln(Ek ) − E(ln(Ek ))
Xk = .
Var(ln(Ek ))
Then, by using (4.1), (4.2) and definition of Ek , we can easily work out that
⎧ 1−p
⎪
⎪
⎪
⎪ , with probability p
⎪
⎪
⎨ p(1 − p)
Xk = ⎪
⎪
⎪
⎪ −p
⎪ , with probability 1 − p.
⎩
⎪
p(1 − p)
n
H= ln(Ek )
k=1
n
√
= (µ
t + σ
tXk )
k=1 √
= µT + σ
tY,
where
n
Y= Xk , (4.4)
k=1
is a simple random walk. Though the notion of simple random walk will be ex-
plained in details in Chapter 7, here we just wish to get familiar with it for clarity.
Definition 4.2.1 (Random Walk) A random walk is a stochastic process {Sn , n =
0, 1, . . .}, with S0 = 0, defined by
n
Sn = Xk ,
k=1
where {Xk } are independent and identically distributed random variables. The ran-
dom walk is simple if for each k = 1, . . . , n, Xk takes value from {a, b} only, a and
b are real constants, with P(Xk = a) = p and P(Xk = b) = 1 − p, p ∈ [0, 1].
It is important to observe that the constants a, b, p are independent of k. Generally
a and b are taken as 1 and −1 respectively. A simple random walk can be visualized
as a game involving two persons, which consists of a sequence of independent
identically distributed moves, and the sum Sn represents the score of the first
person, say, after n moves, with the assumption that the score of the second
person is −Sn .
In view of the above discussion we have the following lemma.
Lemma 4.2.1 For a CRR model with probability of up tick u equals p and prob-
ability of down tick d equals 1 − p, life time T and time increment
t = T/n, the
stock price is given by
√
S(T) = S(0) exp µT + σ ∆T Y ,
where µ is the drift and σ is the volatility described by (4.3) and Y is a simple
random walk given by (4.4).
104 Financial Mathematics: An Introduction
as it requires other conceptual knowledge which we have not yet acquired. (Please
refer to the third point in the notes, Section 4.6).
Also, there are many sophisticated models in financial econometric for comput-
ing historical volatility, like ARCH (autoregressive conditional heteroskedasticity),
GARCH (generalized ARCH), risk metrics EWMA (equally weighted moving av-
erage), threshold autoregressive model, and so on so forth. Again we are not in a
position to elaborate on them. However, for the sake of understanding, we briefly
explain a very basic model for computing the historical volatility. First calculate
the natural log of the ratio of a stock price from the current day t to the previous
day t − 1, that is, Rt = ln(St /St−1 ). Then find the *naverage of all these for the given
period n (for example, n = 20 days), say, R = ( t=1 Rt )/n. The historical volatility
is the standard deviation of Rt , (t = 1, . . . , n), from the mean R. If a stock price
follows a Brownian motion or a geometric Brownian motion then the above result
is multiplied by the square root of the average number of trading days in a year
to quote the volatility on an annual basis. The reason for the last step lies in the
fact that the volatility increases with the square root of the unit of time. Though
we agree that one may find this discussion a bit incomplete, but the details have
been avoided just to skip the complications involved with volatility. However, we
encourage the readers to search the web and find sufficient material/examples on
this concept or refer to an excellent text [138].
We shall next attempt to approximate the CRR model by a multi-period bi-
nomial model. It is important to point out here that not every CRR model is a
multi-period binomial model.
1
in which d = . Now, (4.1) and (4.2), on using (4.3), can be re-expressed as follows
u
(2p − 1)U = µ
t
4p(1 − p)U2 = σ2
t. (4.5)
that is, +
U= µ2 (
t)2 + σ2
t, and D = −U.
Furthermore, (4.5) yields
1 µ
t
p= + .
2 2 µ2 (
t)2 + σ2
t
Let us take n sufficiently large (equivalently, the time steps between successive
trading instances approach zero) so that (
t)2 terms can be neglected. We get
√ √
U = ln(u) = σ
t and D = −U = −σ
t.
Consequently, we have
√ √ 1 µ √
u = eσ
t
, d = e−σ
t
and p = +
t. (4.6)
2 2σ
We take a small break here for some examples to illustrate the choice of the
parameters.
Example 4.3.1 A non-dividend paying stock is currently selling at Rs 100 with
annual volatility 20%. Assume that the continuously compounded risk-free interest
rate is 5%. Using a two-period CRR binomial option pricing model, find the price
of one European call option on this stock with a strike price of Rs 80 and time to
expiration 4 years.
Solution We are given S(0) = Rs 100, K = Rs 80, T = 4,
t = 2, r = 0.05, σ =
0.2. Applying the CRR model, the up-factor and the down-factor are respectively
given by
√ 1
u = eσ
t = 1.3269, d = = 0.7536.
u
Matching of CRR Model with a Multi-Period Binomial Model 107
Now,
Thus,
Cuu = Max{Suu − K, 0} = 96.0664,
Cud = Max{Sud − K, 0} = 20.00,
Cdd = Max{Sdd − K, 0} = 0.
Since the annual rate of risk-free
interest is continuously compounded (so instead
of 1 + r or R we shall use er t ), the risk neutral probability is
er
t − d
p̂ = = 0.6132.
u−d
Hence for a two period binomial model,
C(0) = e−r
t ((p̂)2 Cuu + 2p̂(1 − p̂)Cud + (1 − p̂)2 Cdd )
= e−0.1 ((0.3760)(96.0664) + (0.4744)(20.00) + 0)
= Rs 41.27.
For computational purpose, we take the annualized risk-free interest rate and
volatility. Although we take 365 days for counting a year but the exchanges across
the world remain closed and no trading take place on Saturday and Sunday. More-
over, there are some other holidays observed by exchanges. Consequently the num-
ber of trading days are less. In practice options are priced taking into account the
number of trading days. The following example throw light on this aspect.
Example 4.3.2 A non-dividend paying stock is selling at Rs 1, 500 on March 1,
2010, with annual volatility of 22%. Assume that the continuously compounded
risk-free interest rate is 3%. Compute the price of a European call option written
on this stock struck at Rs 1470 expiring in April 2010 using a single-period CRR
binomial model.
Solution It is given that S(0) = Rs 1, 500, K = Rs 1, 470, r = 0.03, σ = 0.22.
The option is written on March 1 and it expires in the business day immediately
preceding the last business day of the contract month which is April 29, 2010. The
total number of trading days are 44 days. The number of trading days in 2010
is 252. Thus,√T = 44/252 = 0.1746. For a single-period model,
t = T = 0.1746.
Then, u = eσ
t = 1.0963, d = u−1 = 0.9124. The risk neutral probability measure
er
t − d
is p̂ = = 0.505. Thus,
u−d
108 Financial Mathematics: An Introduction
Then,
S(n
t) = S(T)
*n *n
= S(0)u k=1 Yk
d(n− k=1 Yk )
u *nk=1 Yk
= d S(0)
n
,
d
giving
S(T) T
u *T/
t Yk
= d
t .
k=1
S(0) d
*T/
t
Observe that Y = k=1 Yk is a simple random walk with E(Y) = p(T/
t) and
Var(Y) = p(1 − p)(T/
t).
Using the value of ln(d) from (4.6), we have
S(T)
−Tσ √ T/
t
ln = √ + 2σ
t Yk .
S(0)
t k=1
Hence,
S(T) −Tσ √ T/
t
E ln = √ + 2σ
t E(Yk )
S(0)
t k=1
−Tσ √ T
= √ + 2σ
t p
t
t
= µT,
where the last equality follows on using the value of p from (4.6). Also,
Matching of CRR Model with a Multi-Period Binomial Model 109
S(T) *
Var ln = 4σ2
t T/
t
k=1
Var(Yk )
S(0)
= 4σ2 Tp(1 − p)
1
→ σ2 T, as p → when n → ∞.
2
Furthermore, by application of the central limit theorem, we can assume that
Yk follows a normal distribution when time steps approach zero. Summarizing the
above discussion we can conclude that
S(T)
ln ∼ N(µT, σ2 T).
S(0)
Moving ahead, observe that
√ T σ2 T
u = eσ T/n
1+σ + (4.7)
n 2n
√
−σ T/n T σ2 T
d=e +
1−σ . (4.8)
n 2n
So, the risk neutral probability measure (RNPM) is given by
R−d
p̂ =
u−d
rT
1+ −d
= n
u−d
rT T σ2 T
+σ −
n n 2n
.
T
2σ
n
Thus,
1 2r − σ2 T
p̂ = + . (4.9)
2 4σ n
We have seen that the European call or put options are simple to price. We
therefore concentrate on pricing the European call option using the above de-
scribed scheme. As established in previous chapter, the European call option price
for an n-period binomial lattice model is described as follows.
110 Financial Mathematics: An Introduction
rT −n
C(0) = 1 + Ep̂ ((S(T) − K)+ )
n
rT −n
= 1+ Ep̂ ((S(0)uY dn−Y − K)+ )
n
rT −n u Y
= 1+ Ep̂ ((S(0) dn − K)+ ).
n d
It follows from (4.7) and (4.8), that
rT −n T √
C(0) = 1 + Ep̂ ((S(0)ew − K)+ ), where w = 2σ Y − σ nT. (4.10)
n n
T
Var(w) = 4σ2 Var(Y)
n
= 4p(1 − p)σ2 T
→ σ2 T,
where the last relation follows in the limiting case when n → ∞ (or
t → 0), and
p → 21 from (4.6). Moreover it is important to note that
Matching of CRR Model with a Multi-Period Binomial Model 111
T √
Ep̂ (w) = 2σ Ep̂ (Y) − σ nT
n
T ∗ √
= 2σ np − σ nT
n
√ 1 2r − σ2 T √
= 2σ nT + − σ nT (using (4.9))
2 4σ n
σ 2
= r− T
2
and
Var(w)p̂ = 4σ2 p̂(1 − p̂)T
→ σ2 T,
(using the limiting case when n → ∞ and p → 1
2
in (4.9)).
1 w − (r − 2 )T 2
σ2
, ∞ − √
1
C(0) = e −rT
(S(0)ew − K)+ √ e 2 σ T dw.
−∞ σ 2πT
K
Now S(0)ew > K, implies w > ln = w1 . Therefore,
S(0)
1 w − (r − 2 )T 2
σ2
, − √
1 2 σ
C(0) = e−rT (S(0)ew − K) √ e T dw.
w>w1 σ 2πT
To evaluate this integral, substitute
σ2
w− r− T
y= 2 .
√
σ T
Then,
√ σ2 √
w = yσ T + r − T, so dw = σ T dy.
2
Moreover, w > w1 gives y > y1 , where
112 Financial Mathematics: An Introduction
1 K σ2
y1 = √ ln − r− T.
σ T S(0) 2
Subsequently,
,
e−rT √ 2
T+(r− σ2 )T
y2
C(0) = √ (S(0)e yσ − K)e− 2 dy
2π y>y1
, , ∞
e−rT √ 2
yσ T+(r− σ2 )T −
y2 1 y2
= √ S(0)e e 2 dy − √ Ke−rT e− 2 dy
2π y>y1 2π y1
Obviously if we enter into the European put option on the day it is offered then
the put price P(0) can be obtained by taking t = 0 in the above formula. Here, we
encouraged the readers to derive the Black-Scholes formula for pricing European
put option using the analogous arguments as done above in the derivation of C(0).
So far, we have discussed a European call or a European put options cases. It
is well known that, for a non-dividend paying stock, the value of an American call
option is same as that of a European call option with the same strike price and
time to expiration. Hence (4.13) can be used to compute price of an American
call option on a non-dividend paying underlying stock. Notably, the Black-Scholes
formula for an American put option is not the same as for European put option as
it may pay to exercise them early. An American put price has to be approximated
using the binomial method explained in previous chapter wherein we can simulate
sufficiently large binomial lattice by taking sufficiently small time steps for more
accurate approximation.
price by the present value of the dividend. To make us understand this, let us first
explore how the option gets affected by dividends on the underlying stock.
Dividends are a way by which the companies distribute part of their profits
to their investors. The question of who should be paid dividends becomes com-
plex as the composition of shareholders changes each day. To settle this question,
companies designate a date known as the record date. Dividends are paid to the
list of shareholders who hold stock on the record date. In order to allow time for
settling stock purchases, stock exchanges set a date generally two business days
prior to the record date known as the ex-dividend date. Someone who purchases
the stock on or after the ex-dividend date is not eligible to receive dividends.
Now suppose we own a call option on a dividend paying stock, but we don’t own
any of the actual underlying stock. Then the bad news is that we are not entitled
to get any dividends because dividends are paid only to the actual shareholders.
We can only receive the dividend if we own the stock before the ex-dividend date.
All what we own is the right to become a shareholder and do so only when, for
instance, we exercise the call option. But then all is not lost. The good news is
that not getting the dividends while holding only the option is compensated on the
price of option. This is due to the fact that on the ex-dividend date the underlying
stock price falls by roughly the amount of the dividend. This drop causes call
option price becoming cheaper relative to the amount of the dividend and put
option price becoming slightly more expensive. In other words, with dividends,
the premium we have to pay for call option gets reduced while the put option
premium increases.
The Black-Scholes option pricing formula has to be adjusted when the underly-
ing stock pays dividends. For adjusting the price of European options, for known
discrete dividends, we merely have to subtract the present value of the dividend
from the current price of the underlying asset in calculating the Black-Scholes
value. That is, to take into account the drop in stock price on ex-dividend date,
instead of S(0) work with S(0) − div e−r tdiv in the Black-Scholes formula, where div
denotes the dividend amount in proportions to stock, and tdiv denotes the dividend
time. In case if dividends div1 , . . . , divp are paid at discrete times tdiv1 , . . . , tdivp then
*p −r t
take S(0) − j=1 div j e div j instead of S(0) in the Black-Scholes formula (4.13) and
*p −r t
(4.12). A word of caution: be careful to see that S(0) > j=1 div j e div j for we have
to take its logarithm while computing d1 in the Black-Scholes formula.
The following examples help us to appreciate the discussed scenario.
Example 4.4.1 A stock currently trades for Rs 100 per share. The annual con-
tinuously compounded risk-free interest rate is 5% and the annual price volatility
116 Financial Mathematics: An Introduction
relevant for the Black-Scholes formula is 30%. Call options are written on this
stock with a strike price of Rs 80 and time to expiration of 5 years. The stock will
pay a dividend of Rs 20 in 2 years and another dividend of Rs 30 in 3 years. Use
the Black-Scholes formula to find the price of one such call option.
Solution Here, S(0) = Rs 100, K = Rs 80, T = 5, r = 0.05, σ = 0.3, div = Rs 20
for first 2 years and Rs 30 for next 3 years. We first compute the adjusted stock
price by decreasing the current stock price by the present value of the dividend.
The adjusted stock price is denoted by say Sa .
Sa = S(0) − (div)e−r tdiv = 100 − 20e−(0.05)2 − 30e−(0.05)3 = 56.082.
Now we use the formula
ln(Sa /K) + (r + 0.5σ2 )T
d1 = √
σ T
ln(56.08/80) + (0.05 + 0.5(0.3)2 )5
= √ = 0.1786.
0.3 5
√
d2 = d1 − σ T = −0.4922.
Φ(d1 ) = Φ(0.18) = 0.5714, Φ(d2 ) = 1 − Φ(−d2 ) = 1 − 0.6879 = 0.3121.
C(0) = Sa Φ(d1 ) − Ke−rT Φ(d2 ).
Thus,
C(0) = (56.082)(0.5714) − (62.304)(0.3121) = Rs 12.60.
Remark 4.4.1 Suppose in the above example the underlying stock pays no div-
idend in the first two years but thereafter pays a dividend of Rs 30 in 3 years.
Then,
Sa = 74.1787, d1 = 0.5955, d2 = −0.0753.
Φ(d1 ) = 0.7123, Φ(d2 ) = 1 − Φ(−d2 ) = 0.4721 and C(0) = Rs 23.42.
Now assume the stock pays no dividend in its entire lifespan of 5 years, that is,
div = 0. Then
ln(100/80) + (0.05 + 0.5(0.3)2 )5
d1 = √ = 1.0407, d2 = 0.3699.
0.3 5
Φ(d1 ) = 0.8508, Φ(d2 ) = 0.6443,
and hence C(0) = Rs 44.94.
Thus, the dividend paid by the underlying stock gets reflected in price reduction
of the European call option.
Black-Scholes Formula for Dividend Paying Stock 117
Remark 4.4.2 Working with the same data as in Example 4.4.1, we assume that
a European put has been offered on the stock. Although we can directly compute
the put price using the Black-Scholes formula for European put option, we use the
put-call parity with dividends to do so.
(i) In case the stock pays dividends exactly in the same manner as described in
the Example 4.4.1 then
P(0) = C(0) − (S(0) − (div)e−r tdiv − Ke−rT ) = Rs 18.82.
(ii) Consider the other case when the stock pays no dividends in the first 2 years
and pays a dividend of Rs 30 in the next 3 years. Then
P(0) = C(0) − (S(0) − (div)e−r tdiv − Ke−rT ) = Rs 11.55.
(iii) If stock pays no dividend in 5 years then P(0) = C(0)−(S(0)−Ke−rT ) = Rs 7.24.
This clearly illustrates that the European put option become more costly on a
dividend paying underlying stock.
It is worth to note that if the underlying stock pays continuous dividend at a
rate δdiv , then the price of the European call option on such a stock is computed
as follows.
C(0) = S(0)e−δdiv T Φ(d1 ) − Ke−rT Φ(d2 ),
S(0)e−δdiv T
σ2
ln + r+ T
d1 = K 2 ,
√
√ σ T
d2 = d1 − σ T.
Similarly the price of a European put option is given by
∂d1 1 ln(S/K) σ2
= √ −r− . (4.17)
∂t 2σ T − t T − t 2
∂d1 1
= √ . (4.18)
∂S Sσ T − t
√
∂d1 T−t
= . (4.19)
∂r σ
∂d1 √ d1
= T−t− . (4.20)
∂σ σ
We encourage the readers to compute the above expressions.
∂C ∂d1 ∂d1 σ
Θ= = SΦ (d1 ) − rKe−r(T−t) Φ(d2 ) − Ke−r(T−t) Φ (d2 ) + √ .
∂t ∂t ∂t 2 T−t
We can plug in the values of the partial derivatives from (4.16) and (4.17) to
get the final expression for Θ. Moreover, using the put-call parity, we can easily
compute the Θ of the non-dividend paying European put option.
Perhaps the most significant Greek is the delta denoted by ∆. For the European
call option, it is defined to be the partial derivative of call price C with respect
to the price of the stock (underlying derivative security). From (4.14), and noting
that ∂d2 /∂S = ∂d1 /∂S, we have
∂C ∂d1
∆= = Φ(d1 ) + (SΦ (d1 ) − Ke−r(T−t) Φ (d2 )) . (4.21)
∂S ∂S
Now concentrate on the second term in the expression (4.21), we will try to
simplify it. Using (4.16) and (4.18), we obtain
120 Financial Mathematics: An Introduction
∂d1 1
(Se−d1 /2 − Ke−r(T−t)−d2 /2 )
2 2
(SΦ (d1 ) − Ke−r(T−t) Φ (d2 )) =
∂S Sσ 2π(T − t) √
1 −d21 /2
(d −σ T−t)2
−r(T−t)− 1 2
= (Se − Ke )
Sσ 2π(T − t)
e−d1 /2
2
√
(S − Ke−(r+σ /2)(T−t)+σd1 T−t )
2
=
Sσ 2π(T − t)
e−d1 /2
2
= (S − Keln(S/K) )
Sσ 2π(T − t)
= 0.
Therefore, from (4.21) we get ∆ = Φ(d1 ) for the non-dividend paying European
call. Notice that ∆ > 0, and it is less than 1. Using the put-call parity we can easily
work out that for a non-dividend paying European put option ∆ = Φ(d1 ) − 1.
Example 4.5.1 The current price of a stock is Rs 100 and its volatility is 30%
per year. The risk-free interest rate is 4% per year. A portfolio is constructed
consisting of one 6-month European call option with a strike price of Rs 80 and
the cash obtained from shorting ∆ shares of the stock. The portfolio value is non-
random. What is ∆?
Solution By assumption that the portfolio value is non-random, we have, C−∆S =
0. Thus
∂C
= ∆ = Φ(d1 ),
∂S
where
ln(100/80) + (0.04 + (0.3)2 /2)(0.5 − 0)
d1 = √ = 1.2522.
0.3 0.5 − 0
Consequently, ∆ = 0.894752.
Before we proceed in describing the other Greeks, we would like to get more
comfortable as to what roles these Greeks play in option theory. We present a
situation below in this context.
A company has sold a 3-month European call option on 1, 000 shares of a non-
dividend paying stock at a strike price of Rs 100. The share is currently trading at
Rs 95 per unit, the interest free rate of return is 5% per annum, and the market
volatility is estimated at 20% per annum. Using (4.14) and (4.15) we can easily
compute the value of 1, 000 call options C(0) = Rs 2269.30. The company is a
writer of call, so it is exposed to a risk on the call options which is the difference
The Greeks 121
between the value of the call premium (for 1000 calls) invested at a risk free
rate of interest for 3 months and the value of the call at the exercise time. Thus
the risk profile of the company is C(0)erT − NMax{S(T) − K, 0} = Rs (2297.84 −
1000Max{S( 14 ) − 100, 0}). A natural question is that how does the company hedge
this risk? If it decides not to take any action (called the naked position) then
the company is exposed to lose if the stock price rises from present Rs 95. See,
if S( 14 ) = Rs 103, then the company loses Rs 2, 702.16 (the amount may appears
to be meagre but the numeric data in the example can be changed to make this
figure large). On the other hand, if the company decided to buy say 1, 000 shares
of the underlying stock at t = 0 (called the covered position), then it is exposed
to lose if stock decreases. See, if S( 14 ) = Rs 90, the company will lose Rs 2, 702.16.
Both positions expose the company to risk! One can easily think of designing
some other strategy, like wait and buy 1, 000 shares as soon as price of the stock
reaches Rs 100 and sell 1, 000 shares as soon as the price of the stock declines
below Rs 100. But simple hedging strategy sometime does not work well.
What we are looking for is a more concrete and workable strategy to hedge the
risk positions in options created due to changes in the stock price. It is here when
the delta, ∆, becomes a handy tool. In the immediate aforementioned situation,
the company has S(0) = Rs 95, C(0) = Rs 2, 269.30, and ∆ = Φ(d1 ) = Φ(−0.338) =
0.3685. By linear approximation from the classical calculus, we have, dC = ∆ dS.
So, if stock price goes up by Rs 1 (i.e. it becomes Rs 96) then the call price
increases approximately by Rs 0.3685 or Rs368.50 for 1, 000 call options. Now,
since the company is the writer of 1, 000 call, it must own 368.50 shares of stock
so that a change of one unit in the stock price is offset by the change in the short
call.
Generally speaking, a writer (short position) of any derivative security can take
a hedge position against risk in the security whose price depends on the underlying
stock. An important point is that the writer should frequently re-balance the
hedge position by constantly monitoring the underlying stock price movements.
A portfolio embedding the derivative security can be constructed such that it
becomes delta-neutral where by the delta-neutral portfolio we mean the portfolio
having delta equal to zero. Traders usually ensure that their portfolios are delta-
neutral at least once a day. A trading strategy that dynamically maintains a
delta-neutral portfolio is called the delta hedging which is based on the simple
trading rule of “buy high, sell low”.
Continuing with our discussion on Greeks, another important Greek is the
gamma denoted by Γ. The gamma for the non-dividend paying European call
122 Financial Mathematics: An Introduction
∂2 C
option is the second order partial derivative of C with respect to S, thus, Γ = .
∂S2
Obviously,
e−d1 /2
2
∂2 C ∂d1
Γ= = Φ (d1 ) = .
∂S2 ∂S Sσ 2π(T − t)
Note that since Γ > 0 for the European call option, thus C is a convex function
of S, keeping other parameters constants. Again, using the put-call parity one can
work out the gamma of the European put option. We left this for the readers to
complete.
Just like delta hedging, there is another hedging strategy called the delta-
gamma hedging. A delta-gamma hedge is a delta hedge that maintains zero port-
folio gamma. The additional gamma neutrality is maintained by constructing a
portfolio comprising of a short position in the derivative security, a long position
in the underlying stock, and a long position in another hedging call such that the
delta and the gamma of the portfolio become zero. Note that the delta hedge is
based on the first-order approximation; and when change in S is not small enough,
the second-order approximation (or gamma) helps. For this reason, a delta-gamma
neutral portfolio generally offers better protection against the stock price changes
than a simple delta neutral portfolio.
Also, we shall see later in Chapter 10 that the Black-Scholes PDE is
∂f 1 2 ∂2 f ∂f
+ σ (S(t))2 2 + rS(t) = r f.
∂t 2 ∂S ∂S
∂f ∂f ∂2 f
Since = Θ, = ∆, and = Γ, the Black-Scholes PDE can be thought
∂t ∂S ∂S2
of as
1
Θ + rS(t)∆ + σ2 S2 Γ = r f. (4.22)
2
Here, f in (4.22) can be price of any derivative security which is governed by
the Black-Scholes formulas (4.14) and (4.15). Also, (4.22) provides a relationship
between the three Greeks Θ, ∆ and Γ. The readers can skip this discussion for
the time being and can come back to it after going through Chapter 10. It will
provide better understanding then. But we have included this text here for the
sake of completeness.
The fourth Greek that we would like to get familiar with is called the vega
which is the partial derivative with respect to volatility σ. It is denoted by V. For
∂C
the European call option it is . From (4.14), we have
∂σ
The Greeks 123
∂C ∂d1 ∂d1 √
V= = SΦ (d1 ) − Ke−r(T−t) Φ (d2 )( − T − t).
∂σ ∂σ ∂σ
Using (4.20), we get
√
T − t −d2 /2 d1
V= √ Se 1 + (Ke−r(T−t) Φ (d2 ) − SΦ (d1 ))
2π σ
√
T − t − d2 /2
= √ Se 1 .
2π
The latter equation follows on account of Ke−r(T−t) Φ (d2 ) − SΦ (d1 ) = 0. For detail
working of this point, we urge the readers to take a look back at the analogous
instance while computing ∆ = Φ(d1 ) for the European call option. It is immediate
from the put-call parity that the vega of the European put option is identical to
that of the European call option.
Example 4.5.2 Consider a 3-month European put option on a stock whose cur-
rent value is Rs 100 and whose volatility is 30% per annum. The option has a
strike price of Rs 95 and the risk-free interest rate is 3.25% per annum. Find the
vega of the option. If the volatility of the stock increases to 31%, approximate the
change in the value of the put.
Solution We provide the hints to solve; the gaps can √be filled by the readers.
T − t −d2 /2
First compute d1 using (4.15), thereafter compute V = √ Se 1 for the put
2π
option. Use the linear approximation dP = Vdσ, to find approximate change in
the value of the put.
If volatility changes then the delta-gamma hedge may not work well. An en-
hancement is the delta-gamma-vega hedge, which also maintains vega zero portfo-
lio. This is accomplished by bringing in one more security in the portfolio.
The final Greek to be introduced herein is the rho. It is denoted by ρ and is the
partial derivative with respect to the risk-free interest rate r. For the European
call option it is described as follows.
∂C ∂d1 ∂d2
ρ= = SΦ (d1 ) + K(T − t)e−r(T−t) Φ(d2 ) − Ke−r(T−t) Φ (d2 )
∂r ∂r ∂r
= K(T − t)e−r(T−t) Φ(d2 ).
124 Financial Mathematics: An Introduction
Again the last equation follows on similar argument as the one given above along
∂d2 ∂d1
with (4.19) and the fact that = . We encourage you to find the ρ of the
∂r ∂r
European put.
Example 4.5.3 Consider a 3-month European put option on a stock whose cur-
rent value is Rs 100 and whose volatility is 30% per annum. The option has a
strike price of Rs 90 and the risk-free interest rate is 3.25% per annum. Find the
rho of the option. If the interest rate increases to 4%, then approximate the change
in the value of the put option.
Solution Given that K = Rs 90, S(0) = Rs 100, r = 0.0325, σ = 0.3, T = 3
12
=
0.25, t = 0. We first compute d1 as follows
ln(S/K) + (r + σ2 /2)(T − t)
d1 = √ = 0.83157.
σ T−t
The rho of the European put is given by
√
ρ = −K(T − t)e−r(T−t) Φ(σ T − t − d1 )
= (−22.13733)Φ(−0.68157) = −5.48463.
Using the linear approximation from the classical calculus, the change in value of
put option is given by
can take sufficiently small time steps in the CRR binomial model. Some nu-
merical methods for solving partial differential equations can also be applied
for computing American option prices. For more details on these tools, we refer
to [57, 70].
• Another interesting area where the Black-Scholes theory has been extended
is the foreign currency options. The potential of market of foreign currency
options has been fully trapped in the United States of America, Europe and
some other parts of the world. These markets are used to hedge foreign currency
risks. For example, with currency option, one can insure against the adverse
affects of changes in foreign exchange rates. A modified Black-Scholes formula
have been developed by Garman and Kohlhagen [48]. The formula involves
both the foreign and domestic interest rates. The shortcomings of the proposed
model has also been rectified in some subsequent studies. One can take a look
at research articles [48] for further insight into this concept.
• The volatility σ is the only parameter in the Black-Scholes formula which is
not explicitly observable in the market. Although we stated that we can use
historical volatility yet it has been observed that the theoretical value of the
option computed from the Black-Scholes formula fails to match the actual
option price quoted in financial circuits for that option. There could be more
than one cause for this difference. However, non availability of the actual market
value of σ is a key reason for it. This forces us to have a deeper look at this
issue. We then ask a reverse question: knowing the option price, say C(0), along
with other parameters K, T, r in the Black-Scholes formula, can we compute
σ for that stock? The σ so obtained is called implied volatility, and should be
distinguished from the historical volatility. Notably, there is no direct formula to
obtain σ from the Black-Scholes formula as we do not have an explicit formula
for inverting the Φ function. In other words, the Black-Scholes formula can not
be inverted to get the explicit expression for σ in terms of C(0), K, r, T. Again
numerical techniques such as the Newton Raphson method for computing root
of an equation can come to our rescue.
• One of the critical limitation of the CRR binomial model is that the volatility is
assumed to be constant on all the nodes of the binomial lattice. To acknowledge
the critical role of implied volatility and to model option prices consistent
with the market, many new approaches have been proposed. Rubeinstein [115]
proposed the concept of implied binomial tree (called IBT in short) which has
been extended by Derman, Kani and Chriss [38]. The principal idea in these
studies is to compute local volatility at each node of the tree. This is done by
using the Arrow-Debreu prices. Consequently, unlike in the CRR model, the
126 Financial Mathematics: An Introduction
risk neutral probability also see a change at each node. For further details on
IBT, please refer to [31].
• In recent years, innumerable contributions to lattice approaches have been
published. We would like to share some of them with our readers. Jarrow and
Rudd [68] constructed a binomial model where the first two moments, mean
and variance, of the discrete and continuous models coincide. Boyle [20] con-
structed a trinomial lattice. Tian [136] proposed binomial and trinomial models
where the model parameters are derived as unique solutions to some equation
systems derived from the first three moments. Leisen and Reimer [82] change
the formulas to determine the constant up and down factors, and proposed a
new binomial model which converges to the Black-Scholes formula with a high
order of convergence than the previously quoted methods. The main limitation
of almost all lattice models are their relatively slow speed. They do not offer
a practical solution for the calculation of thousands of prices in a few seconds.
Rapid calculations are the need of today’s market. In fact the issue of rate of
convergence of different lattice methods is crucial and has attracted attention
off late.
• The Black-Scholes no-arbitrage argument fails to capture the correct picture
of financial market involving proportional transaction costs. In such a market,
there is no portfolio that can replicate the European call option. Thus the
argument of replication of portfolio, as explained in previous chapter, can no
longer be applicable. This makes it harder to relax the assumption of “no
transaction cost” in binomial models, and consequently in the Black-Scholes
theory. Some alternative approaches have been suggested in literature for option
pricing with transactions costs. These studies are beyond the scope of present
discussion. Interested readers can refer to the works of Boyle and Vorst [20],
Leland [81], Sonar et al. [125] and Perrakis and Lefoll [108].
• While discussing the matching of CRR model with a multi period bino-
mial model in Section 4.3, it has been assumed that ud = 1, i.e. S(0) =
(uS(0))(dS(0)). In other words S(0) is the geometric mean of uS(0) and dS(0).
This opens the possibility of using arithmetic mean and harmonic mean of
uS(0) and dS(0) as well. We may refer to Chawala [26] for further details in
this regard.
4.7 Exercises
Exercise 4.1 Consider the following data: S(0) = Rs 51, K = Rs 50, σ =
30%, r = 8%. Assuming the Black-Scholes framework, and that the stock pays
Exercises 127
no dividend, compute 3-months European call price and 3-months European put
price using the Black-Scholes formula. Also compute the put price using the put-
call parity. Are the two values same?
Exercise 4.2 The price of a stock is Rs 260. A 6-month European call option on
the stock with strike price Rs 256 is priced using Black-Scholes formula. It is given
that the continuously compounded risk-free rate is 4%; the stock pays no dividend;
the volatility of the stock is 25%. Determine the price of the call option.
Exercise 4.3 You own 100 shares of a stock whose current price is Rs 42. You
would like to hedge your downside exposure by buying 6-month European put option
with a strike price of Rs 40. It is given that the continuously compounded risk-free
rate is 5%; the stock pays no dividends; the stock volatility is 22%. Assuming the
Black-Scholes framework determine the cost of the put option.
Exercise 4.4 Consider purchase of 100 units of 3-month Rs 25-strike European
call option. It s given that the stock is currently selling for Rs 20; the continuous
compounding risk free interest is 5%; the stocks volatility is 24% per annum. If
the stock pays dividends continuously at the rate of 3% per annum, determine the
price of block of 100 call options, assuming the Black-Scholes framework.
Exercise 4.5 For European call and put options on a stock having the same expiry
and strike price, it is given that the stock price is Rs 85; the strike price is Rs 90;
the continuously compounded risk free rate is 4%; the continuously compounded
dividend rate on the stock is 2%. If the call option has premium Rs 9.91 and a put
option has premium Rs 12.63, then determine the time to expiry for the options.
Exercise 4.6 Consider a 1-year European call option on a non-dividend paying
stock which is currently priced at Rs 40. The call strike price is Rs 45 and the
continuously compounded risk-free rate is 5%. It has been observed that if the stock
price increases Rs 0.50, the price of the option increases by Rs 0.25. Assuming
the price of the stock follows the Black-Scholes framework, determine the implied
volatility of the stock.
Exercise 4.7 Consider a stock which is currently trading at Rs 100.
(i) Assuming the stock pays no dividend, compute the Black-Scholes European call
price for T years to maturity with a strike price Rs 120, stock price volatility
30%, continuous compounding return 8%.
(ii) What happens to the option price as T → ∞?
(iii) Suppose the same stock pays a dividend of 0.1%. Repeat (i).
128 Financial Mathematics: An Introduction
5.1 Introduction
The readers have already come across certain terms like portfolio, portfolio return,
and portfolio risk in Chapter 1. There, while giving example of a simple portfolio
optimization problem, it was remarked that we need to minimize the portfolio risk
for given aspiration level of portfolio returns and thereby do a trade-off between
the two. This has precisely been the approach of the celebrated mean-variance
theory of Markowitz [90] for the single period portfolio optimization problem. The
aim of this chapter is to continue this discussion in greater detail for a general
n-asset problem and discuss other related results, in particular the capital asset
pricing model (CAPM).
The risk can either be zero, implying that the asset is risk-free, or positive,
implying the asset is risky. If the asset is risk-free then the future value of the
asset is known with certainty otherwise the future value of the risky asset is
uncertain. Any financial asset can thus be classified either as risk-free asset (like a
fixed deposit) or as a risky asset (like share of a stock or share of a mutual fund).
Unfortunately there is no unique measure of risk of an asset. In practice we
most often measure risk of an asset in terms of variance of returns on that asset,
but that may not be the best thing to do. However, unless otherwise stated, the
variance will always be used as a risk measure. But we shall have a much deeper
discussion on various risk measures in the next chapter.
In order to obtain the mathematical model of a typical portfolio optimization
problem we make certain assumptions. These are as follows
(i) The prices of all assets at any time are strictly positive.
(ii) The return r of an asset is a random variable.
(iii) An investor can own a fraction of an asset. This assumption is known as
divisibility. Also, we assume that this fraction can be positive or negative, i.e.
short selling is allowed. We shall mention specifically if short selling is not
allowed.
(iv) An asset can be bought or sold on demand in any quantity at the market
price. This assumption is known as liquidity.
(v) No arbitrage principle holds.
(vi) Unless stated otherwise, there are no commissions/transaction costs.
Portfolio Optimization Problem 131
VΘ (T) − VΘ (0)
The quantity rΘ (T) = is referred as the return of the portfolio Θ.
VΘ (0)
Definition 5.3.2 (Asset Weights) The weight wi of the asset ai is the propor-
tion of the value of the asset in the portfolio at t = 0, i.e.
xi Vi (0)
wi = (i = 1, . . . , n).
n
x j V j (0)
j=1
Here the covariance term σij equals ρij σi σ j , where Var(ri ) = σ2i , Var(r j ) = σ2j and
ρij is the correlation coefficient between ri and r j .
In practice, the mean of the portfolio return is simply referred as the return of
the portfolio. Also the variance or rather the standard deviation of the portfolio
return is referred as the risk of the portfolio. Therefore given a portfolio A :
(w1 , w2 , . . . , wn ), we can compute its mean µA and standard deviation σA and
therefore get the point A : (σA , µA ) in (σ, µ)-plane. Thus irrespective of the number
of assets, a portfolio can always be identified as a point in the (σ, µ)-plane. This
representation is called (σ, µ)-diagram or (σ, µ)-graph (see Fig. 5.1) and is very
convenient for further discussion.
The portfolio optimization problem refers to the problem of determining weights
wi (i = 1, 2, . . . , n), such that the return of the portfolio is maximum and the risk of
the portfolio is minimum. Thus we aim to solve the following optimization problem
*n
Min i,j=1 wi w j σij
and *
Max ni=1 wi µi
subject to
w1 + w2 + . . . + wn = 1.
The above problem, is a bi-criteria optimization problem which has been studied
extensively in the mathematical programming literature. As in general, the two
Two Assets Portfolio Optimization 133
11
00
00
11 B (σB , µB )
01A (σA , µA )
objective functions may not attain their optimum at the same point, it is not pos-
sible to define the meaning of optimization in the usual way. Therefore we consider
either (i) to minimize risk for a given level of return or (ii) to maximize return for
a given level of risk. From the algorithmic point of view the first approach is more
convenient because it results into a linearly constrained quadratic programming
problem. We shall first discuss the two asset case and then move to the multi-asset
case.
µ = E(w1 r1 + w2 r2 ) = w1 µ1 + w2 µ2 , (5.1)
σ2 = Var(w1 r1 + w2 r2 ) = w21 σ21 + w22 σ22 + 2ρw1 w2 σ1 σ2 . (5.2)
µ µ
(σ2 , µ2 )
(σ1 , µ1 ) (−σ2 , µ2 )
P
P (σ1 , µ1 )
σ̃ σ̃
ρ=1 ρ = −1
Fig. 5.2. ((
σ, µ)-graph forρ = ±1.
µ µ
(σ2 , µ2 )
(σ2 , µ2 )
P (σ1 , µ1 ) P
(σ1 , µ1 )
A
A
σ σ
ρ=1 ρ = −1
Fig. 5.3. (σ, µ)-graph forρ = ±1.
dσ2
Thus, to minimize the risk σ, we must choose the weight s such that = 0.
ds
Thereby (5.6) yields smin (the value of s for which σ2 is minimum) as
σ1
smin = < 0.
σ1 − σ2
σ2
Hence (1 − smin ) = > 0. Let µmin and σ2min respectively denote the
σ2 − σ1
expected return and variance of the portfolio with (w1 = 1 − smin , w2 = smin )
then
σ1 µ2 − σ2 µ1
µmin = and σ2min = 0.
σ1 − σ2
Since smin < 0 (i.e. w2 < 0), an investor can eliminate risk in the portfolio by
taking a short position with asset a2 .
(b) When ρ = −1 and σ1 ≤ σ2 . In this case we have
Thus, we have
σ1
smin = >0
σ1 + σ2
σ2
1 − smin = >0
σ1 + σ2
σ1 µ2 + σ2 µ1
µmin =
σ1 + σ2
σ2min = 0.
Since smin > 0 and 1 − smin > 0 hence the investor can eliminate the risk in the
portfolio without restoring to short selling.
Case (ii) We now consider the second case when −1 < ρ < 1. Recalling relations
(5.3) and (5.5), we have
µ = (1 − s)µ1 + sµ2 ,
and
σ2
dσ2 σ2 − ρσ1 σ2
=0 ⇒ s= 2 1 2 .
ds σ1 + σ2 − 2ρσ1 σ2
Also
d2 σ2
= 2((σ1 − ρσ2 )2 + σ22 (1 − ρ2 )) > 0.
ds2
σ2 − ρσ1 σ2
Consequently smin = 2 1 2 , and the minimum value of σ2 is given by
σ1 + σ2 − 2ρσ1 σ2
σ21 σ22 (1 − ρ2 )
σ2min = .
σ21 + σ22 − 2ρσ1 σ2
σ2min = 0 ⇔ ρ = −1.
σ1
(ii) ρ = ⇔ smin = 0 ⇔ σ2min = σ21 .
σ2
138 Financial Mathematics: An Introduction
σ1
(iii) The condition < ρ ≤ 1 is equivalent to smin < 0. In this case the investor
σ2
has taken a short position on asset a2 in order to minimize the portfolio risk.
Further, in this case
σ2min = 0 ⇔ ρ = 1.
µ
B
ρ = −1 −1 < ρ < 1
P ρ=1
ρ = −1 11
00
A
σ
Fig. 5.5. Feasible region for two asset problem.
The risk-return relation of two assets for various values of ρ provides us with
a triangle APB. The points A and B signify undiversified portfolios. Since −1 ≤
ρ ≤ 1,
APB specifies the limit of diversification. The risk-return relation for
all values of ρ except ±1 lie within this triangle. Here the bold portions of the
graphs represent the case 0 ≤ s ≤ 1. We verify the two-assets portfolio theory by
considering the below given example.
Let A and B be two assets with expected returns 12% and 16% and standard
deviation 16% and 20%, respectively. Let xA and xB denote the number of units of
asset A and asset B, respectively, in a portfolio. Then we have the following table
Multi Asset Portfolio Optimization 139
σ
xA xB µ
ρ=1 ρ = 0·5 ρ = 0 ρ = −0 · 5 ρ = −1
100 0 12·00 16·00 16·00 16·00 16·00 16·00
90 10 12·40 16·40 15·50 14·54 13·51 12·40
80 20 12·80 16·80 15·20 13·41 11·34 8·80
70 30 13·20 17·20 15·12 12·71 9·71 5·20
60 40 13·60 17·60 15·26 12·50 8·91 1·60
50 50 14·00 18·00 15·62 12·81 9·4 2·00
40 60 14·40 18·40 16·18 13·60 10·40 5·60
30 70 14·80 18·80 16·92 14·80 12·32 9·20
20 80 15·20 19·20 17·82 16·32 14·66 12·80
10 90 15·60 19·60 18·85 18·07 17·26 16·40
0 100 16·00 20·00 20·00 20·00 20·00 20·00
find the image of any straight line in the weight hyperplane eT w = 1 under the
mapping f . For this we note that the parametric equation of any line in the weight
hyperplane is of the form
l(ξ) = (s1 ξ + b1 , . . . , sn ξ + bn )T
= ξs + b, −∞ < ξ < ∞,
where s = (s1 , . . . , sn )T and b = (b1 , . . . , bn )T . Let w be any point on this line. Then
µ = mT w
= mT (ξs + b)
= ξ(mT s) + (mT b).
−1 −1
Let α = (mT s) , β = − (mT b)(mT s) . Then, ξ = αµ + β. Moreover,
σ2 = wT Cw
= (ξs + b)T C(ξs + b)
= (sT Cs)ξ2 + (sT Cb + bT Cs)ξ + bT Cb
≡ γξ2 + δξ + η.
As ξ varies from −∞ to ∞, the ordered pair (σ2 , µ) traces out a parabola given
by (5.9) which lies in (σ, µ)-plane with axis parallel to σ-axis and sides open on
the right.
We are actually interested in (σ, µ)-graph. Taking the square root of σ2 , the re-
sulting curve is +
σ= γ(αµ + β)2 + δ(αµ + β) + η . (5.10)
This curve is called a Markowitz curve. Thus, each line in the weight hyperplane
is mapped onto a Markowitz curve. This phenomena is depicted in Fig. 5.6.
Remark 5.5.1 Here it is important to note that the Markowitz curve (5.10) is
not a parabola. In fact the main difference between the parabola (5.9) and the
Markowitz curve (5.10) in (σ, µ)-graph is that a tangent can be drawn to the
parabola (5.9) from any point on the µ-axis, whereas the Markowitz curve behaves
almost as a straight line as µ → ∞, thereby, it is not possible to draw a tangent to
142 Financial Mathematics: An Introduction
w2 µ
w1 + w2 + w3 = 1 Markowitz Curves
w1 σ
Weight Lines
w3
Fig. 5.6. Markowitz curves.
the Markowitz curve as µ → ∞. This difference may not sound significant right
now but it plays a vital role when the portfolio consists of one risk-free asset as
well. We shall be addressing to this type of portfolio in the next section. For the
current discussion we have assumed that all the assets in the portfolio are risky.
Remark 5.5.2 As we cover the weight hyperplane by taking weight lines, we trace
a family of Markowitz curves in the (σ, µ)-plane. It is not difficult to get convinced
that this region in the (σ, µ)-plane is going to be a solid region and its shape will
be like a bullet, which is appropriately called the Markowitz bullet.
There is another much simpler way to get convinced that the feasible set in the
(σ, µ) plane should be of the type as described above. Suppose we have three assets
a1 , a2 and a3 . Let these be represented by three points A : (σa1 , µa1 ), B : (σa2 , µa2 )
and C : (σa3 , µa3 ) in the (σ, µ)-plane. As already seen in the case of two asset
portfolio diagram, any two assets when combined to form portfolios give rise to a
curved line or a straight line between them. Whether the line will be straight or
curved depends upon the correlation coefficient between them. The three (curved)
lines between the possible three pairs are shown in the left hand side figure of Fig.
5.7. Now if we take any asset D formed by the combination of B and C, then
again combination of assets A and D will give us a (curved) line connecting A
and D and the process continues. Eventually we get the feasible region which is
a solid two dimensional region in the (σ, µ)-plane. This region is depicted in the
right hand side figure of Fig. 5.7. Further the feasible region is convex to the left,
i.e. given any two points in the region, the straight line connecting them does not
cross the left boundary.
The Minimum Variance Set, The Minimum Variance Point and the Efficient Frontier 143
A
σ
In the feasible region as depicted in Fig. 5.7, the outer region corresponds to
the case when shorting is allowed and the inner region corresponds to the case
when shorting is not allowed.
P1
µ1
µ0 P0
Pmin
µ2
P2
σ
Fig. 5.8. Minimum variance set.
C−1 e
w= .
eT C−1 e
Proof. We desire to solve the following optimization problem
Min σ2 = wT Cw
subject to
eT w = 1. (5.11)
Note that λ is unrestricted in sign because the constraint in the risk minimiza-
tion problem is an equation eT w = 1. Now, differentiating (5.12) with respect to
w, we obtain
λ
2wT C − λeT = 0 =⇒ w = C−1 e.
2
Using (5.11), we get
λ λ 1
eT C−1 e = 1 =⇒ = T −1 .
2 2 e C e
Thus the requisite result follows.
The Minimum Variance Set, The Minimum Variance Point and the Efficient Frontier 145
efficient frontier
µ(U)
0 •
• min variance point
µ(L) •
0
min variance set
σ0 σ
he/she desires to achieve. Therefore the investor’s problem is to decide the right
investment strategy to obtain the return µ with the minimum risk. We look at
this scenario in the result to follow.
Theorem 5.6.2 For a given expected return µ, the portfolio with minimum risk
has weights given by
T −1
µ mT C−1 e −1 m C m µ
det C m + det C−1 e
1 eT C−1 e eT C−1 m 1
w= T −1 . (5.13)
m C m mT C−1 e
det
eT C−1 m eT C−1 e
Proof. We wish to solve the following quadratic programming problem
1
Min σ2 = wT Cw
2
subject to
mT w = µ
(5.14)
eT w = 1.
i.e.
w = C−1 (αm + β e). (5.15)
Substituting the value of w in (5.14), we get
µ mT C−1 e mT C−1 m µ
det det
1 eT C−1 m eT C−1 m 1.
α= T −1 , β= T −1 .
m C m mT C−1 e m C m mT C−1 e
det det
eT C−1 m eT C−1 e eT C−1 m eT C−1 e
Substituting these values in the expression (5.15) for w, we get the required ex-
pression (5.13).
Now to generate the entire efficient frontier we need to solve problems of type
(5.14) for all values of µ ∈ R. This is almost impossible. But then an extremely
interesting observation is made here. Recall from (5.14) and (5.15) that, for a
given value of return µ, the points of minimum variance must satisfy the following
system of (n + 2) linear equations in (n + 2) unknowns w ∈ Rn , α ∈ R, β ∈ R
wT C − αmT − β eT = 0
mT w = µ
eT w = 1. (5.16)
Suppose we solve the system (5.16) for two distinct values of expected return µ,
T
say µ̄(1) and µ̄(2) . Let the two solutions be (w(1) )T = (w(1)1
, . . . , w (1)
n ), α , β
(1) (1)
and
(2) (2)
T
(w(2) )T = (w1 , . . . , wn ), α(2) , β(2) , respectively. Then it is simple to verify that
the combination portfolio, λ(w(1) , α(1) , β(1) )T + (1 − λ)(w(2) , α(2) , β(2) )T , λ ∈ R, is
also a solution of the system (5.16) corresponding to the return λµ̄(1) + (1 − λ)µ̄(2) .
Therefore, in order to solve (5.16) for every value of µ, one is only required to solve
it for two distinct values of µ and then form the combination of the two solutions.
Thus, the knowledge of two distinct portfolios yielding the minimum variances is
sufficient to generate the entire minimum variance set. This result is significant
from investor’s point of view. Also, it demonstrates a very good application of
Karush-Kuhn-Tucker optimality conditions. The result is known as the two fund
theorem.
Theorem 5.6.3 (Two Fund Theorem) Two efficient portfolios can be estab-
lished so that any other efficient portfolio can be duplicated, in terms of mean and
variance, as a linear combination of these two assets. In other words, it says that,
an investor seeking an efficient portfolio need to invest only in the combination of
these two assets.
148 Financial Mathematics: An Introduction
The most convenient way to get two solutions of (5.16) is to assign two distinct
values to α and β, and then work out the solutions. The most convenient choices
are α = 1, β = 0 and α = 0, β = 1. The above discussion is illustrated through
the below given example.
Example 5.6.1 Consider three risky assets with the variance-covariance matrix
and expected returns as follows.
Find two portfolios yielding the minimum variance. Also, determine the expected
returns from these two portfolios. Using the two fund theorem, construct the port-
folio giving the return of 33 · 4% with minimum risk.
*3 (1)
Solution Taking α = 0, β = 1 in (5.16), we need to solve: j=1 σij v j = 1 (i =
1, 2, 3), resulting in the following system of linear equations
2v(1)
1
+ v(1)
2
=1
v(1)
1
+ 2v (1)
2
+ v(1)
3
=1
(1) (1)
v2 + 2v3 = 1.
2v(2)
1
+ v(2)
2
=0·4
(2) (2) (2)
v1 + 2v2 + v3 = 0 · 8
v(2)
2
+ 2v(2)
3
= 0 · 4.
Next, we consider the case when the investor desired a return of µ = 0 · 334 at
minimum risk. It is simple to check that for λ = 3, λµ̄(1) + (1 − λ)µ̄(2) = 0 · 334.
Capital Asset Pricing Model (CAPM) 149
Thus, by the two fund theorem the requisite portfolio is given by w = λw(1) + (1 −
λ)w(2) = (7/6, −2/3, 1/2). Observe that the second asset has a short position in
this portfolio. The variance corresponding
⎛ to this
⎞ ⎛portfolio
⎞ is
⎜
⎜⎜ 2 1 0 ⎟ ⎜
⎟⎟ ⎜⎜ 7/6 ⎟⎟
⎟
w Cw = 7/6 −2/3 1/2 ⎜⎜ 1 2 1 ⎟⎟ ⎜⎜ −2/3 ⎟⎟⎟ = 2/9.
T ⎜ ⎟ ⎜
⎝ ⎠⎝ ⎠
0 1 2 1/2
Also, the expected return and the variance associated with this portfolio are
respectively given by
n
µ= wi µi + wrf µrf = µrisky + wrf µrf ,
i=1
and
⎛ n ⎞ ⎛ n ⎞
⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟
σ = Var ⎜⎝⎜
2
wi ri + wrf rrf ⎟⎠⎟ = Var ⎜⎝⎜ wi ri ⎟⎟⎠ = σ2risky .
i=1 i=1
If we remove the risk-free asset from the portfolio and readjust the weights of
the risky assets so that their sum remain 1, the resultant portfolio so obtained
is referred to as the derived risky portfolio. We use µder and σ2der to denote the
derived risky portfolio expected return and risk, respectively. Then,
150 Financial Mathematics: An Introduction
n
µ= wi µi + wrf µrf
i=1 ⎛ n ⎞
⎜⎜ wi ⎟⎟
= wrisky ⎜⎜⎝ µi ⎟⎟⎠ + wrf µrf (5.18)
i=1
wrisky
= wrisky µder + wrf µrf
= wrisky µder + (1 − wrisky )µrf
= wrisky (µder − µrf ) + µrf .
Also,
⎛ n ⎞
⎜ ⎟⎟
⎜
σ2 = Var ⎜⎜⎝ wi ri ⎟⎟⎠
i=1 ⎛ ⎞
⎜⎜n
w ⎟⎟ (5.19)
= wrisky Var ⎜⎜⎝ ri ⎟⎟⎠
2 i
w i=1
risky
= w2risky σ2der ,
σ
which gives wrisky = . From (5.18) and (5.19) we get
σder
µ − µ
der rf
µ = µrf + σ, (5.20)
σder
which is an equation of the line joining (0, µrf ) and (σder , µder ) in the (σ, µ)-graph.
Now, for a given risk σ, if we choose various weight combinations of risk-free
asset and risky assets satisfying (5.17), we generate different lines represented by
(5.20) in (σ, µ)-graph. Obviously, among all such lines, the line that produces the
point with highest expected return for a given risk is tangent to the upper portion
of the Markowitz bullet. This is illustrated in Fig. 5.10.
Definition 5.7.1 (Capital Market Line) Among all the lines (5.20) for various
weight combinations of risk-free asset and risky assets, the line giving the highest
return for a given risk is called the capital market line.
Definition 5.7.2 (Market Portfolio) The point on the Markowitz bullet where
the capital market line is tangential is said to represent the market portfolio.
Theoretically, the market portfolio must contain all risky assets, for if some
asset is not in it then it will wither and die. Since the market portfolio contains
all risky assets, it is a completely diversified portfolio with no unsystematic risk.
Capital Asset Pricing Model (CAPM) 151
µ
Capital Market Line
(σM , µM )
µr f (σder , µder )
σ
Market Portfolio
The basic idea of the capital asset pricing model (CAPM) is that an investor
can improve the risk-expected return balance by investing partially in a portfolio
of risky assets and partially in a risk-free asset. All investors will end up with
portfolios along the capital market line as all efficient portfolios lie along this line
while any other combination of risk-free asset and risky assets, except those which
are efficient, lies below the capital market line. It is thus important to observe that
all investors will hold combinations of only two assets, viz. the market portfolio M
and a risk-free asset. This fund scenario is summarized in the following theorem.
Theorem 5.7.1 (One Fund Theorem) There exists a single portfolio, namely
the market portfolio M, of risky assets such that any efficient portfolio can be
constructed as a linear combination of the market portfolio M and the risk-free
asset.
Unlike with the two fund theorem where any two efficient portfolios are suffi-
cient, in this case, the tangent portfolio is a specific portfolio.
Theorem 5.7.2 For any expected risk-free return µrf , the weight vector wM of the
market portfolio is given by
C−1 (m − µrf e)
wM = T −1 .
e C (m − µrf e)
Proof. From Fig. 5.10, we observe that for any point (σ, µ) in the Markowitz
bullet, the slope of the line joining (0, µrf ) and (σ, µ) is
152 Financial Mathematics: An Introduction
*n
µ − µrf i=1 µi wi − µrf
s= = *n .
σ 1
( i, j=1 cij wi w j ) 2
For the line joining (0, µrf ) to (σ, µ) to be a tangent line to the Markowitz bullet,
we need to solve the following optimization problem
mT w − µrf
Max
(wT Cw)1/2
subject to eT w = 1. (5.21)
mT w − µrf
L(w, λ) = T 1/2
+ λ(1 − eT w).
(w Cw)
Now, solving (5.21) is same as maximizing L(w, λ). So, ∇w L(w, λ) = 0, giving
1 T −1/2
(w Cw)1/2
m − (m T
w − µrf )(w T
Cw) Cw = λe.
wT Cw
The above expression can be rewritten as
Cw
σm − (µ − µrf ) = λσ2 e.
σ
Multiplying by σ, we obtain
Remark 5.7.1 Suppose the market portfolio (σM , µM ) is known. Then, from
(5.20), the equation of the capital market line is given by
µ − µ
M rf
µ = µrf + σ.
σM
µ − µis willing to take a positive risk σ, he/she can earn an additional
If the investor
M rf
return σ over and above the risk-free return µrf to compensate the risk
σM
µ − µ
M rf
taken by him/her. Therefore sometimes the quantity is called the price
σM
of risk.
Example 5.7.1 Suppose a portfolio comprises of one risk-free asset with return
0.5, and three mutually independent risky assets with expected returns 1, 2, 3 and
variances 1, 1, 1, respectively. Determine the equation of the capital market line.
Solution The given information gives, mT = (µ1 , µ2 , µ3 ) = (1, 2, 3), µrf = 0.5, C =
[σij ] = I3×3 , eT = (1, 1, 1). Therefore, the weight vector of the market portfolio is
given by ⎛ ⎞
C−1 (m − µrf e) ⎜⎜ 1/9 ⎟⎟
⎜ ⎟
wM = T −1 = ⎜⎜⎜ 1/3 ⎟⎟⎟ .
e C (m − µrf e) ⎝ ⎠
5/9
Consequently, the expected return and variance of the market portfolio are
√
22 35
µM = m wM = , σM = ((wM ) CwM ) =
T T 1/2
.
9 9
Thus, the equation of the capital market line is
µ − µ
M rf
µ = µrf + σ
√ σM
1 35
= + σ.
2 2
In practice there are certain assets which are listed in the stock called index
stocks. These limited assets are significant ones that can capture the pulse of the
whole market. The most regularly quoted market indices are broad-base indices
comprising of the stocks of large companies listed on a nation’s largest stock
exchanges, such as the American Dow Jones Industrial Average and S&P 500
Index, the British FTSE 100, the French CAC 40, the Japanese Nikkei 225. The
154 Financial Mathematics: An Introduction
Bombay Stock Exchange is the largest in India, with over 6000 stocks listed and it
accounts for over two thirds of the total trading volume in the country. The index
stocks finally help us to compute the market portfolio (σM , µM ). The knowledge
of the market portfolio yields the equation of capital market line, see Remark
5.7.1. Now suppose an investor P is willing to take risk σP . Then for this risk, the
expected return µP is maximum if the point (σP , µP ) lies on the capital market
line. Thus,
µ − µ
M rf
µP = µrf + σP .
σM
σP
If we let wP = then
σM
µP = wP µM + (1 − wP )µrf .
Remark 5.7.2 The above relation suggests that if an investor is willing to take
σP
a risk σP , then he/she should invest wP = proportion of investment in index
σM
fund and (1 − wP ) proportion of investment in the risk-free investment schemes.
We now aim to examine how an individual asset behaves with respect to the
market portfolio. For this, we attempt to build a relationship between the expected
return along with the risk of an individual asset with the market portfolio. This
gives the CAPM formula (5.24).
Theorem 5.7.3 Suppose the market portfolio is (σM , µM ). The expected return of
an asset ai is given by
Cov (ri , rM )
µi = µrf + βi (µM − µrf ), where βi = . (5.24)
σ2M
Proof. Suppose an investor portfolio comprises of asset ai with weight w and the
market portfolio M with weight 1 − w. Then the expected return and risk of the
investor portfolio are respectively given by
µ = wµi + (1 − w)µM
σ2 = w2 σ2i + (1 − w)2 σ2M + 2ρw(1 − w)σi σM (5.25)
where ρ is the coefficient of correlation between the returns of asset ai and the
market portfolio M.
As w varies, these values trace out a curve in the (σ, µ)-graph. It can be observed
from Fig. 5.11 that as w passes through zero, the capital market line becomes
tangent to the curve at M. This tangency condition can be translated into the
Capital Asset Pricing Model (CAPM) 155
µ
Capital Market Line
(σM , µM )
µr f (σak , µak )
σ
Fig. 5.11. Market portfolio.
condition that the slope of the curve is equal to the slope of the capital market
line at M (corresponding to w = 0).
Now the slope of the curve at M is given by
- -
dµ --- dµ dw ---
- = -
dσ -(w=0) dw dσ -(w=0)
--
dw --
= (µi − µM ) - .
dσ -(w=0)
The above discussion suggests that though for a portfolio an appropriate mea-
sure of risk is σ but for an individual asset the proper measure of risk is its beta.
Thus there is a paradigm shift in understanding the risk of an asset.
Example 5.7.2 Let the risk-free rate µr f be 8% and the market has µM = 12%
and σM = 15%. Let an asset a be given which has covariance of 0.045 with the
market. Determine the expected rate of return of the given asset.
0 · 045
Solution From the given data we have βa = = 2. Then CAPM gives
(0 · 015)2
Cov(r, rM )
µ = µrf + β (µM − µrf ), where β=
σ2M
that describes the expected return for all assets in the market is called the security
market line.
The security market line highlights the essence of CAPM formula. It says that
under the equilibrium conditions assumed by CAPM, all portfolio investments lie
along the security market line in the beta-return space. It emphasizes that the
risk of an asset is a function of its covariance with the market, or equivalently a
function of its beta. The security market line is depicted in bold in Fig. 5.12
158 Financial Mathematics: An Introduction
µr f
β
1
Fig. 5.12. Security market line.
Q̄ − P
= µr f + β(µM − µr f )
P
i.e.
Q̄
P= . (5.27)
1 + µr f + β(µM − µr f )
Solution: The value of a share after one year will be (10×1.07)+(90×1.15) = 114.20.
Thus Q̄ = 114.2. Therefore
114.20
P= = Rs 100.
(1.07) + (0.90)(0.15 − 0.07)
This shows that the price of the share, namely Rs 100, represents Rs 100 of assets
in the fund, and therefore CAPM tells that the price is right.
The CAPM as a Factor Model
The CAPM can be derived as a special case of a single factor model. Let us assume
that the asset return ri and market return rM (taken as a factor) are related as
follows
ri − µrf = αi + βi rM − µrf + i . (5.28)
Here µrf is the risk-free interest rate and E(i ) = 0. Also i is uncorrelated with
the market return rM and also with other j ’s. Further αi and βi are the usual
coefficients appearing in a single factor model.
Taking expectation in (5.27) gives
µi − µrf = αi + βi µM − µrf . (5.29)
Here we note that (5.29) is identical with CAPM except that in CAPM, αi = 0.
If we further take the covariance of both sides in (5.29) we get
σiM = βi σ2M .
Hence,
σiM
βi = ,
σ2M
which is the same expression as used in CAPM. The equation (5.29) represents
a line between the quantities (µM − µr f ) and (µi − µr f ). This line is called the
characteristic equation or characteristic line.
The characteristic line in a sense is more general than CAPM because here
αi need not be zero. In fact αi can have a very nice economic interpretation. A
stock with non zero αi can be regarded as mispriced. If αi > 0 then in view of
CAPM, the asset is performing better than it should. Similarly if αi < 0, then it
is performing worse than it should.
160 Financial Mathematics: An Introduction
Though we have tried to explain CAPM as a single factor model, we must note
that the two are not equivalent. In CAPM we assume that the market is efficient,
but in a single factor model we have taken arbitrary covariance matrix σiM and
made no assumption on market efficiency.
The CAPM and βi can be understood from a different angle if we take the
following model
ri = rr f + βi (rM − rr f ) + i ,
where i is a random variable. Taking expectation in the above equation and
using CAPM we get E(i ) = 0. Further taking the correlation with rM in the above
equation we get Cov(i , rM ) = 0. Therefore we have
This equation tells that σ2i is the sum of two expressions. The first expression β2i σ2M
is called the systematic risk. This is the risk associated with the market as a whole.
There is no chance of reducing this risk by diversification because all assets with
nonzero beta have this risk. The second expression Var(i ) is uncorrelated with
the market and therefore can be reduced by diversification. The quantity Var(i )
is called unsystematic risk of the asset. Therefore the systematic risk measured by
β becomes more important because it directly combines with the systematic risk
of other assets.
5.9 Exercises
Exercise 5.1 Suppose there are three financial market scenarios Ω = {w1 , w2 , w3 }
with different probabilities of occurrence. Consider the following table showing the
returns on two different stocks in these three scenarios
Exercise 5.4 Let a portfolio be designed with investment of 50% in stock 1 and
the remaning 50% in stock 2. Further let short sale be allowed in stock 1 and all
the other data being the same as in Exercise 5.2. Does the conclusion of Exercise
5.3 hold.
Exercise 5.5 Suppose the portfolios are constructed using three securities a1 , a2 , a3
with expected returns, µ1 = 20%, µ2 = 13%, µ3 = 4%, standard deviations of re-
turns, σ1 = 25%, σ2 = 28%, σ3 = 20%, and the correlation between returns,
ρ12 = 0 · 3, ρ13 = 0 · 15 and ρ23 = 0 · 4. Among all the attainable portfolios, find
the one with minimum variance. What are the weights of the three securities in
this portfolio? Also compute the expected return and standard deviation of this
portfolio.
Exercise 5.6 Among all attainable portfolios with expected return 20% con-
structed using the data provided in Exercise 5.5, find the portfolio with minimum
variance. Compute the weights of individual assets in this portfolio.
Exercise 5.7 Consider the following data
µ σ
asset 1 10% 5%
asset 2 8% 2%
For each correlation coefficient ρ = −1, −0 · 5, 0, 0 · 5, 1, what is the combination
of the two assets that yields the minimum standard deviation and what is the
minimum value of the standard deviation?
Exercise 5.8 Compute the minimum risk portfolio for the following rate return
(%) data
Jan Feb Mar Apr May June
asset 1 12 10 5 7 15 12
asset 2 7 12 10 10 12 15
Also compute the expected return for the optimal portfolio.
Exercise 5.9 Consider three risky assets with the variance-covariance matrix and
expected returns (all data in %) as follows.
variance - covariance matrix(C) return(M)
10 4 0 5
4 12 6 6
0 6 10 1
164 Financial Mathematics: An Introduction
Find two efficient portfolios. Also construct the portfolio giving the return of 2·8%
with minimum risk. Will this portfolio be also efficient?
(Hint : use two fund theorem).
Exercise 5.10 Suppose an investor is interested in constructing a portfolio with
one risk-free asset a1 , and three risky assets a2 , a3 and a4 . Let the expected re-
turns of a1 , a2 , a3 and a4 be 6%, 10%, 12% and 18% respectively. Let the variance-
covariance matrix C of the three risky assets be
⎛ ⎞
⎜⎜ 4 20 40 ⎟⎟
⎜ ⎟
C = ⎜⎜⎜ 20 10 70 ⎟⎟⎟ .
⎝ ⎠
40 70 14
Determine all efficient portfolios for the investor.
Exercise 5.11 Consider the data of two risky assets a1 , a2 with µ1 = 12·5%, µ2 =
10 · 5%, σ1 = 14 · 9%, σ2 = 14%, ρ = 0 · 33.
(a) Is it advisable to diversify the investment? If so then what composition of the
assets will minimize the risk?
(b) What is the minimum value of the risk?
(c) If the risk-free rate of return is 5% then derive the equation of the capital
market line?
Exercise 5.12 Given the following information about the one risk-free asset and
three risky assets, find the expected return and standard deviation of the market
portfolio. Also determine the equation of the capital market line.
Exercise 5.13 Assume that the following assets are correctly priced according to
the security market line. Derive the security market line.
µ1 = 6%, β1 = 0 · 5; µ2 = 12%, β2 = 1 · 5.
µ1 = 9 · 5%, β1 = 0 · 8; µ2 = 13 · 5%, β2 = 1 · 3.
Exercises 165
Exercise 5.15 Let the expected rate of return on the market portfolio be 23% and
that of the risk free asset be 7%. Also let the standard deviation of the market
portfolio be 32% and let us assume that the market is efficient.
(a) What is the equation of capital market line?
(b) If Rs 300 is invested in the risk free asset, and Rs 700 in the market portfolio
then what is the expected return at the end of the year?
(c) If an investor has Rs 1000 to invest and he/she desires a return of 39%, then
what should be his/her portfolio?
6
Portfolio Optimization-II
6.1 Introduction
We have presented Markowitz’s mean variance model for portfolio optimization
in the last chapter. But contrary to its theoretical reputation, this model in its
original form has not found much favor with the practitioners to construct large
scale portfolios. There are several theoretical and practical reasons for not using
this model extensively in practice - particularly when the number of assets in the
portfolio is large. This chapter aims to understand these reasons and then present
some other models which have been developed to improve Markowitz’s model both
theoretically and computationally.
Joro and Na [71], but has not found much favor because the resulting optimiza-
tion problem is not easy to handle. An alternative and popular approach is to
introduce certain new measures of risk which carry information about the possi-
ble portfolio losses implied by the tail of the return distribution, even in the case
when the distribution is not symmetric. This takes care of those situations where
the return distribution is heavily tailed. These risk measures are called downside
or safety-first risk measures which aim to maximize the probability that the port-
folio loss is below a certain acceptable level, commonly referred as the benchmark
or the disaster level. Thus these risk measures are quantile based risk measures
and are different from standard deviation or other moment based risk measures.
Some of the most popular quantile based risk measures are value at risk (VaR)
and conditional value at risk (CVaR). Since downside risk measures of individual
securities cannot be easily aggregated into portfolio downside risk measures (we
need the entire joint distribution of security returns), their application in prac-
tice requires computationally intensive non parametric estimation, simulation and
optimization techniques.
There is another major problem associated with the classical Markowitz’s
model. This model gives us an optimal portfolio assuming that we have perfect
information about µi ’s and σij ’s for the assets that we are considering. Therefore
an important practical issue is the estimation of the µi ’s and σij ’s. A reasonable
approach for estimating these data is to use time series of past returns rit which
represents the return of ith asset from time (t − 1) to t, where t = 1, 2, . . . , T. How-
ever, it has been observed that small changes in the time series rit lead to changes
in the µi ’s and σi j ’s that often lead to significant changes in the optimal portfolio.
This is a fundamental weakness of the Markowitz model, no matter how cleverly
µi ’s and σij ’s are computed. This is because the optimal portfolio construction is
very sensitive to small changes in the data. Only one small change in one µi may
produce a totally different portfolio. In fact recent research (Chopra and Ziemba
[30]) has revealed that errors in the estimation of means µi can be more damaging
than errors in other parameters. This has motivated researchers to employ robust
optimization techniques in Markowitz’s model, e.g. Ben-Tal [9], and Tütüncü and
Koenig [139]. A much simpler approach is to consider portfolio optimization under
a minimax rule (Cai et al. [23]) and provide some flexibility by allowing µi to lie
in some interval ai ≤ µi ≤ bi (Deng et al. [36]).
The mean-variance model of Markowitz, in general, results in a dense quadratic
programming problem. If the number of assets in the portfolio is large then it
becomes very difficult to obtain an optimal solution of such large-scale dense
quadratic programming problem on a real time basis. This has motivated re-
Mean Absolute Deviation Based Portfolio Optimization: A L1 -Risk Model 169
We now define L1 -risk measure or the mean absolute deviation of the portfolio
(x1 , x2 , . . . , xn ).
Definition 6.3.1 (L1 -Risk Measure of a Portfolio) Let (x1 , x2 , . . . , xn ) be the
given portfolio. Then its L1 -risk measure or mean absolute deviation is defined as
⎡-- n ⎛ n ⎞--⎤
⎢⎢-- ⎜
⎜
⎟⎟--⎥⎥
wL1 (x1 , x2 , . . . , xn ) = E ⎢⎢⎢⎣-- ri xi − E ⎜⎜⎝ ri xi ⎟⎟⎠--⎥⎥⎥⎦ .
- i=1 i=1
-
In terms of the L1 -risk measure wL1 (x1 , x2 , . . . , xn ), the L1 -risk model of the portfolio
optimization problem is formulated as
⎡-- n --⎤
⎢⎢-- --⎥⎥
Min wL1 (x1 , x2 , . . . , xn ) = E ⎢⎢⎢⎣-- (ri − µi )xi --⎥⎥⎥⎦
- i=1 -
subject to
n
µi xi ≥ αM0
i=1
n
(6.1)
xi = M0
i=1
0 ≤ xi ≤ ui (i = 1, 2, . . . , n),
⎛-- n --⎞ -
T --
--
⎜⎜-- --⎟⎟ 1 --
n --
E ⎜⎜⎜⎝-- (ri − µi )xi --⎟⎟⎟⎠ = (r − µ )x -- . (6.3)
T t=1 -- i=1
it i i
- i=1 - -
In (6.3) it may be noted that, due to the absolute value function, the expres-
sion on the right hand side becomes a nonlinear and non smooth function of
(x1 , x2 , . . . , xn ).
Using (6.2) and (6.3), problem (6.2) can be reformulated as
- --
T - n
1 -- -
Min -- (rit − µi )xi ---
T t=1 - i=1 -
subject to
n
µi xi ≥ αM0
i=1
n
(6.4)
xi = M0
i=1
0 ≤ xi ≤ ui (i = 1, 2, . . . , n).
-- --
-- n --
If we now denote -- (rit − µi )xi -- by yt and employ the definition of the absolute
- i=1 -
function then we get
-- --
-- n --
yt = -- (rit − µi )xi --
- i=1 -
⎛ n ⎞
n
⎜⎜⎜ ⎟⎟
= Max ⎜⎝ (rit − µi )xi , − (rit − µi )xi ⎟⎟⎠ . (6.5)
i=1 i=1
1
T
Min yt
T t=1
subject to
n
(rit − µi )xi ≤ yt
i=1
n
− (rit − µi )xi ≤ yt
i=1
n
(6.6)
µi xi ≥ αM0
i=1
n
xi = M 0
i=1
0 ≤ xi ≤ ui (i = 1, 2, . . . , n),
where the first two constraints in (6.6) follow from (6.5) and the definition of
maximum.
Denoting (rit − µi ) by cit (i = 1, 2, . . . , n; t = 1, 2, . . . , T), we can rewrite (6.6) as
1
T
Min yt
T t=1
subject to
n
yt − cit xi ≥ 0 (t = 1, 2, . . . , T)
i=1
n
yt + cit xi ≥ 0 (t = 1, 2, . . . , T)
i=1
n
(6.7)
µi xi ≥ αM0
i=1
n
xi = M0
i=1
0 ≤ xi ≤ ui (i = 1, 2, . . . , n) .
Remark 6.3.1 The formulation (6.4) is essentially nonlinear and non smooth.
But since this nonlinearity and non smoothness occurs due to the presence of
absolute value function only, it can be handled in a reasonably simple manner
Mean Absolute Deviation Based Portfolio Optimization: A L1 -Risk Model 173
as explained above. The resulting linear programming problem (6.7) can be solved
efficiently even when n is large.
One obvious question at this stage is to enquire if there is any relationship
between L1 and L2 -risk models. The below given theorem answers this question.
Theorem 6.3.1 Let (r1 , r2 , . . . , rn ) be multivariate normally distributed. Then for
a given portfolio x = (x1 , x2 , . . . , xn )
2
wL1 (x) = σ(x),
π
where the standard deviation σ(x) is given by
.
/
0 ⎡⎧ n
/ ⎛ n ⎞⎫ 2⎤
⎢⎢⎪
⎪ ⎜ ⎟ ⎪
⎪ ⎥
⎢⎨ ⎜⎜ ⎟⎟⎬ ⎥⎥⎥
σ(x1 , x2 , . . . , xn ) = E ⎢⎢⎢⎪ r x − E ⎜
⎝ r x ⎟
i i ⎠⎪ ⎥.
⎣⎪
⎩ i i
⎭ ⎥⎦
⎪
i=1 i=1
Therefore
, ∞
1 u2
wL1 (x) = √ | u | exp − 2 du
2πσ(x) −∞ 2σ (x)
, ∞
2 u2
= √ u exp − 2 du ,
2πσ(x) 0 2σ (x)
selection strategy will be the same as the one given by mean-absolute deviation
selection strategy. Even in the case when normality assumption does not hold,
through certain case studies, it has been shown that minimizing the L1 -risk pro-
duces portfolios which are comparable to Markowitz’s mean-variance model which
minimizes L2 -risk, i.e. standard deviation. However as one expects the variance of
mean-absolute deviation portfolio is always at least as large as the corresponding
mean-variance portfolio. But in actual applications, this difference is small.
In fact Konno and Yamazaki [80] applied both L1 and L2 -risk models in Tokyo
Stock Market by using historical data of 224 stocks in NIKKEL 225 index. They
generated efficient frontiers and observed that the difference of the standard devi-
ation of the optimal portfolio generated by L2 and L1 -risk models is at most 10%
for what ever value of α. Of course two frontiers will coincide if ri ’s are multivari-
ate normally distributed. Thus this difference can be largely attributed to the non
normality of the data. Therefore irrespective of the distribution scenario, L1 -risk
model provides a good alternative to Markowitz’s L2 -risk model.
In terms of the L∞ -risk measure wL∞ (x1 , x2 , . . . , xn ), the L∞ -risk model of the
portfolio optimization problem, denoted by (POL∞ ) is formulated as
⎛ ⎞
⎜⎜ n
⎟⎟
Min ⎜⎜⎝Max (qi xi ), − µi xi ⎟⎟⎠ . (6.9)
x∈F 1≤i≤n
i=1
The problem (6.9) is a bi-criteria optimization problem as there are two ob-
jectives involved in the optimization. These two objectives are the portfolio risk
and the portfolio return. This scenario is again same as for the L1 and L2 -risk
176 Financial Mathematics: An Introduction
models studied earlier. There we had minimized the portfolio risk for a given level
of portfolio return. Following the same strategy, we get the following problem
Min Max (qi xi )
x∈F 1≤i≤n
subject to
n
µi xi ≥ αM0 . (6.10)
i=1
Min y
x∈F
subject to
n
µi xi ≥ αM0
(6.11)
i=1
qi xi ≤ y (i = 1, 2, . . . , n).
For each α, (6.11) is a linear programming problem and hence can be solved
efficiently. However to generate the entire efficient frontier we need to solve (6.11)
for every α, which is practically not possible. In the case of Markowitz’s L2 -risk
model, the famous two fund theorem comes to our rescue which allows to generate
the entire efficient frontier from the knowledge of only two efficient points. In the
case of L∞ -risk model, we do not have any analogue of two fund theorem, so we
look to some other suitable option which does not require simplex algorithm to
solve (6.11) for every α, but rather gives a solution in close form for every level
of return. This is the approach which we shall be following now and describe the
details below.
The solution of the bi-criteria optimization problem (6.9) is to be understood in
terms of an efficient point which in this context is termed as an efficient portfolio.
Definition 6.4.2 (Efficient Portfolio) A feasible portfolio x̄ = (x̄1 , x̄2 , . . . , x̄n ) ∈
F is said to be an efficient portfolio if there does not exist any other portfolio
x = (x1 , x2 , . . . , xn ) ∈ F such that
(i) Max (qi x̄i ) ≤ Max (qi xi ),
1≤i≤n 1≤i≤n
and
n
n
(ii) µi x̄i ≥ µi xi .
i=1 i=1
Minimax Rule Based Portfolio Optimization: An L∞ -Risk Model 177
The collection of all efficient portfolios is called the efficient frontier. As ex-
plained in the context of Markowitz’s model, an investor is always interested in
determining the efficient frontier so as to select that portfolio which gives maxi-
mum return for the chosen level of risk. This principle remains valid for L∞ -risk
model as well.
The problem (POL∞ ) for the L∞ -risk model can be rewritten as
n
Min (y , − µi xi )
i=1
subject to
qi xi ≤ y (i = 1, 2, . . . , n)
(6.12)
x ∈ F.
We now have the following lemma connecting problems (POL∞ ) and PO(λ).
Lemma 6.4.1. The pair (x̄, ȳ) is an efficient solution of (POL∞ ) if and only if
there exists 0 < λ < 1 such that (x̄, ȳ) is an optimal solution of PO(λ).
Here λ can be considered as an investor’s risk tolerance parameter which helps to
do appropriate trade-off between the risk and the return.
In view of Lemma 6.4.1, finding efficient frontier of (POL∞ ) is equivalent to
finding solution of PO(λ) for all 0 < λ < 1. The discussion given below illustrates
that given an arbitrary λ̄ ∈ (0, 1), the solution (x̄, ȳ) of PO(λ̄) can be obtained
explicitly in a close form and there is no need of applying any simplex like nu-
merical optimization algorithm. This greatly reduces the computational burden
in determining the efficient frontier.
Analytical Solution of PO(λ)
Without any loss of generality we can assume that (i) µ1 ≤ µ2 ≤ . . . ≤ µn and (ii)
there do not exist two assets ai and a j , i j, such that µi = µ j and qi = q j . There
178 Financial Mathematics: An Introduction
is obviously no problem with the ordering of asset returns µi and therefore the
first assumption can always be met. The second assumption is also valid because
if µi = µ j and qi = q j for two assets i, j (i j), then we may treat them as a single
aggregated asset. We first consider the case when all assets are risky assets.
Theorem 6.4.1 Let all asset a1 , a2 , . . . , an be risky assets. Then for any 0 < λ < 1,
an optimal solution (x∗ , y∗ ) of the problem PO(λ) is given by
y∗ /qi i ∈ T∗ (λ)
x∗i =
0 otherwise,
where ⎛ ⎞−1
⎜⎜ ⎟⎟
y∗ = M0 ⎜⎜⎜⎝ (1/ql )⎟⎟⎟⎠ ,
l∈T∗ (λ)
(µn − µn−1 ) λ
<
qn 1−λ
(µn − µn−2 ) (µn−1 − µn−2 ) λ
+ <
qn qn−1 1−λ
.. .. ..
. . .
(µn − µn−k ) (µn−1 − µn−k ) (µn−k+1 − µn−k ) λ
+ + ... + < ,
qn qn−1 qn−k+1 1−λ
and
(µn − µn−k−1 ) (µn−1 − µn−k−1 ) (µn−k − µn−k−1 ) λ
+ + ... + ≥ .
qn qn−1 qn−k 1−λ
Then
T∗ (λ) = {n, n − 1, . . . , n − k}. (6.14)
(b) Otherwise
T∗ (λ) = {n, n − 1, . . . , 1}. (6.15)
Proof. We apply the Karush Kuhn-Tucker (KKT) conditions to PO(λ). For this,
we first introduce the Lagrangian of PO(λ) as
Minimax Rule Based Portfolio Optimization: An L∞ -Risk Model 179
n
n
L(x, y, β, δ, γ) = λy + (1 − λ) − µi xi + βi (qi xi − y)
i=1 i=1
n n
+δ xi − M0 − γi xi ,
i=1 i=1
∂L n
=λ− βi = 0 (6.16)
∂y i=1
∂L
= −(1 − λ)µi + βi qi + δ − γi = 0 (i = 1, . . . , n) (6.17)
∂xi
n
xi = M0 (6.18)
i=1
(qi xi − y)βi =0 (i = 1, . . . , n) (6.19)
γi xi =0 (i = 1, . . . , n) (6.20)
βi ≥0 (i = 1, . . . , n) (6.21)
γi ≥0 (i = 1, . . . , n). (6.22)
Our aim now is to obtain a solution of the above system. Define T∗ (λ) = {i :
βi > 0}. This means that for (6.19) to be satisfied we must have
qi xi − y = 0 for i ∈ T∗ (λ) ,
i.e.
y
xi = , for i ∈ T∗ (λ). (6.23)
qi
We let
i.e.
⎛ ⎞−1
⎜⎜ 1 ⎟⎟
y = M0 ⎜⎜⎜⎝ ⎟⎟⎟ .
⎠ (6.26)
∗
ql
l∈T (λ)
Therefore, we have
⎧ ⎛ ⎞−1
⎪
⎪ ⎜⎜ 1 ⎟⎟
⎪
⎪ M0 ⎜⎜ ⎟⎟ ,
∗
⎪
⎨ ⎜⎝ ⎟⎠ i ∈ T∗ (λ)
xi = ⎪
⎪ qi q (6.27)
⎪
⎪ ∗
l∈T (λ)
l
⎪
⎩ 0, otherwise.
Therefore we get
⎛ ⎞−1 ⎛ ⎞
⎜⎜ 1 ⎟⎟ ⎜⎜ µl ⎟⎟
δ = ⎜⎜⎜⎝ ⎟⎟ ⎜⎜(1 − λ)
⎟⎠ ⎜⎝ − λ⎟⎟⎟⎠ . (6.29)
∗
ql ∗
ql
l∈T (λ) l∈T (λ)
Also from (6.17) and noting that for i T∗ (λ), βi = 0, we have for i T∗ (λ),
which by (6.14), is nonnegative. The above details show that the KKT conditions
(6.21) and (6.22) are satisfied. In the case when there does not exist any integer
k ∈ [0, n − 2] such that inequalities in (a) are satisfied, we need to show that the
solution given by (6.26) and (6.27) with T∗ (λ) = {n, n − 1, . . . , 2, 1}, will satisfy all
KKT conditions. To do this, we introduce a dummy asset a0 with µ0 = −L and
q0 = L, where L is a sufficiently large positive number. Now following in the same
manner as above, we can show that all KKT conditions are satisfied.
Thus above analysis shows that the KKT conditions (6.16) to (6.22) are satis-
fied, if we select T∗ (λ) by the selection procedure (6.14) to (6.15) and solutions y
and xi are given by (6.26) and (6.27). Now since PO(λ) is a convex programming
problem the KKT conditions become necessary and sufficient for optimality. So
solution given by (6.12) to (6.15) is an optimal solution. This proves the theorem.
182 Financial Mathematics: An Introduction
We next consider the case when one risk-free asset is included in the portfolio.
To be specific, let the asset a1 be risk-free. It is natural to assume that the risk-free
asset has the lowest return. This is because choosing assets having lower returns
than that of the risk-free asset in the portfolio can not make it optimal.
Now for the risk-free asset a1 , we have q1 = 0. Hence Theorem 6.4.1 can not
be directly applied in this scenario. However, we can take q1 = where > 0 is
a small number. Now we can apply Theorem6.4.1 and get the desired result by
taking → 0+ .
Taking q1 = (> 0), when we apply Theorem 6.4.1 to determine the set
T∗ (λ),there are two possibilities. These are (i) 1 T∗ (λ) and (ii) 1 ∈ T∗ (λ).
When 1 T∗ (λ), the situation is exactly same as discussed earlier and so the
optimal solution for PO(λ) as given in Theorem 6.4.1 remains unchanged.
When 1 ∈ T∗ (λ). Then by Theorem 6.4.1, the optimal solution of PO(λ) is given
by
⎧ ∗
⎪
⎪ y
∗
⎪
⎨ , i ∈ T∗ (λ)
xi = ⎪⎪ q (6.32)
⎪
⎩0 ,
i
i T∗ (λ),
where
⎛ ⎞−1
⎜⎜ 1
n
1 ⎟⎟
y∗ = M0 ⎜⎜⎜⎝ + ⎟⎟ .
⎟ (6.33)
ql ⎠
l1
In (6.32) and (6.33), if we now take the limit as → 0+ , we obtain x∗i = 0 for all
i > 1, x∗1 = M0 and y∗ = 0.
An obvious question now is to state certain condition so that the two cases,
namely 1 T∗ (λ) and 1 ∈ T∗ (λ), can be verified easily. In this context, we state
the following condition
(µn − µ1 ) (µn−1 − µ1 ) (µ2 − µ1 ) λ
+ + ... + < . (6.34)
qn qn−1 q2 (1 − λ)
Here µ1 is the risk-free return. For the risk-free asset a1 , q1 = 0 but it does not
enter in (6.34).
Form the statement of Theorem 6.4.1 and related condition for the determina-
tion of T∗ (λ) for k = (n − 2) it is clear that if the condition (6.34) is not satisfied
then 1 T∗ (λ). Otherwise if condition (6.34) is satisfied then 1 ∈ T∗ (λ). Therefore
the above discussion leads to the following theorem
Minimax Rule Based Portfolio Optimization: An L∞ -Risk Model 183
Theorem 6.4.2 Let λ ∈ (0, 1) be given. If the condition (6.34) is not satisfied then
T∗ (λ) and associated (x∗ , y∗ ) should be determined as per Theorem 6.4.1. Otherwise
if (6.34) is satisfied, then all wealth M0 should be invested in the risk free asset,
i.e. x∗1 = M, x∗i = 0, i 1 and y∗ = 0.
Certain Observations about the Solution of Problem PO(λ)
(i) Consider the solution where
(µn − µn−1 ) λ
< ,
qn (1 − λ)
and
(µn − µn−2 ) (µn−1 − µn−2 ) λ
+ ≥ .
qn qn−1 (1 − λ)
Then Theorem 6.4.1 tells that the optimal portfolio should consist of assets
an and an−1 only. Further, the actual amounts of investments for these assets
should be −1 −1
∗ M0 1 1 qn
xn = + = M0 1 + ,
qn qn−1 qn qn−1
and −1 −1
M0 1 1 qn−1
x∗n−1 = + = M0 1 + .
qn−1 qn−1 qn qn
Therefore, if qn is much larger than qn−1 , then it is possible that x∗n is nearly
zero while x∗n−1 is nearly equal to M0 .
The above discussion suggests that the process of constructing an optimal
portfolio is a two phase decision process. In the first phase, the assets are
selected according to their rates of return, thereby giving the set of investable
assets. In the second phase, the actual amounts allocated to the investable
assets (those assets which have been selected in Phase-1) are determined based
on their risk levels. Here if the risk of any investable asset is very high, then
its wealth allocation is very small and therefore it may be neglected in the
optimal portfolio. Thus in Phase-1, an asset may be eliminated if its return is
very low, while in Phase-2, it may be eliminated if its risk is very high.
(ii) For the optimal portfolio x∗ as given by Theorem 6.4.1, we have
∗
∗ y, i ∈ T∗ (λ)
xi qi = (6.35)
0, otherwise.
184 Financial Mathematics: An Introduction
Since qi x∗i is the risk associated with the ith asset when the amount x∗i is in-
vestable in it, (6.35) tells that for the assets selected for investment, we invest
them with the amounts such that they have the same risk y∗ . Thus the optimal
solution of PO(λ) gives an investment plan in which we invest a small amount
for assets having high risk, and invest a large amount for assets having low
risk. This strategy will not increase the maximum risk but certainly increase
the overall expected return.
(iii) The amounts x∗i (i ∈ T∗ (λ)) as given in Theorem 6.4.1 does not depend on µi
as long as the set T∗ (λ) is selected. Thus the information about the expected
returns µi is used to determine the set of investable assets only. Later only risks
of the investable assets are used to determine the actual allocation of wealth
in the investable assets.
(iv) In Theorem 6.4.1, the inequalities are used to define the ranking rule for se-
lecting the investable assets. Because of the presence of these inequalities, this
model has some robustness and is not that much sensitive against errors in
the parameters µi (i = 1, 2, . . . , n). In fact Cai et al. [23] gave an explicit result
for allowable perturbation δi in µi so that the optimal portfolio as obtained by
Theorem 6.4.1 remains unchanged.
We now present some illustrative examples to verify the above points.
Example 6.4.1 Consider a six asset (a1 , a2 , a3 , a4 , a5 , a6 ) portfolio optimization
problem PO(λ) with the following data
(µ6 − µ5 ) (9 − 7)
= = 0 · 25 < 0 · 5384
q6 8
(µ6 − µ4 ) (µ5 − µ4 ) (9 − 6) (7 − 6)
+ = + = 0 · 3775 < 0 · 5384
q6 q5 8 400
(µ6 − µ3 ) (µ5 − µ3 ) (µ4 − µ3 ) (9 − 5) (7 − 5) (6 − 5)
+ + = + + = 0 · 647 > 0 · 5384.
q6 q5 q4 8 400 7
= 0 · 647 > 0 · 5384.
Hence (n − k) = 4, which gives k = 6 − 4 = 2. This gives
T∗ (0.35) = {n, n − 1, . . . , n − k} = {6, 5, 4},
and therefore the investable assets are a4 , a5 and a6 .
Next we determine the allocation in these investable assets using the formulae
given in Theorem 6.4.1, we have
∗ 60, 000 1 1 1 −1
x4 = + + = 31704 · 09
7 7 400 8
60, 000 1 1 1 −1
x∗5 = + + = 554 · 82
400 7 400 8
60, 000 1 1 1 −1
x∗6 = + + = 27741 · 08,
8 7 400 8
and
1 1 1 −1
y∗ = 60, 000 + + = 221928 · 66.
7 400 8
Therefore the optimal portfolio consists of the allocation x∗1 = 0, x∗2 = 0, x∗3 =
0, x∗4 = 31704 · 09, x∗5 = 554 · 82 and x∗6 = 27741 · 08 from the given wealth
M0 = 60, 000. Further the minimum value of the L∞ -risk measure for the optimal
portfolio x∗ , i.e. wL∞ (x∗ ) is y∗ = 221928 · 66. In fact y∗ = q4 x∗4 = q5 x∗5 = q6 x∗6 =
221928 · 66 and therefore all investable assets with the chosen optimal allocation
have the same risk y∗ . Further the maximum value of the portfolio expected return
is
n
µi x∗i = µ4 x∗4 + µ5 x∗5 + µ6 x∗6 = 443778 · 0.
i=1
Here we see that the returns of the asset a1 , a2 and a3 are small in comparison to
a4 , a5 and a6 and they are not selected for the investment. This is precisely the
186 Financial Mathematics: An Introduction
Phase-1 of the investable assets selection procedure. Further, as risk of the asset
a5 is very large, it is therefore allocated a very small amount of investment. This
is the Phase-2 of the investable selection procedure.
Some More Observations on L∞ -Risk Model
(i) Looking at the definition of the L∞ risk measure wL∞ (x) or the formula for the
optimal allocation x∗ , one gets the impression that none of these depend on
the covariances between the assets. Further it seems that only the risks of the
individual assets, rather than the risk of the entire portfolio, are taken care off
in this development. But this is not true as being explained ⎛-- nbelow. n --⎞
⎜⎜-- --⎟⎟
Let us recall that the total portfolio risk is given by E ⎜⎜⎜⎝-- ri xi − µi xi --⎟⎟⎟⎠.
- i=1 i=1
-
Let ⎛ --ξ nbe a given positive
-- number.
⎞ We are interested in making
⎜⎜ -- n -- ⎟⎟
P ⎜⎜⎜⎝ -- ri x i − µi xi -- ≥ ξ⎟⎟⎟⎠ as small as possible because this ensures that the
- i=1 i=1
-
deviation of the actual total return from the expected total return is as small
as possible. But by Markov inequality we have
n
n
1 n n
P(| ri xi − µi xi |≥ ξ) ≤ E(| ri xi − µi xi |)
i=1 i=1
ξ i=1 i=1
1
n
≤ E(| ri − µi | xi )
ξ i=1
n
≤ wL∞ (x). (6.36)
ξ
The inequality (6.36) illustrates that the total portfolio risk is small if wL∞ (x)
is kept small. But in the expression for the total portfolio risk, the covariances
among various assets are certainly involved. This explains that in the L∞ -
risk model, the covariances are not ignored and minimizing wL∞ (x) indirectly
attempts minimizing the total portfolio risk.
(ii) In analogy with Markowitz’s mean-variance model, it is natural to ask certain
questions with regard to the efficient frontier and the status of CAPM type
model for L∞ -risk model. Cai et al. [23] developed an explicit procedure for
tracing the efficient frontier for L∞ -risk model but did not present any CAPM
type result.
Value-at-Risk of an Asset 187
(iii) For Markowitz’s model we can generate efficient frontier easily provided short
selling is allowed (we refer to the two fund theorem in this regard). For L∞ -risk
model discussed here, generation of efficient frontier becomes easier provided
short selling is not allowed (we refer to Theorem 6.4.1 in this regard).
(iv) Cai et al. [23] L∞ -risk model can be made more robust if more flexibility is
allowed with regard to the parameters µi (i = 1, 2, . . . , n). Deng et al. [36] pre-
sented a minimax type model for portfolio optimization where µi s are allowed
to be in the interval, ai ≤ µi ≤ bi (i=1,2,. . . ,n). They used the celebrated min-
imax theorem along with the results of Cai et al. [23] to solve the resulting
optimization problem.
So far we have considered certain variations of Markowitz’s mean-variance
model where the portfolio risk is taken different from the standard deviation (e.g.
L1 -risk or L∞ -risk) but it is still essentially based on moments of the portfolio re-
turn. In the coming sections, we shall discuss few variations of Markowitz’s model
where portfolio risk is quantile based. In particular, we discuss VaR and CVaR
based portfolio optimization problems.
and VaR represents the predicted maximum loss with specified probability (0.95
in our example) over a certain period of time which is T in our case.
188 Financial Mathematics: An Introduction
Thus VaR(1−α) (X) for an asset is the value z such that the probability that the
maximum loss X is at most z, is at least (1 − α).
The use of VaR involves two chosen parameters. These are confidence level
(1 − α) and the holding period T of the asset. The choice of α, and hence(1 − α),
depends on the purpose to which our risk measure is utilized. In practice α is
typically taken as 10%, 5% and 1%, so that the typical confidence levels are
90%, 95% and 99%. The usual holding periods are one day or one month, but it
can be even one quarter or more. Given the confidence level (1 − α) and horizon
T, VaR is a bound such that the loss over the horizon is less than this bound
with probability equal to the confidence coefficient. For example, if horizon is one
week, the confidence level is 99% (so α = 0 · 01) and VaR is Rs 50,000, then there
is only a 1% chance of loss exceeding Rs 50,000 over the next week.
From (6.39) we note that VaR(1−α) (X) = F−1
X
(1−α) and therefore for a continuous
loss distribution VaR(1−α) (X) is simply the loss such that
and our aim is to find VaR for (1 − α) = 0.95, i.e. for α = 0.05. By definition, loss
equals (100e0.08 − S(1)) and
i.e.
S(1)
(100e0.08 − VaR)
P ln ≥ ln = 0.95
S(0) 100
i.e.
190 Financial Mathematics: An Introduction
⎡ ⎤
⎢⎢ S(1) (100e0.08 − VaR) ⎥
⎢⎢ ln − 0.12 ⎥⎥⎥
⎢⎢ S(0) − 0.12
ln
100 ⎥⎥
P ⎢⎢⎢⎢ ≥ ⎥⎥ = 0.95.
⎥⎥ (6.42)
⎢⎢ 0.30 0.30 ⎥⎥
⎢⎣ ⎥⎦
The above shortcomings of VaR has motivated researchers to look for other quan-
tile based risk measures and CVaR (conditional value at risk) is one such risk
measure.
Before we proceed with the discussion of CVaR, we remark that inspite of the
difficulties outlined above, VaR is still very popular in the market. Therefore we
need to discuss some non-parametric and parametric methods for its estimation.
For small sample size, VaR can be best estimated by parametric technique. We
discuss only the case of normal distribution because it is extremely simple.
Since, in general, the historical data or returns is given, we have
Therefore
VaR(1−α) (X) = −S(µ + Φ−1 (α)σ),
which can be estimated by −S(X + Φ−1 (α)σ̂). Here X and σ̂ are the mean and
standard deviation of sample of returns and S is the size of the initial investment.
Let us assume that from the given historical data, X = −3 · 107 × 10−4 and
σ̂ = 0 · 0151. Also Φ−1 (α) = −1 · 645 for α = 0 · 05. Then the estimate of the
required VaR is ((−3 · 107 × 10−4 ) + (−1 · 645)(0 · 0151)(20, 000)) = Rs 471.
In this example, the estimate of the expected return, is negative. This may be
because the particular years used include a prolonged bear market. However, we
certainly do not expect average future returns to be negative, otherwise we would
not have invested Rs 20,000 in the market.
Since in actual practice, normality assumption is not going to hold, we need to
apply historical simulation and Monte Carlo strategies to generate a large number
of possible scenarios. Also we need to understand VaR for a portfolio of assets
which could include options along with certain number of stocks. The estimation
of VaR in this situation also requires Monte Carlo simulation. We shall discuss
Monte carlo simulation in Chapter 14.
To define CVaR we first define the occurrence of a tail event. We say that a tail
event occurs if the loss exceeds the VaR. Then CVaR is the conditional expectation
of loss given that the tail event occurs. We now proceed to define CVaR and discuss
its minimization mathematically. To be specific, we define CVaR in context of
portfolio optimization.
We consider a portfolio of assets with random returns. Let f (w, r) denote the
loss function when we choose the investment w from a set of feasible portfolios and
r is the realization of random returns. In the context of our portfolio optimization
problem, f (w, r) = −rT w where r = (r1 , r2 , . . . , rn )T is the vector of random returns
and w = (w1 , w2 , . . . , wn )T with eT w = 1, e being the vector (1, 1, . . . , 1)T . Here in
the expression of f (w, r), minus sign has been taken to express return rT w as loss
function. We assume that the return vector r has a probability density function
p(r), e.g., the random returns may have a multivariate normal distribution.
For a fixed weight vector w, we define
,
ψ(w, q) = p(r)dr, (6.43)
f (w,r)≤q
which represents the cumulative distribution function of the loss associated with
the weight vector w. In fact ψ(w, q) = P(loss ≤ q). Therefore for a given confidence
level (1 − α), the VaR(1−α) (w) associated with the portfolio is given by
where J = {1, 2, . . . , m}, J1 = {j ∈ J : f (w, r(j) ) ≥ VaR(w)} and the vector r(j) is the
jth realization of the return vector r with probability p j .
Conditional Value-at-Risk 195
Remark 6.6.2 In certain sense, (6.44) and (6.45) tell that CVaR(1−α) (w) is the
average of the outcomes greater than VaR(1−α) (w). This is certainly true for contin-
uous distribution functions; but for general distributions this is not exactly true.
There are certain subtle aspects which need to be explained. We shall refer to
Sriboonchitta et el [128] in this regard.
Example 6.6.1 Let the loss function f (w, r) be given by f (w, r) = −r where r =
75 − j, (j = 0, 1, 2, . . . , 99) with probability 1%. Evaluate VaR(w) and CVaR(w) at
95% confidence level.
Solution The loss function f (w, r) = −r, where r = 75 − j, (j = 0, 1, 2, . . . , 99),
takes 100 values given by (−75, −74, −73, . . . , 0, 1, . . . , 19, 20, 21, 22, 23 and 24) with
equal probability p = 0 · 01. Therefore VaR(w) = 20 for 1 − α = 0 · 95.
We next evaluate CVaR(1−α) (w) for (1 − α) = 0.95 by using formula (6.45). This
gives
1
CVaR(w) = (20 + 21 + 22 + 23 + 24)(0 · 01) = 22.
0 · 05
Thus at 95% confidence level VaR is 20 and CVaR is 22.
In the above example for the same confidence level CVaR is more than VaR.
We have the following result in this regard.
Lemma 6.6.1. For the given confidence level (1−α), CVaR(1−α) (w) ≥ VaR(1−α) (w).
Proof. We have
,
1
CVaR(1−α) (w) = f (w, r)p(r)dr
α f (w,r)≥VaR(1−α) (w)
,
1
≥ VaR(1−α) (w)p(r)dr
α f (w,r)≥VaR(1−α) (w)
,
VaR(1−α) (w)
= p(r)dr
α f (w,r)≥VaR(1−α) (w)
= VaR(1−α) (w).
Remark 6.6.3 As CVaR of a portfolio is always more than or equal to VaR for
same (1 − α), portfolios with small CVaR will also have small VaR. But this does
not mean that the minimization of CVaR is equivalent to the minimization of VaR.
196 Financial Mathematics: An Introduction
Minimization of CVaR
We now discuss Rockafellar and Uryasev’s [111] procedure to minimize CVaR.
Since the definition of CVaR involves the VaR function explicitly, it is not very
convenient to optimize CVaR directly. Therefore we introduce the auxiliary func-
tion ,
1
F(1−α) (w, q) = q + ( f (w, r) − q)p(r)dr. (6.46)
α f (w,r)≥q
If we denote a+ = Max(a, 0), a ∈ R, then from (6.46) we get
,
1
F(1−α) (w, q) = q + ( f (w, r) − q)+ p(r)dr. (6.47)
α
We refer to Rockafellar and Uryasev [111] for below given results.
Lemma 6.6.2. The auxiliary function F(1−α) (w, q) is a convex function of q.
Lemma 6.6.3. VaR(1−α) (w) is a minimizer of F(1−α) (w, q) over q.
Lemma 6.6.4. The minimum value of F(1−α) (w, q) over q is CVaR(1−α) (w).
In view of the above Lemmas, we have for a given w,
CVaR(1−α) (w) = Minq (F(1−α) (w, q)) = F(1−α) (w, VaR(1−α) (w)). (6.48)
The left equality of (6.48) tells that we can minimize CVaR directly without
computing VaR first. Since for portfolios, the loss function f (w, r) = −rT w is a
linear and hence also convex function of w, the auxiliary function F(1−α) (w, q) is a
convex function of w. Therefore the problem
is equivalent to
subject to
eT w = 1, (6.50)
1
ns
(
F(1−α) (w, q) = a + ( f (w, r(s) ) − q)+ . (6.51)
α ns s=1
Using (6.51) to approximate F(1−α) (w, q), we get an approximation to the problem
(6.50) as
Min (
F(1−α) (w, q)
w,q
subject to
eT w = 1. (6.52)
Now writing ( f (w, r(s) ) − q)+ as zs and using the definition of a+ , we obtain
1
ns
Min q+ zs
w,q,zs α.ns s=1
subject to
zs ≥ f (w, r(s) ) − q, s = 1, 2, . . . , ns
eT w = 1 (6.53)
zs ≥ 0, s = 1, 2, . . . , ns .
In the context of portfolio optimization, f (w, r(s) ) = −wT r(s) is linear and so the
problem (13.5) becomes a linear programming problem.
Most often we try to optimize a suitable performance measure (e.g, expected
return) while making sure that certain risk measures do not exceed a threshold
value. It could be variance or absolute deviation as has been discussed earlier.
When the risk measure is CVaR, the resulting optimization problem is
198 Financial Mathematics: An Introduction
Max µT w
subject to
CVaR(1−α) (w) ≤ uα
(6.54)
eT w = 1,
Max µT w
w,z,q
subject to
1
ns
uα ≥ q + zs
α ns s=1
zs ≥ f (w, r(s) ) − q (s = 1, 2, . . . , ns )
e w=1
T
zs ≥ 0 (s = 1, 2, . . . , ns ).
In (6.54), we can have more than one CVaR constraint for different levels α. Also
similar to Markowitz’s model, we can have a trade off between return and CVaR.
We can refer to Mansini et al. [89] for more details in this regard.
provided that the expectations exist. Thus for making their optimal choices under
uncertainty, decision makers try to maximize their expected utilities. This princi-
ple is called Von Neumann- Mangenstern expected utility maximization principle.
Is this principle always applicable? Does there always exist a utility function as
described above for the random variables in the class X? Apart from various
mathematical conditions for its existence it was argued that the existence of util-
ity function is restricted to “rational” people. This latter part has been a topic of
debate among economists and psychologists. For this as well as other aspects of
Von Neumann - Mangenstern theory we shall refer to Fishburn [47].
Utilities and Risk Attitudes
We all know that different people react differently in risky situations. They
make decision based upon their own attitudes towards risk. Thus, making an
investment choice in risky and risk free assets, each person will act differently
depending upon his/her risk attitude. Roughly there are three main risk attitudes
of an investor. Let the random variable X be a risky prospect, and u be the utility
function of the decision maker. We say that the investor is
(i) risk neutral if, when facing two risky prospects with same expected value will
feel indifferent. In terms of the utility function u, this means that u(E(X)) =
E(u(Y)). This will happen for example if u is linear, say u(x) = x.
(ii) risk averse if, when facing two risky prospects with the same expected value,
will prefer the less risky one. In terms of the utility function u, this means
that u(E(X)) > E(u(X)). This will happen for example if u is concave, say
u(x) = log x or u(x) = −x2 .
(iii) risk seeking if, when facing two risky prospects with the same expected value,
will prefer the more risky one. In terms of the utility function u, this means that
u(E(X)) < E(u(X)). This will happen for example if u is convex, say u(x) = x2 .
It is obvious that given a utility function u(x), any function of the form v(x) =
au(x) + b with a > 0 is a utility function equivalent to u(x). This is because
equivalent utility functions give identical rankings in terms of the principle of
expected utility maximization.
To explain the meaning of risk averse scenario, let us consider an investor who
has two alternatives for future wealth.The first alternative is based on the outcome
of tossing of a coin. If the coin turns up ‘head’ the investor gets Rs
x, otherwise
x + y
he/she gets Rs y. The second alternative is a sure event of getting Rs with
2
certainty. Let the investor’s utility function be given by a strictly concave function
Preference Relation, Utility theory and Decision Making 201
u(x) + u(y)
u(x). Then the expected utility of first alternative is , whereas the
x + y 2
expected utility of the second alternative is u . Since u is strictly concave
2
we have
x + y u(x) + u(y)
u > .
2 2
Hence by the principle of expected utility maximization we shall prefer the sure
(x + y)
wealth of Rs to a 50-50 chance of x or y. Here we must note that both
2
(x + y)
alternatives have the same expected value of Rs , but the one without risk
2
(certainty of getting) is preferred. This explains the risk averse attitude of the
investor.
The below given result is useful in this regard.
Theorem 6.7.2 An investor is risk averse if and only if his/her utility function
is concave.
Remark 6.7.1 Although it has not been mentioned in the statement of above
theorem, u is non-decreasing by the definition of utility function. We refer to
Sriboonchitta et al. [128] for the proof of the above theorem. The proof essentially
uses Jensen’s inequality and the defining condition u(E(X)) > E(u(X)) for a risk
averse investor.
Let us take another example to understand the risk attitudes. For an investor
let there be three possible scenarios: a 5% chance of loosing Rs 20, 000, a 10%
chance of loosing Rs 10, 000, and a sure chance of loosing Rs 1, 000. A risk neutral
investor will not find any of these situations worse than the other. For a risk averse
investor, the first situation is worse than the second one, and the second one is
worse than the last one. This is because risk averse investors dislike uncertainty
about the size of losses, so they do not prefer even a small possibility of large
amount of loss.
Certainty Equivalent
The certainty equivalent of a random wealth X is defined to be the amount of a
certain (i.e. risk-free) wealth that has a utility level equals to the expected utility
of X. Thus if C is the certainty equivalent of a random wealth X then by definition
u(C) = E(u(X)).
The certainty equivalent of a random variable is same for all equivalent utilities
and is measured in units of wealth.
202 Financial Mathematics: An Introduction
wealth variable y. Thus E u( ŷ) ≥ E u(y) for all feasible y. Let E( ŷ) = M and
S = {y : y is feasible and E(y) = M}. We shall now show that Var( ŷ) ≤ Var(y) for
all y ∈ S. If possible let there exists y ∈ S such that Var(y) ≤ Var( ŷ). Then as
y ∈ S it is feasible and E(y) = M. Further
1 1 1 1
aM − bM2 − bVar( ŷ) ≤ aM − bM2 − bVar(y). (6.57)
2 2 2 2
But then (6.57) contradicts that ŷ is optimal. Therefore Var( ŷ) ≤ Var(y) for
all y ∈ S, i.e. ŷ has the minimum variance with respect to all feasible y s with
E(y) = M. Hence ŷ corresponds to a mean-variance efficient point. Different mean-
variance efficient points are obtained by providing different values for the param-
eters a and b. The readers may identify y = rT w, E(y) = µT w and Var(y) = wT Cw
with C as the variance-covariance matrix.
Now we have a very natural question. What happens if we take a general utility
function u? How different are the mean-variance efficient portfolios and portfolios
that maximize the expected utility i.e. those which are the solution of the problem
(6.55) for a general u? In this context we have the following theorem
Theorem 6.8.1 Let X and Y be two random variables, normally distributed with
means µ1 , µ2 and variances σ21 , σ22 respectively. Then following are equivalent
(i) E (u(X)) ≥ E (u(Y)) for any u : R → R which is non decreasing and concave.
(ii) µ1 ≥ µ2 and σ21 ≤ σ22 .
Proof. See details in Sriboorchitta et al. [128].
Remark 6.8.1 In view of the above theorem, if the returns are normally dis-
tributed and the utility function is concave, then the portfolio that maximizes the
expected utility is on Markowitz’s efficient frontier. Exactly which portfolio on the
efficient frontier will be obtained, will depend on the choice of utility function.
VaR and CVaR). Here we attempt to understand risk measures in a broader math-
ematical framework and introduce the concept of coherent risk measures(Artzner
[5]). An excellent tutorial on this topic is due to Rockafellor [112].
Basically, we wish to answer the following question: How can we assess the risk
of a financial position mathematically? For this we note that a financial position
is captured by a random variable X, e.g. loss. Information about loss is in its
cumulative distribution function F. In the financial context where the risk is
about the risk of loosing money, we wish to define numerical risk for X as some
appropriate number ρ(X) and call it as a risk measure. It seems natural to choose
ρ(·) in such a manner that it captures our risk perception and it is also suitable
for financial management purposes.
Domain and Range of Risk Measures
As stated above, we wish to assign a numerical value ρ(X) to each random variable
X to describe its risk. Here X stands for the loss of an investment portfolio, the
loss of a financial position, or capital needed to hold for an insurance company to
avoid insolvency.
It is very clear that the range of ρ has to be R. But what about its domain?
Let U denote the real vector space of all possible real valued random variables.
Mathematically we do not need the entire vector space U as the domain of ρ. But
rather we need a particular subset of U namely a convex cone X.
Definition 6.9.1 (Cone) A subset X in U is called a cone if for X ∈ X and
λ > 0, we have λX ∈ X.
Definition 6.9.2 (Convex Cone) A subset X in U is called a convex cone if it
is a cone and also a convex set. Thus X is a convex cone if for X, Y ∈ X, λ > 0,
we have X + Y ∈ X and λX ∈ X.
It is economically meaningful to take the domain of risk measures as a convex
cone of the vector space £. This is because we know that diversification should
reduce risk. But for this
* we need to evaluate our * risk measure on an investment
portfolio of type X = ki=1 λi Yi , where λi > 0, ki=1 λi = 1 and Yi denotes the rate
th
of return of i* asset in the portfolio.
* Thus if Yi are in a domain X of £, we also
require that ni=1 λi Yi , (λi > 0, ki=1 λi = 1), should also be in X; i.e. X is a convex
cone.
Desirable properties of Risk Measures
Let X denote the convex cone of loss random variable X. Then, as per Artzner
[5], we consider the following desirable properties for a risk measure ρ : X → R.
Risk Modeling and Financial Risk Measures 205
(X Y) ⇒ (ρ(X) ≤ ρ(Y)).
(ii) Value-at-Risk For a loss random variable X with the cumulative distribution
function F, the value-at-risk of X with confidence level (1 − α), 0 < α < 1 is
defined as
Here (1 − α)-quantile is the position such that P(X ≤ F−1 (1 − α)) ≥ (1 − α)).
Thus, P(X > F−1 (1 − α)) ≤ α.
It can now be shown that VaR is again not a coherent risk measure because
it is not subadditive. We have already given such an example but below given
another example which is probably more general.
Example 6.9.1 Let X,Y be two independent and identically distributed Bernoulli
random variables with parameter p, i.e. P(X = 0) = (1 − p) and P(X = 1) = p.
Let (1 − p)2 < α < 1. Show that VaR is not sub additive in this situation.
Solution As (1 − p)2 < α < 1, we have 0 < (1 − α) < 2p(1 − p). Therefore
F−1
X
(1 − α) = 0 = F−1
Y
(1 − α). But F−1
X+Y
(1 − α) > 0. Therefore for this choice of
α, the subadditive law does not hold.
Let us recall the definition of SSD. We say that X 2 Y if
, t
(FY − FX )(u)du ≥ 0, ∀ t ∈ R.
−∞
The order 2 is called risk seeking stochastic dominance (RSSD), also called
stop-loss-order which is very popular in acturial science.
Definition 6.9.6 (Distortion Function) A function g : [0, 1] → [0, 1] is
called a distortion function if (i) it is nondecreasing and (ii) g(0)=0, g(1)=1.
208 Financial Mathematics: An Introduction
Max E (R(w))
subject to,
R(w) 2 Y
(6.60)
eT w = 1,
where R(w) = rT w.
The problem (6.60) aims to determine those portfolios whose return dominate
the reference return under second order stochastic dominance rule. This problem
has continum of constraints because of the presence of the constraints R(w) 2 Y,
and as such is difficult to handle. Never the less, the application of stochastic dom-
inance in portfolio selection problem has been an important recent contribution in
the literature. Some notable contributions in this direction are due to Dentcheva
and Ruszczynski [37], Ruszczynski and Vanderbei [116], and Fábián, Mitra and
Roman [44].
in recent years. In fact one of the main issues of risk management is the aggre-
gation of individual risks. Copulas have been used to aggregate risk in [42] and
[18] and many more references cited therein. They are often used to identify
market risk, credit risk and operational risk. Though in this book we have used
correlation coefficient to compute interdependence of returns of two or more
assets in a portfolio, yet it is equally important to inform readers about a rel-
atively complex but highly useful notion of copula. Useful because, correlation
parameter works well with normal distributions, while distributions in finan-
cial markets are mostly skewed and copulas are very handy to deal with the
skewness. For this reason, they have been applied in option pricing, portfolio
value-at-risk, and interest rates derivatives, to name a few. There are various
types of copulas, like, Frank copula, normal (Gaussian) copula, student copula,
singular copula, and many more. For more details on how copulas are used in
risk management, we can refer to [17, 18, 29], and numerous excellent articles
on web. We encourage you to hunt for this concept and its applications in
finance.
6.13 Exercises
Exercise 6.1 Consider the data of Exercise 1.1 of Chapter 1. Let there be
Rs 50,000 to be invested on these three assets. Formulate and solve the result-
ing portfolio optimization problem if
(i) L1 risk measure is used.
(ii) L2 risk measure is used.
(iii) L∞ risk measure is used.
Exercise 6.2 Suppose that the portfolios are constructed using three securities
a1 , a2 , a3 with expected returns µ1 = 20%, µ2 = 15% and µ3 = 4%. Further, let
expected absolute deviations be q1 = 50, q2 = 400 and q3 = 10 respectively. Let
there be M0 = Rs 50, 000 to be invested. Then for the risk tolerance parameter
λ = 0.5, obtain the minimax rule based optimal portfolio.
Exercise 6.3 Solve the above problem if q3 = 0 and there is no change in the
other data.
Exercise 6.4 An investor has utility function u(x) = x1/4 for his/her salary.
He/she has a new job offer which pays Rs 80,000 with a bonus. The bonus will be
Rs 0, Rs 10,000, Rs 20,000, Rs 30,000, Rs 40,000, Rs 50,000, or Rs 60,000
each with equal probability. Find the certainty equivalent of this job offer.
Exercises 213
Exercise 6.8 Let X be the class of profit or loss random variables with finite
variances. Verify that
and
P(Y = 8) = 1 − P(Y = 18) = 4/5.
7.1 Introduction
In our earlier chapters, we have been making frequent use of terms like random
variables and random vectors. In fact, any first course on probability theory dis-
cusses these concepts in detail. The aim of this chapter is to go beyond the notion
of a random vector, and introduce a collection of random variables parameterized
by a parameter, say time. Such a collection of random variables is essentially a
stochastic process.
Stochastic processes occur naturally in real life applications. For instance con-
sider the (opening) exchange rate: Indian rupees (INR)/US dollar (USD) at every
day between August 18, 2011 and February 2, 2012 except weekends. The actual
exchange rate is shown in Fig. 7.1. As on any given day, the exchange rate is
random, we can interpret this figure as a realization xn of the random variable Xn ,
where n is the day 1, 2, . . .. In order to make a guess of the interest rate on a fu-
ture date, it is reasonable to look at the whole evolution of Xn between August 18,
2011 and February 2, 2012. Therefore, there is a need into develop theory which
provides almost continuous information about the process considered. A mathe-
matical representation for describing such a phenomenon leads to the notion of
stochastic process.
Stochastic processes play a vital role in financial mathematics, because the
proper understanding of asset dynamics is crucial for making any meaningful
financial decision. The derivation of Black-Scholes formula for option pricing il-
lustrates this point very well. The derivative pricing and interest rate modeling
can be studied in greater depth using the concept of stochastic process. In this
chapter, we aim to present a brief introduction of stochastic processes and some
of the related concepts keeping finance in view.
216 Financial Mathematics: An Introduction
54
53
52
50
49
48
47
46
45
0 20 40 60 80 100 120
Days (August 18, 2011 − February 2, 2012)
Fig. 7.1. Exchange Rate: Indian Rupees (INR) / US Dollar (USD) from August 18, 2011 to February
2, 2012
Then P(.) is called a probability measure. The triplet (Ω, F , P) is called a proba-
bility space.
In most of the random experiments, either it is not feasible or it is difficult
to obtain the probability of an event using definition. However, it is possible
to calculate the probability of an event and related probability distributions by
mapping the set of all possible outcomes into real line. This leads to the definition
of a random variable.
Definition 7.2.3 (Random Variable) Let (Ω, F , P) be given probability space.
A function X : Ω → R is called a random variable if for any x in R, X−1 {(−∞, x]}
belongs to F . Here
X−1 {(−∞, x]} = {w ∈ Ω : X(w) ∈ (−∞, x]} .
Remark 7.2.1 For a given probability space (Ω, F , P), when F is the largest σ-
field on Ω, any real-valued function defined on Ω will be a random variable.
Remark 7.2.2 Let Ω = R and U (1) = {(a, b] : −∞ < a < b < ∞}. Then the σ-field
generated by U (1) is called the Borel σ-field, and its elements are called Borel sets.
Thus Borel σ-field is the σ-field generated by the family of semi-open intervals of
type (a, b], −∞ < a < b < ∞.
Remark 7.2.3 A function f from a measurable space (Ω, F ) into (R, B) is a said
to be measurable function if for each Borel set B ∈ B, the set {w ∈ Ω : f (w) ∈
B} ∈ F . Here B is the Borel σ-field of subsets of R. From the above definition, a
random variable X is a measurable function from (Ω, F ) to (R, B).
Remark 7.2.4 As X : Ω → R, a statement of type a ≤ X ≤ b is to be understood
as {w ∈ Ω : a ≤ X(w) ≤ b}. In that case P(a ≤ X ≤ b) = P{w ∈ Ω : a ≤ X(w) ≤ b}.
In general for a Borel set B, P(X ∈ B) = P{w ∈ Ω : X(w) ∈ B}. Here it is assumed
that (Ω, F , P) is the underlying probability space.
The function which records the probabilities associated with the random vari-
able is defined as below.
Definition 7.2.4 (Cumulative Distribution Function of a Random Vari-
able) The cumulative distribution function (CDF) of a random variable X is given
as
FX (x) = P(X ≤ x) = P(w ∈ Ω : X(w) ≤ x), − ∞ < x < ∞ .
For a < b, we have
P(a < X ≤ b) = P(w ∈ Ω : a < X(w) ≤ b) = FX (b) − FX (a) .
Definitions and Simple Stochastic Processes 219
Example 7.2.3 Let Ω = {a, b, c, d} and F = {∅, {a}, {b, c, d}, Ω}. Let (Ω, F , P) be
the given probability space where P({a}) = 1/4 and P({b, c, d}) = 3/4. Define
⎧
⎪
⎪ 0, w=a
⎪
⎪ 1,
⎪
⎨ w=b
X(w) = ⎪⎪
⎪ 2,
⎪ w=c
⎪
⎩ 3, w = d,
and
0, w=a
Y(w) =
1, w ∈ {b, c, d}.
Check whether X and Y are random variables? If so, find the corresponding CDFs.
Solution As X−1 {(−∞, 1]} = {a, b} F , the function X is not a random variable.
Hence CDF of X is not defined.
Next, for Y we have
⎧
⎪
⎪ ∅, −∞ < y < 0
−1
⎪
⎨
Y {(−∞, y]} = ⎪ ⎪ {a}, 0≤y<1
⎪
⎩ Ω, 1≤y<∞.
5 6
Since, for every y ∈ R, Y−1 (−∞, y] ∈ F , Y is a random variable. The CDF of Y
is given by ⎧
⎪
⎪ 0, −∞ < y < 0
⎪
⎨1
FY (y) = ⎪
⎪ , 0≤y<1
⎪ 4
⎩ 1, 1≤y<∞.
To understand uncertainty in a static system, we employ standard probability
theory. But to handle uncertainty in a dynamic system, e.g. the price of a share of
stock, we need to go beyond to capture evolution over time. This can be studied
via a collection of random variables namely a stochastic process with time as index
set. Now, we define the stochastic process formally.
Definition 7.2.5 (Stochastic Process) Let (Ω, F , P) be a given probability
space. A collection of random variables {X(t), t ∈ T} defined on the probability
space (Ω, F , P) is called a stochastic process.
A stochastic process is also called a random process or a chance process.
220 Financial Mathematics: An Introduction
Remark 7.2.5 Given a probability space (Ω, F , P), the stochastic process {X(t), t ∈
T} can be identified as a real-valued function X : T × Ω → R of two independent
variables t ∈ T, w ∈ Ω such that X−1 (t) {(−∞, x]} belongs to F for every x ∈ R and
for every t ∈ T. Here
In other words, if the future prediction depends only on the current state of the
stochastic process and does not depend on the past information, then it has the
Markov property.
Definition 7.2.14 (Markov Process) A given stochastic process {X(t), t ∈ T}
is said to be a Markov process if it satisfies Markov property.
A Markov process is a stochastic process with property that, given the value
of X(s), the values of X(t), t > s, do not depend on the values of X(u), u < s, i.e.
the probability of any particular future behavior of the process, when it’s present
state is known exactly, is not altered by additional knowledge concerning it’s past
behavior. For instance, binomial process, Poisson process and Brownian motion
are examples of Markov process whereas Gaussian process is an example of non
Markov process. All these processes will be discussed later in this chapter.
Stochastic processes can be classified according to different criteria. One of
them is the distribution of finite dimensional random variables. Based on this,
we list few stochastic processes which are related to stochastic study of financial
mathematics.
Definition 7.2.15 (Bernoulli Process) A discrete time discrete space stochas-
tic process {Xn , n = 1, 2, . . .} is called a Bernoulli process if for each n, Xn is a
Bernoulli random variable with parameter p, 0 < p < 1.
Here we may note that the Bernoulli process is strict sense as well as wide sense
stationary process.
Definition 7.2.16 (Binomial Process) Let for n = 1, 2, . . ., Sn = X1 + X2 +
. . .+Xn where Xi ’s are mutually independent Bernoulli distributed random variables
with parameter p, 0 < p < 1. Then {Sn , n = 1, 2, . . .} is called a binomial process.
Here we may note that {Sn , n = 1, 2, . . .} is a discrete time discrete space stochastic
process. Also each Sn is a binomial distributed random variable.
224 Financial Mathematics: An Introduction
Sn = Sn−1 + Xn .
Hence,
P(Sn = x, Sn−1 = xn−1 , . . . , S1 = x1 )
P(Sn = x/Sn−1 = xn−1 , . . . , S1 = x1 ) =
P(Sn−1 = xn−1 , . . . , S1 = x1 )
P(Sn = x/Sn−1 = xn−1 )P(Sn−1 = xn−1 /Sn−2 = xn−2 ) . . . P(S2 = x2 /S1 = x1 )P(S1 = x1 )
=
P(Sn−1 = xn−1 /Sn−2 = xn−2 ) . . . P(S2 = x2 /S1 = x1 )P(S1 = x1 )
P(Xn = x − xn−1 )P(Xn−1 = xn−1 − xn−2 ) . . . P(X2 = x2 − x1 )P(S1 = x1 )
=
P(Xn−1 = xn−1 − xn−2 ) . . . P(X2 = x2 − x1 )P(S1 = x1 )
= P(Xn = x − xn−1 )
= P(Sn = x/Sn−1 = xn−1 ) .
where
and
Cov(X(ti ), X(t j )) = E[(X(ti ) − E(X(ti ))(X(t j ) − E(X(t j )))] .
Definitions and Simple Stochastic Processes 225
(ii) A Gaussian process which is weak stationary is also strict sense stationary.
Further, in any Gaussian process, its finite dimensional random variables are
independent if and only if those random variables are uncorrelated, i.e. the
corresponding covariance matrix Σ is a diagonal matrix.
Definition 7.2.18 (Symmetric Random Walk) Consider a random exper-
iment of tossing a fair coin infinitely many times. Let the successive outcomes
be denoted as w = (w1 , w2 , w3 , . . .) e.g. w = (w1 , w2 , w3 , . . .) = (H, T, T, . . .) or
(T, H, T, . . .) etc. We now define, for j = 1, 2, . . .
1, if w j = H
Xj =
−1, if w j = T,
and
P(X j = 1) = P(w j = H) = 0.5 ; P(X j = −1) = P(w j = T) = 0.5 .
Set S0 = 0. Let
k
Sk = X j, (k = 1, 2, . . .) .
j=1
0
Sk
−2
−4
−6
−8
0 20 40 60 80 100
k
(ii) We choose an arbitrary positive integer n and then choose non-negative inte-
gers 0 = k0 < k1 < . . . < kn . Then
ki+1
Ski+1 − Ski = Xj .
j=ki +1
k2
Sk2 − Sk1 = X( j) .
j=k1
Since X j are i.i.d random variables having Bernoulli distribution, Sk2 − Sk1 has
the same distribution of Sk2 −k1 −S0 . Hence the stochastic process {Sk , k = 0, 1, . . .}
has the stationary increment property.
(iv) We have for k = 1, 2, . . .
Sk = Sk−1 + Xk .
Definitions and Simple Stochastic Processes 227
Now,
P(Sk ≤ x, Sk−1 = xk−1 , . . . , S1 = x1 )
P(Sk ≤ x/Sk−1 = xk−1 , . . . , S1 = x1 =
P(Sk−1 = xk−1 , . . . , S1 = x1 )
14
12
10
N(t) 8
0
0 2 4 6 8 10
t
Example 7.2.5 Consider a financial risk model that may occur in an insurance
company. Let N(t) denote the number of claims received by time t, t ≥ 0. Thus,
N(t) counts the number of claims received in (0, t]. Suppose the process {N(t), t ≥ 0}
is a Poisson process with claim arrival rate 9 per month of 30 days. In a randomly
chosen month of 30 days,
(i) what is the probability that there are exactly 4 claims were received in the first
15 days?
(ii) given that exactly 4 claims were received in the first 15 days, what is the
probability that all the four claims were received in the last 7 days out of these
15 days?
Solution We have
e−λt (λt)k
P(N(t) = k) = , (k = 0, 1, . . .),
k!
9
where λ = per day.
30
(i) The required probability is
9
e− 2 94
P(N(15) = 4) = .
384
Brownian Motion and its Properties 229
3.5
2.5
W(t) 1.5
0.5
−0.5
−1
−1.5
0 2 4 6 8 10
t
We know that, W(t0 + ∆t) − W(t0 ) has normal distribution with mean zero and
variance ∆t, i.e. N(0, ∆t). Hence,
W(t0 + ∆t) − W(t0 ) 1 1
lim Var = lim × ∆t = lim .
∆t→0 ∆t ∆t→0 (∆t)2 ∆t→0 ∆t
Thus the above limit does not exist. Hence, we conclude that Brownian motion
has non-differentiability at a fixed point. We observe that nowhere differentiability
of Brownian motion requires a more careful argument than non-differentiability
at a fixed point. The precise mathematical proof is not presented in this book but
rather the above argument may convince to conclude nowhere differentiability of
Brownian motion (refer Karatzas and Shreve [75] for proof ).
Remark 7.3.2 The Wiener process is not wide sense stationary. This is because
for s < t, the covariance function Cov (W(t), W(s)) is not a function of (t − s). In
fact
Cov (W(t), W(s)) = E[(W(t) − E(W(t)))(W(s) − E(W(s)))]
= E[W(t)W(s)]
= E[(W(t) − W(s) + W(s))W(s)]
= E[W(t) − W(s)]E[W(s)] + E[(W(s))2 ]
= (0 × 0) + s = s .
Brownian Motion and its Properties 231
Proof. From Definition 7.2.17, a Wiener process is also a Gaussian process. Since
{W(t), t ≥ 0} is a Markov process as well as a Gaussian process, we have
Let us now consider the joint distribution of (W(t1 ), W(t2 )). We know that, W(t1 )
and (W(t2 ) − W(t1 )) are independent. Also, W(t1 ) is N(0, t1 ) and W(t2 ) − W(t1 ) is
N(0, t2 − t1 ). Therefore, the joint probability density function of (W(t1 ), W(t2 )) is
where 2
1 x
p(x, t) = √ exp − , t > 0; −∞ < x < ∞ .
2πt 2t
Thus,
2 8
1 1 x1 (x2 − x1 )2
f (x1 , x2 ) = exp − + .
2π t1 (t2 − t1 ) 2 t1 (t2 − t1 )
Remark 7.3.4 The above Lemma can be generalized in the sense that for 0 <
t1 < t2 < · · · < tn , X = (W(t1 ), W(t2 ), . . . , W(tn )) is jointly normally distributed
with the joint probability density function given by
1 1 T −1
fX (x) = exp − (X − µ) Σ (X − µ) ,
(2π)n/2 (det(Σ))1/2 2
232 Financial Mathematics: An Introduction
where
and
2
X(t)
−1
−2
0 2 4 6 8 10
t
We note that, BM has independent increments. Hence given X(t), the future X(t +
h) only depends on the future increment of the BM. Thus future is independent
of the past and therefore the Markov property is satisfied. Hence, {X(t), t ≥ 0} is a
Markov process.
Because a geometric Brownian motion is nonnegative, it provides for a more
realistic model of stock prices. Also, the GBM model considers the ratio of stock
prices to have the same normal distribution. Therefore, the percentage change in
price as opposed to the absolute change in price is modeled by a GBM. Fig. 7.6
shows a sample path of geometric Brownian motion.
How does geometric Brownian motion relate to stock prices? One possibility is
to think of modeling the rate of return of the stock price as a Brownian motion.
Suppose that the stock price S(t) at time t is given by
1.6
1.4
1.2
1
X(t)
0.8
0.6
0.4
0.2
0 0.2 0.4 0.6 0.8 1
t
where S(0) is the initial price and H(t) = µt + σW(t) is a Brownian motion with
drift. In this case, H(t) represents a continuously compounded rate of return of
the stock price over the period of time [0, t]. Here, H(t) refer to the logarithmic
growth of the stock price, satisfies
S(t)
H(t) = ln .
S(0)
This gives
ln(S(t)) = ln(S(0)) + H(t) .
Therefore, ln(S(t)) has a normal distribution with mean µt + ln(S(0)) and variance
σ2 t. As we have seen, if a random variable X has the property that lnX has normal
distribution, then the random variable X is said to have a lognormal distribution.
Accordingly, S(t)/S(0) is lognormal distributed random variable.
Example 7.4.1 Suppose that the stock price S(t) at time t is given by S(t) =
S(0) eH(t) where S(0) is the initial price and H(t) = µt + σW(t) is a Brownian
motion with drift µ and volatility σ. Prove that
(i) E(S(t)) = S(0) exp µ + σ2 t .
2
2
(ii) Var(S(t)) = S(0) exp µ + σ2 t
2
exp σ2 t − 1 .
236 Financial Mathematics: An Introduction
(ii)
2
2
(r+ σ2 )t
eσ t − 1 .
2
Var(S(t)) = S(0) e
Here we observe that, the expected stock price depends not only on the drift µ of
H(t) but also on the volatility σ. Further, it shows that, the expected price grows
like a fixed-income security with continuously compounded interest rate r. In real
scenario, r is much lower than r, the real fixed-income interest rate, that is why
one invests in stocks. But the stock has variability due to the randomness of the
underlying Brownian motion and hence a risk is involved here.
Example 7.4.2 Suppose that stock price {S(t), t ≥ 0} follows geometric Brownian
motion with drift µ = 0.12 per year and volatility σ = 0.24 per annum. Assume
that, the current price of the stock is S(0) = Rs 40. What is the probability that
a European call option having four years to exercise time and with a strike price
K = Rs 42, will be exercised?
Solution We have
S(4) 42
P (S(4) > 42) = P >
40 40
S(4) 42
= P ln > ln .
40 40
S(4)
Since ln follows normal distribution with mean 0.48 and variance (0.48)2 ,
40
we get
⎛
⎞
⎜⎜ S(4) 42 ⎟⎟
⎜⎜ ln − 0.48 ln − 0.48 ⎟⎟⎟
S(4) 42 ⎜
⎜ 40 ⎟⎟
= P ⎜⎜⎜⎜
40 ⎟⎟
P ln > ln > ⎟⎟
40 40 ⎜⎜ 0.48 0.48 ⎟⎟
⎜⎝ ⎠
⎛
⎞
⎜⎜ ln 42 − 0.48 ⎟⎟
⎜⎜ ⎟⎟
⎜ 40 ⎟⎟
= 1 − Φ ⎜⎜⎜ ⎟⎟
⎜⎜ 0.48 ⎟⎟
⎝ ⎠
= 1 − Φ(−0.8983)
= Φ(0.8983) = 0.3133,
, x
y2
1 −
Φ(x) = √ e 2 dy.
2π −∞
Definition 7.4.4 (Ornstein-Uhlenbeck Process) Let {W(t), t ≥ 0} be a
Wiener process. Define
where a and b are strictly positive real numbers and X(0) is independent of W(t).
Then, we say {X(t), t ≥ 0} is an Ornstein-Uhlenbeck process.
We note that, {X(t), t ≥ 0} is a Markov process. It is also a Gaussian process if
X(0) = x(0) is fixed or X(0) is Gaussian.
In the “real” world, we observe that asset price processes have jumps or spikes
and risk-managers have to take them into account. We need processes that can
describe the observed reality of financial markets in a more accurate way than
models based on Brownian motion. Levy processes provide us with the appropriate
framework to model both in the “real” and in the “risk-neutral” world.
Definition 7.4.5 (Levy Process) A stochastic process {X(t), t ≥ 0} is said to
be a Levy process if it satisfies the following properties
(i) X(0) = 0,
(ii) for all n and for 0 ≤ t0 < t1 < t2 < . . . < tn , increments X(ti ) − X(ti−1 ), i =
1, 2, . . . , n, are independent and stationary,
(iii) for a > 0, P(| X(t) − X(s) |> a) → 0 when t → s.
Remark 7.4.2 Let b be a constant and X(t) = bt. Then {X(t), t ≥ 0} is a Levy
process.
Remark 7.4.3 A Wiener process {W(t), t ≥ 0} defined in Definition 7.3.1, is a
Levy process in R that has continuous paths and has the Gaussian distribution
with mean zero and variance ∆t for its increments W(t + ∆t) − W(t). The most
general continuous Levy process in R has the form X(t) = bt + cW(t), t ≥ 0, where
b and c are real constants.
Remark 7.4.4 A Poisson process {N(t), t ≥ 0} with parameter λ defined in Def-
inition 7.2.19 is a Levy process that is a counting process having the Poisson
distribution with mean λ∆t for its increments N(t + ∆t) − N(t).
Exercises 239
The above three processes namely deterministic process X(t) = bt, Wiener
process and Poisson process are Levy processes. It turns out that all Levy processes
can be built up out of these building blocks. We may refer to Applebaum [4] for
further reading in this regard.
7.6 Exercises
Exercise 7.1 Consider the binomial model for trading in stock, t = 1, 2, where at
each time the stock can go up by the factor u or down by the factor d. The sample
space Ω = {(u, u), (u, d), (d, u), (d, d)}. Create one non-trivial σ-field and the largest
σ-field on Ω.
Exercise 7.2 Construct an example of σ-fields F (∞) and F (∈) such that F (∞)∪
F (∈) is not a σ-field.
Exercise 7.3 Prove that there does not exist a σ-field which contains exactly 6
elements.
240 Financial Mathematics: An Introduction
Exercise 7.4 Consider Ω = {1, 2, 3, 4}. Let σ-algebra F = {∅, {1}, {2, 3, 4}, Ω}.
Construct a random variable on the measurable space (Ω, F ).
Exercise 7.5 Consider a random experiment of tossing an unbiased coin three
times. Let Ω denote the set of all possible outcomes. Let the random variable X be
the number of heads observed. Find the cumulative distribution function F(X, x).
Exercise 7.6 Let Xn , for n√ even take values +1 and −1 each with probability 0.5,
and for n odd, take values a, −1 √ with probability 1 , a respectively (a > 0, a
a a+1 a+1
1). Further, let Xn ’s be independent, show that the stochastic process {Xn , n ≥ 1}
is wide sense stationary but not strict stationary.
x
Exercise 7.7 Let {X(t), t ≥ 0} be a stochastic process with independent increments
and X(0) = 0. Show that Cov (X(s), X(t)) = Var (X(Min(s, t))), for any s, t (t, s > 0) .
Exercise 7.8 Let X and Y be i.i.d. random variables each having uniform distri-
bution on the interval (−π, π). Let Z(t) = cos(tX + Y), t ≥ 0. Is {Z(t), t ≥ 0} wide
sense stationary process?
Exercise 7.9 Consider an urn containing 100 red balls and 100 black balls. Balls
are drawn one by one without replacement. Let Xn be the number of red balls
remaining in the urn after the nth ball is drawn. Is {Xn , n = 0, 1, . . .} a Markov
process?
Exercise 7.10 Let Yn = a0 Xn + a1 Xn−1 ; n = 1, 2, . . . where a0 , a1 are constants and
Xn , (n = 0, 1, . . .) are i.i.d random variables with mean 0 and variance σ2 . Is
{Yn , n = 1, 2, . . .} a Markov process?
Exercise 7.11 Let {N(t), t ≥ 0} be a Poisson process with parameter λ. Suppose
N(t) denotes the number of events that occur in the interval [0, t], prove that the
inter arrival of successive events are independent and are exponentially distributed
with parameter λ.
Exercise 7.12 Let {W(t), t ≥ 0} be the Brownian motion. Prove that, E((W(t) −
W(s))4 ) = 3(t − s)2 .
Exercise 7.13 Let {W(t), t ≥ 0} be the Brownian motion. Prove that
(W(t1 ), W(t2 ), . . . , W(tn )) is jointly normal distributed with CDF for 0 < t1 < t2 <
· · · < tn is given by
Exercises 241
1
P(W(t1 ) ≤ a1 , W(t2 ) ≤ a2 , . . . , W(tn ) ≤ an ) = ×
(2π)n t1 (t2 − t1 ) · · · (tn − tn−1 )
, a1 , an 2
1 x1 (x2 − x1 )2 (xn − xn−1 )2
··· exp − + + ··· + dx1 · · · dxn .
−∞ −∞ 2 t1 t2 − t1 tn − tn−1
Exercise 7.14 Let X be a normally distributed random variable with mean µ and
variance σ2 . Let u be a fixed number in R and define the convex function φ(x) = eux
for all x ∈ R. Prove that
1 2 2
(i) E(φ(X)) = euµ+ 2 u σ .
(ii) Verify the Jensen’s inequality holds E(φ(X)) ≥ φ(E(X)).
Exercise 7.15 Let {W(t), t ≥ 0} be a Wiener process. Find the conditional distri-
bution of W(t) given that W(s) = c (where c is a constant) when s < t.
Exercise 7.16 Let {W(t), t ≥ 0} be a Wiener process. Find the conditional distri-
bution of W(s/2) given that W(s) = x.
Exercise 7.17 Show that for any T > 0, V(t) = W(t + T) − W(T) is a Wiener
process if W(t) is a Wiener process.
Exercise 7.18 Let {W(t), t ≥ 0} be a Brownian motion. Prove that {tW(1/t), t ≥ 0}
where tW(1/t) is taken to be zero when t = 0, is a Brownian motion.
Exercise 7.19 Let {W(t), t ≥ 0} be a Wiener process. Prove that distribution of
{W(t), 0 ≤ t ≤ 1} and {W(1) − W(1 − t), 0 ≤ t ≤ 1} are the same.
Exercise 7.20 Consider the process
W(t), t<1
Y(t) =
W(t) + Z, t≥1
where W(t) is a Wiener process and Z ∼ N(0, 1) and is independent of W(t). Show
that {Y(t), t ≥ 0} not a Levy process.
8
Filtration and Martingale
8.1 Introduction
Conditional expectation is an extremely important concept in probability theory.
Traditionally in a typical probability course, the well known gambler’s ruin prob-
lem is presented as a motivational example. But now the concept of conditional
expectation has found much favor in financial mathematics because it provides
the basis for two of the most important concepts namely, the filtration and the
martingale.
In gambler’s ruin problem, a gambler starts with an amount of Rs N. Then an
unbiased coin is flipped, landing head with probability 0.5 and landing tail with
probability 0.5. If the coin lands head, he/she gains Rs 1, otherwise he/she loses
Rs 1. The game continues until he/she loses all his/her amount of Rs N. Let Xn
be the fortune at the nth game, and Sn be his/her capital after n games. An inter-
esting question here is to know his/her fortune, on an average, on the next game
given his/her current fortune. To answer this question, we need the conditional
expectation of the random variables {Xn , n = 0, 1, . . .} given the information up to
r, r < n.
We next present another example which is more relevant to us. Let X(t) denote
the share price of a particular stock at time t. Assume that, we have the informa-
tion about the share price up to time s > 0. We may like to know the expected or
average share price of the stock at a future time, say, s + 5 given that X(s) = 10
(say). To make such statements precise and also to answer them, we need the
conditional expectation of random variables {X(t), t ≥ 0} given the information up
to time s, s < t.
The above two examples certainly make a case for the study of conditional
expectation. But for a better understanding and deeper study of various other
244 Financial Mathematics: An Introduction
We say that the expectation E(X) exists provided E(X) < ∞. Obviously not all
random variables have expectation. For example, if the discrete random variable
X takes value 2n , n = 1, 2, . . . with probability mass function (p.m.f.)
P(X = 2n ) = 2−n , (n = 1, 2, . . .) ,
then E(X) does not exist. In a similar manner, if the continuous random variable
X has the p.d.f.
1
f (x) = , −∞<x<∞ ,
π(1 + x2 )
then E(X) does not exist.
Now, we define the conditional expectation and present some of its properties.
Definition 8.2.1 (Conditional Expectation of a Random Variable) For
a discrete random variable X and a discrete random variable Y, the conditional
expectation of X given the event Y = y is defined as
E(X/Y = y) = x P(X = x/Y = y) .
x
Remark 8.2.1 Looking at the expression of E(X/Y = y), it is simple to note that
it is a function of y. If we write this function of y as f (y) = E(X/Y = y), then
we write E(X/Y) for f (Y). This is a discrete random variable. It is called the
conditional expectation of X given Y and is denoted by E(X/Y).
Remark 8.2.2 Definition 8.2.1 of conditional expectation can be extended in an
obvious manner even when X is continuous random variable. We note that Y is
still a discrete random variable. Therefore if we define
1, w∈B
1B (w) =
0, wB,
denotes the indicator function of the set B.
Example 8.2.1 Let X denote the outcome of tossing of an unbiased die. Let Y
be a random variable which takes the value +1 if the outcome is an even number
and it takes the value −1 if the outcome is an odd number.
(i) Find E(X/Y = 1) and E(X/Y = −1).
(ii) Let B = {1, 2, 3}. Find E(X/B).
Solution Here Ω = {1, 2, 3, 4, 5, 6} and P(X = i) = 1
6
for i = 1, 2, 3, 4, 5, 6.
(i) We have A1 = {w ∈ Ω : Y(w) = 1} = {2, 4, 6} and A2 = {w ∈ Ω : Y(w) = −1} =
{1, 3, 5}. Hence
E(X1A1 )
E(X/Y = 1) = E(X/A1 ) = .
P(A1 )
But
1 1 1
E(X1A1 ) = 2 × +4× +6× ,
6 6 6
and
1 1 1
P(A1 ) = P({2, 4, 6}) = + + .
6 6 6
Hence,
E(X1A1 )
E(X/Y = 1) = =4.
P(A1 )
Similarly,
E(X1A2 )
E(X/Y = −1) = =3.
P(A2 )
(ii) We have
E(X1B )
E(X/B) = =2.
P(B)
Properties of Conditional Expectation
Using the above definition of conditional expectation we present the following
properties for E(X/Y). We do not prove these results here and shall refer to Mikosch
[97] for the same.
Conditional Expectation E(X/Y) and Calculation Rules 247
and
A2 = {w ∈ Ω : X(w) = 1.5} = {(d, u), (d, d)} .
Hence to determine the required σ-field we need to consider the family U =
{A1 , A2 } of subsets of Ω. Thus
Then the σ-field generated by the random variable X is defined as the σ-field
generated by the family U. In this case,
σ(X) = σ(U)
= {∅, {(u, u), (u, d)}, {(d, u), (d, d)}, Ω} .
Definition 8.3.4 (σ-Field Generated by a Random Vector)
Let Y = (Y1 , . . . , Yn ) be an n-dimensional random vector. Then the σ-field gener-
ated by the random vector Y = (Y1 , Y2 , . . . , Yn ) is the smallest σ-field containing
all n-dimensional Borel sets, i.e. sets of the form
and ⎧
⎪
⎪ 0.25, w = (u, u)
⎪
⎪ 0.5,
⎪
⎨ w = (u, d)
X2 (w) = ⎪
⎪
⎪
⎪ 0.75, w = (d, u)
⎪
⎩ 1.00, w = (d, d) .
Find the σ-field generated by the random vector X = (X1 , X2 ).
More on σ-Fields 251
Solution We have
and
A2 = {w ∈ Ω : X1 (w) = 1.5} = {(d, u), (d, d)} .
Similarly, we have
and
B4 = {w ∈ Ω : X2 (w) = 1.00} = {(d, d)} .
Hence σ-field generated by X = (X1 , X2 ) is the σ-field generated by the family U
where
U = {A1 , A2 , B1 , B2 , B3 , B4 } .
Therefore the desired σ-field is
for any n-dimensional Borel set C, and for any choice of ti ∈ [0, t], i ≥ 1 is called
the σ-field generated by the Brownian motion W.
Remark 8.3.4 For a random variable, a random vector or a stochastic process Y
on Ω, the σ-field σ(Y) generated by Y contains all the essential information about
the structure of Y as a function of w ∈ Ω. It consists of all subsets {w : Y(w) ∈ C}
for all suitable sets C. In general, C has to be any n-dimensional Borel set (n ≥ 1),
n being equal to 1 for the case of a random variable. In this situation we agree to
the terminology that Y contains information represented by σ(Y) or Y carries the
information σ(Y).
8.4 Filtration
Filtration is important in financial mathematics, because it allows to model the
flow of information. Of course, the information increases as time goes by. Before
we give the formal definition of a filtration, we consider below the given situation.
Let us consider the random experiment of tossing a coin three times. Here the
set of all possible outcomes is
Now, after the coin is tossed three times, every subset of Ω is resolved. Hence
the family F3 of 256 subsets of Ω which is the set of all subsets of Ω. That is, F3
is the total σ-field. Since Fi is the set of events that have been decided by the end
of ith toss, we have more information at the end of (i + 1)th toss than at the end
of ith toss.
Taking motivation from the above example we give the definition of filtration
in discrete time.
Definition 8.4.1 (Filtration in a Discrete Time) Let Ω be the set of all
possible outcomes of a random experiment and F0 = {∅, Ω}. Then a filtration in
discrete time is an increasing sequence F0 ⊂ F1 ⊂ . . . of σ-fields, one per time
instant.
The σ-field Fn may be thought of as the events of which the occurrence is de-
termined at or before time n, the “known events” at time n. The interpretation
of F0 = {∅, Ω} is that, at the beginning, one has no information. Further the
interpretation of F0 ⊂ F1 ⊂ · · · is that the state of information increases over
time.
Example 8.4.1 Consider Ω = {a, b, c, d}. Construct σ-fields Fi , (i = 0, 1, 2), such
that F0 ⊂ F1 ⊂ F2 .
Solution Obviously F0 = {∅, Ω}. Let F1 = {∅, {a, b}, {c, d}, Ω} and F2 = {∅, {a},
{b}, {c}, {d}, {a, b}, {c, d}, {a, d}, {a, c}, {b, c}, {b, d}, {a, b, c}, {c, d, a}, {d, a, b}, {b, c, d}, Ω}.
Then, F0 ⊂ F1 ⊂ F2 .
Definition 8.4.2 (Filtration in a Continuous Time) Let Ω be the set of all
possible outcomes of a random experiment. Let T be a fixed positive number and
assume that for each t ∈ [0, T], there is a σ-field Ft . Assume further that, if s ≤ t,
then every set in Fs is also in Ft . Then, the collection of σ-fields {Ft , 0 ≤ t ≤ T}
is called a filtration in continuous time.
Thus a collection of σ-fields {Ft , t ≥ 0} is called a filtration in continuous time if
Fs ⊂ Ft for all 0 ≤ s ≤ t.
Remark 8.4.1 Filtration is used to model the flow of information over time. As
an example, we think of Xt as the price of some asset at time t and Ft as the
information obtained by watching all the prices in the market up to time t.
Definition 8.4.3 (Natural Filtration) The natural filtration of a discrete time
stochastic process {X0 , X1 , . . .} is defined by the filtration {Fn , n = 0, 1, . . .} where
Fn is the σ-field generated by the random vector (X0 , X1 , . . . , Xn ).
254 Financial Mathematics: An Introduction
Thus Fn contains all events that depend on the first (n + 1) elements of the
stochastic process. It gives the “history” of the process up till time n. A convenient
notation to describe a σ-field corresponding to observing a random vector X is
σ(X). Thus σ(X), the σ-field generated by X, consists of all events that can be
expressed in X: events of the type {X ∈ C}, C being (n + 1) dimensional Borel
sets. In this notation, the natural filtration of a discrete time stochastic process
{X0 , X1 , . . .} can be written as Fn = {(X0 , X1 , . . . , Xn )}.
The natural filtration of a continuous time stochastic process {X(t), t ≥ 0} is
defined by the filtration {Ft , t ≥ 0} where Ft = σ(X(s), s ≤ t).
Definition 8.4.4 (Adapted Process) We say that a discrete time stochastic
process {X0 , X1 , . . .} is adapted to a given filtration {Fn , n = 0, 1, . . .} if the σ-field
generated by Xn is a subset of Fn means σ(Xn ) ⊂ Fn , for every n. In a similar
manner, a continuous time stochastic process {X(t), t ≥ 0} is said to be adapted to
a given filtration Ft , t ≥ 0} if σ (X(t)) ⊂ Ft for all t ≥ 0.
Thus the events connected to an adapted process up to time n are known at time
n. For instance, suppose Sn is the price of a stock at the end of nth day then the
price process {Sn , n = 0, 1, 2, . . .} is adapted to natural filtration {Fn , n = 0, 1, 2, . . .}
where Fn is the history up to the end of nth day, and F0 = {∅, Ω}, Ω = [0, ∞). In
a similar manner for continuous time stochastic process {S(t), t ≥ 0} is adapted to
{Ft , t ≥ 0} where Ft is the history up to time t.
Remark 8.4.2 The natural filtration corresponding to a process is the smallest
filtration to which it is adapted. If the process {Y0 , Y1 , . . .} is adapted to the natural
filtration of a stochastic process {X0 , X1 , . . .} then for each n the variable Yn is a
function σ(X0 , X1 , . . . , Xn ) of the sample path of the process X up till time n.
Example 8.4.2 Consider Example 8.4.1. Define
X0 (w) = {1, w ∈ {a, b, c, d}} ,
1, w ∈ {a, b}
X1 (w) =
−1, w ∈ {c, d} ,
and
⎧
⎪
⎪ 1, w=a
⎪
⎪
⎪
⎨ 2, w=b
X2 (w) = ⎪
⎪
⎪
⎪ 3, w=c
⎪
⎩ 4, w=d.
Verify that {Xi , i = 0, 1, 2} is an adapted process to Fi , (i = 0, 1, 2).
Conditional Expectation E(X/F ) and Calculation Rules 255
From Definition 8.5.1, we conclude that when G is the σ-field generated by Y, then
we write E(X/G) for the random variable E(X/Y). Thus E(X/G) is the expected
value of X given the information G. This result is valid for the continuous random
variable also.
Conditional Expectation E(X/F ) and Calculation Rules 257
Example 8.5.1 Consider a binomial model with t = 1, 2. Let St be the stock price
at time t. Let Ω = {(u, u), (u, d), (d, u), (d, d)}. Let σ-field F be the power set of Ω
and P(w) = 14 for all w ∈ Ω. Let G = {∅, {(u, u), (u, d)}, {(d, u), (d, d)}, Ω}. Define a
discrete random variable X as
⎧
⎪
⎪ 0.25, w = (u, u)
⎪
⎪
⎪
⎨ 0.5, w = (u, d)
X(w) = ⎪ ⎪
⎪
⎪ 0.75, w = (d, u)
⎪
⎩ 1.00, w = (d, d) .
Find E(X/G).
Solution Define 3
, w ∈ {(u, u), (u, d)}
Z(w) = 8
7
8
, w ∈ {(d, u), (d, d)} .
Then we have
⎧
⎪
⎪ 3
⎪
⎪ ∅, −∞ < z <
⎪
⎪ 8
⎪
⎪
⎪
⎨ {(u, u), (u, d)}, 3 7
Z−1 {(−∞, z]} = ⎪ ≤z<
⎪
⎪ 8 8
⎪
⎪
⎪
⎪ 7
⎪ Ω,
⎪ ≤z<∞.
⎩
8
Hence, Z is G-measurable.
For A1 = {(u, u), (u, d)}, we have
1 1 3
E(X1A1 ) = 0.25 × + 0.50 × = ,
4 4 16
and for A2 = {(d, u), (d, d)}, we have
1 1 7
E(X1A2 ) = 0.75 × + 1.00 × = .
4 4 16
Now, for A1 = {(u, u), (u, d)}, we have
3 1 1 3
E(Z1A1 ) = × + = ,
8 4 4 16
and for A2 = {(d, u), (d, d)}, we have
7 1 1 7
E(Z1A2 ) = × + = .
8 4 4 16
By Definition 8.5.1, the random variable Z is the conditional expectation of X
given the σ-field G.
258 Financial Mathematics: An Introduction
Result 8.5.1 Let X and Y be two integrable random variables and a and b be two
real numbers, and G be the sub-σ-field of F . Then,
(i) E((aX + bY)/G) = aE(X/G) + bE(Y/G),
(ii) If X and G are independent, then E(X/G) = E(X). It means that we do not
gain any information about X, if we know G and vice versa.
(iii) If σ-field σ(X) ⊂ G, then E(X/G) = X. This means that the information con-
tained in G provides us with the whole information about the random variable
X. Hence, X can be treated as a constant.
(iv) If X is G-measurable, then E(XY/G) = XE(Y/G). Given G, we can deal with
X as if it is a constant, hence we can pull X(w) out of the updated expectation
and write it in front of E(Y/G).
(v) If G1 and G2 are two sub σ-fields of F with G1 ⊆ G2 , then E(E(X/G1 )/G2 ) =
E(E(X/G2 )/G1 ) = E(X/G1 ).
For the proofs of above results we shall refer to Mikosch [97].
Example 8.5.2 Suppose G = {∅, Ω}. Find E(X/G).
Solution The given σ-field {∅, Ω} is the trivial σ-field containing no information.
The only random variables which are measurable with respect to the trivial σ-field
are constants. Hence, E(X/G) = E(X) = c, where c is a constant.
√
Example
√ 8.5.3 Consider a discrete random variable X which takes value − b
and b with probabilities 13 and 23 respectively. Find E(X/σ(X2 )). Here σ(X2 ) is
the σ-field generated by the random variable X2 .
Solution We have
√ 2 √ 1
P(X = b/X2 = b) = ; and P(X = − b/X2 = b) = .
3 3
Hence, √ √ √
2 b b b
E(X/σ(X )) =2
− = .
3 3 3
Martingales 259
8.6 Martingales
The theory of martingale plays a very important and useful role in the study of
financial mathematics. A formal definition is given below.
Definition 8.6.1 (Discrete Time Martingale) Let (Ω, F , P) be a probability
space. Let {Xn , n = 0, 1, . . .} be a stochastic process and {Fn , n = 0, 1, . . .} be the
filtration. The stochastic process {Xn , n = 0, 1, . . .} is said to be a martingale cor-
responding to the filtration {Fn , n = 0, 1, . . .} if it satisfies the following conditions
(i) For every n, E(Xn ) exists.
(ii) Each Xn is Fn -measurable.
(iii) For every n, E(Xn+1 /Fn ) = Xn .
Remark 8.6.1 The definition of martingale depends on the collection of σ-fields
Fn . For clarity, one can say that (Xn , Fn ) is a martingale.
Remark 8.6.2 From the definition of martingale and using the properties of con-
ditional expectation, we observe that if {Xn } is a martingale then E(Xn+1 ) = E(Xn )
for every n. This implies that E(Xn ) = c, a constant. Therefore if, for some n > 0,
E(Xn ) < ∞ and the increments Xn+1 − Xn of the martingale {Xn } are bounded, then
E(Xn ) = E(X0 ).
Result 8.6.1 We can generate martingale sequences by the following procedure.
Given any increasing family of σ-fields {Fn }, and any integrable random variable
X on (Ω, F , P), we take Xn = E(X/Fn ) and it is easy to check that {(Xn , Fn )} is a
martingale sequence. Of course, every finite martingale sequence is generated this
way for we can always take X to be Xn , the last one.
Result 8.6.2 In equation (8.6.1), if ‘ = is replaced by ‘ ≥ then Xn : n = 0, 1, 2, . . .
is called a submartingale while if it is replaced by ‘ ≤ then Xn : n = 0, 1, 2, . . . is
called a supermartingale. Obviously, {Xn } is a supermartingale if and only if {−Xn }
is a submartingale.
Result 8.6.3 Let {Xn , n = 0, 1, . . .}, {Yn , n = 0, 1, . . .} be discrete time stochastic
processes. We say {Xn } is a martingale with respect to {Yn } if E(|Xn |) < ∞ and
E(Xn+1 /Y0 , Y1 , . . . , Yn ) = Xn . We may think of Y0 , Y1 , . . . , Yn as the information or
history upto stage (n+1). It may include more information than just X0 , X1 , . . . , Xn .
By calculating the conditional expectation value of Xn+1 given the information
about Xn (or Yn ) upto time n, we are making a forecast for the random variable.
The martingale relation implies that the “best” forecast for the next value of the
random variable is its current value.
260 Financial Mathematics: An Introduction
Result 8.6.4 Martingale theory provides a classification scheme for the time se-
ries. If a time series exhibits no discernible trend then it has a martingale like
behavior. On the other hand, if the trend is an increasing (decreasing) one, then
the time series behaves like submartingale (supermartingale).
Example 8.6.1 Consider the gambler’s ruin problem. A gambler starts with Rs
N. Then a coin is flipped, landing head with probability p and landing tail with
probability 1−p. If the coin lands heads, he/she gains one rupee, otherwise he /she
loses a rupee. The game continues until he/she loses all his/her N rupees. Let Xn
be fortune at the nth game, and Sn be his/her capital after n games. What will
be his/her fortune, on an average, on the next game given that his/her current
fortune?
Solution For this we suppose that Sn is adapted to the filtration Fn . Then we can
interpret a martingale as a fair game. Thinking (Sn − Sr ) as the net winnings of the
game per unit stake in time frame (r, n], the best prediction of the net winnings
given the information at the time r < n has value
E ((Sn − Sr )/Fr ) = E(Sn /Fr ) − Sr .
If Sn is a martingale then E(Sn /Fr )−Sr = 0. This means that the best prediction of
the future net winnings per unit stake in the interval (r, n] is zero. This is exactly
what we expect to be a fair game. It says that gambler expected capital after one
more game played with the knowledge of the entire past and present is exactly
equal to his/her current capital.
Example 8.6.2 Let X1 , X2 , . . . be a sequence of i.i.d random variables each taking
two values +1 and -1 with equal probabilities. Let us define S0 = 0 and Sn =
n
X j , (n = 1, 2, . . .). This discrete time stochastic process {Sn , n = 0, 1, . . .} is a
j=1
symmetric random walk. Prove that, {Sn , n = 0, 1, . . .} is a martingale with respect
to {Xn , n = 1, 2, . . .}.
Solution We have E(|Sn |) ≤ E(|X1 |) + E(|X2 |) + · · · + E(|Xn |) < ∞ . Also
E(Sn+1 /X1 , X2 , . . . , Xn ) = E ((Sn + Xn+1 )/X1 , X2 , . . . , Xn )
= E(Sn /X1 , X2 , . . . , Xn ) + E(Xn+1 /X1 , X2 , . . . , Xn )
= Sn + E(Xn+1 ) (using independent of X1 , X2 , . . . , Xn , Xn+1 )
= Sn + 0
= Sn .
Martingales 261
Now,
E(S2n+1 /Fn ) = E (Sn+1 − Sn + Sn )2 /Fn
= E (Sn+1 − Sn )2 /Fn − 2E [Sn (Sn+1 − Sn )/Fn ] + E(S2n /Fn )
= E(Xn+1
2
/Fn ) − 2E(Xn+1 Ṡn /Fn ) + E(S2n /Fn ).
⎛ n+1 ⎞2
⎜⎜ ⎟⎟
Xn+1 = ⎜⎜⎜⎝ Yk ⎟⎟⎟⎠ − (n + 1)σ2
k=1
⎛⎛ n ⎞ ⎞2
⎜⎜⎜⎜ ⎟⎟ ⎟⎟
= ⎜⎜⎜⎝⎜⎜⎜⎝ Yk ⎟⎟⎟⎠ + Yn+1 ⎟⎟⎟⎠ − nσ2 − σ2
k=1
⎛ n ⎞
⎜⎜ ⎟⎟
= Xn + 2Yn+1 ⎜⎜⎜⎝ Yk ⎟⎟⎟⎠ + Yn+1
2
− σ2 .
k=1
⎡ ⎛ n ⎞ ⎤
⎢⎢ ⎜⎜ ⎟⎟ ⎥⎥
E [Xn+1 /Y0 , Y1 , . . . , Yn ] = E ⎢⎢⎢⎣(Xn + 2Yn+1 ⎜⎜⎜⎝ Yk ⎟⎟⎟⎠ + Yn+1
2
− σ2 )/(Y0 , Y1 , . . . , Yn )⎥⎥⎥⎦
k=1
= E [Xn /(Y0 , Y1 , . . . , Yn )]
⎡ ⎤
⎢⎢⎢ n
⎥⎥
+2E ⎢⎢⎣Yn+1 Yk /(Y0 , Y1 , . . . , Yn )⎥⎥⎥⎦
k=1
+E Yn+1 /(Y0 , Y1 , . . . , Yn ) − σ2
2
n
= Xn + 2 Yk E(Yn+1 ) + E(Yn+1 2
) − σ2
k=1
= Xn .
Since the game is double or nothing, his/her fortune at the end of nth toss is given
by
Yn = X1 X2 · · · Xn (n = 1, 2, . . .) .
Martingales 263
Hence,
E(Sn+1 /Fn ) = p uSn + (1 − p) dSn = Sn [p u + (1 − p) d] .
We consider the variable ln(Sn ) and observe that
Sn
E ln /Sn−1 , Sn−2 , . . . , S0 = p ln(u) + (1 − p) ln(d) .
Sn−1
264 Financial Mathematics: An Introduction
Therefore,
Using equation (8.3), and noting that the history of Sn−1 , Sn−2 , . . . , S0 yields the
history of Rn−1 , Rn−2 , . . . , R0 and vice-versa, we get
E(Rn /Rn−1 , Rn−2 , . . . , R0 ) = ln(Sn−1 ) − (n − 1) p ln(u) + (1 − p) ln(d)
= Rn−1 .
The discounted process is a martingale only if the right hand side of the above
equation is equal to e−nr Sn . That is,
or
er = p u + (1 − p) d .
Thus, the discounted process is a martingale only if
er − d
p= .
u−d
Remark 8.6.3 It may be noted that in Chapter 3 the RNPM for the binomial
lattice model has been derived via replicating portfolio arguments. But the approach
presented here in Example 8.6.6 is based on the fact that discounted stock process
under RNPM form a martingale. This approach is more general and is applicable
in a large variety of derivative pricing problems.
Martingales 265
P(Y = u) = p, P(Y = d) = q .
Also we may note that Yk is the factor by which the stock price goes up or down
at time k. Since the random variable Yk is independent of Fk , we have
EQ (1 + r)−(k+1) Sk+1 /Fk = (1 + r)−k Sk (1 + r)−1 EQ [(Sk+1 /Sk ) /Fk ] .
266 Financial Mathematics: An Introduction
But
EQ [(Sk+1 /Sk ) /Fk ] = EQ [(Sk+1 /Sk )] = p u + q d = (1 + r) .
Therefore on substitution, we conclude
EQ (1 + r)−(k+1) Sk+1 /Fk = (1 + r)−k Sk .
This proves that the process {(1 + r)−k Sk , k = 1, 2, . . .} is a martingale.
Wealth Process
Let ∆k be the number of shares of a stock held between time k and k + 1. We
assume that ∆k is Fk -measurable and X0 is the amount of money we have started
with time t = 0. If we have ∆k shares between time k and k + 1, then at time
k + 1 those shares will be worth ∆k Sk+1 , where Sk+1 is the share price at time k + 1.
The amount of cash we hold between time k and k + 1 is Xk minus the amount
held in stock, that is Xk − ∆k Sk . Hence, the worth of this amount at time k + 1 is
(1 + r)[Xk − ∆k Sk ]. Therefore, the amount of money we have at time k + 1 is
Xk+1 = ∆k Sk+1 + (1 + r) [Xk − ∆k Sk ] .
When r = 0, this reduces to
Xk+1 − Xk = ∆k (Sk+1 − Sk ) .
Thus,
k
Xk+1 = X0 + ∆i (Si+1 − Si ) .
i=0
The stochastic process {Xk , k = 0, 1, . . .} is called the wealth process.
We shall now show that under risk neutral probability measure Q the dis-
counted wealth process is a martingale.
EQ [Xk+1 − Xk /Fk ] = EQ [∆k (Sk+1 − Sk )/Fk ]
= ∆k EQ [(Sk+1 − Sk )/Fk ] (∆k is Fk − measurable)
= 0 (Sk is a martingale) .
Now writing Xk+1 as Xk + ∆k (Sk+1 − Sk ) and noting that r > 0, we have
EQ (1 + r)−(k+1) Xk+1 − (1 + r)−k Xk )/Fk = EQ ∆k (1 + r)−(k+1) Sk+1 − (1 + r)−k Sk /Fk
= ∆k EQ (1 + r)−(k+1) Sk+1 − (1 + r)−k Sk /Fk
= 0 (Sk is a martingale under Q) .
Hence, discounted wealth process {(1 + r)−k Xk , k = 1, 2, . . .} is a martingale.
Martingales 267
Since W(t) − W(s) has normal distribution with mean zero and variance (t − s), we
have
t−s
E e W(t)−W(s)
=e 2 .
Hence,
t−s
E eW(t) /Fs = eW(s) e 2 .
This gives, for 0 ≤ s < t,
⎛ ⎞ −t
⎜⎜ W(t)− t ⎟⎟
s
⎜⎜ ⎟
2 /Fs ⎟⎟⎟ = e 2 E eW(t) /Fs = e
W(s)−
2.
E ⎜⎜e
⎝ ⎠
It follows that exp W(t) − 2t is a martingale.
The cornerstone of martingale theory is Doob’s optional sampling theorem.
This states, roughly, that “stopping” a martingale at a random time τ does not
alter the expected “payoff”, provided the decision about when to stop is based
solely on information available up to τ. Such random times are called stopping
times. Stopping times are also called as Markov times or optional times.
If τ is optional and c > 0 is a positive constant, then τ + c is a stopping time.
Before stating Doob’s optional sampling theorem we give below the definitions.
Definition 8.6.3 (Stopping Time) A stopping time relative to a filtration
{Fn , n ≥ 0} is a non-negative integer-valued random variable τ such that for each
n the event τ = n ∈ Fn . In continuous case, a random variable γ : Ω → [a, b] is
called a stopping time with respect to a filtration {Ft , a ≤ t ≤ b} if {w, γ(w) ≤ t} ∈ Ft
for all t ∈ [a, b].
From the definition, we can think of γ as the time to stop playing a game. The
decision to stop playing the game before or at time t should be determined by the
information provided by Ft which will give the condition for γ to a stopping time.
Definition 8.6.4 (Hitting Time) Let {X(t), t ≥ 0} be a stochastic process. Let
A be Borel set in Rn . Define
τ = in f {t > 0 : X(t) ∈ A} .
Then τ is called a hitting time of A for the stochastic process {X(t), t ≥ 0}.
Martingales 269
In other words,
T = in f {n ≥ 0 : Sn = 2} .
Further
4
n
{T ≤ n} = {Si = 2} ∈ Fn
i=0
τ = in f {t > 0 : W(t) = a} .
Since the sample paths of Brownian motion are continuous, it is easy to see that
τ is a stopping time. That is,
which depends only on {W(s), 0 ≤ s ≤ t}. Also, note that, if τ < ∞, then W(τ) = a
by continuity of sample paths. Since τ is a stopping time, W(t + τ) − W(τ) is a
Brownian motion.
270 Financial Mathematics: An Introduction
Example 8.6.13 Suppose that two players A and B start respectively with Rs a
and Rs b, and they bet against each other one rupee at a time by tossing a fair
coin. What is the probability that player B will be broke and A ends up with all
the money?
Solution Let Xn be player A fortune at the nth game, and Sn be his/her capital
after n games. We have
n
Sn = a + Xn ,
i=1
Let
τ0 = in f {n : Sn = 0}
τa+b = in f {n : Sn = a + b} ,
9
and T = τ0 τa+b . Therefore, T is the first time that player A either makes b
extra rupees or goes ruin i.e.
T = in f {n : Sn = 0 or Sn = b + a} .
Hence, T is the stopping time with respect to the martingale {Sn , n = 0, 1, . . .}. By
using optional stopping theorem, we have
E(ST ) = S0 = a .
But
E(ST ) = 0 × P(T = τ0 ) + (a + b) × P(T = τa+b ) .
Hence,
a
P (T = τa+b ) = .
a+b
This is exactly the probability that player B goes broke and player A wins all.
8.8 Exercises
Exercise 8.1 Let {Xn , n ≥ 1} be a sequence of independent and identically dis-
tributed random variables each having uniform distribution with values -2, -1, 0,
1, 2, 3. Let {N(t),
* t ≥ 0} be a Poisson process with parameter 2. Find the mean
and variance of N(t)i=1
Xi .
Exercise 8.2 Consider two i.i.d random variables X and Y each having uniform
distribution between the intervals 0 and 1. Define Z = X + Y. Prove that E(X/
Z
Z) = .
2
Exercise 8.3 Consider the successive rolling of an unbiased die. Let X and Y
denote the number of rolls necessary to obtain a two and a three respectively.
Obtain (i) E(X/Y = 2), (ii) E(X/Y = 4).
Exercise 8.4 Let (Ω, F , P) be a probability space and let X be an integrable ran-
dom variable and let G ⊂ F be a σ-field. Prove that, the conditional expectation
E(X/G) exists. (Hint: Use Radon-Nikodym theorem)
Exercise 8.5 Consider Ω = {a, b, c, d}. Construct 4 distinct σ-fields F1 , F3 , F3 , F4
such that F1 ⊂ F2 ⊂ F3 ⊂ F4 .
Exercise 8.6 Construct an example of σ-fields F1 and F2 such that F1 F2 and
F2 ⊂ F 1 .
Exercise 8.7 Consider a binomial model with t = 1, 2, 3. Let St be the stock price
at time t. Let Ω = {(u, u, u), (u, u, d), (u, d, u), (u, d, d), (d, u, d), (d, u, u), (d, d, u), (d, d, d)}.
1
Let σ-field F be the power set of Ω and P(w) = for all w ∈ Ω. Let
8
G = {∅, {(u, u, u), (u, u, d), (u, d, u), (u, d, d)}, {(d, u, d), (d, u, u), (d, d, u), (d, d, d)}Ω}.
Define a discrete random variable X as
X(wi ) = i, i = 1, 2, . . . , 8 .
Find E(X/G).
Exercises 273
Xn = E(X/Y0 , Y1 , . . . , Yn ), (n = 0, 1, . . .) .
Exercises 275
9.1 Introduction
This chapter attempts to present certain introductory topics of stochastic calculus,
also called Ito’s calculus. But do we really need the apparatus of Ito’s calculus
in finance? The answer is YES. This is because to make any meaningful financial
decision, we need to understand the dynamics of the asset under consideration.
For example, to price a stock option, we need to understand the dynamics of
the underlying stock. Traditionally calculus in general and differential equations
in particular have played a very vital role in the mathematical modeling of any
dynamical system. We expect that these topics will again come to our rescue to
model the asset dynamics. But since the asset dynamics is mostly stochastic, we
need a different type of calculus to take care of functions which are no where
differentiable and are not of bounded variation (do not worry! we shall be defining
it shortly). Ito’s calculus serves our goal very well in this scenario.
When in the 19th century the German mathematician Weierstrass constructed
a real-valued function which is continuous everywhere, but differentiable nowhere,
(see Section 9.10) this was considered as nothing else but a mathematical curios-
ity. Interestingly, this “curiosity” is at the core of mathematical finance. High
frequency data show that prices of exchange rates, interest rates, and liquid assets
are practically continuous, but are of unbounded variation in every given time in-
terval. In particular, they are nowhere differentiable. Therefore classical calculus is
required to be extended to functions of unbounded variation, a task overlooked by
mathematicians for long. This gap was bridged by the development of stochastic
calculus, which can be considered as the theory of differentiation and integration
of stochastic processes.
Although the subject of stochastic calculus is very broad and complex, we keep
our goal very modest. We present a very brief and introductory discussion on
278 Financial Mathematics: An Introduction
Here we are using the symbol V g (T) rather than V g [0, T] since left end point 0
is fixed throughout discussion while right end point T can vary.
Definition 9.2.2 (Function of Bounded Variation) Let 0 < t ≤ T. Then g
is said to be of finite variation if V g (t) < ∞, for all t. Further if for 0 < t ≤ T,
V g (t) < K, a constant independent of t, then g is said to be of bounded variation
on [0, T].
Definition 9.2.3 (p-Variation of a Real-Valued Function) Let 0 < t ≤ T.
Then the p-variation, p > 1, of the function g in the interval [0, t] is defined as
⎛ n−1 ⎞
⎜⎜ ⎟⎟
V g (t) = sup ⎜⎜⎜⎝ | g(ti+1 ) − g(ti ) |p ⎟⎟⎟⎠ .
p
(9.1)
π∈Π i=0
Remark 9.2.1 (Goffman [51]) In case g is a continuous function then V g (T) can
alternatively be expressed as
⎛ n−1 ⎞
⎜⎜ ⎟⎟
V g (T) = lim ⎜⎜⎜⎝ | g(ti+1 ) − g(ti ) |⎟⎟⎟⎠ ,
π→0
i=0
n−1 - --2
--
V (2)
g (T) = lim g(ti+1 ) − g(ti )-
π→0
i=0
n−1 --
-2
-- g(ti+1 ) − g(ti ) ---
= lim - ti+1 − ti - (ti+1 − ti ) .
2
π→0
i=0
0 0 3 0 3
But , 1 -- --2
lim π - g (t)- dt = 0 .
π→0 0
Variations of Brownian Motion 281
Therefore
V g (1) = 1; V 2g (1) = 0 .
Now by applying Result 9.2.1, we have
p 1, p=1
V g (1) =
0, p>1.
where
n−1
Qπ = (W(ti+1 ) − W(ti ))2 . (9.3)
i=0
But now we face certain difficulties and these need to be addressed. Clearly, Qπ is
a function of the sample points w ∈ Ω. Hence, the quadratic variation calculated
for the Brownian motion for each partition is itself a random variable. Here we
note that, the limit is to be taken over all partitions of [0, T], with π → 0 as
n → ∞. Since, for each partition π, Qπ is a random variable, we need to specify
the sense in which we are finding the limiting distribution of Qπ for large n. In
other words we need to specify the proper mode of convergence in these random
variables. We shall use the convergence in mean square (convergence in L2 ) sense
as defined below.
Definition 9.3.1 Let {Xn , n ≥ 1} and X be random variables defined on a common
probability space (Ω, F , P). We say that Xn converges to X in mean square sense
(in L2 sense) if
lim E(|Xn − X|2 ) = 0 .
n→∞
282 Financial Mathematics: An Introduction
In the case of Brownian motion, we will show that {Qπ } converges to T in mean
square sense, i.e.
lim E(|Qπ − T|2 ) = 0 . (9.4)
π→0
When the above result holds good, we say that the quadratic variation accumu-
lated by the Brownian motion over the interval [0, T] is T almost surely and is
denoted as [W, W](T) = T.
Theorem 9.3.1 Let Qπ be defined as in (9.3). Then
(i) E(Qπ ) = T,
(ii) Var(Qπ ) ≤ 2 π T .
Proof.
(i) We have
n−1
E(Qπ ) = E (W(ti+1 ) − W(ti ))2 . (9.5)
i=0
Since, for fixed i, W(ti+1 ) − W(ti ) has normal distribution with mean zero and
variance (ti+1 − ti ), equation (9.5) gives
n−1
E(Qπ ) = (ti+1 − ti ) = T .
i=0
But
Var(W(ti+1 )−W(ti ))2 = E(W(ti+1 )−W(ti ))4 −2E(W(ti+1 )−W(ti ))2 (ti+1 −ti )+(ti+1 −ti )2 .
(9.7)
Since the fourth order moment of normal distribution with mean zero and
variance (ti+1 − ti ) is 3(ti+1 − ii )2 (see Exercise 7.12 in Chapter 7), we get from
(9.7)
n−1
Var(Qπ ) = 2(ti+1 − ti )2
i=0
n−1
≤ 2 π (ti+1 − ti ) = 2 π T . (9.9)
i=0
Remark 9.3.1 The above theorem tells that for the Brownian motion {W(t), t ≥
0}, [W, W](T) = T for all T ≥ 0 and almost surely. This is because Var(Qπ ) =
E (Qπ − E(Qπ ))2 = E (Qπ − T)2 , which from (9.9) gives
Therefore
[W, W](T) = lim Qπ = T ,
π→0
Hence we write [W, W](T) = T almost surely. Here the terminology almost surely
means that there can be some paths of the Brownian motion for which the assertion
[W, W](T) = T is not true. But the ‘the set of all such paths’ has zero probability.
Though we write [W, W](T) = T, we must realise that it is to be understood in the
sense as described above.
Remark 9.3.2 In view of Theorem 9.2.1, we have
⎧
⎪
⎪ ∞, p=1
p ⎪
⎨
VW (T) = ⎪
⎪ T, p=2
⎪
⎩ 0, p>2.
Also for 0 < T1 < T2 , [W, W](T2 ) − [W, W](T1 ) = T2 − T1 , the Brownian mo-
tion accumulates (T2 − T1 ) units of quadratic variation over the interval [T1 , T2 ].
Since this is true for every interval, we infer that the Brownian motion accumu-
lates quadratic variation at rate one per unit time. This last statement we write
informally as
dW(t) dW(t) = dt , (9.11)
and remember that the dt in (9.11) is in fact 1.dt.
The mathematical justification of the formula (9.11) is essentially equation
(9.10). At this stage we shall also like to know some formula for dW(t) dt and
dt dt. To determine these, we need to compute the cross variation of W(t) and t
and also the quadratic variation of t itself. We have the below given theorem in
this regard.
Theorem 9.3.2 Let {W(t), t ≥ 0} be the given Brownian motion and π = {0 =
t0 < t1 < . . . < tn = T} be a partition of [0, T]. Then
(i) ⎛ n−1 ⎞
⎜⎜ ⎟⎟
lim ⎜⎜⎜⎝ (W(ti+1 ) − W(ti )) (ti+1 − ti )⎟⎟⎟⎠ = 0 ,
π→0
i=0
(ii) ⎛ n−1 ⎞
⎜⎜ ⎟⎟
lim ⎜⎜⎜⎝ (ti+1 − ti )2 ⎟⎟⎟⎠ = 0 .
π→0
i=0
Proof.
(i) We observe that
Therefore
-- --
-- n−1 -
-- (W(ti+1 ) − W(ti )) (ti+1 − ti )--- ≤ Max |W(tk+1 ) − W(tk )| .T . (9.12)
- i=0 - 0≤k≤n
Riemann Integral
Let π = {0 = t0 < t1 < . . . < tn = T} be an arbitrary partition of the interval [0, T].
For each subinterval [ti , ti+1 ], we set Mi = Max g(t) and mi = Min g(t). Then the
ti ≤t≤ti+1 ti ≤t≤ti+1
upper Riemann sum is defined as
n−1
S+π = Mi (ti+1 − ti ) ,
i=0
286 Financial Mathematics: An Introduction
π = Max (tk+1 − tk ) .
0≤k≤n−1
When the number n of partition points goes to infinity and the length of the
longest subinterval tk+1 − tk goes to zero (i.e. π → 0), the upper Riemann sum
S+π and the lower Riemann sum S−π converge to the same limit, which we call
:T
0
g(t)dt. Equivalently the Riemann integral can also be defined as
, T
n−1
S= g(t) dt = lim g(t∗i )(ti+1 − ti ) , (9.14)
0 π→0
i=0
Riemann-Stieltjes Integral
Suppose that g and f are real-valued functions defined on [0, T]. We assume
that on [0, T], g is continuous and f is monotonically nondecreasing. We aim to
define the integral , T
S= g(t) d f (t) . (9.15)
0
n−1
S+π = Mi ( f (ti+1 ) − f (ti )) ,
i=0
and
n−1
S−π = mi ( f (ti+1 ) − f (ti )) .
i=0
When π → 0, the upper Riemann-Stieltjes sum S+π and the lower Riemann-
:T
Stieltjes sum S−π converge to the same limit, which we call 0 g(t)d f (t). Equiva-
lently the Riemann-Stieltjes integral can also be defined as
, T
n−1
S= g(t) d f (t) = lim g(t∗i ) f (ti+1 ) − f (ti ) , (9.16)
0 π→0
i=0
, ∞
E(X) = t dF(t) ,
−∞
⎧
⎪
⎪ 0, t<0
⎪
⎨
F(t) = ⎪
⎪ p + (1 − p)(1 − e−λt ), 0≤t<T
⎪
⎩ 1, t≥T,
, ∞
E(X) = tdF(t)
−∞
, 0 , T , ∞
= tdF(t) + 0 × P(X = 0) + tdF(t) + T × P(X = T) + tdF(t)
−∞ 0 T
, T
=0+0+ tλ(1 − p)e−λt dt + T(1 − p)e−λT + 0
0
,
(1 − p) λT −y
= ye dy + T(1 − p)e−λT
λ 0
(1 − p)
= (1 − λTe−λT − e−λT ) + T(1 − p)e−λT
λ
1
= (1 − p) 1 − e−λT .
λ
Here it must be noted that at present equation (9.17) is purely informal because
we have not yet attached any meaning to the ‘limiting process’ of the sum of
random variables involved in the R.H.S of this equation. In defining stochastic
integral we use the mean square convergence as in Definition 9.3.1. Once we agree
to this mode of convergence, then (9.17) is well defined. We therefore take (9.17)
Stochastic Integral and its Properties 289
for 0 ≤ t1 ≤ t.
(v) The process I(t) has continuous sample path.
(vi) For each t, I(t) is Ft - measurable.
:t
(vii) [I, I](t) = 0 X2 (s) ds.
:t
(viii) The process I(t) = 0 X(s) dW(s), t ∈ [0, T], is a martingale with respect to
the natural Brownian filtration Ft (0 ≤ t ≤ T).
This last property is very important in applications. It essentially follows be-
cause in the definition of Ito integral, in the subinterval [ti , ti+1 ] we have taken t∗i
as the left end point. So this choice of t∗i very crucial to make I(t) a martingale.
We are not proving above results, and shall refer to Karatzas and Shreve [75]
for the same.
290 Financial Mathematics: An Introduction
Remark 9.5.2 The Ito integral does not : T have the monotonicity
:T property. Thus
X(t) ≤ Y(t) does not necessarily mean 0 X(t) dW(t) ≤ 0 Y(t) dW(t). We may
:T :T
take X(t) = 0 and Y(t) = 1. Then 0 1.dW(t) = W(T) and 0 0.dW(t) = 0. But
1
W(T) is smaller than 0 with probability .
2
We shall now have certain examples to illustrate some of the points discussed
above.
Example 9.5.1 Evaluate the Ito integral
, T
W(s) dW(s) .
0
But, for each i, W(ti ) and W(ti+1 ) − W(ti ) are independent random variables and
are having normal distributions. Hence, the right hand side terms within the
summation are nothing but the sum of independent random variables. Hence, the
integral is nothing but the limit of sum of such random variables. Now, we have
n−1
Qπ = (W(ti+1 ) − W(ti ))2
i=0
n−1
= W 2 (ti+1 ) − W(ti )2 − 2W(ti )(W(ti+1 ) − W(ti ))
i=0
n−1
= W (T) − W (0) − 2
2 2
W(ti )(W(ti+1 ) − W(ti )) ,
i=0
i.e.
n−1
1 2
W(ti ) (W(ti+1 ) − W(ti )) = W (T) − W 2 (0) − Qπ . (9.19)
i=0
2
Now taking limit as π → 0 and using E(Qπ ) = T, we get
Stochastic Integral and its Properties 291
, T
W 2 (T) − T
W(s) dW(s) = .
0 2
We see that, unlike the Riemann-Stieltjes integral, we have an extra term T/2,
which arises on account of the finite quadratic variation of the Brownian motion.
*n−1
Remark 9.5.3 In Example 9.5.1, if we form the sum i=0 W(t∗i ) (W(ti+1 ) − W(ti ))
ti + ti+1
with t∗i = , and take its limit as π → 0, then we obtain this limit as
2 :T
0.5W 2 (T). But can we say that the Ito integral 0 W(s) dW(s) equals 0.5W 2 (T)?
This is not correct because we desire Ito integral to be a martingale, and 0.5W 2 (T)
is not a martingale. However when we take t∗i = ti , the left end point of the interval
[ti , ti+1 ], then the limit is (W 2 (T) − T)/2 which is a martingale. The requirement
that I(t) should be a martingale tells us to choose t∗i as ti .
Remark 9.5.4 There is a more compelling practical reason to choose t∗i as ti (the
left point of [ti , ti+1 ]). Let t > 0 and π = {0 = t0 < t1 < . . . < tn = t}. We can think
of t0 , t1 , . . . , tn−1 as the trading dates in the asset and ∆(t0 ), ∆(t1 ), . . . , ∆(tn−1 ) as the
position (number of shares) taken in the asset at each trading date and held to the
next trading date. If I(t) denote the gain from trading at each limit t, then
⎧
⎪
⎪ ∆(t0 ) [W(t) − W(0)] = ∆(0)W(t), 0 ≤ t ≤ t1
⎪
⎪
⎪ ∆(t0 )W(t1 ) + ∆(t1 ) [W(t) − W(t1 )] ,
⎪ t 1 ≤ t ≤ t2
⎪
⎪
⎪ .
⎨ ..
I(t) = ⎪ ⎪
⎪
⎪ *k−1
⎪
⎪ i=0 ∆(ti ) [W(ti+1 ) − W(ti )] + ∆(tk ) [W(t) − W(tk )] , tk ≤ t ≤ tk+1
⎪
⎪
⎪ .
⎩ ..
:t
Obviously the process I(t) defined above becomes our Ito integral 0 ∆(s) dW(s) of
the simple process ∆(t).
:t
Remark 9.5.5 In the Ito integral 0 X(s) dW(s), we can have a financial inter-
pretation of integrand and integrator. The integrand represents a position in an
asset and integrator represents the price of that asset. Since we need to decide
the position at the beginning of each interval, we have to take t∗i as ti , rather an
arbitrary point in [ti , ti+1 ]. Thus there are theoretical as well as practical reasons
for choosing t∗i as the left end point of the interval [ti , ti+1 ].
292 Financial Mathematics: An Introduction
Solution Note that, W(1) is not adapted to the filtration σ{W(s), 0 < s ≤ t}, 0 ≤
t ≤ 1, because it depends on future events. Hence, this Ito integral does not exist.
This example shows that, assumption of the integrand adapted to the filtration
{Ft , t ≥ 0} is needed to have existence of the Ito integral.
:T
Example 9.5.3 Let X(t) and Y(t) be suitable processes so that I1 = 0 X(t) dW(t)
:T :T
and I2 = 0 Y(t) dW(t) exist. Show that E(I1 .I2 ) = 0 E (X(t)Y(t)) dt.
1
Solution We know that I1 I2 = (I1 + I2 )2 − I12 − I22 . Now we use the isometry
2
property and get the result.
But do we have similar result for f (W(t))? The answer is provided by the Ito-
Doeblin formula.
Here the first integral in (9.21) is an Ito integral whereas the second integral in
(9.21) is a Reimann integral.
Remark 9.6.1 Borrowing ideas from the classical Taylor’s series we can infor-
mally write
:t
Taking motivation from the fact that 0 x dx = t2 /2, we choose f (x) = x2 /2. This
gives f (x) = x and f (x) = 1. Hence the above Ito-Doeblin formula gives
, T , T
W 2 (T) 1
−0= W(t) dW(t) + dt .
2 0 2 0
Therefore , T
W 2 (T) − T
W(t) dW(t) = . (9.22)
0 2
:T
Remark 9.6.2 Here we may note that, the integral 0 W(t) dW(t) cannot be de-
fined path by path in Riemann-Stieltjes sense, because sample path of Brownian
motion is of unbounded variation on each time interval. In the formula
, T
W 2 (T) − T
W(t) dW(t) = , t≥0,
0 2
T
there is an additional term of − . This is because the local increment of the
2
Wiener √ process over an increment of length ∆t is of the size of its standard de-
viation ∆t. However for a smooth continuously differentiable function f (t), the
second term in (9.22) is zero, as it should be.
Example 9.6.1 Show that
, t , t
1
eW(t)
=1+ e W(s)
dW(s) + eW(s) ds .
0 2 0
Let f (t, x) have continuous partial derivatives of at least second order and
{W(t), t ≥ 0} be the given Weiner process. Then
1
d f (t, W(t)) = ft (t, W(t)) dt + fx (t, W(t)) dW(t) + fxx (t, W(t)) dt ,
2
or equivalently
Ito-Doeblin Formula and its Variants 295
, t , t
1
f (t, W(t))− f (0, W(0)) = ft (u, W(u)) + fxx (u, W(u)) du+ fx (u, W(u)) dW(u) .
0 2 0
This formula can again be justified by considering the classical Taylor’s expansion
for a function of two variables. In particular we may take
∂ f (t, W(t)) ∂ f (t, W(t))
f (t + ∆t, W(t + ∆t)) − f (t, W(t)) = dt + dW(t)
∂t ∂x
1 ∂2 f (t, W(t)) ∂2 f (t, W(t))
+ (dt) 2
+ 2 dt dW(t)
2 ∂t2 ∂t∂x
∂2 f (t, W(t))
+ dW(t) dW(t) + . . .
∂x2
But since we have dW(t) dW(t) = dt, dW(t) dt = 0 and dt dt = 0, therefore,
∂ f (t, W(t)) 1 ∂2 f (t, W(t))
f (t + ∆t, W(t + ∆t)) − f (t, W(t)) = + dt
∂t 2 ∂x2
∂ f (t, W(t))
+ dW(t),
∂x
where
- - -
∂ f (t, W(t)) ∂ f (t, x) -- ∂ f (t, W(t)) ∂ f (t, x) -- ∂2 f (t, W(t)) ∂2 f (t, x) ---
= - , = - , = - .
∂x ∂x -x=W(t) ∂t ∂t -x=W(t) ∂x2 ∂2 x -x=W(t)
, T
Example 9.6.2 Using second version of the Ito-Doeblin formula, find W(t) dW(t).
0
x2
Choose f (t, x) = . Then fx (t, x) = x, ft (t, x) = 0 and fxx (t, x) = 1. Substituting,
2
we get , T
, T
W 2 (T) 1
−0= 0+ du + W(u) dW(u)
2 0 2 0
296 Financial Mathematics: An Introduction
Hence,
, T
W 2 (T) T
W(t) dW(t) = − .
0 2 2
W(t)− 2t
Example 9.6.3 Find the stochastic differential dX(t) of X(t) = e .
t
Solution We take f (t, x) = ex− 2 and then use the second version of the Ito-Doeblin
formula. This gives
dx(t)
= f (t, x(t)), t ∈ [0, T], x(0) = x0 , (9.23)
dt
where f : [0, T] × R → R is a continuous function. A continuously differentiable
function x : [0, T] → R is called a solution of the given IVP if x(0) = x0 and x(t)
satisfies the ordinary differential equation (ODE) (9.23) for all t ∈ [0, T]. We note
that the IVP (9.23) is equivalent to the integral equation given by
, t
x(t) = x0 + f (s, x(s)) ds . (9.24)
0
Then it is well known that for such a function f the IVP (9.23)(or the correspond-
ing integral equation (9.24)) has a unique solution. Further this unique solution
x(t) can be obtained by applying the standard Picard’s method
, t
xn+1 (t) = x0 + f (s, xn (s)) ds, (n = 0, 1, . . .) ,
0
Stochastic Differential Equation 297
giving
x(t) = lim xn (t) .
n→∞
Suppose we now introduce randomness in the above IVP (9.23). This random-
ness could either be introduced in x(0) or in f . If x(0) = x0 not fixed, but rather a
random variable, then for each w ∈ Ω, the IVP (9.23) can be solved. The solution
in this case will depend on Ω. Thus the solution x(t) will be a stochastic process
{X(t, w), t ∈ [0, T], w ∈ Ω}. Similarly, if the function f depends on w ∈ Ω, then
again the solution {X(t, w), t ∈ [0, T], w ∈ Ω} is a stochastic process. Such differen-
tial equations are known as random differential equations. Here we observe that,
in both these cases of introducing randomness, we solve IVP’s for each w ∈ Ω.
We now discuss the stochastic differential equations. These are different from
random differential equations. Here we introduce uncertainties in the equation
(9.23) by introducing an additive term, which is the ‘derivative’ of Brownian
motion W(t). Symbolically we write
dX(t) dW(t)
= b(t, X(t)) + σ(t, X(t)) , (0 ≤ t ≤ T) ,
dt dt
where b : [0, T] × R → R and σ : [0, T] × R → R are two given functions. The
above equation can also be symbolically written
A naive interpretation of (9.25) tells us that the change dX(t) = X(t + dt) − X(t)
is caused by a change dt of time, with factor b(t, X(t)) in combination with a
change dW(t) = W(t + dt) − W(t) of Brownian motion with factor σ(t, X(t)). Now
borrowing analogy from classical ODE we also symbolically express (9.25) in terms
of integral equation
, t , t
X(t) = X(0) + b(s, X(s)) ds + σ(s, X(s)) dW(s) , (0 < t ≤ T) . (9.26)
0 0
Here the initial condition X(0) and the coefficient functions b(t, x) and σ(t, x) are
given. The differential equation (9.25) is termed as the stochastic differential equa-
tion and (9.26) is termed as stochastic integral equation. We shall now discuss the
types of possible solution of SDE (9.25) or its equivalent integral equation version
(9.26).
We must note that so far stochastic differential equation (9.25) and stochastic
integral equation (9.26) are only symbolic. They will make sense only after we
298 Financial Mathematics: An Introduction
define them in a proper mathematical way. In this context we shall like to empha-
size that mathematically only stochastic integral equation has a meaning, because
we have defined a stochastic integral in strict mathematical sense. The stochastic
differential equation shall always remain a notational convenience whose interpre-
tation shall always be in terms of the corresponding stochastic integral equation.
There are two types of solution concepts for stochastic differential equations.
1. Strong solution
A strong solution to the SDE (9.25) is a stochastic process {X(t); t ∈ [0, T]} which
satisfies the following
(i) {X(t), t ∈ [0, T]} is adapted to the Brownian motion, i.e. at time t it is a function
of W(s), s ≤ t.
(ii) The integrals in (9.26) are well defined and {X(t); t ∈ [0, T]} satisfies the same.
(iii) {X(t); t ∈ [0, T]} is a function of the underlying Brownian sample path and of
the coefficients b(t, x) and σ(t, x).
Thus a strong solution is an explicit function f such that X(t) = f (t, W(s) : s ≤ t).
A strong solution to (9.26) is based on the path of the underlying Brownian
motion. The solution {X(t); t ∈ [0, T]} is said to be unique strong solution if given
any other solution {Y(t), t ∈ [0, T]}, P(X(t) = Y(t)) = 1 for all t ∈ [0, T]
2. Weak solution
For a weak solution, the path behaviour is not essential. Hence we are only
interested in the distribution of X. Thus weak solutions are sufficient to determine
the expectation, variance and covariance functions of the process. In this case we
do not have to know the sample paths of X.
A strong or weak solution X of the given SDE is called a diffusion. We may
note that Brownian motion is also a diffusion process because in (9.26) we can
take b(t, x) = 0 and σ(t, x) = 1.
We now have the following existence theorem.
Theorem 9.7.1 (Existence Theorem) Let E(X2 (0)) < ∞ and X(0) be inde-
pendent of {W(t), t ≥ 0}. Let for all t ∈ [0, T] and x, y ∈ R, b(t, x) and σ(t, x) be
continuous and satisfy Lipschitz condition with respect to second variable, i.e.
Then the SDE (9.26) has a unique strong solution {X(t), 0 ≤ t ≤ T}.
Stochastic Differential Equation 299
1 1
ft = 0; fx = √ ; fxx = − 3/2 .
2 x 4x
1
d( S(t)) = ft dt + fx dS(t) + fxx dS(t) dS(t)
2
1 1
=0+ dS(t) − √ dS(t) dS(t)
2 S(t) 8 x3/2
1 1
= [a S(t) dt + bS(t) dW(t)] − b2 (S(t))2 dt
2 S(t) 8 (S(t)) 3/2
a b 2 b
= − S(t) dt + S(t) dW(t) .
2 8 2
where X(0) is non random, ∆(u) and Θ(u) are adapted processes.
A stochastic differential equation form of the Ito process {X(t), t ≥ 0} is
dX(t) = ∆t dW(t) + Θt dt .
All stochastic processes except those that have jumps are actually Ito processes.
Stochastic Differential Equation 301
Equivalently
1
d f (t, X(t)) = ft (t, X(t)) dt + fx (t, X(t)) dX(t) + fxx (t, X(t)) dX(t) dX(t) .
2
We may further rewrite above expression as
, t , t
f (t, X(t)) = f (0, X(0)) + ft (s, X(s)) ds + fx (s, X(s)) ∆(s)dW(s)
0 0
, t ,
1 t
+ fx (s, X(s)) Θ(s)ds + fxx (s, X(s)) ∆2 (s)ds .
0 2 0
This is because dXt = ∆(t) dW(t) + Θ(t) dt and dX(t) dX(t) = ∆2 (t) dt.
Example 9.7.1. Find the stochastic differential of W 3 (t) and show that W 3 (t) is
an Ito process.
Now using the equivalent integral equation with the condition W(0) = 0, we have
, t , t
W (t) = 3
3
W(s) ds + 3 W 2 (s) dW(s) .
0 0
, t
Since 3W(s) is adapted process, the integral 3W(s) ds exist and is finite. Also,
:t 0
3W 2 (s) is mean square integrable, i.e. E 0 (3W 2 (s))2 ds < ∞. By using the Defini-
tion 9.7.1, W 3 (t) is an Ito process.
302 Financial Mathematics: An Introduction
Example 9.7.3 Show that tW(t) is an Ito process and find the stochastic differ-
ential of tW(t).
Solution We take f (t, x) = tx and then use the second version of Ito-Doeblin
formula to get , t , t
tW(t) = W(s) ds + s dW(s)
0 0
i.e. , ,
t t
s dW(s) = t W(t) − W(s) ds .
0 0
Since W(t) is adapted process and f (t) = t is a mean square integrable, for any
t > 0, tW(t) is an Ito process. The stochastic differential of tW(t) is
d(tW(t)) = W(t) dt + tdW(t) .
Recall that, we had proved earlier that Brownian motion {W(t), t ≥ 0}, W(0) =
0, a martingale and have the continuous sample path with [W, W](t) = t. But we
now have a natural converse question. If the above stated properties hold for a
process then will the process be a Brownian motion? The below given theorem
provides an answer to the question.
Theorem 9.7.2 (Levy’s Theorem) Let {M(t), t ≥ 0} be a martingale with re-
spect to a filtration {Ft , t ≥ 0}. Suppose M(0) = 0 and M(t) has continuous paths.
Further let [M, M](t) = t for all t ≥ 0. Then {M(t), t ≥ 0} is a Brownian motion
with associated filtration {Ft , t ≥ 0}.
Proof. We know that Brownian motion is a martingale and increments W(t)−W(s)
for s < t are normally distributed with mean 0 and variance t − s. Here we are
already given that M(t) is a martingale. Therefore if we prove that M(t) is normally
distributed with mean 0 and variance t, then it is a Brownian motion. For this
we apply Ito-Doeblin formula to M(t). Therefore for any function f (t, x) whose
partial derivatives exist and are continuous, we have
1
d f (t, M(t)) = ft (t, M(t)) dt + fx (t, M(t)) dM(t) + fxx (t, M(t0) dt . (9.27)
2
Here dM(t) dM(t) = dt since [M, M](t) = t. Equation (9.27) in integral form is
, t , t
1
f (t, M(t)) = f (0, M(0)) + [ ft (t, M(0)) + fxx (s, M(0))] ds + fx (s, M(s)) dM(s) .
0 2 0
(9.28)
Some Important SDE’s and their Solutions 303
, t
Since M(t) is a martingale, I(t) = fx (s, M(s)) dM(s) is also a martingale. Also
0
I(0) = 0 = E(I(t)). Now taking expectation on both sides in (9.28) we get
, t
1
E f (t, M(t)) = f (0, M(0)) + E ft + fxx ds + 0 . (9.29)
0 2
We are interested in finding the strong solution of S(t), if it exists. For this we
first verify that conditions of Theorem 9.7.1 are satisfied. But this is true because
µ and σ are constants. Now we assume that S(t) = f (t, W(t)) and make use of the
second version of Ito-Doeblin formula. This gives
1
d f (t) = ft dt + fx dW(t) + fxx dt ,
2
304 Financial Mathematics: An Introduction
fx = σ f (9.30)
1
ft + fxx = µ f . (9.31)
2
Now solving equation (9.30), we get f (t, x) = eσx k(t) , for some function k(t). From
here we get ft = k (t) eσx and fxx = σ2 eσx k(t) which on substituting in equation
(9.31), we get
σ 2
k (t) eσx = µ − eσx k(t) .
2
Solving the above equation, we get
⎛ ⎞
⎜⎜
⎜⎜ σ2 ⎟⎟⎟⎟
⎜⎜µ− ⎟⎟ t
⎜⎝
k(t) = S(0) e 2 ⎟⎠ .
Here we may observe that for fixed t, S(t) follows lognormal distribution. Hence,
it can be verified that
(i) E(S(t)) = E(S(0)) eµt .
2
(ii) E((S(t))2 ) = E(S2 (0)) e(2µ+σ )t .
Further, we observe
σ2
(i) If µ > then S(t) → ∞ as t → ∞ almost surely.
2
σ2
(ii) If µ < then S(t) → 0 as t → ∞ almost surely.
2
σ 2
(iii) If µ = then S(t) will fluctuate between arbitrary large and arbitrary small
2
value as t → ∞.
Summary and Additional Notes 305
2. Ornstein-Uhlenbeck Process
where −∞ < µ < ∞ and σ > 0 are constants. This equation is often referred to
as Langevin equation. This equation is useful to model the velocity at time t of a
free particle that performs a Brownian motion. If we set dt = 1 then
i.e.
• This chapter gives a brief description of certain basic concepts and important
results of stochastic calculus keeping finance in view. By ignoring the various
technical conditions that are required to make our definitions rigorous, these
concepts are discussed.
• This chapter starts with variation of real-valued functions and then variation
of Brownian motion in Sections 9.2 and 9.3 respectively.
• In Section 9.5, stochastic integral is introduced and its properties are discussed
with few examples. For results concerning the existence of the general Ito
stochastic integral we may refer to Mikosch [97].
• Kiyosi Ito developed Ito formula in 1951 and that is how these calculus rules
were referred in earlier texts and research papers. Independently the same was
studied by Wolfgang Doeblin before him, although Doeblin’s work remained
secret, hidden away in the safe of the French Academy of Science. In May
2000, the sealed envelope sent in February 1940 by Doeblin was finally opened.
In recognition of the Doeblin’s work, the Ito’s formula is now referred as Ito-
Doeblin formula. Interested readers may refer to Vassiliou [144] for these his-
torical details.
• Ito-Doeblin formula with two versions are presented and illustrated with some
examples in Section 9.6. The proofs of various versions of the Ito formula can
be found in standard textbooks on stochastic calculus, for example, Karatzas
and Shreve [75] or Shreve [122].
• The counter part of stochastic integral, stochastic differential equation is pre-
sented in Section 9.7. SDE’s which admit an explicit solution are the exception
from the rule. Therefore using numerical techniques, the approximation of the
solution to a SDE can be obtained. Such an approximation is called a numerical
solution. Numerical solutions allow us to simulate the sample paths which is
the basis for Monte Carlo techniques. In this text book, we have discussed only
exact and closed form solutions of stochastic differential equations and Monte
Carlo simulation will be discussed in a later chapter.
• Some important SDE’s which are frequently occurring in finance are presented
with their solutions in Section 9.8.
• We may want to allow for the possibility that a stock price can experience
sudden jumps. It may be useful to have models that incorporate jump processes
and can be studied using Ito’rule for jump processes. The topic of jump process
is not presented in this text book. Option pricing with jumps is studied in
Merton [93].
• Stochastic calculus for processes with jumps is very important in financial appli-
cations. But as this chapter is only introductory in nature, we have avoided any
Exercises 307
9.10 Exercises
Exercise 9.1 Let f : [−1, 1] → R be given by
⎧
⎪
⎪ 1
⎨ x sin , x0
f (x) = ⎪
⎪ x
⎩ 0, x=0.
Is f continuous? Is f of bounded variations? What will be your answer if for x 0,
2 1
f (x) is taken as x sin ?
x
Exercise 9.2 Let Y1 , Y2 , . . . be independent random variables each taking *n two val-
ues +1 and -1 with equal probabilities. Define X(0) = 0 and X(n) = j=1 Y j , (n =
1, 2, . . .). This stochastic process {X(n), n = 0, 1, 2 . . .} is a symmetric random walk.
Show that, the quadratic variation of X(n) up to k is k, i.e. [X(n), X(n)](k) = k.
Exercise 9.3 For a Poisson process {X(t), t ≥ 0} with rate 1, find
: t
(i) E 0 X(s) dW(s)
: t
(ii) Var 0 X(s) dW(s) .
Exercise 9.4 Find the stochastic differentials of sin(W(t)) and cos(W(t)).
where X(0), µ and σ > 0 are constants. Find the strong solution of the above SDE.
Also, find the distribution of X(t).
Exercise 9.9 Prove that , t
I(t) = X(s) dW(s)
0
is a martingale.
Exercise 9.10 Prove that , T
W(T) = dW(t)
0
is an Ito process.
Exercise 9.11 Consider the SDE of the form dX(t) = X(t) dW(t) with X(0) = 1.
t
Prove that its solution X(t) = eW(t)− 2 is an Ito process.
Exercise 9.12 Find the stochastic differential of W 2 (t) and show that W 2 (t) is an
Ito process.
Exercise
, T 9.13 Using the first version of Ito-Doeblin formula, to evaluate
W 2 (t)dW(t).
0
Exercises 309
:T :T
Exercise 9.14 Are the random variables 0 t dW(t) and 0 W(t) dt independent?
Also, find the mean and variance of these random variables.
Exercise 9.15 An option is called digital option if the pay-off is 1 for S(T) > S(0)
at the time of exercise T, and zero otherwise. Find the arbitrage free price of a
digital option (European) with strike price K = S(0). You may assume that the
stock price follows the SDE.
dS(t) = r S(t) dt + σ S(t) dW(t) ,
where r is the interest rate and W(t) is the Brownian motion under risk neutral
probability measure.
Exercise 9.16 Consider the SDE
dX(t) = c(t) X(t) dt + σ(t) X(t) dW(t), t ∈ [0, T] .
Using the second version of Ito-Doeblin formula, prove that, the solution is
, t
, t 8
1 2
X(t) = X(0) exp c(s) − σ (s) ds + σ(s) dW(s) , t ∈ [0, T] .
0 2 0
Prove that, it has a weak solution, but does not have a strong solution. Further,
prove that, when x0 = 0, the weak solution of X(t) is implicitly given by
, t
W(t) = sgn(X(s)) dX(s) .
0
with initial condition X(o) = c. Obtain the strong solution of X(t). Prove that,
, t
X(t) = c e + e
t t
e−s dW(t) .
0
Exercise 9.22 Let Q(t) = ln S(t) and dQ(t) = µ − 21 σ2 dt + σ dW(t). Find dS(t).
10
Black-Scholes Formula Revisited
10.1 Introduction
It is seldom a surprise to find in literature that the approach taken by mathe-
maticians to study concepts of finance is quite different from the one adopted
by financial economists. The two groups speak different languages and take their
own routes to develop theories for financial mathematics, but arrive at the same
conclusions. Nothing can illustrate our point better than the theory of derivative
pricing. We have already seen that one can resort to any of the several methods for
derivative pricing. For instance, for European options pricing, no matter whether
we follow the binomial lattice approach or the CRR approach, we arrive at the
same Black-Scholes (BS) formula. The two approaches have been fairly simple to
understand, and hence favorite with many.
In this chapter we aim to introduce the readers to a fascinating though com-
plex world of change of probability measure (one can recollect how the method
of substitution works so well in the classical Riemann integral calculus). We shall
also see how this new concept in stochastic integration help us to derive the BS
formula for option pricing. But wait! why should we derive the BS formula again
and that too with an altogether new theory? We would like to emphasize here
that our aim is not to derive the BS formula only, but more importantly, to get
familiarize with a more formal mathematical approach in financial instruments
pricing. On first reading the initial part of this chapter one may feel like asking,
what is really going on! We agree with you that the discussion is more mathemat-
ical than financial but we all agree that it is useful sometimes, even for financial
economists, to recognize and appreciate what the mathematicians are saying.
Our primary aim is to present two important theorems from stochastic calcu-
lus namely, the Girsanov’s theorem and the Feynman-Kac theorem. It is worth
mentioning that both the theorems provide glimpse of confluence of mathematical
312 Financial Mathematics: An Introduction
theory with financial theory. The theorems initially appear to be a somewhat com-
plex rules but enable us to perform change of probability measure in stochastic
processes. The same will subsequently be applied to obtain the BS formula. The
two theorems provide a rigorous analytical framework to build not only the BS
formula but many other financial theories. They also open the gates to look for
advance theories in probability measure theory and partial differential equations
from the perspective of financial concepts. We urge the readers to go through the
entire chapter to appreciate the presented approach.
qi
We now define a non-negative random variable Z on Ω as, Z(ωi ) = , i = 1, 2, 3.
pi
Then Z can be written as
3
qi
Z(ω) = 1Ai (ω),
i=1
pi
where 1Ai is the indicator function of the set Ai (which takes value 1 only when
ω = ωi else value 0). With this Z, note that we can rewrite (10.1) and (10.2) as
follows
where EP denotes the expectation operator with respect to measure P, and EP (Z/
C) denotes the conditional expectation of Z given C ∈ F with respect to P.
Since EP (Z) = EP (Z/Ω) = 1, we have, in general,
Finally, we have, EQ (Y) = EP (ZY), for any random variable Y on Ω. The random
variable Z can be thought of as a constructed random variable such that Z(ω) =
Q(ω)
, ω ∈ Ω. Note that we have specifically assumed P(ω) > 0 in the above
P(ω)
discussion to avoid division by zero. However, it is not hard to overcome this
point as we can also write Z(ω)P(ω) = Q(ω). Note that if Z(ω) > 1 (or Z(ω) < 1)
for some ω ∈ Ω, then Q(ω) > P(ω) (or Q(ω) < P(ω)), that is, the probability
P(ω) is revised upwards
* (or downwards). Also, for Q to be a probability measure,
we must have, ω∈Ω Z(ω)P(ω) = 1. Thus the random variable Z is acting as a
revision factor such that EP (Z) = 1.
We now have a set-up to build on and extend the above idea to an arbitrary
set Ω, which may be uncountably infinite. But before that, we define equivalent
measures.
314 Financial Mathematics: An Introduction
for all F -measurable functions X such that the integrals exist. In this case,
(Ω, F , Q) is also a measure space, but in general Q(A) P(A), A ∈ F . Note
that if P(A) = 0, then 1A (ω)Z(ω) = 0 almost surely with respect to P, implying,
Q(A) = 0. Opposite is not true. For instance, if Z(ω) = 0 for ω ∈ A where P(A) > 0
then Q(A) = 0 despite P(A) 0. But if Z(ω) > 0, ∀ ω ∈ Ω, then it works both
ways.
Definition 10.2.1 (Equivalent measure) Let (Ω, F ) be a measurable space,
that is, F is a σ-field on Ω. Consider two probability measure P and Q on F . We
say that Q is absolutely continuous with respect to P (denoted by Q P) if
P(A) = 0 ⇒ Q(A) = 0, ∀ A ∈ F .
(ii) Z is P-unique almost surely means that if there is another version of Z, say
Z∗ , then we have P({ω ∈ Ω : Z(ω) = Z∗ (ω)}) = 1. That is the set where Z and
Z∗ do not agree has a measure zero with respect to P.
(iii) The Radon-Nikodym states the existence of Z without providing an explicit
expression for Z. In a practical situation, Z is estimated by a series of obser-
vations or simulation of sufficiently large sample size.
(iv) The expectations of any random variable X : Ω → Rn under the two proba-
bility measures are related by
, ,
f (ω)dQ(ω) = f (ω)Z(ω)dP(ω), A ∈ F .
ω∈A ω∈A
fY (g(X))g (X)
Z= .
fX (X)
ekx
Z(x) fX (x) = λe−λx
MX (k)
λ − k
= ekx λe−λx
λ
= (λ − k)e−(λ−k)x .
We simply write it as
Q(A) = EP (EP (Z/F )1A ), A ∈ F.
The much awaited Girsanov theorem is stated as follows. The result was first
proved by R. H. Cameron and W. T. Martin in the 1940’s and by I. V. Girsanov
in 1960.
Theorem 10.3.1 Let (Ω, F , P) be a probability space. Let {W(t), 0 ≤ t ≤ T} be a
Brownian motion with associated filtration {Ft , 0 ≤ t ≤ T}. Let {θ(t), 0 ≤ t ≤ T} be
an adapted measurable process adapted to the filtration {Ft , 0 ≤ t ≤ T}. Define
,
:t 1 t 2
− 0 θ(t)dW(t)− θ (t)dt
Z(t) = e 2 0 , t ∈ [0, T].
Let Z = Z(T). A sufficient condition, known as the Novikov’s condition, for Z to
be a martingale is that , T
1 θ2 (t)dt
E e2 0 < ∞.
EQ (Y) = EP (YZ)
= EP (Y(t)Z(t)/Ft )
= EP (YEP (Z(t)/Ft ))
= EP (YZ(t)).
(
d(W(t)Z(t)) ( + W(t)dZ(t)
= Z(t)dW(t) ( (
+ dZ(t)dW(t)
(
= Z(t)(dW(t) + θ(t)dt) + W(t)(−θ(t)Z(t)dW(t))
+(−θ(t)Z(t)dW(t))(dW(t) + θ(t)dt)
(
= Z(t)dW(t) − θ(t)W(t)Z(t)dW(t)
(
= (1 − θ(t)W(t))Z(t)dW(t),
where in the second last relation we have used that dW(t)dW(t) = dt and dW(t)dt =
0. Therefore,
, t , t
(
W(t)Z(t) (
= W(0)Z(0) + Z(t)dW(t) − (
θ(t)W(t)Z(t)dW(t).
0 0
The two Ito integrals on the right hand side of the above expression are
(
martingale, so the process W(t)Z(t) is a martingale under P. Therefore, for any
0 ≤ s ≤ t ≤ T, we have,
320 Financial Mathematics: An Introduction
(
EP (W(t)Z(t)/F (
s ) = W(s)Z(s).
Now, for 0 ≤ s ≤ t ≤ T,
(
EQ (W(t)/F (
(s)) = EQ (1A W(t)), A ∈ Fs
(
= EP (1A W(t)Z(t))
(
= EP (1A EP (W(t)Z/F s ))
1 (
= EP ((1A EP (W(t)Z(t)/F s ))Z(s))
Z(s)
1 (
= EQ (1A EP (W(t)Z(t)/Fs ))
Z(s)
1 (
= EP (W(t)Z(t)/F s)
Z(s)
1 (
= W(s)Z(s)
Z(s)
(
= W(s).
( is a martingale under probability measure Q.
Hence, W(t)
(
Invoking Levy’s theorem (Theorem 9.7.2), with notices that dW(t)d ( = dt,
W(t)
:t
and 0 θ(t), being a Riemann integration, is a continuous function of t, it follows
( is a Brownian motion. The process {W(t),
that W(t) ( 0 ≤ t ≤ T} is adapted to the
filtration {Ft , t ≥ 0}.
We also let β(t) to be the price of some risk-free asset which satisfies the fol-
lowing ordinary differential equation
Discounted Portfolio Process 321
Then,
dV(t) = a(t)dS(t) + b(t)dβ(t).
The discounted price of one share of stock is (
S(t) = e−rt S(t), t ∈ [0, T]. Apply the
Ito’s Lemma given at (9.20) on it, we have,
d(
S(t) = −re−rt S(t)dt + e−rt dS(t)
= −re−rt S(t)dt + e−rt S(t)(µdt + σdW(t))
=(S(t)((µ − r)dt + σdW(t))
= σ( (
S(t)dW(t) (10.5)
µ−r
where we denote by W(t)( = W(t) + t, t ∈ [0, T]. A natural interpretation for
σ
µ−r
is that the numerator is the expected return minus the risk-free rate, or the
σ
risk premium while the denominator is risk. The ratio is the risk premium per
unit of risk and is called the market price of risk . In other words, the ratio reflects
the additional expected return necessary to induce investors to take risk. Refresh
that in the study of general market equilibrium such as the capital asset pricing
model (CAPM), the appropriate risk is the systematic risk beta. It is important
to note that in (10.5), the drift term is completely removed. When we remove the
drift, what we are doing is removing the risk premium and the risk-free rate.
From the Girsanov’s
: theorem (Theorem 10.2.1), there exists an equivalent mea-
sure Q, Q(A) = A Z(T)dP, A ∈ F , and Z is a Radon-Nikodym derivative of Q
( into the Brownian motion with respect to Q.
with respect to P, which turns W(t)
Note that, EP (W(t)) = 0, on account of W(t) being a Brownian motion under P,
µ−r
(
hence, EP (W(t)) = ( is not a Brownian motion under P but it is
t. Thus, W(t)
σ
a Brownian motion under Q. Furthermore, solution of equation (10.5), given by,
−σ2 (
t + σW(t)
(
S(t) = (
S(0)e 2 , t ∈ [0, T],
322 Financial Mathematics: An Introduction
is a martingale under Q. This measure Q is nothing else but the risk neutral
probability measure (RNPM) (the probability measure in which the discounted
( as a convention.
price process becomes a martingale). We shall be denoting Q by P,
To summarize, we have to adjust the drift of the stock price process by changing
the probability measures such that we obtain a martingale.
Theorem 10.4.1 There exist two probability measures namely the Wiener mea-
( such that the stock price S(t) at time t, has
sure P and the risk neutral measure P,
the following properties:
(i) In probability measure P, S(t) = S(0)eµt+σW(t) .
( S(t) = S(0)e(r− σ2 )t+σW(t) (
2
(
(ii) In probability measure P, , and {W(t), t ≥ 0} is a Wiener
process.
Consider the discounted portfolio value process
( = e−rt V(t) = e−rt a(t)S(t) + b(t)β(t) .
V(t)
Then,
( = −re−rt V(t)dt + e−rt dV(t)
dV(t)
= −re−rt a(t)S(t) + b(t)β(t) dt + e−rt (a(t)dS(t) + b(t)dβ(t))
= a(t)e−rt (−rS(t)dt + dS(t)) + b(t)e−rt −rβ(t)dt + dβ(t)
= a(t)d(
S(t),
where the last relation follows on account of (10.4) and definition of ( S(t). Thus,
the increment in the value of the discounted portfolio process {V(t), 0 ≤ t ≤ T} is
coming from the discounted stock price process {S(t), 0 ≤ t ≤ T}. Also, note that
( = V(0). We thus have
V(0)
, t
(
V(t) = V(0) + a(t)d(
S(t)
0
, t
= V(0) + σ a(t)( (
S(t)dW(t).
0
(
Since W(t) ( and a(t)(
is a Brownian motion under P S(t) is an adapted process to
( constitutes a martingale. Therefore,
Ft , t ∈ [0, T], hence V(t)
V(t) (
( = E( V(T)/F t , t ∈ [0, T],
P
Risk Neutral Pricing Formula 323
implying
V(t) = ert EP( e−rT V(T)/Ft , t ∈ [0, T]. (10.6)
It is easy to note that V(0) = EP( e−rT V(T) , because F0 = {∅, Ω}.
1
X(T) = 1A , A ∈ FT = F ,
D(T)
(
EP(1 (V(T)) = V(0),
(
EP(2 (V(T)) = V(0).
The above is called the risk neutral pricing formula. The applicability of the
risk neutral pricing formula is much wider than only the BS formula as many of the
assumptions made in say CRR model had been dropped here and we are working
in a more general scenario for any derivative security which can be hedged.
Furthermore, the price of derivative security at t = 0 is
X(0) = EP((e−rT X(T)).
We urge you to take a pause and carefully examine the above formula. Is it not
somewhat a familiar expression that we have came across earlier too, for instance,
Risk Neutral Pricing Formula 325
in Chapter 3; except that there we have used notations V(0) for what is X(0) here
and VRP (t) for what is V(t) in the above discussion. The readers must appreciate
that all the hard mathematics concepts finally lead to a path we are so familiar
with, giving us a hope that we are close to deriving the BS formula. The discussion
to follow is the ultimate in this context.
Consider a European call option derivative security on the stock. The payoff,
to the holder of call, on maturity is X(T) = C(T) = (S(T) − K)+ , K is the strike
price of call.
C(t) = ert EP((e−rT C(T)/Ft ), t ∈ [0, T]
−r(T−t) +
= EP((e (S(T) − K) /Ft ). (10.9)
Since S(t) is a random process following the generalized Brownian motion under
(
P, we have,
(
S(t) = S(0)eσW(t)+(r−(σ /2))t ,
2
t ∈ [0, T].
In other words,
( (
S(T) = S(t)eσ(W(T)−W(t))+(r−(σ /2))(T−t) ,
2
t ∈ [0, T].
Let ς = T − t. Then,
( (
C(t) = EP((e−rς (S(t)eσ(W(T)−W(t))+(r−(σ /2))ς − K)+ /Ft ).
2
(
Now, W(T) (
− W(t) is the increment in the Brownian motion for remaining
time ς, so it is independent of Ft . Moreover S(t) is Ft -measurable. Thus, the
conditional expectation in immediate above formula is superfluous. Furthermore,
(
W(T) ( ∼ N(0, τ). Setting
− W(t)
(
W(T) (
− W(t)
−Y = √ ∼ N(0, 1),
T−t
we get,
√
C(t) = EP( e−rς (S(t)eσ ς(−Y)+(r−(σ /2))τ − K)+
2
, ∞
1 √ 1 2
e−rς (S(t)eσ ς(−y)+(r−(σ /2))τ − K)+ e− 2 y dy.
2
= √
2π −∞
From now onwards, mimic the argument used in the proof presented in Section
4.3, Chapter 4 to get the Black-scholes formula for European Option. We skip its
redoing here.
326 Financial Mathematics: An Introduction
The Feynman-Kac theorem can be used in both directions. That is, if we know
that X(t) follows the SDE (10.10) and we are given the PDE with boundary
condition, then we can always obtain the solution g(t, x) as described in (10.12).
On the other hand if we know that the solution g(t, x) is given by (10.12) and that
X(t) follows the process in (10.10), then we are assured that g(t, x) satisfies the
PDE (10.11).
The generator of the process in (10.10) is defined as the operator
∂ 1 ∂2
A = µ(t, x(t)) + σ2 (t, x(t)) 2 .
∂x 2 ∂x
The PDE (10.11) can then be rewritten as follows
∂g
+ Ag − rg = 0.
∂t
Let us now examine how we can apply the above theorem to derive the BS
formula for a derivative security.
Let the stock price S(t) be driven by the process
Feynman-Kac Theorem and BS Formula 327
price K. Note that (10.16) is same as (10.8) except that in (10.8) the notation has
been X(t) for denoting derivative price on stock whose price is S(t). Now onwards,
we can follow the discussion of Section 10.5 after relation (10.8) to get the BS
formula for price of European call option. We leave it for readers to recognize that
all approaches for derivative pricing finally lead to a similar result.
It is also a good opportunity to have a closer look at the Black-Scholes (BS)
PDE (10.16). We would like to mention here that the PDE like (10.16) has moti-
vated a community of numerical analysts to play a serious role in financial math-
ematics. Though, this PDE has a closed form solution which is nothing but the
BS formula, there are many other PDE’s which naturally arise in financial theory
which require efficient numerical methods to come up with their analytic solutions.
We sketch below the method by which the closed form solution of (10.16) can
be obtained.
Introduce the variables τ = ν(T − t) and ξ = α(ln( Kx ) + β(T − t)), where the
constants α, β, and ν will be appropriately chosen later. Define
C(t, x) = e−r(T−t) y(τ, ξ).
Then,
∂C
Ct = = re−r(T−t) y + e−r(T−t) yt
∂t
= re−r(T−t) y + e−r(T−t) (−αβyξ − νyτ ).
∂C
xCx = x = xe−r(T−t) yx = e−r(T−t) αyξ .
∂c
∂Cx ∂ α
Cxx = = (e−r(T−t) yξ ).
∂x ∂x x
The latter on some simplification yields
We left the above for the readers to verify. Since C satisfies the BS PDE,
1
Ct + rxCx + σ2 x2 Cxx − rC = 0,
2
we must have,
1
−αβyξ − νyτ + ναyξ + σ2 (α2 yξξ − αyξ ) = 0. (10.17)
2
Summary and Additional Notes 329
• We have discounted the stock price and then worked with the discounted stock
price to change its probability measure to RNPM. We then evaluate the option
price by applying the probability distribution of the discounted stock price to
the option payoff on maturity. In this way the option price is its expected payoff
at expiration without discounting. Mathematicians prefer to convert the stock
price to a martingale, requiring that the discounting be done beforehand.
On the other hand, without discounting the stock price, we could have worked
(
directly with the SDE dS(t) = rS(t)dt + σS(t)dW(t), instead of dS(t) = µS(t)dt +
σS(t)dW(t) (see, (10.3)). We would then have first evaluated the expected op-
tion payoff at expiration and do the discounting thereafter only. The financial
economists prefer to follow this path. That is because it is in line with the idea
that the price of any asset is its expected future value discounted to the present
at an appropriate rate.
Whichever approach one adopts, the appreciating point is that fundamental
process of taking the expectation would not have altered except the simple
linear adjustment by e−rt . The same is depicted in (10.9).
• The other important highlight of the chapter is application of the Feynman-Kac
theorem in derivative pricing. Again, this theorem also has many variants. It
has been proved for multidimensional case as well as in variety of settings. We
again urge the interested readers to explore the abstract world of this theorem
on web.
• The chapter also emphasize that be it economists, financial engineers, hard-core
mathematicians, optimization researchers, numerical analysts, or real practi-
tioners, all have significant role to play in developing the subject of finance.
The beautiful confluence of many streams of mathematics into one is indeed
remarkable. All this make ‘finance’ awesome.
10.8 Exercises
1. Let X be a random variable such that X ∼ N(0, 1) with respect to a
probability measure P. Suppose Q is another probability measure such that
1 2
EQ (X) = EP (XG), where G = e−γX− 2 γ . Prove that X ∼ N(−γ, 1) with respect
to Q.
(Note: Changes of probability measures can be used to shift the means of
random variables.)
2. (a)Let X ∼ N(µ, 1). Define another random variable Y = X + θ. What is the
new probability measure Q such that Y, under Q, has the same distribution
Exercises 331
dQ σ1 (X − µ1 )2 (X − µ2 )2
(X) = exp − .
dP σ2 2σ21 2σ22
dQ
= exp((1 − λ)T + N(T)ln(λ)).
dP
Prove that {N(t), 0 ≤ t ≤ T} is a poisson process under Q with rate λ. Also
see, Klebaner [78].)
5. Let {W(t), t > 0} be a Brownian motion on probability measure space (Ω, F , P).
Find probability measure(s) Q on (Ω, F ) that is mutually absolutely continu-
ous with respect to P, and under which the following Y(t) becomes martingale
(a) Y(t) = 2dt − 3dW(t) (b) Y(t) = 4dt + dW(t), (c) Y(t) = −2dW(t).
6. Let {W(t), t > 0} be a Brownian motion with respect to a probability measure
P and associated filtration {F(t), t > 0}. If dW(t) = θdt + dW(t), then prove
that there exists a probability measure Q such that W(t) is a Brownian motion
with respect to Q.
1 2
(Hint: Take Z(t) = e−θt− 2 θ t , in the Girsanov theorem.)
(Note: The above result is a particular case of the Girsanov theorem when the
drift term is a constant θ. However, we encourage the readers to rework the
proof independently.)
7. Let Q be a risk-neutral probability measure for the investor investing in rupees.
Suppose the dollar/rupee exchange rate Y(t) obeys a stochastic differential
equation
dY(t) = µ(t)Y(t)dt + σ(t)Y(t)dW(t).
332 Financial Mathematics: An Introduction
If the risk-less rates of return for dollar investors and rupee investors are rd (t)
and rrp (t), then describe the exchange rate Y(t) under Q? (Hint: Note the drift
is rd (t) − rrp (t), then use the Girsanov theorem.)
8. Let W(t) = (W1 (t), W2 (t)), t ≤ T} , be a 2-dimensional Brownian motion on
probability measure space (Ω, F , P). Find an equivalent probability measure
Q on (Ω, F ) under which the following Y(t) becomes a martingale
2 1 1 dW1 (t)
Y(t) = dt + , t ≤ T.
4 1 −1 dW2 (t)
(Hint: Explore the multidimensional version of the Girsanov’s theorem and use
it.)
11
Interest Rates and Interest Rate Derivatives
11.1 Introduction
Up till now we have assumed that the market interest rate is constant and ob-
servable. Be it a general concept building or a discussion on option pricing theory,
European or American, the market interest rate is taken to remain the same from
the time of purchase/investment till the maturity of the instrument. This has
greatly simplified the discussion on pricing and hedging of these instruments. In
practice, however, it cannot be the case when options and derivatives are written
either on interest rates or specifically on securities whose values depend on inter-
est rates (e.g., bond options, swaps, caps, floors etc.). Any fluctuation in interest
rates can bring risk to investment on these instruments, and it is exactly what an
investor is looking to hedge. It thus makes no sense to assume a constant interest
rate. In fact pricing and hedging interest rate derivatives require models that can
describe the evolution of the entire term structure of interest rates. This provides
a good reason to study and understand what and how the interest rate theory is
build.
One thing that we have to understand is that besides equity derivatives (like
the one we had studied in option pricing theory or portfolio optimization), there
are several other types of derivatives available in the market. Among them, the
interest rate derivatives are the popular ones. The interest rate derivatives market
constitutes the largest derivatives market in the world. An interest rate derivative
is a derivative where the underlying asset is the right to pay or receive a notional
amount of money at a given interest rate. Interest rate theory is fairly complex and
requires greater insight so much so that it is generally covered only in Appendix of
few introductory textbooks on financial mathematics till late. Except when either
the text is aimed for advance level learning/modelling or it primarily focus on
interest rate theory, there are not much references available to report at a very
334 Financial Mathematics: An Introduction
basic level. The difficulties are mainly due to the very nature of the fixed income
instruments, implementation and calibration of various models, and above all the
interest rate theory is not standardized yet. There is no well-accepted standard
general model that can capture the dynamic of interest rates like the Black-Scholes
model for equities.
This chapter is not intended to present the entire contents of otherwise a vast
theory on interest rates, but rather to briefly provide the fragments of it so that
the readers can appreciate the significance of the concept as a whole. Of course,
we cite here few good books and notes [21, 24, 87, 92, 127, 146, 149] on interest
rates that have been published in the last decade or so.
debt investment and because of a relative low risk of default they do not usually
come with a high interest rate and would largely be for investment rather than
speculation purpose.
Unless otherwise stated, by a bond we shall mean a coupon bond. Whenever
we wish to talk about a zero coupon bond we shall be stating it explicitly as ZCB.
We shall be making two assumptions in the sequel. One that the issuer of the
bond fulfills all commitments made at the issue date of bond, and second that the
ZCBs are traded for all maturities T in the market.
We shall be using the notation B(t, T) to denote the price of a T-maturity bond
at time t, 0 ≤ t ≤ T.
Let the redemption date of a ZCB be T, and assume that its face value is Rs
1, that is, B(T, T) = 1. Note that B(T, T) might be less than 1 if the issuer of the
T-bond defaults. If the market rate of interest (continuously compounding) r is
constant, then
B(t, T)er(T−t) = 1,
equivalently,
B(t, T) = e−r(T−t) .
If the face value of a ZCB is F then B(t, T) = Fe−r(T−t) .
Example 11.2.1 A ZCB is issued by the government on August 16, 2010 at an
interest rate 2.53% payable semiannually. The bond will mature on August 16,
2040, with a face value Rs 100. Find the price of the bond.
Solution This is a 30-year bond with F = Rs 100 and r = 0.0253/2 = 0.01265, T =
60 (because of semiannual payments). Hence,
Thus, one unit of the ZCB will be trading at Rs 46.81 on the issue date. So if an
investor purchases 500 units of this bond on August 16, 2010 then he has invested
Rs 23405 and will get Rs 50000 after 30 years.
Example 11.2.2 Consider the same data as in Example 11.2.1. (a) Suppose an
investor wishes to purchase the bond on August 16, 2015, then what price he has
to pay for the bond? (b) If the same bond is purchased on December 1, 2015, then
find the bond price?
336 Financial Mathematics: An Introduction
Solution (a) The main difference is in the purchase date which is after 5 years of
issue of bond. Because of semiannual payment, t = 10, and thus
(b) There are 106 days between August 16, 2015 and December 1, 2015 (assuming
that the bond can be traded on all 365 days of the year). Therefore, t = 2(5 +
(106/365)) = 10.581, where the factor of 2 is because of semiannual payment and
5 is the number of years from the issue of the bond to August 16, 2015. So,
Time
0 1 2 n−2 n−1 n
Fc Fc Fc Fc Fc + F
Fig. 11.1. Cash flow of a coupon paying bond
Example 11.2.3 Assume that a company issues a 2-year bond with a coupon rate
of 10% payable semiannually and face value of Rs 100. Find the cash flow of the
bondholder.
Bond Price 337
where ĉi = ci , (i = 1, . . . , n − 1), and ĉn = cn + F. The left side B(0, T) means price
of T-(coupon) bond and the right side B(0, ti ) means price of the ti -ZCB. In turn
it amounts to say that if we can develop and analyze the price dynamics of ZCBs
of all maturities T, then we can translate the same to analyze the dynamics of
(coupon) bonds.
Example 11.2.4 A bond is issued by the government on August 16, 2010 at an
interest rate 8.53% and coupon rate 7.5% payable semiannually. The bond will
mature on August 16, 2040, with a face value Rs 100. Find the price of the bond.
Solution The bond is a 30-years bond with c = 0.075/2 = 0.0375, r = 0.0853/2 =
0.04265, F = Rs 100, T = 60, n = 60. Hence, using the aforementioned expression,
the bond price on date of issue is given by
1 − e−(60)(0.04265)
B(0, 60) = 100(0.0375) + 100e−60∗0.04265 = Rs 87.11.
e0.04265 − 1
338 Financial Mathematics: An Introduction
Example 11.2.5 A Rs 100 face value 10-year bond and a coupon rate of 8%
payable semiannually is purchased by an investor at the price of Rs 98. Find the
annual market rate of interest on the bond.
Solution We need to find r given that T = 20 (because of semiannual data),
B(0, 20) = Rs 98, F = Rs 100, c = 0.04 (again because of semiannual coupon).
Thus,
1 − e−20r
98 = (100)(0.04) + 100e−20r .
er − 1
Set er = ξ. Then r = lnξ, and ξ is a solution of the following equation.
49ξ21 − 51ξ20 − 50ξ + 52 = 0, ξ 1.
A numerical technique can be applied to solve the above equation for ξ. We can
verify that ξ is close to 1.0415 and so r is close to 4.066%. Thus, the interest rate
is 8.132% per annum convertible semiannually.
Sometime instead of coupon rate, coupon payments of the bond are specified.
For instance, suppose it is specified that a bond makes n coupon payments per
year for T years in the amount Cn , and pays F at maturity. Let r be the annualized
T
interest rate and h = . Then the price of the bond is given by
n
C −ihr
n
B(0, T) = e + Fe−rT .
n i=1
It can be observed that bond prices have close knit relationship with the in-
terest rate r (that is B(0, T) can be treated as a function of r, although it is not
explicitly specified notational in B(0, T)), and that is precisely what we are aim-
ing to investigate in sections to follow. Some bonds have greater sensitivity to
changes in interest rates. This risk is measure by computing how its price is likely
to change when market interest rates go up or down. A bond’s modified duration,
a figure derived from several factors, measures this risk. The change in the bond
price for a unit change in the yield, r, is described by
dB C n T
= −h ie−ihr + nFe−rnh , h = .
dr n i=1 n
The percentage change in the bond price for a unit change in the interest
is called modified duration of the bond. Thus, the modified duration of bond is
Term Structure of Interest 339
1 dB
− . Modified duration is stated in years. For example, a 3-year duration
B(0, T) dr
means the bond will decrease in value by 3% if interest rate rises 1% and increase
in value by 3% if interest rate falls by 1%. For more details on this aspect, we
refer to [87].
There are many other risks involved in investing in bonds, including default
risk (when the bond issuer is unable to make interest payments and/or redemption
repayment), market risk (the risk that the bond market as a whole declines),
currency risk for foreign investors, interest rate risk (if the prevailing interest rate
rises, the price of the bond will fall making the bond less attractive), inflation
risk, to name a few. Some examples do exist where a government has defaulted on
its domestic currency debt, such as the one in Russia in 1998, and very recently
(2012) in Greece, however such examples are rare. Therefore our already assumed
condition that there is no default payment on part of the issuer does not seem out
of place. In the section to follow we shall be talking more about interest rate risk
on bonds.
B(t, T) = e−Y(t,T)(T−t) ,
that is,
1
Y(t, T) = − ln(B(t, T)).
T−t
Note that Y(t, T) > 0 since B(t, T) < 1 for t < T.
1 B(t, T)
In case if the face value of a ZCB is F then, Y(t, T) = − ln . Bond
T−t F
yield is an example of a long rate of interest.
Definition 11.3.2 (Yield to Maturity (YTM)) It is the yield earned on a
bond when the bond is held until maturity, assuming that all coupons and principal
payments shall be made on schedule.
340 Financial Mathematics: An Introduction
For a T-year annual coupon bond with face value F and purchase price B(t, T),
the YTM, denoted by Ym , is the solution of the following equation
nr
T−t
B(t, T) = F c e−ihYm + Fe−(T−t)Ym , h= . (11.2)
i=1
nr
where c is the coupon rate and nr is the remaining number of coupon payment
periods from date of purchase t to date of maturity T of the bond. Note that the
YTM, Ym , of a ZCB is same as the bond yield Y(t, T). However, for a (coupon)
bond, the YTM is not the same as Y(t, T).
Definition 11.3.3 ( Term Structure and Spot Rates) The function Y(t, T)
of two variables t and T, t < T, corresponding to ZCBs, is called the term structure
of interest rates. The yields Y(0, T), T > 0, described by the current bond prices of
T-ZCBs are called spot rates.
Let us present an example to clarify the difference between spot rates and
YTM.
Example 11.3.1 Consider a Rs 100 face value 2-year bond which pays annual
coupon at the rate of 5%. Suppose the spot rates are Y(0, 1) = 0.08 and Y(0, 2) =
0.1. Find the YTM of the bond.
Solution First we compute the bond price with the data F = Rs 100, T = 2, c =
0.05.
2
B(0, 2) = Fc e−iY(0,i) + Fe−2Y(0,2)
i=1
= (100)(0.05)e−0.08 + (100)(0.05)e−(2)(0.1) + 100e−(2)(0.1)
= Rs 90.58.
The YTM of the bond is the interest, say Ym , is computed using (11.2) as follows.
that is,
90.58 = 5e−Ym + 105e−2Ym .
After some simplification we get Ym = 0.099495 or the YTM is 9.95%.
Term Structure of Interest 341
In practice, the yield rates or term structure on bonds, Y(t, T), 0 < t ≤ T ≤ T̂,
where T̂ is some hypothetical maximum time horizon such that all ZCBs trading in
markets will mature at or before T̂, are not observable while the current (purchase)
price of the bonds can be observed from the market. What we can do then is to
solve for the YTM that equates the discounted future cash flows (coupon payments
and redemption value) to the price of the bond.
Example 11.3.2 Consider a Rs 100 face value 10-year bond issued on August 16,
2010 which pays coupon semiannually at a rate 10% on August 16 and February
16 of each year. An investor purchase a bond at a price Rs 102.54 on December
31, 2017. Find the YTM of an investor.
Solution There are 183 days between two coupon dates and 46 days between
the date of purchase December 31, 2017, and the date of the next coupon on
February 16, 2018 (assuming 365 trading days in an year). An investor will also
receive the next 6 coupon payments starting on February 16, 2018. A little insight
in the problem will give the YTM Ym of an investor as a solution of the following
equation
46
−Ym −2Ym −3Ym −4Ym −5Ym −5Ym
− Ym
102.54 = ((100)(0.05)(1 + e +e +e +e +e ) + 100e )e 183 ,
that is,
The above can be solved for Ym by some numerical technique. We can verify that
ξ is close to 0.95. The YTM per annum is thus close to 2Ym ∗ 100% = 10.26%.
The term structure of interest rates is also known as yield curve. It is a very
common bond valuation method. The yield curve is constructed by interpolat-
ing the market data of yield to maturities and the corresponding maturity dates
of finitely many benchmark fixed-income bonds. The yield curve measures the
market’s expectations of future interest rate. The short-term interest rate can be
thought of as shortest maturity yield or perhaps the overnight rate offered by the
market. The exact shape of the curve can be different at any point in time. Any
change in the shape of a normal yield curve is an indication that the investors
need to change their outlook on economy.
Under normal market conditions, wherein investors believe that that there will
be no significant changes in the economy, such as in inflation rates, and that the
342 Financial Mathematics: An Introduction
rve
interest rate %
cu
yield
maturity
Fig. 11.2. Normal yield curve
economy will continue to grow at a normal rate, the yield curve generally looks
like the one shown in Fig. 11.2.
A flat yield curve usually occurs when there are mixed indications in the mar-
ket. There are speculations that short-term interest rates will rise and simulta-
neously there are signals that long-term interest rates will fall. If the initial term
structure is flat, then the yields Y(0, T) may be independent of T as shown in Fig.
11.3(a).
The market expects long-term fixed income securities to offer higher yields than
short-term fixed income securities. This is a normal expectation because short-
term instruments generally hold less risk than long-term instruments. Sometimes,
however, abnormal conditions in the economic environment can result in short-
term interest rates rising above that offered by long-term fixed income investments.
The result is a negative yield curve depicted in Fig. 11.3(b).
interest rate %
interest rate %
yield curve
yie
ld c
urv
e
maturity maturity
(a) (b)
Fig. 11.3. (a) Flat yield curve; (b) Negative yield curve
Term Structure of Interest 343
which means that one unit of the same bond would be trading at a cost of Rs
94.18 if bond yield becomes 6%. The logarithmic return on the bond is
94.18
ln = 0.0454 or 4.54%.
90
On the other hand if the bond yield reduces to 4% after 1 year. Then the price
of the bond becomes
B(1, 2) = 100e−0.04 = 96.08,
which means that one unit of the bond will be trading at Rs 96.08. The logarithmic
return, in this case, is
96.08
ln = 0.0654 or 6.54%.
90
In case if the yield Y(1) remains the same, that is, 5.27%, then it can easily be
seen that B(1, 2) = Rs 94.87 with logarithmic return 5.265%.
344 Financial Mathematics: An Introduction
Let us ask another question. Can we get a yield of 11% on the bond after
one year? Suppose yes; then B(1, 2) = 90e−0.11 = 100.46, which is not possible as
B(1, 2) ≯ Rs 100, the face value of the bond.
When the term structure is not flat, the premium of a bond with respect to
its face value may vary with the time to maturity even for bonds with the same
coupon rate of interest. The same is illustrated in the following example.
Example 11.3.4 Consider a Rs 100 face value bond with coupon rate of 4.0%
per annum paid semiannually. The spot rates of interest in the market are given
as follows.
Find the price of the bond if it matures in (a) 6 months (b) 2 years, and (c) 4
years.
Solution (a) Note that T = 1 and Y(0, 1) = 0.03/2 = 0.015. Thus,
T
B(0, T) = Fc e−iY(0,i) + FeY(0,T)T
i=1
−0.015
= 2(e + e−(2)(0.015) + e−(3)(0.0175) + e−(4)(0.0175) ) + 100e−(4)(0.0175)
= Rs 96.92.
T
B(0, T) = Fc e−iY(0,i) + FeY(0,T)T
i=1
−(3)(0.015)
= 2(e + e−(7)(0.0175) + e−(11)(0.02) + e−(15)(0.0225) ) + 100e−(8)(0.0225)
= Rs 90.24.
Note that we have a premium bond for the case of (a) with the price higher
than the face value while for the cases (b) and (c), we have a discount bond with
the price lower than the face value.
Definition 11.4.1 (Forward Rate) The rate at time t, denoted by f (0, t, T),
such that the present price of a ZCB with maturity T can be generated at time
T by locking a present price of a ZCB with maturity t at time t. That is, the
continuously compounded forward rate for [t, T] prevailing currently (at time zero)
satisfies
B(0, T) = B(0, t)e−(T−t) f (0,t,T) .
In other words,
1
f (0, t, T) = − (ln(B(0, T)) − ln(B(0, t)))
T−t
TY(0, T) − tY(0, t)
= . (11.3)
T−t
Since B(t, t) = 1, hence
1
f (t, t, T) = − ln(B(t, T)) = Y(t, T).
T−t
Let us see some more examples and interpretation to clarify what we mean by
forward rate.
Example 11.4.2 If the one-year spot rate is 8% and the two-year spot rate is
9%, what is f (0, 1, 2)? Interpret the result from an investor perspective.
Solution From (11.3), we have f (0, 1, 2) = 2(0.09) − 0.08 = 0.10 or 10%. Consider
an individual investing in a 2-year ZCB yielding 9%. Equivalently, it is same as if
an investor receives 8% over the first year and simultaneously locks in 10% over
the second year.
Forward rate can also be viewed as an interest rate which is specified at a
current time t for a loan that will occur at a specified future date T. Forward
interest rates also include a term structure which shows the different forward
rates offered to loans of different maturities.
Example 11.4.3 Suppose you wish to take a loan of Rs 100000 for your child
admission one month from now, and you expect to have means to repay the loan
along with interest on loan after 6 months from now. Assume that the market spot
rates for 1 month and 6 months are respectively 0.35% and 0.55% . What interest
rate does the bank will offer you to construct this loan?
Solution You can arrange Rs 100000 by choosing the following strategy. Compute
the discounted value of Rs 100000 for 1 month from now, that is, (100000)e−0.0035 =
Forward Rates 347
Rs 99650.61. So, you can take a loan of Rs 99650.61 today for the period of 6
months. Invest this loan in purchasing 996.5061 units of 1-month ZCB of Rs 100
face value. This will give (996.5061)(100e0.0035 ) = Rs 100000 after 1 month from
today, and this can be used for child’s admission purpose. Now think of repaying
the loan after 6 months from today. The principal along with interest comes out
to be (99650.61)e(6)(0.0055) = Rs 102993.94.
Instead suppose you decided to wait and take a loan of Rs 100000 after 1-month
from now. In order to maintain no arbitrage position, the interest rate that the
1 102993.94
bank should charge on your loan to be paid after 5 months is ln =
5 100000
0.0059. It amounts to saying that an interest 7.08% (per annum) will be charged
by the bank on your loan. Observe that the same interest rate can be obtained
directly from (11.3).
To clarify how the forward rates are computed in real markets, we construct a
forward rate agreement (FRA). A prototypical FRA is an over the counter (OTC)
contract involving three time instants say t, T and T + τ, t < T < (T + τ), where t
is the current time, T is the expiry time, and (T + τ) is the maturity time. Suppose
that today is day t, and that at time T we want to lend Rs 1 for the period τ ,
earning the implied forward rate f (t, T, T + τ) over the interval from T to T + τ.
In other words, we want to accomplish on day t the position that allows cash
going out on day T and coming in on day T + τ. To meet this cash flow, we need
to borrow on day t with a T-day maturity (to generate a cash outflow on day
T) and lend with a T + τ-day maturity (to generate a cash inflow on day T + τ).
Moreover, we want that the borrowing and lending be equal on day t so that there
is no initial cash flow. In FRA, the same is achieved, by setting up the following
portfolio at time t.
(a) Take a short position of one unit of T-maturity ZCB having face value 1.
B(t, T)
(b) Take a long position by purchasing units of T + τ-maturity ZCB
B(t, T + τ)
having face value Rs 1.
B(t, T)
The value of the portfolio at t is B(t, T) − B(t, T + τ) = 0. At time T,
B(t, T + τ)
close the short position of Rs 1 in T-maturity bond. At later time T + τ, receive an
B(t, T)
amount from the long position in T + τ-maturity ZCB. In other words,
B(t, T + τ)
B(t, T)
under no arbitrage condition, the discounted value of at time T must
B(t, T + τ)
348 Financial Mathematics: An Introduction
be equal to Rs 1. The yield Y that can explain this payment is the forward rate
for [T, T + τ] prevailing at time t and is given by
B(t, T)
= 1 ∗ e((T+τ)−T) f (t,T,T+τ) , (11.4)
B(t, T + τ)
which means,
1
f (t, T, T + τ) = − (lnB(t, T + τ) − lnB(t, T)).
τ
We define instantaneous forward rate prevailing at time t for investing at time
T as follows.
lnB(t, T + τ) − lnB(t, T)
f (t, T) = − lim
τ↓0 τ
∂
=− lnB(t, T).
∂T
If we know f (t, T) for all 0 < t ≤ T, we can easily recover B(t, T) for all values
of 0 < t ≤ T as :T
B(t, T) = e− t f (t,s) ds .
Definition 11.4.2 The function f (0, T), T ≥ 0, of variable T, is called the ini-
tial forward rate curve, while the interest rate f (t, t) is called instantaneous
short rate that we can lock in at time t to borrow at time t.
Thus instantaneous short rate can be thought of a market rate at which the
money can be borrowed for a very short duration which could be overnight charges
to be repaid later. Suppose a bank in India requires an amount say S to clear their
liabilities due for next day, and presently it is not in a position to arrange for S on
its own. Then the bank can take an overnight loan S1 from say RBI (central bank
of India) at an instantaneous rate (viz., 1 day lending rate) offered by the RBI
which makes S1 grows to S overnight. The bank can repay the amount S1 along
with its interest to RBI at some later date on the borrowing rate offered by RBI.
Now, if instead of continuously compounding forward interest rate, we take
simple interest rate to explain the heretofore payment of FRA in (11.4). Let the
simple interest rate be L(t, T). Then, by the similar argument leading to (11.4),
we have
B(t, T)
= 1 + τL(t, T),
B(t, T + τ)
equivalently,
Forward Rates 349
1 B(t, T)
L(t, T) = −1 .
τ B(t, T + τ)
For 0 ≤ t < T, L(t, T) is the forward LIBOR, the interest rate locked at time
t for investment over the time [T, T + τ]. Also, L(T, T) is called spot LIBOR (or
simply LIBOR), and τ is tenor of LIBOR and is usually taken as 0.25 or 0.5 year.
Many large financial institutions trade with each other deposits for maturities
ranging from just overnight to one year at a given currency. These are traded
on market interest rates. The most commonly used market interest is LIBOR
(London Interbank Offered Rate). The LIBOR is the rate at which financial insti-
tutions are willing to lend, on average (actually there are fifteen different LIBOR
rates for fifteen maturities: overnight, one week, 2 weeks, one month, and so on).
It is an average indicative quote of the interbank lending market. It is calculated
by Thomson Reuters for ten currencies (including USD, AUD, GBP, DKK, EUR,
CAD, JPY, to name a few), and published daily by the British Bankers Associ-
ation. On the other hand, the LIBID (London Interbank Bid Rate) is the rate
that these financial institutions are prepared to pay to borrow money, on average.
Normally, LIBID < LIBOR. The LIBOR is a fundamental point of reference to
financial institutions. Moreover, many fixed income instruments like, forward rate
agreements or mortgage rates, are indexed to the LIBOR. The following is a small
example of EUR LIBOR interest rates having maturity 1 day
When reference is made to the Indian interest rate this often refers to the MI-
BOR (Mumbai Inter-Bank Offered Rate) and MIBID (Mumbai Inter-Bank Bid
Rate). The MIBOR was launched on June 15, 1998 by the Committee for the
Development of the Debt Market, as an overnight rate. The National Stock Ex-
change of India launched the 14-day MIBOR on November 10, 1998, and the one
month and three month MIBORs on December 1, 1998. Since the launch, MIBOR
rates have been used as benchmark rates for the majority of money market deals
made in India.
Besides bonds, there are various other interest rate derivative (debt) instru-
ments in the market. Some are easy to understand and model while some others
350 Financial Mathematics: An Introduction
involves complex financial intricacies. The interest rate derivative market is enor-
mous and it is not possible for us to cover all aspects of it in one chapter. Still we
would like to briefly touch upon interest rate swaps, one of the largest and fastest
growing derivative instruments. A “plain vanilla” interest rate swap is a contract
between two parties, often called counter-parties, in which they agree to exchange
their interest payments of two different kind on a predefined principal amount,
on a periodic basis over the fixed time period. Typically payments made by one
counter-party is based on a fixed interest rate for the term of the contract, while
payments made by other counter-party is based on floating interest rate for the
same term. In doing so, the principal amount is not physically exchanged rather
interest payments are exchanged on notional principal. It is a convention to des-
ignate a fixed-rate counter-party as the buyer of the swap and the floating-rate
counter-party is the seller of the swap. This type of contract is based on the needs
and estimates of the level and changes in interest rates during the period of the
swap contract.
For example, when we say a 3-year 8% fixed for six-month LIBOR floating
Rs 10 lakh swap we mean a fixed-rate party is required to pay 8% fixed-rate
interest on a notional principal of 10 lakh to a floating-rate party in exchange for
a variable-rate interest that depends on a pre-specific six-month LIBOR rate on
10 lakh, and the transaction is to be settled every six months. Fig. 11.4 explains
this kind of transaction.
Fixed Rate
Counterparty A Counterparty B
buyer seller
Floating Rate
Example 11.4.4 A swap buyer company A and the swap seller company B enter
into a 5-year swap on January 10, 2001, on terms that company A pays fixed
interest 3.25% annually on the notional principal of Rs 10 lakh and company B
Forward Rates 351
The net cash flow figures shown in the last column above are expressed from
company A’s point of view and indicate that company A must pay company B
on each of the last two payment dates. On first three payment dates, since the
floating payment received by company A exceeds the fixed payment, company
A will receive a net cash inflow on these dates, while in last two payments it is
company B that is benefitted . All payments are in rupees.
Another very basic interest rate derivative is repo. A repo (Repurchase agree-
ment) is a way of borrowing against a collateral. Suppose a financial institution
A borrows money from another financial institution B (usually banks or RBI; in
context to follow we assume it to be RBI) to meet its short term needs by selling
certain securities (like bonds) with an agreement that it will buy back the security
at some fixed point in future time (the next day, after a week, etc.) at a predeter-
mined price. It is equivalent to saying that A gets a loan against a collateral (the
security) and pays an interest rate to B. The following figure captures this idea.
Repo rate is also called short term lending rate. When the repo rate increases
borrowing from Reserve Bank of India (RBI) becomes more expensive. Therefore,
we can say that in case, RBI wants to make it more expensive for the banks to
borrow money, it increases the repo rate; similarly, if it wants to make it cheaper
for banks to borrow money, it reduces the repo rate. If banks are short of funds
they can borrow rupees from the RBI at the repo rate, the interest rate with a 1
day maturity. On the other hand, the reverse repo rate is the interest rate that
352 Financial Mathematics: An Introduction
asset (start)
Rs 10000 (start)
Bank A Bank B
borrower lender
Rs 10000 + 5%
asset (end)
banks receive if they deposit money with the RBI. This reverse repo rate is always
lower than the repo rate. The RBI uses the reverse repo rate tool when it feels
there is too much money floating in the banking system. Increases or decreases
in the repo and reverse repo rate have an effect on the interest rate on banking
products such as loans, mortgages and savings. An increase in the reverse repo
rate means that the RBI will borrow money from the banks at a higher rate of
interest. As a result, banks would prefer to keep their money with the RBI. To
conclude the discussion on short-term interest rate, we would like to bring to
the readers note a very common news line when the repo rate and/or reversed
repo rate are changed: “The RBI today hiked short-term lending and borrowing
rates sharply by 50 basis points”. What we mean by one basis point is 1/100 of
1%, so, 50 basis point means 0.5% change in the existing short-term lending and
borrowing interest rates.
therefore, “can our most reliable friend, the binomial lattice, be again called to
help us to model the short rate?” Luckily, the answer is YES.
We first partition the time horizon for which we intend to design short term
interest rate into finite number of periods, like per day or per week etc. The
lattice is drawn in a right angled triangle form; it is assumed that at time t, the
two branches from any node are either in “up” state or in “flat” state. An index
i is used to denote how many ups have been taken to reach the node. Thus, each
node of the lattice is indexed by a pair (t, i), where t is the time and i is the node
index at time t. Then, each node (t, i) is assigned a short rate rti ≥ 0. Fig. 11.6
shows a binomial lattice model say for 5-day short rate interest with a basic time
span of 1 day. The time t is shown at the bottom of the lattice while index i
denotes the ups (or height) of that node.
3 3
2 2 2
i
1 1 1 1
0 0 0 0 0
0 1 2 3 4
t
Fig. 11.6. Binomial lattice model for term structure
This lattice forms the basis for pricing interest rate securities by using the mar-
tingale pricing. For example, if Sti is the value of a non-dividend/coupon paying
security at time t and state i, then we insist that
1
Sti = (pu St+1,i+1 + pd St+1,i ), (11.5)
1 + rti
where pd = 1 − pu and pu is the risk neutral probability measure (RNPM). Since
the probability pu is assigned rather than computed, here, it is a convention to
take it 12 . Observe that such a model is arbitrage-free by construction.
If the security pays a coupon/dividend Cti at node (t, i), then formula (11.5)
should be taken as follows.
354 Financial Mathematics: An Introduction
1
Sti = (pu St+1,i+1 + pd St+1,i ) + Cti .
1 + rti
Example 11.5.1 Suppose the current short term rate is 7% per annum, and the
up factor is 1.2 while the flat factor is 0.8. Construct a short term interest lattice
for 6 years from now.
Solution The lattice is described in Table 11.1.
0.1742
0.1452 0.0116
0.1210 0.0968 0.0774
0.1008 0.0806 0.0645 0.0516
0.084 0.0672 0.0538 0.043 0.0344
0.07 0.056 0.0448 0.0358 0.0287 0.0229
t=0 t=1 t=2 t=3 t=4 t=5
Table 11.1. Short term interest rate lattice
Example 11.5.2 Assuming the data of Example 11.5.1, find the price of a ZCB
maturing 4-years from now.
Solution The term structure lattice is described in Table 11.1. To compute the
bond price of a 5-year ZCB, we assign a face value Rs 1 to this bond at t = 4.
Then we work backwards and use formula (11.5) with rti read from Table 11.1.
1.0000
0.8921 1.0000
0.8255 0.9254 1.0000
0.7858 0.8782 0.9490 1.0000
0.7642 0.8496 0.9161 0.9654 1.0000
t=0 t=1 t=2 t=3 t=4
Table 11.2. 4 year ZCB price
For instance, the top entry in second column , i.e. node (1, 1), is computed using
1 1
r11 = 0.084 from Table 11.1 and formula (11.5), to get, (0.8255) +
1 + 0.084 2
1
(0.8782) = 0.7858.
2
Binomial Lattice Approach for Term Structure 355
The option contracts in which the underlying asset is a bond are called bond
options. We have to realize that the price of a call bond option on bond increases
and the price of a put bond option on bond decreases as the short-term interest rate
rises (through the impact of short term interest rate on underlying bond price).
There is no significant difference between the stock options and the bond options
except that the underlying asset is the bond than the stock. Another characteristic
difference to appreciate is that bonds are long term investment and because of a
very low risk of default, they are more for an investment rather than speculation.
This is why the bond options are traded on OTC (over the counter) basis, that is,
traded between two private parties and are not listed on an exchange, unlike stock
options which are highly speculative and generally traded in exchange markets.
Despite this difference, the mathematics of bond option price is similar to the one
we are already familiar with stock option pricing.
For instance, let us look at the put-call parity for European bond options.
Consider a portfolio where we purchase one ZCB, take a short position (sell) on
one European call bond option, and take a long position (buy) on one European
put bond option, both bond options have same time to maturity T and same strike
price K.
At t = 0, the portfolio worth is V(0) = B(0, 0) − C(0, 0) + P(0, 0), where B(0, 0)
denotes the current price of a ZCB, and C(0, 0), P(0, 0) are prices of a call option
and a put option on the underlying ZCB at t = 0 respectively. At T, the value of
this portfolio is
B(0, T) + K − B(0, T) , if B(0, T) < K
V(T) =
B(0, T) − (B(0, T) − K) , if B(0, T) ≥ K.
Thus, no matter what is the state, the portfolio is worth K at time of expiration.
With no arbitrage in force, the payoff from the portfolio is risk-free, and we can
discount its value at the spot rate Y(0, T), YTM of a T-ZCB, to get B(0, 0)−C(0, 0)+
P(0, 0) = Ke−Y(0,T)T , which is similar to the put-call parity for stock options.
Let us investigate pricing of bond option through some simple examples.
Example 11.5.3 Compute the price of a European call option on the ZCB of
Example 11.5.2 that expires in 3 years and has strike price Rs 93.
Solution At time of expiry t = 3, K = 93, hence C(i, 3) = Max{100 B(i, 3) −
K, 0}, i = 0, 1, 2, 3. The same is depicted in the last column in Table 11.3. There-
after the call price is computed iterating backward and using (11.5).
356 Financial Mathematics: An Introduction
0
0 0
0.4102 0.8894 1.8983
0.9643 1.6534 2.6025 3.54
t=0 t=1 t=2 t=3
Table 11.3. 3-year European call bond option price
The European call price is Rs 0.9643 on a ZCB governed by short term interest
rates given in Table 11.1.
Example 11.5.4 Compute the price of a European put option on the ZCB of
Example 11.5.2 that expires in 3 years and has strike price Rs 93.
Solution We shall work out the put price using Table 11.2 and formula (11.5),
working backward. The Table 11.4 depicts the complete working of the same.
3.7908
1.9318 0.4622
0.9909 0.2166 0
0.511 0.1025 0 0
t=0 t=1 t=2 t=3
Table 11.4. 3-year European put bond option price
By now we have realized that the term structure dynamics is characterized by
the evolution of the short term interest rates. In the sections to follow we shall
be describing two basic models which capture the dynamics of short term interest
rates and consequently the bond pricing.
Moreover, as we have
, t
−βt α
r(t) = r(0)e + (1 − e−βt ) + σe−βt eβs dW(s), (11.8)
β 0
The immediate above relation needs a bit of explanation. Recall that from
the definition of the Brownian motion : t the increments
Wi = W(ti+1 ) − W(ti )
*
have variance ti+1 − ti . So, for I = 0 f (s)dW(s) i f (ti )
Wi , the variance is,
* * :t
Var(I) = i f 2 (ti )Var(
Wi ) = i f 2 (ti )(ti+1 − ti ), implying, Var(I) = 0 f 2 (s)ds. We
have used this fact in computing Var(r(t)) from (11.8). Next on simplifying the
Var(r(t)) relation, we get
σ2
Var(r(t)) = (1 − e−2βt ). (11.9)
2β
Moreover, as the increments of the Brownian motion are independent and normally
distributed, so from (11.9), r(t) is also normally distributed.
The above discussion can be summarized to explicitly state the distribution of
r(t) as follows.
α σ2
r(t) ∼ N r(0)e−βt + (1 − e−βt ) , (1 − e−2βt ) .
β 2β
α σ2
Observe that when t → ∞, the E(r(t)) → while Var(r(t)) → . Hence, in
β 2β
ασ2
this case, r(t) ∼ N , . Observe that the variance of the short rate converges
β 2β
to a finite value in contrast to the case of Brownian motion.
α
We now provide an interpretation of the term . Recall the Vasicek model SDE
β
α
(i) If r(t) = , then the drift (term associated with dt) is zero.
β
α
(ii)If r(t) > , then the drift is negative. The process will try to pull up r(t).
β
α
(iii)If r(t) < , then the drift is positive. The process will try to pull down r(t).
β
α
Hence the drift is always directed to , that may thus be interpreted as a long
β
run mean of the short rate r(t). The parameter β represents then the strength of
this mean reversion.
Let us see what happens to the distribution of r(t) when the volatility σ = 0.
In that case the Vasicek model SDE (11.6) reduces to
dr(t) = (α − βr(t))dt,
that is,
dr
α = βdt.
−r
β
The solution is
α α
r(t) = + r(0) − e−βt .
β β
α α
Again observe that when t → ∞, r(t) → from below when r(0) < and from
β β
α
above when r(0) > . Thereby depicting the mean reverting nature of the short
β
interest rate {r(t), t ≥ 0} process.
An obvious limit of this model is that r(t), having a normal distribution, can
always assume negative values with positive probability. Although practically, on
calibration of the Vasieck model, it is found that the probability of r(t) taking
on negative values is negligible, yet it could not be completely ruled out. The
Vasicek model is not realistic enough because the short interest rate r(t) < 0 with
a positive probability is ambiguous.
risk. It was introduced in 1985 as an extension of the Vasicek model. The dynamic
of the short interest rate process, {r(t), t ≥ 0}, is described by the following SDE.
dr(t) = (α − βr(t))dt + σ r(t)dW(t), (11.10)
where α, β, σ are positive constants, W(t) is a Wiener process which models the
random market risk factor. The parameter σ determines the volatility of the in-
terest rate.
Example 11.7.1 Assume a particular short interest rate follows the following
CIR model
dr(t) = 0.22(0.06 − r(t))dt + 0.45 r(t)dW(t).
At some particular time t, r(t) = 0.05, and then r(t) suddenly becomes 0.02. What
is the resulting change in the volatility?
Solution The volatility in the CIR model is σ r(t). For σ = 0.45 and r(t) = 0.05,
√ √
√ for r = 0.02,
the volatility is 0.45 0.05, while √ the volatility is 0.45 0.02. The
change in volatility is thus 0.45 0.02 − 0.443 0.05 = −0.09426.
Unlike the Vasicek model, the CIR model is not Gaussian and is therefore
considerably more difficult to analyze. Furthermore, it does not have the closed
form solution like the one (11.8) we have for the Vasicek model. But still we can
find the distribution of r(t). For this text we restrict ourselves to find only the first
two moments of r(t) by applying the Ito Lemma.
Let g(t, r) = eβt r. From Ito Lemma, we have
∂g ∂g 1 ∂2 g
dg = dt + dr + (dr)2
∂t ∂r 2 ∂2 r
= βeβt r(t)dt + eβt dr
= βeβt dt + eβt ((α − βr(t))dt + σ r(t)dW(t))
= αeβt dt + σ r(t)dW(t).
An equivalent integration form is
, t , t
βt βs
e r(t) = r(0) + α e ds + σ eβs r(s)dW(s)
0 0
, t
α
= r(0) + (eβt − 1) + σ eβs r(s)dW(s). (11.11)
β 0
Cox-Ingersoll-Ross Model 361
Though (11.11) looks very similar to the one we obtain in (11.8) for the Vasicek
model, yet there is a noticeable difference. Observe that the term r(s) appears
in the right hand side Ito integral in (11.11). Thus we are not able to obtain the
closed form expression for r(t). We need to restore to some numerical techniques
for computing r(t) value at given t.
Going back to our discussion, we compute the first moment, that is expectation
of r(t), from (11.11). Recall that the expectation of an Ito integral is zero, we get
α βt
E(eβt r(t)) = r(0) + e −1 , (11.12)
β
equivalently
α
E(r(t)) = r(0)e−βt + 1 − e−βt .
β
Note that the E(r(t)) in the CIR model is the same as the E(r(t)) in the Vasicek
model.
Next we wish to compute the variance Var(r(t)). For this, let h(t, r) = (g(t, r))2 =
e2βt r2 . Applying the Ito Lemma on h(t, r), we have
∂h ∂h 1 ∂2 h
dh = dt + dr + (dr)2
∂t ∂r 2 ∂2 r
= 2βe2βt r2 (t)dt + 2e2βt r(t)dr + e2βt (dr)2
= 2βe2βt r2 (t)dt + 2e2βt r(t)((α − βr(t))dt + σ r(t)dW(t)) + e2βt σ2 r(t)dt,
where the last term in the above equality follows on account of dW(t)dW(t) =
dt, dW(t)dt = 0, dtdt = 0. Simplifying the above relation, we get
Taking expectation, and using the fact that the expectation of the Ito integral
is zero, we get
362 Financial Mathematics: An Introduction
, t
E(h(t, r)) = r (0) + (2α + σ )
2 2
eβs E(g(s, r))ds
0
, t
α βs
= r (0) + (2α + σ )
2 2
eβs (r(0) + (e − 1))ds
0 β
2α + σ2
α (2α + σ2 )α 2βt
= r2 (0) + (r(0) − )(eβt − 1) + (e − 1).
β β 2β2
We have used (11.12) for E(g(s, r)) in the second last equality. Therefore
Example 11.8.1 Find the value of a ZCB when the short rate is governed by the
SDE dr = 0.
Solution As discussed above, the value of the ZCB B(t, T) = g(t, r), is the solution
of (11.15). Since dr = 0 implies r is a constant, (11.15) reduces to gt = rg. A
solution is given by
ln(g(t, r)) = c + rt,
where c is a constant. Using the boundary condition g(T, r) = 1 yields c = −rT.
Thus, B(t, T) = e−r(T−t) , which is in agreement with our knowledge about bond
value when interest rate is a constant.
The next example is a particular case of Ho-Lee model in which the short term
interest rate can be described by the SDE
(,
dr = θ(t)dt + σdW
for some appropriate function θ(t). Comparing the above expression with (11.14),
we have δ(t, r(t)) = θ(t) and γ(t, r(t)) = σ.
Example 11.8.2 Find the value of a ZCB when the short rate is governed by the
SDE
(
dr = adt + σdW, a is a constant.
Solution The given SDE is similar to the Ho-Lee model SDE with θ(t) = a. Thus,
δ(t, r(t)) = a and γ(t, r(t)) = σ. The bond value of a ZCB is B(t, T) = g(t, r) is a
solution of (11.15); which, in the considered case, becomes
1
gt + agr + σ2 grr = rg.
2
Assume that g(t, r) = e−r(T−t) A(t, T) is a solution of the above PDE. Then, by
means of classical calculus, we have
1
(rA + At ) − aA(T − t) + σ2 A(T − t)2 = rA,
2
that is,
1
dlnA = ((T − t)a − (T − t)2 σ2 )dt.
2
Integrating and thereafter using the boundary condition that B(T, T) = g(T, r) =
A(T, T) = 1, we get the constant of integration zero. Consequently,
Partial Differential Equation 365
(T − t)2 1
lnA(t, T) = − a + (T − t)3 σ2 .
2 6
We thus have an explicit formula for B(t, T) = e−r(T−t) A(t, T), where A(t, T) can be
computed using an immediate above expression.
We now go back to the interest rate models studied in sections 11.6 and 11.7.
Recall the Vasicek model. We want to find the bond price of a ZCB with face
value 1 and maturity time T when the dynamics of the market spot rate r(t) is
governed by the Vasicek model SDE (11.6). For this we need to solve the PDE
(11.15) under Vasicek model. Comparing (11.6) with (11.13), we have, δ(t, r(t)) =
α − βr(t) and γ(t, r(t)) = σ. We assume that a solution of (11.15) is of the form
Note that, for notational convenience, we have avoided writing the variables in all
functions. Thus, from (11.15), we get
1
(At − rCt − (α − βr)C + σ2 C2 − r)g = 0.
2
For this to hold for all values of r, we must have
Ct = βC − 1
1
At = αC − σ2 C2 .
2
Also, the terminal condition g(T, r) = 1 in (11.15) reduces to A(T, T) = 0, C(T, T) =
0. It is easy to solve the first equation to get
1
C(t, T) = + keβt .
β
e−βT
Using the terminal condition C(T, T) = 0, we get k = − . Thus,
β
366 Financial Mathematics: An Introduction
1
C(t, T) = (1 − e−β(T−t) ).
β
Substituting the function C(t, T) in the second equation and thereafter solving
the resultant first order PDE with terminal condition A(T, T) = 0, we can get the
function A(t, T). After some work (which we leave for the readers to complete) we
can see that
( 1 σ2 − αβ)((T − t) − C) σ2 C2
A(t, T) = 2 − .
β2 4β
Once the bond price B(t, T) is known then the bond yield Y(t, T) can be determined
by using
1
Y(t, T) = − ln(B(t, T))
T−t
1
= (r(t)C(t, T) − A(t, T)).
T−t
Next, suppose we wish to find the bond price B(t, T) when the spot rate r(t)
follows the dynamics of the CIR model. Recall the CIR model SDE (11.10). Com-
paring it with (11.13), we have, δ(t, r(t)) = α − βr(t) and γ(t, r(t)) = σ r(t).
Again we assume that the solution of the PDE (11.15) is of the form
g(t, r) = e−rC(t,T)+A(t,T) , t ∈ [0, T],
where A(t, T) and C(t, T) are non-random functions to be determined. In CIR
model case, the PDE (11.15) becomes
1
(At − αC + (−Ct + βC + σ2 C2 − 1)r)g = 0.
2
The above equation to hold for all r is possible only when the following hold.
1
Ct = σ2 C2 + βC − 1
2
At = αC.
The first of the two equations is a Riccati equation and can easily be solved by
a standard technique to get C(t, T). Once through with it, we can use the C(t, T)
in the second equation to solve it for A(t, T). In this solution procedure we also
need to use the terminal conditions A(T, T) = 0 and C(T, T) = 0. Moreover, the
1
bond yield is given by Y(t, T) = (r(t)C(t, T) − A(t, T)).
T−t
For both the Vasicek model and the CIR model, the ZCB yield is an affine
function of short rate r(t). Such models are therefore also called the affine yield
models.
Summary and Additional Notes 367
discussed in Sections 11.6-11.8, there are several other term structure models
studied in the literature under different set ups. Prominent among them are the
binomial lattice model of Black-Derman-Toy [13] and a continuous-time Heath-
Jarrow-Morton (HJM) model [61]. The HJM model evolves the yield curve in
terms of the forward rates. This model is significant so much so that every term
structure model driven by the Brownian motion is the HJM model including
the Vasicek model and the CIR model. For details on how the model dynamics
and its evaluation can be described we urge the interested readers to go through
Chapter 10 in [121]. The continuous-time models of Black-Dermon-Toy and the
Black-Karasinski [14] instead assume that lnr(t) is normally distributed. The
theory of one-factor models has also been extended to two-factor model and
multi-factor models which involve several factors, like short term rate, long
term rate, etc. The main difficulties in the latter models lies in their calibra-
tion which requires lot of data to determine all the parameters in the model
under consideration. There are several other models, like Heston model [62],
which assume the stochastic volatility, besides stochastic short-term returns,
governed by two SDEs. Certain studies are available on www which try to cre-
ate appropriate model for a specific market of a specific country. The literature
in the context is extremely vast to be quoted here but we encourage readers to
take a look at some excellent texts and web pages on interest rate models.
• Another significant point to bring forth is the calibration of the interest rate
models. Calibration of interest rate models under the risk neutral measure
typically entails the availability of some derivatives such as swaps, caps or
swaptions. The primary tool used to estimate the parameters in a system of
stochastic differential equations are Bayesian method, Cholesky decomposition,
principal component analysis, Monte Carlo simulation, various regression tech-
niques, interpolations and splines, generating function, and machine learning.
We refer to few texts [69, 142, 126] in this regard.
11.10 Exercises
Exercise 11.1 Find the spot rate for a 1-year ZCB trading at Rs 92 with a face
value of Rs 100.
Exercise 11.2 A 1-year ZCB with face value Rs 100 is currently selling for Rs
85. What is the interest rate after 6 months if the investment for 6 months in zero
coupon bonds gives a continuously compounding annual return of 12%.
(Hint: Find y(0.5) using B(0, 1) er = Fe−y(0.5)/12 ).
Exercises 369
Exercise 11.3 A coupon bond with a face value Rs 100 makes coupon payments
of Rs 1.50 every three months. What is the coupon rate?
Exercise 11.4 What asset is consider to be riskier, a 3-month ZCB bond or a
30-year coupon bond maturing in 3 months?
Exercise 11.5 If the 6-month spot rate is 3% and the 1-year spot rate is 5%, then
find the price of a 1-year bond with a 8% annual coupon rate payable semiannually
and a face value Rs 100.
(Hint: Use B(0, 1) = Fce−r1 + (Fc + F)e−2r2 and note c = 0.04).
Exercise 11.6 You intend to purchase a 10-year, Rs 100 face value bond that
pays Rs 5 coupon every 6 months. If the required continuous compounding return
is 5% per annum, how much should you be willing to pay for the bond ?
(Hint: Example 11.2.6 with coupon value).
Exercise 11.7 Compute the price of a 5-year coupon bond with an 6% annual
coupon payable semiannually, the annual bond yield is 8%, and the face value of
bond is Rs 100.
(Hint: Example 11.3.4 with n = 10, c = 0.03, Ym = 0.04).
Exercise 11.8 What is the yield to maturity on a 10-year ZCB with a face value
of Rs 100 that is selling for Rs 65? (Hint: Use B(0, T) = Fe−TYm to compute YTM
Ym ).
Exercise 11.9 A bond with 20 years remaining to maturity is selling for Rs 120
has a 9% annual coupon paid semiannually. If the face value of the bond is Rs
100, calculate the bond yield to maturity.
Exercise 11.10 Calculate the YTM on a 1-year coupon bond with annual coupon
Rs 10 and the face value Rs 100. The bond is selling for Rs 95.
(Hint: Example 11.3.5 with coupon rate replace by coupon value).
Exercise 11.11 Which among the following security has the higher YTM?
(a)A 1-year ZCB with a face value of Rs 100 that is selling at Rs 90.
(b)A 10-year bond with a 5% coupon and a face value of Rs 100 that is selling at
par.
(Note: A bond selling at par means that it is selling at full face value).
Exercise 11.12 Compute the forward rate when the 6-month spot rate is 3%, and
the 1-year spot rate is 5%.
370 Financial Mathematics: An Introduction
Exercise 11.13 A 3-month ZCB is selling for Rs 95, a 6-month ZCB is selling
for Rs 92, a 9-month ZCB is selling for Rs 88 and a 1-year ZCB selling for Rs
85. All four bonds have a face value of Rs 100. Compute the spot rates for the
four periods. Also, compute the second, third and fourth periods forward rates.
(Hint: Use Definition 11.3.1 with F = 100 and then (11.3)).
Exercise 11.14 For ZCBs, each bond with face value Rs 1, the following bond
prices per annum are observed.
B(0,1) B(0,2) B(0,3) B(0,4) B(0,5)
0.9654 0.9173 0.8735 0.8115 0.7855
For each maturity year, compute the bond yields and the 1-year implied forward
rate.
Exercise 11.15 The ZCB continuously compounded annual yields are observed
as follows.
Y(0,1) Y(0,2) Y(0,3) Y(0,4) Y(0,5)
0.03 0.035 0.04 0.045 0.05
For each maturity year, compute the ZCB prices and and the 1-year implied for-
ward rate. What are the simple rate annual yields on the bonds?
(Hint: Simple rate annual yield y(0, T) should be such that 1 + y(0, T) = eY(0,T) ).
Exercise 11.16 Suppose a loan of Rs 100000 will be taken 3 months from now
(date of borrow). It is expected that the loan will be repaid 1 year from now (repay-
ment date). An FRA is designed with guaranteed 6% annual simple interest rate
for 1 year on Rs 100000. The actual interest rate is 4% on the date of borrow.
Determine the settlement of the FRA if the settlement occurs on the date the loan
is (i) borrowed, (ii) repaid.
(Hint: If the FRA is settled at the time the money is borrowed, payments will be
less than when the same is settled on the date of repayment because the borrower
has time to earn interest on the FRA settlement.)
Exercise 11.17 Suppose that the risk neutral probabilities are equal to 0.5 in
every state. If the short-term rates (in percentage per month) are described by the
following table, then find the prices of a 3-month ZCB with face value Rs 100 (take
a one month step).
0.0101
0.0098 0.0091
0.0095 0.0088 0.0081
Exercises 371
Exercise 11.18 A discrete-time model is used to model both the price of a non-
dividend paying stock and the short-term interest rate. The stock is selling for Rs
100 and the annual simple interest rate is 5%. After 1 year, only two states, upstate
and downstate, are observed in the economy. The stock prices are Rs 110 (upstate)
and Rs 95 (downstate) and the annual short interest rates are 6% (upstate) and
4% (downstate). Find the price of 2-year ZCB with face value Rs 100.
(Hint: Compute risk neutral probability measure p∗ using 1 period binomial lattice
model for stock. Find B(0, 2) using B(0, 2) = 1+r
1
(p∗ Bu (1, 2) + (1 − p∗ )Bd (1, 2)).)
Exercise 11.19 Compute the spot rate binomial lattice model for 6 months with
upward and downward parameters 1.25 and 0.9 respectively. Using this lattice,
compute the price of a 4-months ZCB with face value 100. A European call option
is written on the above bond. If the strike price of the call is Rs 85 and expiration
time 3 months then find the price of the call.
Exercise 11.20 An American put option is written on the 3-year ZCB with face
value Rs 100 governed by the following short rate lattice.
0.08
0.064 0.059
0.052 0.047 0.041
0.042 0.038 0.032 0.026
If the strike price of put is Rs 110 and expiration 2 years, then calculate the put
price.
Exercise 11.21 Suppose the following market data for ZCBs with a maturity
payoff of Rs 1 is given.
Calibrated the data on a 2-period interest rate binomial lattice with upstate annual
2σ
interest re and downstate annual interest r. Calculate r.
1 1
Note : B(0, 2) = B(0, 1) + .
2(1 + ru ) 2(1 + rd )
Exercise 11.22 Given a current short rate of 10%, upward and downward param-
eters of u = 1.15 and d = 0.95. Generate a 3-period binomial lattice of short-term
rate. Using this lattice, determine the value of a 2-period coupon bond with face
372 Financial Mathematics: An Introduction
value Rs 100. The coupons are paid at every period at 4% on the bond face value.
(Hint: Take B(0, 3) = 104 for all four nodes at third period. Move backward and
at each preceding node add a coupon of 4 to the node value to get the actual value
(bond price) of that node).
Exercise 11.23 Suppose that the short rate process {r(t), t ≥ 0} follows the fol-
lowing SDE
dr = θdt + βdW,
where θ > 0 and β > 0 are constants. Is the process mean reverting? Justify.
Exercise 11.24 Suppose the short rate process {r(t), t ≥ 0} follows the Vasicek
interest rate model with α = 0.01, β = 0.1, σ = 0.02, and the short rate is 10%.
Find the 2-year ZCB price of face value Rs 1. What is the bond yield?
Exercise 11.25 Suppose that the short rate is currently 3% and its volatility mea-
sure is 0.9% per annum. What happens to the volatility measure when the short
rate increases to 5.5% in (i) Vasicek’s model, (ii) the CIR model?
12
Optimal Trading Strategies
12.1 Introduction
The concept of portfolio optimization and diversification has played a key role in
the development and understanding of financial markets. The major breakthrough
came in 1952 with the publication of Harry Markowitz’s theory of portfolio se-
lection [90]. The theory, popularly referred to as mean-variance portfolio theory,
provided an answer to investors fundamental question: How should an investor
allocate funds among the possible investment choices? Thus, the major interest
for the investor is to have a balance between the total risk of the portfolio and its
expected return. Investors generally set their priority in terms of minimizing total
risk and maximizing return of the portfolio. On the lines of Markowitz’s, Sharpe
[119] proposed capital asset pricing model (CAPM) which introduces the notions
of systematic risk and specific risk. We have already studied Markowitz’s theory
and CAPM in Chapter 5. An interesting question which every investor faces apart
from holding the shares is about how to acquire them. This is where optimal trad-
ing strategy enters into the picture. The goal of optimal trading strategies is to
formulate a mathematical approach which tries to answer some of those questions
that arise during the phase of implementation of investments.
With the established positions, fund managers or the institutional investors
need to re-balance their portfolios frequently, either to include new stock picks,
sell stocks that are out of favor, or to improve the risk/reward characteristics of the
portfolio. This generates huge orders that must be executed in a fixed time horizon.
The execution costs associated with such orders can be substantial. Numerous
studies have shown that these costs typically comprise the largest quantity of the
fund tracking error. This is hardly a problem for an individual investor as trading
volumes are normally small and he/she hardly worry about the execution costs in
acquiring or selling out of a portfolio. On the other hand, the quantities traded
374 Financial Mathematics: An Introduction
Taxes
Commissions
Investment Delays
Market Impact
average execution price. It causes investors to pay premiums to complete the buy
orders and provides discounts to complete the sell orders. Market impact is caused
by two primary reasons. These are (i) Supply Demand Imbalance (liquidity needs)
and (ii) Information Leakage.
In efficient market, where the price of every security fully reflects all available
information and hence is equal to its true investment value, prices adjust contin-
uously to ensure that buying demand equals selling demand. As investors seek
to buy shares, they are required to raise their price to attract additional sellers
into the market. Further, due to the immediacy needs, investors often have orders
that are larger than the quantity of shares available at the best market quote.
To achieve immediate execution (market order), it is often necessary to eat into
the limit order book, making each successive transaction more expensive. This
shift in price of shares is temporary and market soon returns to equilibrium posi-
tion where demand equals supply. This type of market impact is called temporary
market impact.
Every time an order is released to market, it conveys information regarding the
investment and trading intentions of investors which in turn change the beliefs of
other investors for a long time. This causes the market to believe that the future
376 Financial Mathematics: An Introduction
prices will be different than originally expected or there is a change in the stock’s
intrinsic value. This brings a quick price adjustment causing a jump or drop in
the price that remains permanent. This type of market impact is called permanent
market impact.
A portfolio manager while minimizing market impact may increase timing risk
(market risk) of his transaction. To explain clearly, take a simple example of
buying a stock whose price in the market is currently Rs 100. If we need to buy
a large quantity of this stock, we can place a market order to immediately buy
the stock. Then we might drive the price of the stock up so that the average
cost of buying is say Rs 102 or Rs 103. An obvious preventative step to reduce
temporary as well as permanent market impact is to trade more slowly, i.e. break
the large trade into a number of small trades and trade over a longer interval.
In this case, market risk exists where the price of the stock might move in the
opposite direction (increases) due to trades by other market participants in the
same interval. Thus, trading strategy of an investor is the decision of how fast does
one want to trade and it depends on the balance one wishes to strike between the
execution cost (due to market impact) and the market risk.
Thus, trading for institutional investors is not about making profits. It is about
rapid execution with minimal execution costs. Therefore, the aim of the ‘optimal
trading strategies’ is to formulate a mathematical approach to address the ques-
tions and issues that arise during the implementation phase of the investment
cycle- like: How do we estimate trading cost? How long should execution take?
How should the order be sliced? How do we choose an alternate strategy like to
trade aggressively or passively? etc. To begin with, in the following, we describe
the types of the order available to the traders while trading large/small orders in
the market.
Orders are trade instructions. They specify what traders want to trade, whether
to buy or sell, how much, when and how to trade, and, most important, on what
terms. Thus, orders are the fundamental building blocks of the trading strategies.
To trade effectively, a trader must specify exactly what he/she wants. An order
submission strategy is the most important determinant of success of a trader.
Therefore, the proper order used at the right time can make the difference between
a good trade, a costly trade and no trade at all [59].
Traders indicate their willingness to buy or sell by making bids or offers re-
spectively. Traders quote their bids and offers when they arrange their own trades.
Otherwise they use orders to convey their bids and offers to the brokers or auto-
mated trading systems that arrange their trades.
The highest bid price in a market is the best bid. The lowest offer price is the
Various Types of Orders in the Market 377
best offer (or, equivalently best ask). Traders also call them the market bid and
the market offer because they are the best prices available in the market. The
prices at which orders fill are trade prices. The difference between the best bid
and the best ask is the bid/ask spread.
A market is said to be liquid when traders can trade without significant adverse
affect on execution price. An order offers liquidity or equivalently supplies liquidity
if it gives an opportunity to the other traders to trade large size quickly at a low
cost. Both buyers and sellers can offer liquidity. Buyers offer liquidity when their
bids give other traders opportunity to sell, sellers offers liquidity when their offers
give other traders opportunity to buy.
Markets and traders treat orders differently, depending on whether they are
agency orders or proprietary orders. Agency orders are orders that brokers repre-
sent as agents for their clients. Proprietary trading (also “prop trading”) occurs
when a firm trades stocks, bonds, currencies, commodities, their derivatives, or
other financial instruments, with the firm’s own money as opposed to its cus-
tomers’ money, so as to make a profit for itself.
In the subsequent section we will define various types of orders that traders
send to the market once they decide upon how much to execute, their time limit,
and the side of the trade (i.e. buy or sell).
Liquidity demanders place market orders and liquidity suppliers place limit or-
ders. For a round trip (a purchase and sale together) the liquidity demander pays
the spread and the liquidity supplier earns the spread. The size of the bid-offer
spread in a security is one measure of the liquidity of the market and of the size
of the transaction cost.
Large market orders are more difficult to execute than smaller ones. Traders
willing to take the other side of a large trade are often hard to find market order.
The reason could be due to possibility of informed trading. However to attract
buyers (or sellers) impatient trader often move prices. Large buyers increase bid
prices of their order to encourage sellers to sell to them and vice versa. The
premiums that large buyers pay and the discounts that large sellers offer are price
concessions.
Let us explain the market impact using an example. Let us assume that the
currently available trades in the market are 500 shares at level first with ask
price Rs 100, and 300 shares at level second with ask price Rs 101. Now trader
wants to buy 700 shares. Therefore he/she has to look towards the ask side of the
market i.e. if sellers are available or not. Hence he will buy 500 shares at Rs 100
and rest 200 shares at Rs 101, therefore the impact price at which he will buy is
(500 × 100 + 200 × 101)/700 = 100.2857. Similarly, if in the market 300 shares are
available at first level with bid price Rs 99, further if trader buys 200 shares at
Rs 100 and sell 200 shares at Rs 99 then he has paid Rs 1 as bid ask spread for
the respective trade.
The price, at which market orders trade, depend on current market conditions.
Since market conditions can change quickly, traders who use market orders are at
risk of trading at worse prices than what they expected. This risk in literature is
termed as execution price uncertainty. Execution price uncertainty is due to quote
changes that may occur between the submission of an order and its execution, and
to the unpredictable price concessions that may be required to fill large orders.
Thus, those traders who are concerned about the execution price risk may prefer
to submit limit orders.
A limit order is an instruction to trade at best price available, but only if it
is no worse than the limit price specified by the trader. For buy orders, the trade
price must be at or below the limit price. For sell orders, the price must be at or
above limit price.
In continuously trading markets, a broker (or a exchange) will attempt to trade
a newly submitted limit order as soon as it arrives. If no trader is immediately
willing to take the opposite side at an acceptable price, the order will not be
traded. Instead, it will stand as an offer to trade until someone is willing to trade
Various Types of Orders in the Market 379
at its limit price, until it expires, or until the trader who submitted it cancels it.
Standing limit orders are placed in a file called a limit order book.
The probability that a limit order will trade depends on its limit price. If the
limit price of a buy order is too low, the order will not trade. Likewise, if a sell
limit price is too high, the order will not trade. Buy limit orders with high prices
and sell limit orders with low prices are aggressively priced. Aggressively prices
limit orders are easiest limit orders to fill.
Traders classify limit orders with limit prices at which they are placed relative
to the market. The market is the range of prices bounded above by the best offer
(lowest price) and below by the best bid (highest price). A marketable limit order
is an order that the broker can execute immediately when a trader submits it. The
limit price of a marketable limit buy order is at or above the best offer. The broker
therefore can manage to buy immediately from the seller quoting the best offer.
Moreover, marketable limit orders are like market orders, except that they limit
the price concessions that brokers can make to fill them. Marketable limit orders
with very high limit buy prices or very low sell prices are essentially market orders.
Traders use marketable limit orders instead of market orders to limit execution
price uncertainty and to limit what they will pay for liquidity.
Limit buy orders that stand at the best bid, and limit sell orders that stand at
the best offer, are at the market. The traders who submit these orders make the
market. To summarize, marketable limit orders are the most aggressively prices
limit orders. Traders who submit standing limit orders offer liquidity to the other
traders. Their limit orders give others the opportunity to trade when they want
to trade. In particular, sell limit orders are call options that give other traders
an opportunity to buy when they want to buy. Buy limit orders, likewise are put
options that give other traders opportunity to sell when they want to sell. The
option strike price is limit price.
The other type of orders are stop orders. A stop instruction stops an order from
executing until price reaches a stop price specified by the trader. Traders attach
stop instructions to their orders when they want to buy only after price rises to
the stop price or sell only after price falls to the stop price. Orders with the stop
instructions are called stop orders.
Traders most commonly use stop orders to stop their losses when prices move
against their positions. For example, suppose that a trader buys 100 sugar future
contracts at Rs 100 each. To limit the potential loss on this position, he/she may
issue a market sell order for 100 contracts with a stop price of Rs 90. If the sugar
drops to or below Rs 90, his/her broker will immediately try to sell 100 contracts
at the best price then available in the market. Traders often calls such orders stop
380 Financial Mathematics: An Introduction
loss orders. The price at which a stop order executes may not be the stop price.
In the above example, if sugar falls quickly from Rs 97 to Rs 88, his/her broker
may be able to sell the 100 contracts at Rs 87.5.
When a trader attach a stop instruction to a limit order, they must specify
two prices. The stop price indicates when the limit order becomes active, and the
limit price indicates the terms upon which a trade may be arranged. The combined
order is stop limit order. With stop limit order, trader do not need to monitor the
market, and thus are free to attend to other business.
Stop orders accelerate price changes. Prices often change because traders on
one side of the market demand more liquidity than what is available. When these
price changes activate the stop orders, it contributes to the one-sided demands
for liquidity. Stop orders accelerate price changes by adding buying pressure when
prices are rising and selling pressure when prices are falling. They demand liquidity
when it is least available. Traders claim that stop orders add momentum to the
market. Traders who pursue momentum trading strategies buy when prices are
rising and sell when price are falling. They basically take the advantage of stop
orders to create momentum in the market. On the other hand, contrarian traders
employ the opposite trading strategy. They buy when prices are falling and sell
when prices are rising. They therefore stabilize prices when they trade.
A trailing stop order is entered with a stop parameter that creates a moving or
trailing activation price, hence the name. This parameter is entered as a percentage
change or actual specific amount of rise (or fall) in the security price. Trailing stop
sell orders are used to maximize and protect profit as a stock’s price rises and limit
losses when its price falls. Trailing stop buy orders are used to maximize profit
when a stock’s price is falling and limit losses when it is rising.
For example, a trader has bought stock ABC at Rs 10 and immediately places
a trailing stop sell order to sell ABC with a Rs 1 trailing stop. This sets the stop
price to Rs 9. After placing the order, ABC doesn’t exceed Rs10.00 and falls to
a low of Rs 9.01. The trailing stop order is not executed because ABC has not
fallen Rs 1 from Rs 10. Later, the stock rises to a high of Rs 15 which resets the
stop price to Rs 14. It then falls to Rs 14 (Rs 1 from its high of Rs 15) and the
trailing stop sell order is entered as a market order.
Another type of order is good till cancelled order. A good-till-cancelled (GTC)
order is an order to buy or sell a security at a specific or limit price that lasts
until the order is completed or cancelled. A GTC order will not be executed until
the limit price has been reached, regardless of how many days or weeks it might
take. Investors often use GTC orders to set a limit price that is far away from the
current market price. Some brokerage firms may limit the time a GTC order can
Execution Models 381
remain in effect and may charge more for executing this type of order. An uptick is
when the last (non-zero) price change is positive, and a downtick is when the last
(non-zero) price change is negative. Any tick sensitive instruction can be entered
at the trader’s option, for example buy on downtick, although these orders are
rare.
So far in this section, we have discussed various types of orders that traders
execute. In making the decision of how much to trade depends upon the market
conditions like price and volume available in the market. Thus traders generally
use technical indicators to calculate the price and volume change in the previous
trading interval using historical data to forecast for the next trading interval. Now
a days traders prefer to use algorithmic trading strategies that learn the market
conditions and based on the favorable trading rules/circumstances schedule the
trade for the current trading interval. The traders while trading large blocks of
trade face market impact on their execution price.
In the following sections, we describe some models which provide quantitative
trading strategies that are helpful to the traders to minimize the market impact.
These models are refered as execution models.
difference between the theoretical benchmark price and the actual price received
is the implementation shortfall (Perold [107]).
The first quantitative description of estimating and minimizing market impact
has been given by Bertsimas and Lo [11] in 1998. Here, given a fixed block S̄
of shares to be executed (acquired/liquidated) within a fixed time interval [0,T],
and given price dynamics that captures price impact, the objective is to find an
optimal sequence of trades (as a function of the state variables) that will minimize
the expected cost of executing S̄ shares within time T. For this, Bertsimas and
Lo [11] considered various price processes and employed dynamic programming
technique to develop optimal trading strategies for a risk neutral investor. We will
discuss their models and also limitations of these models.
Another major contribution in this area is due to Almgren and Chriss [2],
who observed that market impact and timing risk associated with the execution
of trades are conflicting concepts and hence it is not possible to simultaneously
minimize both. They proposed a mean variance based objective function for their
model and obtained the static strategies where the optimal trading path was
determined in advance of trading. Further, a model is called static if there is no
serial correlation among prices, or no change in the market perception of the price
of the stock. In general, we expect the optimal strategy to be dynamic because
optimal trading path cannot be determined priori of trading, as one needs to
update the optimal strategy based on observed state variables at each time point.
We shall not discuss the details of Almgren and Chriss’ model here and shall refer
to [2] for further study in this regard. However, taking motivation from [2], we
shall incorporate variance terms in the orginal models of Bertsimas and Lo [11]
and present a static approximation technique for the same (Khemchandani et al.
[76]).
The problem statement for obtaining optimal strategy that minimizes the ex-
ecution cost can be formulated mathematically as follows: Consider an investor
who wishes to buy a large block S̄ of shares of some stock over a fixed time in-
terval [0, T]. We divide the interval [0, T] into N subintervals of length τ = T/N
and define the discrete times tk = kτ, for k = 0, 1, . . . , N. Thus t0 = 0 and tN = T.
Here it is understood that the trading begins from the first period. Thus at t1 , S1
shares are purchased at price P1 . In general, at tk , Sk shares are purchased at price
Pk , k = 1, . . . , N. Let Wk be the number of shares that remain to be purchased at
time tk . The investor’s objective is to minimize expected cost of buying S̄ shares
for a given level of risk, say V ∗ . If we use variance of cost of buying S̄ shares as a
measure of timing risk [2], [64], then investor’s problem is represented as follows.
Execution Models 383
*
N
Min E1 k=1 Pk Sk
{Sk }
subject to
*
Var1 N
k=1Pk Sk ≤ V ∗
*N
k=1 Sk = S̄
Sk ≥ 0 (k = 1, 2, . . . , N).
Here E1 and Var1 respectively represent the expected cost and risk which
trader will face at the starting of the interval assuming that he/she has traded
{S1 , S2 , . . . , SN } shares in the intervals {T1 , T2 , . . . , TN } respectively. The constraint
Sk ≥ 0 implies shares purchased while Sk ≤ 0 implies shares sold. The problem
to sell S̄ shares in time [0, T] is a symmetric problem. We need to maximize the
objective function, i.e. maximize the revenue generated by liquidating S̄ shares.
We can reformulate the above problem as
N
N
Min E1 Pk Sk + λVar1 Pk Sk
{Sk }
k=1 k=1
subject to
*N
k=1 Sk = S̄
(12.1)
Sk ≥ 0 (k = 1, 2, . . . , N),
This model is same as the basic model of Bertsimas and Lo [11] who derived the
optimal solution of (12.1) for risk neutral investor under price process (12.2), using
dynamic programming. The basic requirements for any dynamic programming
problem are the state of the environment at time tk , the control variables, the
randomness, the objective function, and the law of motion. In our context, the
state at time tk , for k = 1, . . . , N consists of the price Pk−1 realized at the previous
period, and Wk , the number of shares remain to be purchased at time tk . The state
variables give all the information the investor requires in each period tk to make
decision regarding the control. The control variable at time tk is the number of
shares Sk purchased. The objective function is given by
N
Min E1 Pk Sk ,
{Sk }
k=1
state and control are, the remaining decisions must constitute an optimal policy
with regard to the state resulting from the first decision”. Therefore for every tk ,
k = 1, . . . , N, the sequence {S∗k , S∗k+1 , . . . , S∗N } must still be optimal for the remaining
*
program Vk (Pk−1 , Wk ) = Et [ N i=k Pi Si ]. This property is summarized by the Bellman
equation, which relates the optimal value of the objective function in period tk to
its optimal value in period tk+1
By starting at the end (time tN ) and applying the Bellman equation (12.4)
and the law of motion for Pk (12.2) and Wk (12.3) recursively, the optimal value
function can be derived as a function of the state variables. The Bellman equation
(12.4) for time tN is given by
Since this is the last period and WN+1 = 0 by (12.3), there is no choice but to
execute the entire remaining order WN , hence the optimal trade size S∗N = WN . In
the next to last period tN−1 , the Bellman equation is
Substituting the law of motion for PN−1 from (12.2) and WN from (12.3), we get
Using the fact that EN−1 [εN−1 ] = 0, and using the right-hand side of (12.5), the
above equation can be expressed as explicit function of SN−1 as
This can be minimized by taking its derivative with respect to SN−1 , and solving
for its zero, yielding
S∗N−1 = WN−1 /2,
3
VN−1 (PN−2 , WN−1 ) = WN−1 PN−2 + θWN−1 .
4
Continuing through backward recursion, we get
386 Financial Mathematics: An Introduction
S∗1 = W1 /N,
N+1
V1 (P0 , W1 ) = W1 P0 + θW1 .
2N
Substituting the initial condition W1 = S̄ from (12.3) into above equation gives
The optimal execution strategy is simply to divide the total order S̄ into N equal
parts and trade them at regular intervals. This strategy is called a “naive” strategy.
This simple trading strategy comes from the fact that the price impact θSk does
not depend on either the prevailing price Pk−1 or the size of the unexecuted order
Wk . Hence, the price impact function is same in each period and independent from
one period to the next.
static models, Huberman and Stanzl [64] have shown that an equivalent dynamic
program exists and one can find the optimal strategy for risk averse investors. We
used the same result to derive the optimal strategy for risk averse investor for
basic model of Bertsimas and Lo.
For a risk averse investor, the objective function is given by
N
N
Min E1 Pk Sk + λVar1 Pk Sk .
{Sk }
k=1 k=1
The law of motion for state variables Pk and Wk are as before in (12.2) and
(12.3) respectively. The Bellman equation for a risk averse investor is given by
!
Vk (Pk−1 , Wk ) = Min Ek [Pk Sk + Vk+1 (Pk , Wk+1 )] + λVark [Pk Sk + Vk+1 (Pk , Wk+1 )] .
{Sk }
(12.10)
As before, the Bellman equation at time tN is given by
!
VN (PN−1 , WN ) = Min EN [PN SN ] + λVarN [PN SN ]
{SN }
Substituting the law of motion for PN−1 from (12.2) and WN from (12.3) and
simplifying, we get a quadratic function explicit in SN−1 which can be minimized
by solving for zero of its derivative as before. The best execution strategy and
optimal value function can be obtained by recursively solving. Here, they are
given by
S∗T−k = bk WT−k ,
VT−k = PT−k−1 WT−k + (ak + λσ2 )WT−k
2
,
for k = 0, 1, . . . , T − 1, where
θ
ak = θ 1 − , a0 = θ, (12.11)
4(ak−1 + λσ2 )
388 Financial Mathematics: An Introduction
θ
bk = 1 − , b0 = 1. (12.12)
2(ak−1 + λσ2 )
The optimal strategies for risk neutral, risk averse and risk seeking investors
are compared using a numerical example. The results are shown in Fig 12.2 and
Fig 12.3. Risk averse investor follows an aggressive strategy trading heavily in the
initial phases. Risk seeking investor on the other hand follows passive strategy,
waiting for the price to fall and trades at the end. Risk-neutral investor follows
the naive strategy.
9000
8000
7000
6000
No. of Shares traded (S)
5000
4000
3000
2000
λ=0
1000
λ=1e−5
λ=−1e−5
0
0 2 4 6 8 10 12 14 16 18 20
Time Interval (T)
Fig. 12.2. Comparison of strategies for risk neutral, risk averse and risk seeking investors for Model 1.
S represents the number of shares acquired in each period. The black bold curve corresponds to a risk
neutral investor (λ = 0). The black dash curve corresponds to risk averse investor (λ = 1 × 10−5 ), while
the black dotted curve corresponds to risk seeking investor (λ = −1 × 10−5 ).
Dynamic Models 389
4
x 10
10
λ=−1e−5
9
λ=1e−5
8 λ=0
No. of Shares traded (S)
7
0
0 5 10 15 20 25
Time Interval (T)
Fig. 12.3. Comparison of strategies for risk neutral, risk averse and risk seeking investors for Model
1. We represents the number of unexecuted shares. The black bold curve corresponds to a risk neutral
investor (λ = 0). The black dashed curve corresponds to risk averse investor (λ = 1 × 10−5 ), while the
black dotted curve corresponds to risk seeking investor (λ = −1 × 10−5 ).
where δk is the coefficient of XN−k which can be computed recursively, see [11] for
details.
In contrast to the case of a linear price-impact function with no information,
the best execution strategy (12.15) varies over time as a linear function of the
remaining shares WN−k and the information variable XN−k . The first term of (12.15)
is simply the naive strategy of dividing the remaining shares WN−k at time N − k
evenly over the remaining k + 1 periods. The second term of the (12.15) is an
adjustment that arises from the presence of serially correlated information XN−k .
Note that the number of shares traded at time tk is a function of Xk , and therefore
the optimal strategy obtained is dynamic in nature.
If ρ = 0, the term δk vanishes then the naive strategy is an optimal solution.
If ρ > 0, then δk is positive, implying that the positive realizations of XN−k in-
creases the number of shares purchased at time tN−k . Similarly, if ρ < 0, positive
realizations of XN−k decreases the number of shares purchased at time tN−k as δk
is negative.
Though the strategy obtained in (12.15) is dynamic in nature, the above model
suffers from a number of drawbacks that are mentioned below.
• Prices Pk are assumed to follow arithmetic random walk in (12.13) implying a
positive probability for negative prices.
• Price impact and information have only permanent effect on prices whereas
several recent empirical studies suggest that some combination of permanent
and temporary impact exists in the market.
• The percentage price impact - as a percentage of the execution price - is a
decreasing function of the price level, which is counterfactual.
To overcome above limitations, Bertsimas and Lo [11] suggested the linear per-
centage temporary price impact model.
Dynamic Models 391
Pk = P̃k + ∆k .
The no impact price may be viewed as the price which would prevail in the
market in the absence of any market impact. To ensure that prices do not go
negative we assume geometric Brownian motion for the price dynamics P̃k and is
defined as
where Zk are i.i.d. normal random variables with mean uz and variance σ2z .
The price impact ∆k captures the effect of trade size Sk on the transaction price,
hence
Xk = ρXk−1 + ηk ,
where ηk is white noise with mean 0 and variance σ2η . We set Xk to be an AR(1)
(auto-regressive with lag 1) process. The parameters θ and γ measure the sensitiv-
ity of price impact to trade size and market conditions. The closed form solution
can be obtained using dynamic programming [11] as
where δwk , δxk and δ1k are fixed coefficients. We are skipping the details here but
interested readers are encourage to see [11].
The linear percentage temporary price impact model resolves a number of
problems discussed with the aforementioned models.
• First, P̃k is guaranteed to be non-negative and Pk is also guaranteed to be non
negative under mild restriction on ∆k .
• Second, by separating the transaction price Pk into a no-impact price compo-
nent P̃k and the impact component ∆k , the price impact of a trade is temporary,
moving the current transaction price but having no effect on future prices.
• Third, the percentage price impact increases linearly with the trade size.
392 Financial Mathematics: An Introduction
• Fourth, the linear percentage temporary price impact law of motion implies
a natural decomposition of execution costs, decoupling market microstructure
effects from price dynamics. The objective function can be separated into two
terms:
8
N N
N
Min E1 Pk Sk = Min E1 P̃k Sk + E1 ∆k Sk .
{Sk } {Sk }
k=1 k=1 k=1
The first term is the no-impact cost of execution and second term is the total
impact cost. This decomposition is precisely the one proposed by Perold [107]
in his definition of implementation shortfall, but now applied to executing S̄.
Further, the closed form solution obtained above does not impose non-negativity
constraint and it is possible that the solution given by optimal strategy sells the
the stock and buys it again in the time interval [0, T]. This limitation of dynamic
programming can be overcome by quadratic programming approach discussed in
Section 12.6. Other types of constraint like shrinking portfolio constraint, partici-
pation rate constraint, tax-motivated constraint are also possible [77], but difficult
to work out with dynamic programming approach.
subject to
S1 + S2 = S̄
S1 , S2 ≥ 0.
P1 = P0 + θS1 + 1
P2 = P1 + θS2 + 2
= (P0 + θS1 + 1 ) + θS2 + 2
Therefore,
Min CT S + ST QS
S
subject to
eT S = S̄
(12.19)
S ≥ 0,
396 Financial Mathematics: An Introduction
where
P0 θ θ/2 S
C= , Q= , S= 1 .
P0 θ/2 θ S2
Min CT S + ST QS
S
subject to
eT S = S̄
(12.20)
S ≥ 0,
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
⎜⎜ P0 ⎟⎟ ⎜⎜ θ θ/2 . . . θ/2 ⎟⎟ ⎜⎜ S1 ⎟⎟
⎜⎜ P ⎟⎟ ⎜⎜ θ/2 θ . . . θ/2 ⎟⎟ ⎜⎜ S ⎟⎟
⎜⎜ 0 ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ 2 ⎟⎟
⎜ ⎟ ⎜
C = ⎜⎜ .. ⎟⎟ , Q = ⎜⎜ .. . . . ⎟⎟ , S = ⎜⎜⎜ .. ⎟⎟⎟ .
⎟
⎜⎜ . ⎟⎟ ⎜⎜ . .
. . .
. . ⎟⎟⎟ ⎜⎜ . ⎟⎟
⎜⎝ ⎟⎠ ⎜⎝ ⎠ ⎜⎝ ⎟⎠
P0 θ/2 θ/2 . . . θ SN
We encourage the readers to see that the optimal solution of above QPP will
be naive strategy, that is,
⎛ ⎞
⎜⎜ S̄/N ⎟⎟
⎜⎜ S̄/N ⎟⎟
⎜⎜ ⎟⎟
S = ⎜⎜⎜ .. ⎟⎟⎟
∗
⎜⎜ . ⎟⎟
⎜⎝ ⎟⎠
S̄/N
where
Var[Pk ] = kσ2 , k = 1, . . . , N,
Thus, for a risk averse investor with risk aversion parameter λ, we have the
following quadratic programming problem
Min CT S + ST QS + λST ΣS
S
subject to
eT S = S̄
S ≥ 0.
The price process is simulated assuming k ∼ N(0, σ2 ). The results are shown in
Table 12.2. The solution for two techniques are exactly same as expected.
Period P1 S1 W1 P2 S2 W2
1 50.75666 16215 100000 50.75666 16215 100000
2 51.22827 13596 83785 51.22827 13596 83785
3 51.81408 11403 70189 51.81408 11403 70189
4 52.32833 9565 58786 52.32833 9565 58786
5 52.5864 8028 49221 52.5864 8028 49221
6 53.07228 6740 41193 53.07228 6740 41193
7 53.50411 5664 34453 53.50411 5664 34453
8 53.7376 4764 28789 53.7376 4764 28789
9 53.97918 4013 24025 53.97918 4013 24025
10 54.1704 3388 20012 54.1704 3388 20012
11 54.29048 2868 16624 54.29048 2868 16624
12 54.50313 2439 13756 54.50313 2439 13756
13 54.53384 2085 11317 54.53384 2085 11317
14 54.89655 1796 9232 54.89655 1796 9232
15 54.9577 1564 7436 54.9577 1564 7436
16 55.04097 1381 5872 55.04097 1381 5872
17 55.23632 1240 4491 55.23632 1240 4491
18 55.30066 1138 3251 55.30066 1138 3251
19 55.34233 1073 2113 55.34233 1073 2113
20 55.29029 1040 1040 55.29029 1040 1040
Table 12.2. Arithmetic random walk with linear permanent market impact model. {P1, S1, W1} is the
solution using dynamic programming. {P2, S2, W2} is the solution using quadratic programming. The
two strategies are exactly same.
section and the results are compared with those of the dynamic programming
approach.
known. Therefore, for a risk neutral investor, the optimization problem is defined
as
N
Min E1 Pk Sk
Sk
k=1
subject to
*N
k=1 Sk = S̄
(12.24)
S1 , . . . , SN ≥ 0.
*N
We can express E1 [ k=1 Pk Sk ] as a quadratic function (see [55]) of S1 , . . . , SN as
N
E1 Pk Sk = CT S + ST QS ,
k=1
⎛ ⎞
⎜⎜ θ θ/2 . . . θ/2 ⎟⎟
⎜⎜ θ/2 θ . . . θ/2 ⎟⎟
⎜⎜ ⎟⎟
Q = ⎜⎜⎜ .. . . . ⎟
.. . . .. ⎟⎟⎟⎟ .
⎜⎜ .
⎜⎝ ⎟⎠
θ/2 θ/2 . . . θ
We urge the readers to work out the above details. Therefore, a risk neutral
investor solves the following quadratic programming problem (QPP1 ) at time t1
Min CT S + ST QS
S
subject to
eT S = S̄
(12.25)
S ≥ 0.
Let {S(1)
1
, S(1)
2
, . . . , S(1)
N
} be optimal strategy for optimization problem (12.25).
(1)
The investor acquires S1 shares at time t1 . At the beginning of time t2 the price
Quadratic Programming Approach for Optimal Execution: Dynamic Models 401
and information parameters are updated. Let P1 and X2 be price and information
available at time t1 . Moreover, the number of shares to be acquired at time t1
are W2 = S̄ − S(1)
1
. So, the investor reformulates the quadratic program (QPP2 ) as
follows
Min CT S + ST QS
S
subject to
eT S = W2
(12.26)
S ≥ 0,
⎛ ⎞
⎜⎜ θ θ/2 . . . θ/2 ⎟⎟
⎜⎜ θ/2 θ . . . θ/2 ⎟⎟
⎜⎜ ⎟⎟
Q = ⎜⎜⎜ .. . . . ⎟
.. . . .. ⎟⎟⎟⎟ .
⎜⎜ .
⎜⎝ ⎟⎠
θ/2 θ/2 . . . θ
Let {S(2)
2
, S(2)
3
, . . . , S(2)
N
} be the optimal solution of (12.26). Now, the investor acquires
(2)
S2 shares at time t2 and update the remaining number of shares to be purchased
at time t3 to W3 = W2 − S(2) 2
. Using latest price and information available at
time t2 , investor reformulates the quadratic program. Therefore, at each step, a
new optimization problem is solved. At the last period, the investor acquires the
remaining number of shares.
For N = 5, the results of all quadratic programs are shown in Table 12.3. The
following parameter values are used for calculation
will be same as that obtained from QPP2 . This can be seen in Table 12.4. The
following parameter values are used for calculation
7000
6000
Shares Traded (S)
5000
4000
3000
2000
1000
0
0 2 4 6 8 10 12 14 16 18 20
Time Interval (T)
Fig. 12.4. Best execution strategy without constraints for N = 20 using DP and QP. The black dotted
curve represents DP strategy while the black bold curve represents QP strategy. The two curve coincides.
6000
5000
4000
3000
2000
1000
0
0 2 4 6 8 10 12 14 16 18 20
Time Interval (T)
Fig. 12.5. Best execution strategy for N = 20 using DP and QP. The black dotted curve represents
DP strategy without constraints while the black bold curve represents QP strategy with non-negativity
constraints. S1 is never negative and the two curve coincides.
404 Financial Mathematics: An Introduction
6000
4000
Shares Traded
2000
−2000
−4000
−6000
−8000
0 5 10 15 20 25 30
Time Insterval (T)
Fig. 12.6. Best execution strategy for N = 30 using DP and QP. The black dotted curve represents
DP strategy with no constraints while the black bold curve represents QP strategy with non-negativity
constraint. The effect of imposing non-negativity constraint is clearly visible. The strategies follow same
trend, but QP strategy becomes zero as DP strategy goes negative.
and the results are shown in Table 12.5. The static approximation procedure gives
strategy similar to one given by dynamic programming without non-negativity
constraint, but the optimal strategy with non-negativity constraints is different.
The optimal strategy by dynamic programming with non-negativity constraints
for time t1 takes into account that there is a finite probability that strategy at time
t2 goes negative. Therefore, optimal solution trades more at time t1 as compared
optimal solution without constraint.
For a risk averse investor, the quadratic programming problem at time period
t1 will be
Min CT S + ST QS + λST ΣS
S
subject to
eT S = S̄
S≥0
Quadratic Programming Approach for Optimal Execution: Dynamic Models 405
where the matrices C and S are defined in (QPP1 ) above, and Σ is variance-
covariance matrix of price Pk are discussed in [55]. As before, we reformulate the
quadratic program at each time period to get a dynamic strategy. Fig. 12.7 shows
the best execution strategy with non-negativity constraints obtained using static
approximation procedure. As λ is increased, the strategy becomes more and more
aggressive, that is, trade more in the beginning to reduce the risk. For lower values
of λ, the strategy follows a similar trend as market information variable Xk (black
bold line with ).
4
0x 10 2 4 Effect6 of risk Aversion
8 parameter
10 12on the strategy
14 16 18 20
8 0.08
λ=0.001
7 λ=0.0001 0.06
λ=0.00001
6 λ=0.000001 0.04
Shares Purchased (S)
5
0.02
4
0
3
−0.02
2
1 −0.04
0 −0.06
0 2 4 6 8 10 12 14 16 18 20
Time Interval (T)
In this model, the no-impact price P̃k is modelled using the geometric Brownian
motion and market information is modeled using AR(1) process:
406 Financial Mathematics: An Introduction
Xk = ρXk−1 + ηk , (12.29)
where Zk are i.i.d. normal random variables with mean uz and variance σ2z and ηk
is white noise with mean 0 and variance σ2η . The execution price Pk at time tk is
comprised of two components, the no-impact price P̃k and the price impact ∆k
Pk = P̃k + ∆k , (12.30)
At time t1 , initial price P̃0 and market information X1 are known. Thus, a risk
neutral investor solves the following quadratic programming problem with Q and
C defined in [55] at time t1
Min CT S + ST QS
S
subject to
eT S = S̄
(12.32)
S ≥ 0,
where
⎛ ⎞ ⎛ ⎞
⎜⎜ qP̃0 (1 + γX1 ) ⎟⎟ ⎜⎜ S1 ⎟⎟
⎜⎜ 2 ⎟ ⎜⎜ S ⎟⎟
⎜⎜ q P̃0 (1 + γρX1 ) ⎟⎟⎟ ⎜⎜ 2 ⎟⎟
C = ⎜⎜ ⎜ .. ⎟⎟ , S = ⎜⎜⎜ .. ⎟⎟⎟ ,
⎜⎜ ⎟⎟ ⎜⎜ . ⎟⎟
⎜⎝ . ⎟⎟ ⎜⎝ ⎟⎠
⎠
qN P̃0 (1 + ρN−1 γX1 ) SN
⎛ ⎞
⎜⎜ θqP̃0 0 . . . 0 ⎟⎟
⎜⎜ ⎟
⎜⎜ 0 θq2 P̃0 . . . 0 ⎟⎟⎟
⎜
Q = ⎜⎜ .. .. . . .. ⎟⎟⎟⎟ .
⎜⎜ . . . . ⎟⎟
⎜⎝ ⎠
0 0 . . . θq P̃0
N
In this model Q is a diagonal matrix. This is due to the fact that we consider
only temporary market impact. The number of shares traded at time t1 does not
affect the future price. Therefore, the execution price P2 for shares traded at time
Quadratic Programming Approach for Optimal Execution: Dynamic Models 407
For N = 5, the strategies for all the quadratic programs are shown in Table
12.6.
Table 12.6. Dynamic strategy for N = 5 for Model 3. The ith column denotes the solution of (QPPi ).
The number of shares to be traded in period t2 obtained from (QPP1 ) is not equal to that obtained from
(QPP2 ). For each period, we use the latest market information, and solve the corresponding QP and
follow this strategy till new information comes. Since, we assume that market information is updated
at each period, the diagonal gives the number of shares to trade in that period.
In Fig. 12.8, the two strategies obtained using DP and QPP approach are
exactly same as the optimal strategy using DP is never negative whereas in Fig.
12.10, the two strategies differ.
408 Financial Mathematics: An Introduction
4
x 10
3
S1
S2
2.5
1.5
0.5
0
1 2 3 4 5 6 7 8 9 10
Time Interval (T)
Fig. 12.8. Best execution strategy for N = 10 for Model 3 (Linear percentage temporary price impact)
using DP and QP. The black dotted curve represents DP strategy with no constraints. The black bold
curve represents QP strategy with non-negativity constraint. Since S1 is never negative, the two curve
coincides.
0.5
X
−0.5
−1
−1.5
−2
−2.5
1 2 3 4 5 6 7 8 9 10
4
x 10
2
S1
S2
1.5
1
Shares Traded
0.5
−0.5
−1
0 2 4 6 8 10 12 14 16 18 20
Time Interval (T)
Fig. 12.10. Best execution strategy for N = 10 for Model 3 (Linear percentage temporary price impact)
using DP and QP. The black dotted curve represents DP strategy no constraints. The black bold curve
represents QPP strategy with non-negativity constraint.
Summary and Additional Notes 409
1.5
X
1
0.5
−0.5
−1
−1.5
−2
0 2 4 6 8 10 12 14 16 18 20
and Littman [73] for basics of reinforcement learning. Nevmyvaka et al. [101]
provides a good application of reinforcement learning in the area of optimal
trading strategies.
• This chapter presents optimal trading strategies in the context of a single asset
only. But a more realistic case in that of a portfolio. Bertsimas et al. [12], and
Aitsahlia et al. [1] discuss optimal execution for portfolios.
• Similar to the concept of efficient frontier in the Markowitzs’ model of portfolio
optimization, Kissell and Glantz [77] introduced the concept of efficient trading
frontier in the context of optimal trading strategies.
12.12 Exercises
Exercise 12.1 Compute the associated transaction cost of a buy order for 100,000
shares of ABC if the price at the beginning of trading was Rs 50 per share and the
average execution price was Rs 52 per share and all shares were executed.
Exercise 12.2 Consider the set of parameters,
Plot the efficient frontier for various values of λ lies between 0 and 1 using all the
price process mentioned in the chapter.
13
Credit Risk Management
13.1 Introduction
Credit risk is the risk of a trading partner for not meeting its obligations in full
on the due date or at any time thereafter. Such an event is called a default, hence
another term for credit risk is default risk . There are three well known types
of credit risk: default risk/counterparty risk, credit spread risk and downgrade
risk. There are several ways through which credit risk can be taken into account
within commercial and retail banking activities. To name some, deposits, mortgage
lending and credit cards are part of retail banking, whereas commercial banking
includes loans, letters of credit and asset finance.
For example, in a credit card issuance financial service/bank agrees to pay
retailers for purchases made by credit card holders in exchange for an unsecured
promise to repay the card balance. Thus, any given debt has a short expected life,
as when card holders pay off some of their balances and then buy new goods, new
exposures roll in to place of old ones.
Financial services/banks need a mechanism to quantify the risk factors relevant
for an obligor’s ability and willingness to pay. Credit scoring has been used as
a tool in modern banking to access the risk factors due to the large number of
applications received on a daily basis and the increased regulatory requirements for
banks. In this chapter, we shall present some of the basic concepts with emphasis
on statistical and data mining algorithms, and their application to credit scoring.
Risk management is a core activity conducted by banks, insurance and in-
vestment companies, or any financial institution that evaluate risk due to losses.
Losses can result from either counter-party default, or from a decline in market
value stemming from the credit quality migration of an issuer or counter-party.
There are two primary types of models that attempt to describe default pro-
cesses in the credit risk literature. These are (i) structural models and (ii) reduced
412 Financial Mathematics: An Introduction
form models. Structural models have been pioneered by Black-Scholes and Merton.
The basic idea, common to all structural type models is that a company defaults
on its debt if the value of the assets of a company falls below a certain default
point. For this reason, these models are also known as firm-value models. In these
models it has been demonstrated that default can be modeled as an option and,
as a result, researchers have been able to apply the same principles used for option
pricing to the valuation of risky corporate securities.
The second group of credit models, known as reduced form models, are more
recent. These models, most notably the Jarrow-Tunbull and Duffie and Singleton
models, do not look inside the firm. Instead, they model directly the likelihood
of default or downgrade. Apart from modeling the current probability of default,
some researchers attempt to model a ‘forward curve’ of default probabilities. This
can be used to price instruments of varying maturities.
In recent past, managing credit risk at portfolio level has influenced many
researchers. In order to utilize the advantage of the credit portfolio one should
know the risk of portfolio and the factors that affect the portfolio risk profile. Thus
portfolio managers are interested in knowing the effect of changing the portfolio
mix, how would risk-based pricing at the individual contract and the portfolio
level be influenced by the level of expected losses and credit risk capital?
Traditionally used tools for assessing and optimizing market risk assume that
the portfolio return-loss is normally distributed. With this assumption, the two
statistical measures, mean and standard deviation, could be used to balance return
and risk. However, to cope with skewed return-loss distributions, conditional value-
at-risk (CVaR) has been introduced as the risk measure. This measure is also
known as mean excess loss, mean shortfall, or tail VaR. By definition, (1 − β)-
CVaR is the expected loss exceeding (1 − β)-value-at-risk (VaR), i.e. it is the mean
value of the worst β × 100% losses. For instance, at (1 − β) = 0.95, or equivalently
β = 0.05, CVaR is the average of the 5% worst losses.
In this chapter we focus on the quantification of credit risk. We will discuss
different approaches to estimate the probability that a company will default. In the
subsequent sections we will explain how a bank or other financial institution can
estimate its loss given that the default has occurred. Last part of the chapter also
covers credit rating migration and default correlation, and application of machine
learning techniques in the credit scoring scenario.
Basic Terminology 413
where qi denotes the probability that the counter party is solvent (not refutable) at
the ith period and CFi and r are defined as above.
The credit/default premium is paid by companies with lower grade bonds or
by individuals with poor credit. As an illustration, companies with poor financial
state will tend to compensate investors for the additional risk by issuing bonds
with high yields. Individuals with poor credit must pay higher interest rates in
order to borrow money from the bank.
Definition 13.2.3 (Credit Spread) Credit spread is the yield spread, or differ-
ence in yield between different securities, due to different credit quality. The credit
spread reflects the additional net yield an investor can earn from a security with
more credit risk relative to one with less credit risk. The credit spread of a particu-
lar security is often quoted in relation to the yield on a credit risk-free benchmark
security or reference rate, typically either U.S. Treasury bonds or LIBOR.
414 Financial Mathematics: An Introduction
For instance, the credit spread could be the difference between yields on gov-
ernment bonds and those on single A-rated industrial bonds. A company must
offer a higher return on their bonds because their credit is worse in comparison
to government bonds.
Definition 13.2.4 (Recovery Rate: Loss Given Default) Recovery rate is
defined as the proportion of the “claimed amount” received in the the event of a
default Loss-given-default is the percentage we expect to lose when default occurs.
Obviously R=1-LGD.
When default occurs, a portion of the value of the portfolio can usually be
recovered. Because of this, a recovery rate is always considered when evaluating
credit losses. It represents the percentage value which we expect to recover, given
default.
Definition 13.2.5 (Credit Exposure).It is the maximum loss that a portfolio
experience at the time of default taken with a certain confidence level.
Definition 13.2.6 (Default Probability) Probability of default is the likelihood
that a loan will not be repaid.
A bank assigns to every customer a default probability (DP); a loss fraction
called the loss given default (LGD); describing the fraction of the loans exposure
expected to be lost in case of default, and the exposure at default (EAD) expected
to be lost in the considered time period. The loss of any obligor is then defined by
a loss variable L̃ = EAD × LGD × L with L = 1D × P(D) = DP, where D denotes the
event that the obligor defaults in a certain period of time (most often one year),
1D is the characteristic function of D and P(D) denotes the probability of D.
The task of assigning a default probability to every customer in a bank credit
portfolio is far from being easy. There are essentially two approaches to compute
default probabilities: calibration of default probabilities from ratings and calibra-
tion of default probabilities from market data. We intend to discuss these in the
subsequent sections.
that the present value of the cost of defaults equals the excess of the price of the
risk free bond over the price of the corporate bond.
Probability of Default Assuming no Recovery
Let y(T) be the yield on a T year corporate zero-coupon bond, and y∗ (T) be
the yield on a T year risk free zero-coupon bond. Let D(T) be the probability that
corporation will default between time zero and time T.
∗
The value of a T year risk free zero coupon bond with principal 100 is 100e−y (T)T
while the value of similar corporate bond is 100e−y(T)T . The expected loss from
∗
default is therefore 100(e−y (T)T − e−y(T)T ).
If we assume that there is no recovery in the event of default, the calculation
of D(T) is relatively easy. There is a probability D(T) that the corporate bond
will be worth zero at maturity and a probability 1 − D(T) that it will be worth
Rs 100. The value of the bond is therefore {[D(T) × 0] + [(1 − D(T)) × 100]e−y∗(T)T }.
The yield on the bond is y(T), so that
e−y∗(T)T − e−y(T)T
D(T) =
e−y∗(T)T
i.e.
D(T) = 1 − e−[y(T)−y∗(T)]T .
1 − e−[y(T)−y∗(T)]T
D(T) = .
1−R
The above estimation of default probability is based on the following set of
assumptions
(i) Amount claimed in the event of default equals the no-default value of the bond.
(ii) Claim made in the event of default equals the bond’s face value plus accrued
interest.
(iii) Probabilities are calculated on zero coupon bonds.
Over the time, the bonds are liable to move from one rating category to an-
other. This is sometime referred as credit rating migration. Rating agencies gen-
erate, from historical data, a rating transition matrix whose entries correspond to
percentage probability of a bond moving from one rating to another during a cer-
tain period of time. Next we present an approach termed CreditMetrics proposed
for estimating risks associated with default.
riskier the bond higher the yield; thus the yield on the BBB bond is higher than
that on the AA bond which is in turn higher than the risk-free yield. Thus higher
yield for risky bonds is compensation for the possibility of not receiving future
coupons or the principal.
In the CreditMetrics framework, the transition matrix has its entries as the
probability of a change of credit rating at the end of a given time horizon, for
example the probability of a upgrade from AA to AAA might be 5.5%. The
time horizon for the CreditMetrics data set is one year. Unless the time horizon
is very long, the highest probability is typically for the bond to remain at its
initial rating. In the risk-free yield,the spreads and the transition matrix, contain
sufficient information for the CreditMetrics method to derive distributions for the
possible future values of a single bond. However, when it comes to examine the
behavior of a portfolio of risky bonds, we must consider whether there is any
relationship between the re-rating or default of one bond to another. In other
words, are bonds issued by different companies or governments are some sense
correlated? This is where the CreditMetrics correlation data set comes in. This
data set gives the correlations between major indices in many countries.
The CreditMetrics methodology is about calculating the possible values of a
risky portfolio comprising of bonds at some time in future (the time horizon) and
estimating the probability of occurrence of default. There are many credit rat-
ing agencies who compile data on individual companies or countries and estimate
the likelihood of default. The most famous of these are Standard & Poor’s and
Moody’s. These agencies assign a credit rating or grade to firms as an estimate
of their creditworthiness. Standard & Poor’s rate businesses as one of AAA, AA,
A, BBB, BB, B, CCC or Default. Moody’s use Aaa, Aa, A, Baa, Ba, B, Caa,
Ca, C. Both these companies also have finer grades within each of these primary
categories. The Moody’s grades are described in the Table 13.1 below.
Transition matrices, refer to Table 13.2, measure rating movements over time.
The diagonal of a transition matrix represents the share of ratings that remain
unchanged during the course of the reference period of one year. For instance,
84.67% of issuers rated in the ‘A’ category at the beginning of 2009 were still
rated ‘A’ by year-end, compared with only 69.34% of issuers in the ‘B’ category
(see Table 13.2). Here NR refers to nondefault rate. The probabilities are based
on historical data and therefore real world probabilities.
For the illustration of the concepts we shall be using the transition matrix
provided by Standard & Poor’s as on 15th March 1996, see Table 13.3.
The credit rating agencies continuously gather data on individual firms and de-
418 Financial Mathematics: An Introduction
Let the problem be to find CreditVaR for a senior unsecured BBB rated bond
maturing exactly in 5 years and paying an annual coupon of 6%. Following are
the steps involved in calculating Credit-VaR.
rating agencies.
Step 3: Specify the forward pricing model.
The valuation of a bond is derived from the zero-curve corresponding to the
rating of the issuer. Since there are seven possible credit qualities, seven “spread”
curves are required to price the bond in all possible states, all obligors within
the same rating class being marked-to-market with the same curve. The spot zero
curve is used to determine the current spot value of the bond. The forward price of
the bond in 1 year from now is derived from the forward zero-curve, 1 year ahead,
which is then applied to the residual cash flows from year one to the maturity of
the bond. Table 13.4 gives the 1-year forward zero-curves for each credit rating.
The 1-year forward price of the bond, if the obligor stays BBB, is then
6 6 6 106
VBBB = 6 + + 2
+ 3
+ = 107.55 .
1.0410 (1.0467) (1.0525) (1.0563)4
If we replicate the same calculations for each rating category we obtain the
values shown in Table 13.5
Step 4: Derive the forward distribution of the changes in portfolio value.
The distribution of the changes in the bond value, at the 1-year horizon, due
to an eventual change in credit quality is shown in the Table 13.6. The first
entry in the last column of Table 13.6 is obtained as 109.37 − 107.55 = 1.82. The
other enterers in this column are obtained in a similar manner. This distribution
exhibits long downside tails. The first percentile of the distribution of
V, which
corresponds to CreditVaR at the 99% confidence level is 83.64 − 107.55 = −23.91.
It is much lower if we compute the first percentile assuming a normal distribution
for
V. In that case CreditVaR at the 99% confidence level would be only −7.43.
422 Financial Mathematics: An Introduction
m = mean(
V)
= pi
Vi
i
= 0.02% × 1.82 + 0.33% × 1.64 + . . . + 0.18% × (−56.42)
= −0.46 .
σ = Var(
V)
2
= pi (
Vi − m)2
i
= 0.02%(1.82 + 0.46)2 + 0.33%(1.64 + 0.46)2 + . . . + 0.18%(−56.42 + 0.46)2
= 8.95 ,
his current rating BB, and 91.05% is the probability that obligor two stays in
rating class A.
However, this table is not very useful in practice when we need to assess the
diversification effect on a large loan or bond portfolio. Indeed, the actual correla-
tions between the changes in credit quality are different from zero. Correlations
are expected to be higher for firms within the same industry or in the same region,
than for firms in unrelated sectors. In addition, correlations vary with the relative
state of the economy in the business cycle. If there is a slowdown in the economy,
or a recession, most of the assets of the obligor will decline in value and quality,
and the likelihood of multiple defaults increases substantially. The contrary hap-
pens when the economy is performing well as default correlations go down. Thus,
we cannot expect default and migration probabilities to stay stationary over time.
There is clearly a need for a structural model that bridges the changes of default
probabilities to fundamental variables whose correlations stay stable over time.
Furthermore, for the sake of simplicity, CreditMetrics/CreditVaR have chosen the
equity price as a proxy for the asset value of the firm that is not directly ob-
servable. This is another strong assumption in CreditMetrics that may affect the
accuracy of the method. CreditMetrics estimates the correlations between the eq-
uity returns of various obligors, the model then infers the correlations between
changes in credit quality directly from the joint distribution of equity returns.
The proposed framework initially developed by Merton [93] is the option pricing
approach to the valuation of corporate securities. The firm’s assets value, Vt , is
assumed to follow a standard geometric Brownian motion, i.e.
√ !
Vt = V0 exp µ − σ2 /2 t + σ tZt , (13.1)
with Zt ∼ N(0, 1), µ and σ2 being respectively the mean and variance of the in-
424 Financial Mathematics: An Introduction
If pDe f denotes the probability for the BB-rated obligor defaulting, then the
critical asset value VDe f is such that
which can be translated into a normalized threshold ZCCC , such that the area in
the left tail below ZCCC is pDe f . Indeed, according to (13.1), default occurs when
Zt satisfies
426 Financial Mathematics: An Introduction
ln(VDe f /V0 ) − µ − σ2 /2 t
pDe f = P √ ≥ Zt (13.2)
σ t
ln(V0 /VDe f ) + µ − σ2 /2 t
= P Zt ≤ − √ (13.3)
σ t
= Φ(−d2 ), (13.4)
ln(Vt /V0 ) − µ − σ2 /2 t
where the normalized return r = √ is N(0, 1). The ZCCC is
σ t
simply the threshold point in the standard normal distribution corresponding to
a cumulative probability of pDe f . Then the critical asset value VDe f , which triggers
default, is such that ZCCC = −d2 , where
ln(V0 /VDe f ) + µ − σ2 /2 t
d2 = √ ,
σ t
and is also called distance-to-default .
Accordingly ZB is the threshold point corresponding to a cumulative probability
of being either in default or in rating CCC, i.e., pDe f + pCCC , etc. Further, since
asset returns are not directly observable, CreditMetrics/ CreditVaR chose equity
returns as a proxy equivalent to assuming that the firm’s activities are all equity
financed. Now, for the time being, assume that the correlation between asset rates
of return is known and is denoted by ρ, which is assumed to be equal to 0.20 in
our example. The normalized log-returns on both assets follow a joint normal
distribution:
8
1 −1
f (rBB , rA ; ρ) = exp [r − 2ρrBB rA + rA ] .
2 2
2π (1 − ρ2 ) 2(1 − ρ2 ) BB
We can then easily compute the probability for both obligers of being in any
combination of ratings, like that they remain in the same rating classes BB and
A, respectively:
, 1.37 , 1.98
P(−1.23 < rBB < 1.37, −1.51 < rA < 1.98) = f (rBB , rA ; ρ)drBB drA = 0.7365.
−1.23 −1.51
events of default for obligors 1 and 2 are denoted as DEF1 and DEF2 , respectively,
and P(DEF1,DEF2) is the joint probability of default. Then, it can be shown that
the default correlation is
P(DEF1, DEF2) − P1.P2
corr(DEF1, DEF2) = .
P1(1 − P1)P2(1 − P2)
The joint probability of both obligors defaulting is, according to Merton’s
model,
P(DEF1, DEF2) = Pr[V1 ≤ VDe f 1 , V2 ≤ VDe f 2 ] ,
where V1 and V2 denote the asset values for both obligors at time t, and VDe f 1
and VDe f 2 are the corresponding critical values which trigger default. It is further
equivalent to
P(DEF1, DEF2) = Pr[r1 ≤ −d12 , r2 ≤ d22 ] = N2 (−d12 , −d22 , ρ),
where r1 and r2 denote the normalized asset returns for obligors 1 and 2, respec-
tively, and d12 and d22 are the corresponding distance to default. N2 (x, y, ρ) denotes
the cumulative standard bivariate normal distribution and ρ is the correlation
coefficient between x and y.
CreditMetrics does not itself answer the question regarding the pricing of credit
risk and its underlying modeling. Forward zero curves are supplied as an input to
the framework that could account for the drawback of CreditMetrics approach.
However, forward curves are the result of an estimation process of the term struc-
ture of the credit spreads. This leads to an integrating framework which provides
an idea about credit portfolio diversification.
428 Financial Mathematics: An Introduction
aforementioned models.
sitions), making it difficult to control and optimize. Also, VaR has some other
undesirable properties, such as the lack of sub-additivity.
By contrast, CVaR is considered a more consistent measure of risk than VaR.
CVaR supplements the information provided by VaR and calculates the quantity
of the excess loss. Since CVaR is greater than or equal to VaR, portfolios with a
low CVaR also have a low VaR. Under quite general conditions, CVaR is a convex
function with respect to positions [140], allowing the construction of efficient opti-
mization algorithms. CVaR has been compared with the widely accepted VaR risk
performance measure for which various estimation techniques have been proposed,
see for example [39, 109].
Bucay and Rosen [22] applied CreditMetrics methodology to a portfolio of cor-
porate and sovereign bonds issued in emerging markets. They estimated the credit
risk of the undertaking portfolio by taking into account both defaults and credit
migrations. On the similar lines, we utilize CVaR optimization routine developed
in Chapter 6 for the bond portfolio discussed in [3].
The test portfolio has been compiled by a group of financial institutions to
access the state-of-the-art of portfolio credit risk models. The portfolio consists of
197 emerging markets bonds, issued by 86 obligors in 29 countries. The date of the
analysis was October 13, 1998 and the mark-to-market value of the portfolio was
8.8 billion USD. Most instruments are denominated in US dollars but 11 fixed rate
bonds are denominated in seven other currencies; DEM, GBP, ITL, JPY, TRL,
XEU and ZAR. Bond maturities ranged from a few months to 98 years and the
portfolio duration is approximately five years.
Let x = (x1 , x2 , . . . , xn ) be obligor weights (positions) expressed as multiples
of current holdings, b = (b1 , b2 , . . . , bn ) be future values of each instrument with
no credit migration (benchmark scenario), and y = (y1 , y2 , . . . , yn ) be the future
(scenario-dependent) values with credit migration. The loss due to credit migra-
tion for the portfolio is defined as f (x, y) = (b − y)T x. The CVaR optimization
problem is formulated as
Min n Φ(x) = Φ(z, α)
x∈X⊂R
where X is the feasible set in Rn . For the definition of Φ please refer to Chapter
6. This set could also contain constraints pertaining mean return constraint, box
constraints on the positions of instruments, etc. However to make content simple
and easy to read we have not discussed them in optimization routine. Interested
reader should read [3, 91]. They have approximated the performance function
using scenarios y j , j = 1, . . . , J, which are sampled with the density function p(y).
As discussed in Chapter 6, minimization of the CVaR function Φ(x) could be
Credit Risk Optimization with CVaR Criteria 431
subject to
f (x, y j ) − α ≤ z j ( j = 1, 2, . . . , J)
eT x = 1
(13.5)
zj ≥ 0 ( j = 1, 2, . . . , J)
li ≤ xi ≤ ui (i = 1, 2, . . . , n) ,
1
where ν = , where β is the confidence level.
(1 − β)J
Also, if (x∗ ; α∗ ; z∗ ) is an optimal solution of the optimization problem (13.5),
then x∗ is an approximation of the optimal solution of the CVaR optimization
problem, the function Φ(z∗ ; α∗ ) equals approximately the optimal CVaR, and α∗ is
an approximation of VaR at the optimal point. Thus, by solving problem (13.5)
we can simultaneously find approximations of the optimal CVaR and the corre-
sponding VaR.
Further, in [3, 91], two sets of additional constraints, mentioned below, in the
optimization of CVaR are considered to evaluate the performance against bench-
mark case. These set of constraints are
(i) no short positions allowed, and the positions can be at most doubled in size.
(ii) positions, both long and short, can be at most doubled in size.
The first constraint simply means that li = 0 and ui = 2, i.e. 0 ≤ xi ≤ 2
and the second constraint implies that −2 ≤ xi ≤ 2. Further, authors [3] have
considered that the re-balanced portfolio should*nmaintain *the future expected
n
value, in absence of any credit migration, i.e. i=1 bi xi = i=1 bi . After adding
these set of constraints the result of the optimization, in the case of no short
positions (no short), and in the case of both long and short positions (long and
short), are presented in Table 13.10.
Table 13.10 shows that the two risk measures, VaR and CVaR, are significantly
improved after the optimization. When no short positions are allowed, VaR and
CVaR reduce by about 60%. For example, at (1 − β) = 0.99, we lowered CVaR to
559 million from the original 1320 million USD. By allowing both short and long
positions slightly improves reductions, but not significantly. Thus, it has been
observed that risk measures could be reduced by about 60% with the multiple
obligor optimization.
432 Financial Mathematics: An Introduction
Table 13.11 shows that the two risk measures, expected loss and standard de-
viation, also are dramatically improved when we minimized CVaR. For example,
in case of both long and short positions, the expected loss and standard deviation
are reduced about 50%. The corresponding position weights for the original twelve
largest risk contributors are presented in Table 13.12.
the firm agree to that? If the customer starts to fall behind in repayments what
actions should the firm take? Techniques that help with these decisions are called
behavioural scoring.
In order to grant loan to customers following are the steps considered in the
lending process:
(i) Solicitation: Either the firm solicits applications for loans, for instance via
advertising, or a client approach to firm with a new request.
(ii) Information gathering: Information about the applicant via various means
including interviews, visits, review of financial data or accounts, possible use
of credit reference agencies, ratings agencies or other available data has been
gathered to rate the client.
(iii) Recommendation: On the basis of information gathered and consideration of
why the client wants the money an internal rating is assigned and a lending
decision is recommended. This is then reviewed, perhaps by a loan committee
in the case of a corporate loan or branch staff in the case of a retail exposure.
(iv) Closing administration: Any collateral is perfected, documentation is finalized
and signed and funds are made available.
(v) Monitoring: The performance of the obligator and their condition are re-
assessed periodically. One aim of internal ratings is to allow many applications
from different kinds of corporates in different countries to be assessed on an
equitable basis. It also allows the bank to set break-even spreads for internal
ratings classes which encourage lending to good counterparties.
The analogue of internal ratings for retail exposures is credit scoring. Credit
scoring can be formally defined as a statistical (or quantitative) method that is
used to predict the probability that a loan applicant or existing borrower will
default or become delinquent [95]. This helps to determine whether credit should
be granted to a borrower [100]. Credit scoring can also be defined as a system-
atic method for evaluating credit risk that provides a consistent analysis of the
factors that have been determined to cause or affect the level of risk [135]. Data
for a system should have discriminating power that could be based on certain
parameters like gross income, age, number of years spent in the current position,
gender etc. This can then be used both reactively to decide on whether to accede
to a mortgage application, for instance and proactively to solicit application for a
particular type of credit card.
Credit scoring has many benefits that accrue not only to the lenders but also
to the borrowers. For example, credit scores help to reduce discrimination as
scoring models provide an objective analysis of a customers credit worthiness.
Credit Scoring and Internal Rating 435
With the help of the credit scores, financial institutions are able to quantify the
risks associated with granting credit to a particular applicant in a shorter time.
Further, credit scores can help financial institutions determine the interest rate
that they should charge their customers and to price portfolios [6]. Higher-risk
customers are charged a higher interest rate and vice versa. Based on the customers
credit scores, the financial institutions are also able to determine the credit limits
to be set for the customers [106, 118]. This further help financial institutions to
manage their accounts more effectively and profitably.
Credit scores are also used as a basis to adjust premiums. Generally, customers
with bad credit scores have a higher chance of failing insurance claims compared
with customers with good credit scores. Therefore, the former are charged a higher
premium. Further, the credit information is used to assess a customers account-
ability and performance under the conditions of an insurance policy.
The methods generally used for credit scoring are based on statistical pattern
recognition techniques. Historically, discriminant analysis and linear regression
were the most widely used techniques for building scorecards. Later the logit and
probit models were suggested in Martin [88] and Ohlson [105]. All these models
belong to the class of generalized linear models (GLM) and could also be inter-
preted using a latent (score) variable. Their core decision element is a linear score
function (graphically represented as a hyperplane in a multidimensional space)
separating successful and failing companies. The company score is computed as a
value of that function. In the case of the probit and logit models the score is via
a link function which is directly transformed into a probability of default (PD).
The major disadvantage of these popular approaches is the enforced linearity of
the score and, in the case of logit and probit models, the prespecified form of the
link function (logit and Gaussian) between PDs and the linear combination of
predictors.
Some of the techniques that have been previously used, but rather infrequently,
to construct credit scoring models include genetic algorithm, k−nearest neighbor,
linear programming, and expert systems Thomas et al. [135]. In recent years,
new techniques have been increasingly used to construct credit scoring models.
In particular, the decision tree approach has become a popular technique for
developing credit scoring models because the resulting decision trees are easily
interpretable and visualized. Further, neural networks are also commonly used. All
the methods and techniques mentioned above can be considered as an important
data mining techniques for predictive modeling.
436 Financial Mathematics: An Introduction
connected with classifying a bad applicant as good. Usually cB > cG , because costs
incurred due to misclassifying a bad customer are financially more damaging than
cost associated with the former kind of error. If applicants with x are assigned to
B, the expected cost is
class * * cB p(B/x) and the expected loss for the whole sample
is cB x∈AB p(B/x)p(x) + cG x∈AG p(G/x)p(x), where p(x) is a probability that the
measurement vector is equal to x. This is minimized when, into group G, such
applicants are assigned who have their group of measurement vectors AG = {x :
cB
cB p(B|x) ≤ cG p(G/x)} which is equivalent to AG = {x : p(G/x) ≥ }. Without
cG + cB
loss of generality, the misclassification costs can be normalized to cG + cB = 1.
In this case, the rule for classification is to assign an applicant with x to class
G if p(G/x) > cB and otherwise to class B. An important task is to specify the
cost of lending errors and to accurately specify the optimal cutoff-score for credit
scoring, as banks have to choose the optimal trade-off between profitability and
risk. Credit policies that are too restrictive may ensure minimal costs in terms of
defaulted loans, but the opportunity costs of rejected loans may exceed potential
bad debt costs and thus profit is not maxmized. On the other hand, policies that
are too liberal may result in high losses from bad debt.
In the sequel, we would discuss three algorithms which are very popular in the
domain of credit scoring. These are logistic regression, Fisher’s linear discrimi-
nant analysis and support vector machines. Our presentation here is motivated
by Thomas et al. [135].
between the independent variables and the dependent variable, and it does not re-
quire the assumption of normality. But it assumes that the independent variables
p
are linearly related to logit (i.e. ln 1−p ) of the dependent variable.
Mw = S−1 (mG − mB )w
The advantages of the method are that, in the nonparametric case, SVM re-
quires no data structure assumptions such as normal distribution and continuity.
SVM can perform a nonlinear mapping from an original input space into a high
dimensional feature space and this method is capable of handling both continuous
and categorical predictions. The weaknesses of this method are that, it is difficult
to interpret unless the features interpretable and the standard formulations do
not contain any specification of business constraints.
Given a training set of instance-label pairs (xi , yi ), i = 1, 2, . . . , m where xi ∈ Rn
and yi ∈ {+1, −1}, SVM finds an optimal separating hyperplane with the maximum
margin by solving the following optimization problem
1 T m
(LP) Min w w+C ξi
w,b 2 i=1
subject to
yi (wT xi + b) + ξi − 1 ≥ 0
ξi ≥ 0
where C is a penalty parameter on the training error and ξi is the non-negative
slack variable. Introducing the Karush KuhnTucker (KKT) [98] condition for the
optimum constrained function, then LP is transformed to the dual Lagrangian
LD(α).
m
1
m
LD(α) Max αi − αi α j yi y j < xi .x j >
α
i=1
2 i=1
subject to
*m
i=1 αi yi = 0
0 ≤ αi ≤ C (i = 1, 2, . . . , m) .
To find the optimal hyperplane, a dual Lagrangian LD(α) must be maximized
with respect to non-negative αi . The solution αi for the dual optimization problem
determines the parameters w∗ and b∗ of the optimal hyperplane. Thus, the optimal
hyperplane
⎛ m decision function ⎞ f (x) = sign(w∗T x + b∗ ) can be written as f (x) =
⎜⎜ ⎟⎟
sign ⎜⎜⎝ yi α∗i < xi , x > +b∗ ⎟⎟⎠ . The nonlinear SVM maps the training samples from
i=1
the input space into a higher-dimensional feature space via a mapping function
φ. In the dual Lagrange, the inner products are replaced by the kernel function
< φ(xi ), φ(x j ) >= K(xi , x j ), and the nonlinear SVM dual Lagrangian LD(α) is
similar with that in the linear generalized case i.e.
442 Financial Mathematics: An Introduction
m
1
m
Max αi − αi α j yi y j K(xi , x j )
α
i=1
2 i=1
subject to
*m
αi yi = 0
i=1
0 ≤ αi ≤ C (i = 1, 2, . . . , m)
Followed by the steps described in the linear generalized case, we obtain decision
function of the following form
⎛ m ⎞
⎜⎜ ⎟⎟
f (x) = sgn ⎜⎜⎝ yi αi < φ(x), φ(xi ) > +b ⎟⎟⎠
∗ ∗
i=1
⎛ m ⎞
⎜⎜ ⎟⎟
= sgn ⎜⎜⎝ yi α∗i K(x, xi ) + b∗ ⎟⎟⎠ . (13.8)
i=1
Proper kernel parameters setting can improve the SVM classification accuracy.
With the RBF kernel, there are two parameters C and γ to be determined in the
SVM model. The grid search approach [63] is an alternative to finding the best C
and γ when using the radial basis function (RBF) kernel function. In addition to
the proper parameters setting, feature subset selection can improve the SVM clas-
sification accuracy. To pursue even small improvement in credit scoring accuracy,
many methods have been investigated in the last decade. Artificial Neural Net-
works (ANNs), genetic algorithms (GA) are the most commonly soft computing
method used in credit scoring modelling [134, 135].
13.12 Drawbacks
Although credit scoring has significant benefits, its limitations should also be
noted. One major problem that can arises when constructing a credit scoring
model is that the model may be built using a biased sample of customers who
have been granted credit [58]. This may occur because applicants (i.e. potential
customers) who are rejected are not included in the data for constructing the
model. Hence, the sample is biased (i.e. different from the general population) as
good customers are too heavily represented. The credit scoring model built using
this sample may not perform well on the entire population since the data used
to build the model is different from the data that the model will be applied to.
The second problem that can arise when constructing credit scoring models is the
Summary and Additional Notes 443
change of patterns over time. The key assumption for any predictive modeling is
that the past can predict the future [10]. In credit scoring, this means that the
characteristics of past applicants who are subsequently classified as good or bad
creditors can be used to predict the credit status of new applicants. Sometimes,
the tendency for the distribution of the characteristics to change over time is so
fast that it requires constant refreshing of the credit scoring model to stay relevant.
One of the consequences of credit scoring is the possibility that end-users become
so reliant on the technology that they reduce the need for prudent judgment and
the need to exercise their knowledge on special cases. In other instances, end-
users may unintentionally apply more resources than necessary to work the entire
portfolio. This could run into the risk of a self-fulfilling prophecy [84]. In the U.S., a
new industry has emerged that is dedicated to help borrowers improve their credit
scores by rearranging finances [136], rather than obeying the simple rule: pay your
bills on time and keep your debt low. Such score-polishing actions could potentially
distort the patterns of credit default. Despite the limitations highlighted above,
there is no doubt that credit scoring continues to be a major tool in predicting
credit risk in customer lending. It is envisaged that organizations using credit
scoring appropriately will gain important strategic advantage and competitive
edge over their rivals.
13.14 Exercises
Exercise 13.1 Let A = {(2, 0), (0, 0)} and B = {(4, 4)} be two sets in R2 . It is
claimed that the line y = 3 is the hard margin classifier. Verify this claim analyt-
ically.
Exercise 13.2 Let x = (0, 2, −3)T and (−2, 1, 4)T . Determine ||(x)+ ||1 , ||(y)+ ||1 and
||(x + y)+ ||1 and ||(x)+ + (y)+ ||1 . Show that the polynomial kernel of degree 2 is a
Mercer kernel.
Exercise 13.3 Let A = {(0, 0), (2, 0), (0, 2)} and B = {(0, −1), (1, 0), (0, 1)} be two
datasets with class labels +1 and -1, respectively.
1. Use a polynomial kernel of degree 2 to find a soft margin classifier.
2. Write the optimization problem (both the primal and dual forms) if a Gaussian
kernel with σ = 1 is employed for finding the soft margin classifier.
Exercise 13.4 Perform the simulation of Merton’s Model for Credit-VaR for a
Bond portfolio when the correlation 0.4.
14
Monte Carlo Simulation
14.1 Introduction
Monte Carlo methods (or Monte Carlo simulations) are a class of computational
algorithms that rely on repeated random sampling to compute their results. The
use of Monte Carlo simulations in quantitative finance or specifically in financial
engineering for valuation of options was probably first suggested by Phelim Boyle
in 1977 [19]. Ever since then the Monte Carlo simulation has become an essential
tool in pricing the derivative securities and also in risk management. These ap-
plications have, in turn, stimulated research into new Monte Carlo methods and
renewed interest in some older techniques. This chapter develops the Monte Carlo
methods and study their applications in finance. It also uses simulation, which is
a fictitious representation of reality, as a vehicle for presenting models and ideas
from financial engineering. The subject of Monte Carlo methods can be viewed as
a branch of “experimental mathematics” in which one uses random numbers to
conduct experiments. Typically experiments/simulations are carried on systems
using anywhere from hundreds to billions of random numbers.
A Monte Carlo method is a numerical method based on random sampling that
can be used to solve a mathematical or statistical problem. It provides approximate
solutions to a variety of mathematical problems by performing statistical sampling
experiments. These methods derive their collective name from the fact that Monte
Carlo, the capital of Monaco, has many casinos and casino roulette wheels.
One of the most important uses of Monte Carlo methods is in evaluating diffi-
cult multi-dimensional integrals for which we have very few methods for computa-
tion. Note that these methods only provide an approximation of the actual value.
The attempt to minimize this error is the reason for many different Monte Carlo
methods. For example, a model of a random process that produces (or mimic)
traffic movements on a particular highway is a simulation. Now suppose, we are
446 Financial Mathematics: An Introduction
interested to know the average number of cars passing through that highway or
the probability that the waiting time is more than 5 minutes, then in these cases
we need the Monte Carlo simulation.
A Monte Carlo simulation uses repeated sampling to determine the properties
of some statistical phenomenon. The Monte Carlo simulation technique has for-
mally existed since the early 1940s, where it had its applications in research into
nuclear fusion. One important feature of Monte Carlo simulation is that it is pos-
sible to estimate the order of magnitude of estimation error, in terms of statistical
confidence intervals. Thus, Monte Carlo simulation includes the distribution of
the random variables, analysis of the output and efficiency of the simulation.
In this chapter, we focus on generating random numbers and variables, and
Monte Carlo techniques, and show their applications in the area of finance. We
shall be referring mainly to Glasserman [49] and Jackel [69].
bers are called pseudo-random numbers. These random numbers are tested with
rigorous statistical tests to ensure that the numbers are random.
The most common method used for pseudo-random number generation is a
recursive technique called the linear congruence generator or Lehmner generator.
It is defined by the recursive formula
where the integers A, C and M are parameters that can be adjusted for convenience
and to ensure the desired nature of the sequence of pseudo-random numbers, and
mod stands for modulus operation. This generator is initiated with a seed x0 .
In general, the random integers so generated should be mapped into the interval
[0, 1]. For a linear congruence generator taking possible values x ∈ {0, 1, . . . , M−1},
it is suggested to use u = x/M or (x+0.5)/M to get values which are approximately
uniformly distributed in the unit interval [0, 1]. Note that, there are many tests
that can be applied to determine whether the hypothesis of independent uniform
variables is credible.
As an illustration, let x0 = 4, A = 81, C = 35 and M = 256 (8 bits). This
generates the sequence
equal subintervals, for example [0, 1/2) and [1/2, 1). Therefore, it is possible for
n trials to coincidentally all lie in the first half of the interval, while the (n + 1)th
point still falls within the other half with probability 1/2. This is not the case
with the quasi-random sequences, in which the outputs are constrained by a low-
discrepancy requirement that has a net effect of points being generated in a highly
correlated manner (i.e. the next point “knows” where the previous points are).
Illustration 14.2.1: Value of pi using Monte Carlo simulation
We consider a unit circle within a square with sides equal to 2 (see Fig. 14.1).
x2 + y2 = 1
−1 0 1 x
−1
Now if we pick a random point (x, y) where x and y are between −1 and 1,
the probability that this random point lies inside the unit circle is given by the
proportion between the area of the unit circle and the square. Thus
Area of the unit circle π
P(x2 + y2 < 1) = = .
Area of the square 4
Therefore if we pick a random point N times in the square and observe that M
times the point lies inside the unit circle, the probability that a random point lies
inside the unit circle is given by
M
P̃(x2 + y2 < 1) = ,
N
where P̃ indicates that it is a discrete distribution because M and N are integers.
But if N becomes very large, then as a consequence of the central limit theorem,
the above two probabilities will become equal. This gives
Generating Random Variables 449
4M
π= .
N
Therefore to use Monte Carlo simulation, we generate a large number of random
(x, y) positions in the square, i.e. x and y are between −1 and 1. We next determine
which of these positions are inside the circle. Each time it is inside the circle, we
add ‘one’ to the counter. Hence by the methodology of Monte Carlo simulation
we have
Number of points inside circle
π≈4 .
Total number of point generated in a square
Thus
1
F−X (u) = F−1
λ (u) = − ln(1 − u) .
λ
Therefore to generate the given exponential distribution, we first generate U ∼
U[0, 1] and then use the inversion method. This is because if U ∼ U[0, 1] then
1
− ln(1 − U) is exponential distribution with rate λ > 0. As U and 1 − U have
λ
1
the same distribution, we can use − ln(U) as well. The details are summarized
λ
as follows.
generate U ∼ U[0, 1]
ln(U)
X←−
λ
ln(1 − U)
(or Return X ← − ).
λ
Illustration 14.3.2: Generation of a Discrete Distribution
Let X be a discrete random variable with possible values ai (i = 1, 2, . . . , n). We
assume that a1 < a2 < . . . < an . The cumulative distribution function of X is given
by
Generating Random Variables 451
q0 = 0 ,
i
qi = P(X = a j ) = FX (ai ) (i = 1, 2, . . . , n) .
j=1
Therefore to generate the given discrete distribution we employ the following steps.
Step 1. Generate U ∼ U[0, 1].
Step2. Find k ∈ {1, 2, . . . , n} such that qk−1 < U ≤ qk .
Step 3. Set X = ak .
The inversion method is a very efficient tool for generating random numbers.
However very few distributions possess CDF whose (generalized) inverse can be
evaluated efficiently. For example, CDF of a Gaussian distribution is not even
available in the closed form. Note however that the generalized inverse of the CDF
is just one possible transformation and that there might be other transformations
that yield the desired distribution. An example of such a method is the Box-Muller
method for generating Gaussian random variables.
Illustration 14.3.3: Box-Muller Method for Generating Normal Distri-
bution
This algorithm generates a sample form the standard bivariate normal distri-
bution, each component of which is thus a univariate standard normal variate.
This algorithm is based on the below given lemma.
Lemma 14.3.1 Let X1 , X2 ∼ N(0, 1) and independent, i.e. Z = (X1 , X2 ) ∼
N(0, I2 ), I2 being a (2 × 2) identity matrix. Then
(i) R = X12 + X22 is exponential distributed with mean 2, i.e. P(R ≤ x) = 1 − e−x/2 .
(ii)√Given R, the point (X1 , X2 ) is uniformally distributed on the circle of radius
R centered at the origin.
(iii) R and θ = tan−1 XX21 are independent random variables.
√ θ ← 2πU2 √
X1 ← R cos(θ), X2 ← R sin(θ)
return X1 , X2 .
The idea of transformation methods, like the inversion method, is to generate
random samples from a distribution other than the target distribution and to
transform them such that they come from the desired target distribution. In many
situations we cannot find such a transformation in closed form. In these cases we
have to find other ways of correcting for the fact that we sample from the wrong
distribution. The next two algorithms present two such ideas namely the rejection
sampling and importance sampling [102].
1. Acceptance- Rejection Sampling
In the sequel we would describe acceptance-rejection Monte Carlo method.
The acceptance-rejection method, introduced by Von Neumann, is among the
most widely applicable mechanisms for generating random samples. This method
generates samples from a target distribution by first generating candidates from
a more convenient distribution and then rejecting a random subset of generated
candidates. The rejection mechanism is designed so that the accepted samples are
indeed distributed according to the target distribution.
The basic idea of rejection sampling is to sample from an instrumental distri-
bution and reject samples that are “unlikely” under the target distribution.
Assume that we want to have a sample from a target distribution whose density
f is known to us. The simple idea underlying rejection sampling (and other Monte
Carlo algorithms) is the rather trivial identity
, f (x) , 1
f (x) = 1du = 10≤u≤ f (x) du .
0 0
Using the above identity we can draw sample from Beta(3, 5) by drawing sample
from a uniform distribution on the area under the density {(x, u) : 0 ≤ u ≤ f (x)}
(the area shaded in dark gray in Fig. 14.3). We will sample from the light gray
rectangle and only keep the samples that fall in the area under the curve.
Mathematically speaking, we sample independently X ∼ U[0, 1] and U ∼
U[0, 2.4]. We keep the pair (X, U) if U < f (X), otherwise we reject it. The
conditional probability that a pair (X, U) is kept if X = x is P(U < f (X)/
X = x) = P(U < f (x)) = f (x)/2.4. As X and U were drawn independently we
can rewrite our algorithm as
Draw X from U[0, 1] and accept X with probability f (X)/2.4, otherwise reject X.
The method proposed in the above example is based on bounding the density of
the Beta distribution by a box. Whilst this is a powerful idea, it cannot be directly
applied to other distributions, as the density might be unbounded or have infinite
support. However we might be able to bound the density of f (x) by Mg(x), where
g(x) is a density that we can easily sample from.
454 Financial Mathematics: An Introduction
Algorithm (Rejection Sampling) Given two densities f, g with f (x) < Mg(x) for
all x, we can generate a sample from f by employing following steps.
Step 1. Draw X from density g.
f (X)
Step 2. Accept X as a sample from f with probability , otherwise go back
Mg(X)
to Step 1. This step is equivalent to the following step.
Generate U ∼ U[0, 1]. If U ≤ f (x)/Mg(x), accept X, otherwise go to Step 1.
The above algorithm can be justified mathematically as detailed below. We
have
, :
f (x) f (x) dx
P(X ∈ X and is accepted) = g(x) dx = X ,
X Mg(x) M
and thus
1
P(X is accepted) = P(X ∈ X and is accepted) = .
M
X is the domain of random variable X. This yields
P(X ∈ X and is accepted)
P(x ∈ X/X is accepted) =
P(X is accepted)
:
X
f (x) dx/M
=
1/M
,
= f (x) dx .
X
π(X)
Mg(X)
Fig. 14.4. Rejection sampling from N(0, 1) distribution using a Cauchy proposal
as the Cauchy distribution has heavier tails than the Gaussian distribution.
2. Importance Sampling
In rejection sampling we have compensated for the fact that we sampled from
the instrumental distribution g(x) instead of f (x) by rejecting some of the values
proposed by g(x). Importance sampling is based on the idea of using weights to
correct for the fact that we sample from the instrumental distribution g(x) instead
of the target distribution f (x).
Importance sampling is based on the identity
456 Financial Mathematics: An Introduction
, , ,
f (x)
P(X ∈ X) = f (x) dx = g(x) dx = g(x) w(x) dx ,
X X g(x) X
for all g(·), such that g(x) > 0 for (almost) all x with f (x) > 0.
Suppose we are interested in estimating
, , ,
f (x)
α = E[h(X)] = h(x) f (x) dx = h(x) g(x) dx = h(x) w(x) g(x) dx .
X X g(x) X
f (x)
where dQ(x) = g(x) dx. Note that, EP [θ] = EQ [θ] where as VarQ [θ] < VarP [θ].
Hence, we should generate the samples of the random variable X using pdf g(x)
instead of f (x).
A common pitfall of importance sampling is that the tails of the distributions
matter. While g(x) might be roughly the same shape as f (x), serious difficulties
arise if g(x) gets small much faster than f (x) out in the tails. In such a case,
though it is improbable (by definition) that we will realize a value xi from the far
f (xi )
tails of g(x), if we do so then the Monte Carlo estimator will take a jolt for
g(xi )
such an improbable xi which may be orders of magnitude larger than the typical
f (x)
values that we see.
g(x)
Illustration 14.3.6: Geometric Distribution with Parameter p.
This is a discrete distribution which describes the number of independent trials
necessary to achieve a single success with the probability of a success on each trial
is p. We know that the probability mass function is
p(i) = p(1 − p)i , (i = 1, 2, . . .) ,
and the cumulative distribution function is
F(x) = P(X ≤ x) = 1 − (1 − p)[x] , x ≥ 0 ,
where [x] denotes the integer part of x. We wish to output an integer value of x
which satisfies the inequalities
F(x − 1) < U ≤ F(x) .
Solving these inequalities for integer x,we obtain
1 − (1 − p)x−1 < U ≤ 1 − (1 − p)x
(1 − p)x−1 > 1 − U ≥ (1 − p)x
(x − 1) ln(1 − p) > ln(1 − U) ≥ x ln(1 − p)
ln(1 − U)
(x − 1) < ≤x.
ln(1 − p)
We should therefore choose the smallest integer
for X which is greater
than or
ln(1 − U) ln(1 − U) −E
equal to or equivalently, X = 1 + or 1 + where we
ln(1 − p) ln(1 − p) ln(1 − p)
write −ln(1 − U) = E, an exponential distributed random variable with parameter
1. In MATLAB, the geometric random number generators is called geornd.
458 Financial Mathematics: An Introduction
b−a
N
IN = f (xi ) .
N i=1
It is well known that IN is unbiased, i.e. its expectation is I and, because of law
of large numbers, IN → I with probability 1 as n → ∞. The approximation error
is of order !(n−1/2 ). This follows from central limit theorem, or can be viewed
√
through the mean square error (MSE) E(IN − I)2 = VarIN = √σn .
We next describe stratified sampling method. Consider again the following in-
tegral
, b
I= f (x) dx .
a
The basic principle of stratified sampling method is to divide the interval [a, b]
into n subintervals and then to perform a crude Monte Carlo method on each
subinterval. Thus we write
, b , a1 , a2 , b
f (x) dx = f (x) dx + f (x) dx + . . . + f (x) dx ,
a a a1 an−1
and then apply crude Monte Carlo method in each of the integrals of the right
hand side. The reason we might use this method is that instead of finding variance
in one go, we can find the variance by adding up the variances of each subinterval.
This method is generally helpful when function is step like or has periods of flat.
Thus advantage of stratified sampling method is that we get to split the curve
into parts that could have certain advantageous properties when evaluating them
on their own.
Applications in Finance 459
of the stock and the strike price. Then, we repeat this many a times, averaging
the discounted returns to estimate the present value of the option.
We present the steps involved as follows.
Step 1. Generate several random price paths.
Step 2. Calculate the associated exercise value for each path.
Step 3. Find the average payoff.
Step 4. Find discounted average payoff.
Now we present the Monte Carlo algorithm for this option pricing. By using
random sampling of normal distribution and then using the Black-Scholes model,
the Monte Carlo algorithm can be described as follows.
For i = 1:N
E(i) = normalrnd(0, 1);
√
S(i) = S(0) exp((r − 0.5 σ2 )T + σ TE(i));
V(i) = exp (−rT Max (S(i) − K, 0)) ;
end
1
N
ÂN = V(i).
N i=1
Note that estimator ÂN will be unbiased and consistent estimator since E(ÂN ) = A
and ÂN → A with probability 1 as n → ∞.
In the above mechanism, the payoff is determined by the terminal stock price
S(T) and does not otherwise depend on the evolution of S(t) between times 0 and
T. Each simulated path of the underlying asset consists of two points S(0) and
S(T).
In order to obtain a more accurate approximation to sampling we divide the
time interval [0, T] into smaller subintervals, we simulate the payoff of a derivative
security which may depend explicitly on the values of underlying asset at multiple
time points. We present the algorithm as follows. For 0 = t0 < t1 < . . . , tn = T,
√
S(i + 1) = S(i) exp((r − 0.5 σ2 )(ti+1 − ti ) + σ ti+1 − ti E(i)) .
We next illustrate the Monte Carlo method for interest rate model.
Illustration 14.5.1: Interest Rate Model Consider
√
drt = α(b − rt )dt + σ rt dWt , t ∈ [0, T] , (14.1)
where α and b are positive constants. If r0 > 0, then rt will never be negative. If
2αb ≥ σ2 , then rt remains strictly positive for all t, at most surely.
Applications in Finance 461
with Zi , (i = 1, 2, . . . , n) i.i.d random variables each having N(0, 1). Here r+ti =
Max(rti , 0).
Using probability theory, observe that for 0 < u < t, given ru , rt is distributed
as √ 2
rt = c Z + λ + χ2d−1
where Z ∼ N(0, 1), χ2 is chi-square distribution with d − 1 degrees of freedom and
σ2 1 − e−α(t−u) 4bα 4 ru α e−α(t−u) ru e−α(t−u)
c= ; d= ; λ = = .
4α σ2 σ2 (1 − e−α(t−u) ) c
We now present the details of algorithm.
Case 1: d > 1.
For i = 1, 2, . . . , n − 1
σ2 1 − e−α(ti+1 −ti ) 4 rti αe−α(ti+1 −ti )
Step 1. ci = , λi =
4α c
Step 2. Generate Zi ∼ N(0, 1)
Step 3. Generate Xi ∼ χ2d−1
√ 2
Step 4. rti+1 ← c Zi + λi + Xi
end
Case 2: d ≤ 1.
For i = 1, 2, . . . , n − 1
σ2 1 − e−α(ti+1 −ti ) 4 rti αe−α(ti+1 −ti )
Step 1. ci = , λi =
4α c
Step 2. Generate N ∼ Poisson (λ/2)
Step 3. Generate Xi ∼ χ2d+2N
Step 4. rti+1 ← cXi
end
Illustration 14.5.2: Portfolio Optimization
Let us consider a portfolio consisting of two stocks A and B. Let SA (0) and
SB (0) respectively denote the process of these stocks at time t = 0. The prices of
462 Financial Mathematics: An Introduction
these stocks at time t are denoted by SA (t) and SB (t). At time t = 0, the portfolio
value V(0) is given by
V(0) = nA SA (0) + nB SB (0) ,
where the portfolio consists of nA units of stock A and nB units of stock B. Also
let us suppose SA (t) and SB (t) follow the geometric Brownian motion with drift µA
and volatility σA and drift µB and volatility σB respectively. Further, let SA (t) and
SB (t) be independent. We are interested in finding the estimate of probability that
the value of portfolio drops by more than 15% at maturity date T using Monte
Carlo simulation.
V(T)
The desired probability is θ = P ≤ 0.85 . The steps for estimating θ are
V(0)
as follows.
For i = 1 to N; generate Xi = (S(i)
A
(T), S(i)
B
(T)).
Compute ⎧
⎪
⎪ nA S(i) (T) + nB S(i) (T)
⎪
⎨ 1, if A B
≤ 0.85
I(Xi ) = ⎪
⎪ nA SA (0) + nB SB (0)
⎪
⎩ 0, if otherwise
end.
Set θ̂N = N1 (I(X1 ) + I(X2 ) + . . . + I(XN )) .
The estimator θ̂N will be unbiased and consistent estimator since E(θ̂N ) = θ and
θ̂N → θ with probability 1 as n → ∞.
14.7 Exercises
Exercise 14.1 Consider the linear congruential generator xn+1 = Axn +C (modM)
with M = 64. Generate the pseudo-random numbers with A = 4, C = 1 and x0 = 2.
Determine the period of the generator starting with seed x0 = 2 and with seed
x0 = 3.
Exercise 14.2 Consider the uniformly distributed random variable X on the in-
terval [0, 1]. Find a function of X which is uniformly distributed on the interval
[0, 2]. Generate the sequence of random numbers of uniformly distributed on the
interval [0, 2].
Exercise 14.3 Using inverse transform method, develop an algorithm for simu-
lation of Poisson distribution with parameter λ.
464 Financial Mathematics: An Introduction
, 1
e−x dx, by the crude Monte Carlo method. Also, eval-
4
Exercise 14.4 Evaluate
0
uate it by acceptance-rejection method.
Exercise 14.5 Consider a multivariate normal random vector * X = (X1 , X2 , . . . , Xn )
having mean vector (µ1 , µ2 , . . . , µn ) and covariance matrix of order n×n matrix.
We wish to generate
* this n-dimensional random vector.
* The procedure involves a
decomposition of into factors such that A A = . Suppose Z = (Z1 , Z2 , . . . , Zn )
is a vector of independent standard normal random variable, the required n-
dimensional random vector is given by X = (µ1 , µ2 , . . . , µn ) + ZA .
Exercise 14.6 Consider a Vasicek interest rate model
where α, b and σ are positive constants. Prove the algorithm for the exact sim-
−α(ti+1 −ti )rti
ulation at time 0 = + t0 < t1 < . . . < tn = T is given by rtn = e +
b 1 − e−α(ti+1 −ti ) + σ 2α
1
(1 − e−2α(ti+1 −ti ) ) Zi+1 where Zi is the standard normal sam-
ple.
(Hint: refer to Chapter 11.)
15
MATLAB Codes for Selected Problems
in Finance
15.1 Introduction
MATLAB has a very useful toolbox, namely Financial Toolbox, which contains
many standard programs of finance. The aim of presenting the MATLAB codes
in this book is to encourage readers to attempt writing codes for similar problems
in financial mathematics, taking these codes as building blocks.
% at the beginning of any statement in MATLAB code stands for comment
lines, which are added in the codes to facilitate understanding of programs.
S0 = input(’S0 = ’);
u = input(’u = ’);
d = input(’d = ’);
T = input(’T = ’);
k = input(’k = ’); % Strike price
r = input(’r = ’);
466 Financial Mathematics: An Introduction
R = 1 + r;
p = (R-d)/(u-d);
BinomialLattice = zeros(T,T);
r = 0; % power of d in the binomial lattice
for i =T:-1:1 % row generation
r = 1;
% i+j = T+1 => j= T+1-i % constructing a lower triangle
for j = T+1-i:T % column generation
BinomialLattice(i,j) = S0*uˆ(T-i)*dˆ(r-1);
r = r+1;
end
end disp(’Binomial lattice for stock movement ’);
disp(BinomialLattice);
c0(i,j) = max(BinomialLattice(i,j) - k,
(1/R)*(p*c0(i-1,j+1)+(1-p)*c0(i,j+1)));
end
end
end
disp(’American call option binomial lattice mm’);
disp(c0);
else
p0 = zeros(T,T);
p0(:,T) = max(k - BinomialLattice(:,T), 0);
for j = T-1:-1:1 % j = column traversal
for i = T:-1:2 % i = row traversal
if i+j>=T+1
p0(i,j) = max(k - BinomialLattice(i,j),
(1/R)*(p*p0(i,j+1)+(1-p)*p0(i-1,j+1)));
end
end
end
disp(’American put option binomial lattice ’);
disp(p0);
end
break;
case 2
fprintf(’\n 1 : for call option \n 2 : for put option ’);
choice_p_c = input(’your choice ’);
c0 = zeros(T,T);
c0(:,T) = max(BinomialLattice(:,T) - k, 0);
if choice_p_c == 1
for j = T-1:-1:1 % j = column traversal
for i = T:-1:2 % i = row traversal
if i+j >=T+1 % construct lower triangle
c0(i,j) =(1/R)*(p*c0(i-1,j+1)+(1-p)*c0(i,j+1));
end
end
end
disp(’European call option binomial lattice ’);
disp(c0);
else
468 Financial Mathematics: An Introduction
p0 = zeros(T,T);
p0(:,T) = max(k-BinomialLattice(:,T),0);
for j = T-1:-1:1 % j = column traversal
for i = T:-1:2 % i = row traversal
if i+j>=T+1
p0(i,j) =(1/R)*(p*p0(i,j+1)+(1-p)*p0(i-1,j+1));
end
end
end
disp(’European put option binomial lattice ’);
disp(p0);
end
break;
otherwise
disp(’Unknown Choice.’);
end
S0 = input(’S0 = ’);
T = input(’T = ’); % Expiration time
n = input(’n = ’); % no of periods
k = input(’k = ’); % Strike price
r = input(’r = ’);
sigma = input(’Sigma = ’) % volatility of the stock price.
deltat = T/n;
R = exp(r*deltat);
u = exp(sigma*sqrt(deltat));
d = 1/u;
p = (R-d)/(u-d); % risk neutral probability measure.
BinomialLattice = zeros(T,T);
r=0; % power of d in the binomial lattice
for i = T:-1:1 % row generation
r = 1;
% i+j = T+1 =>j= T+1-i % create lower triangle display
for j = T+1-i:T % column generation
BinomialLattice(i,j) = S0*uˆ(T-i)*dˆ(r-1);
r = r+1;
end
end disp(’Binomial lattice for stock movement ’);
disp(BinomialLattice);
case 1
fprintf(’\n 1 : for call option \n 2 : for put option ’);
choice_p_c = input(’your choice ’);
c0 = zeros(T,T);
c0(:,T) = max(BinomialLattice(:,T) - k,0);
if choice_p_c == 1
for j = T-1:-1:1 %j = column traversal
for i = T:-1:2 %i = row traversal
if i+j >=T+1 % create lower triangle
470 Financial Mathematics: An Introduction
c0(i,j) = max(BinomialLattice(i,j) - k,
(1/R)*((1-p)*c0(i-1,j+1)+(p)*c0(i,j+1)));
end
end
end
disp(’American call option binomial lattice mm’);
disp(c0);
else
p0 = zeros(T,T);
p0(:,T) = max(k-BinomialLattice(:,T),0);
for j = T-1:-1:1 %j = column traversal
for i = T:-1:2 %i = row traversal
if i+j>=T+1
p0(i,j) = max(k - BinomialLattice(i,j),
(1/R)*(p*p0(i,j+1)+(1-p)*p0(i-1,j+1)));
end
end
end
disp(’American put option binomial lattice ’);
disp(p0);
end
break;
case 2
fprintf(’\n 1 : for call option \n 2 : for put option ’);
choice_p_c = input(’your choice ’);
c0 = zeros(T,T);
c0(:,T) = max(BinomialLattice(:,T)-k,0);
if choice_p_c == 1
for j = T-1:-1:1 % j = column traversal
for i = T:-1:2 % i = row traversal
if i+j >=T+1 % To make lower triangle
c0(i,j) =(1/R)*((1-p)*c0(i-1,j+1) + (p)*c0(i,j+1));
end
end
end
disp(’European call option binomial lattice ’);
disp(c0);
else
Black Scholes Formula for Dividend and Non Dividend Paying Stock 471
p0 = zeros(T,T);
p0(:,T) = max(k - BinomialLattice(:,T),0);
for j = T-1:-1:1 %j = column traversal
for i = T:-1:2 %i = row traversal
if i+j>=T+1
p0(i,j) = (1/R)*((p)* p0(i,j+1) + (1-p)*p0(i-1,j+1));
end
end
end
disp(’European put option binomial lattice ’);
disp(p0);
end
break;
otherwise
disp(’Unknown Choice.’);
end
CH = 1;
while CH == 1
472 Financial Mathematics: An Introduction
if ch == 2 % put option
c0 = s0 * normcdf(d1)- k * exp(-r*T) * normcdf(d2);
p0 = c0 - s0 + k * exp(-r*T);
fprintf(’\n Value of put option = %4.4f\n’,p0);
end
if ch ˜= 1 && ch ˜= 2
disp(’Invalid choice’);
end
end
CH = input(’1-continuing option pricing else any other key: ’);
end
wmin = pC*e/(e’*pC*e);
port = zeros(10,1);
effr = zeros(10,1);
effsigma = zeros(10,1);
for i=1:11
lambda =(i-1)/10 ;
port = (1-lambda)* wmin + (lambda*w2);
effr(i,1)= meanR * port;
effsigma(i,1) = port’* C * port;
end plot(effsigma, effr);
xlabel(’Sigma’);
ylabel(’Return’);
% Set-up
if ˜exist(’quadprog’)
msgbox(’The optimization toolbox is required to run
this demo.’, ’Product dependency’)
476 Financial Mathematics: An Introduction
return
end
rand(’state’, 0)
weights = rand(1000, n);
CAPM with Generic Input Parameters 477
[portRisk,portReturn]=portstats(returns,covariances,weights);
hold on
plot(portRisk, portReturn, ’.r’)
title(’Mean-Variance efficient frontier for random portfolios’)
hold off
% MR = market return,
% it can taken from historical data of any market.
% Data collections/genration
C = zeros(20,1);
for i = 1:20;
MERR = mean(MR. * R(:,i));
C(i,1) = MERR - MMR * mean(R(:,i));
end
VMR = var(MR);
beta = zeros(20,1);
for i = 1:20
beta(i) = C(i,1)/VMR;
end
for i = 1:length(beta)
r(i) = rf + beta(i)*(MMR-rf);
end
for i =1 :20
fprintf(’\t a%d \t %f \t\t %f\n’,i, beta(i),r(i));
Geometric Brownian Motion using Simple Random Walk 479
end
% The Geometric_brownian(N,r,sigma,T)
% simulates a Geometric Brownian motion on [0,T].
W = [0; cumsum(randn(N,1))]/sqrt(N);
% cumsum function is running sum of normal N(0,1/N) variables.
t = t * T;
W = W * sqrt(T);
X = (r-(sigmaˆ2)/2)* t + sigma * W;
Y = exp(X);
plot(t,Y);
title([’Sample Path of Geometric Brownian motion with
diffusion coefficient=’num2str(sigma),’Interest rate=’num2str(r)])
xlabel([’Time’])
480 Financial Mathematics: An Introduction
IntLattice = zeros(T,T);
p = zeros(T+1,T+1);
x = 1; % We want find in percentage later will be converted
p(:,T+1) = x;
for j = T:-1:1 % j = column traversal
for i = T+1:-1:2 %i = row traversal
if i+j >=T+2 % To make a lower triangle
p(i,j) =(1.0/(1 + IntLattice(i-1,j)))*
(0.5*p(i-1,j+1)+0.5*p(i,j+1));
end
end
end
C = zeros(T,T);
for i = T+1:-1:2
C(i-1,T) = max(p(i,T)*100-k,0);
end
for j = T-1:-1:1 % j = column traversal
for i = T:-1:2 % i = row traversal
if i+j >=T+1 % To make a lower triangle
C(i,j) =(1.0/(1 + IntLattice(i,j)))*
(0.5*C(i,j+1)+0.5*C(i-1,j+1));
end
end
end
P = zeros(T,T);
for i = T+1:-1:2
P(i-1,T) = max(k - p(i,T)*100, 0);
end
482 Financial Mathematics: An Introduction
13. F. Black, E. Derman and W. Toy, A one-factor model of interest rates and its
application to treasury bond options, Financial Analysts, Vol. 22, pp.33–39,
1990.
14. F. Black and P. Karasinski, Bond and option pricing when short rates are
lognormal, Financial Analysts, Vol. 47, pp. 52–59, 1991.
15. F. Black and M. Scholes, The pricing of options and corporate liabilities,
Journal of Political Economy, Vol. 81, pp. 637–659, 1973.
16. M. E. Blume and D. B. Kein, The valuation of callable bonds, Wharton Fi-
nancial Institution Center,
http://finance.wharton.upenn.edu/∼rlwctr/papers/8914.PDF
17. E. Bouyé, V. Durrleman, A. Nikeghbali, G. Riboulet and T. Roncalli, Copulas
for finance: A reading guide and some applications, 2000.
http://thierry-roncalli.com/download/copula-survey.pdf
18. E. Bouyé, V. Durrleman, A. Nikeghbali, G. Riboulet and T. Roncalli, Copulas:
an open field for risk management, 2003.
http://www.thierry-roncalli.com/download/copula-rm.pdf
19. P. Boyle, Options: a Monte Carlo approach, Journal of Financial Economics,
Vol. 4, pp. 323–338, 1977.
20. P. P. Boyle and T. Vorst, Option replication in discrete time with transaction
costs, Journal of Finance, Vol. 47, pp. 271-293, 1992.
21. D. Brigo and F. Mercurio, Interest Rate Models - Theory and Practice: With
Smile, Inflation and Credit, Springer Finance, 2nd edition, 2006.
22. N. Bucay and D. Rosen, Credit risk of an international bond portfolio: a case
study, ALGO Research Quarterly, Vol. 2(1), pp. 9–29, 1999.
23. X. Cai, K. -L. Teo, X. Yang and X. Y. Zhou, Portfolio optimization under a
minimax rule, Management Science, Vol. 46, pp. 957–972, 2000.
24. A. J. G. Cairns, Interest Rate Models, Princeton University Press, 2004.
25. M. Capinski and T. Zastawniak, Mathematics for Finance: An Introduction
to Financial Engineering, 2nd edition, Springer 2010.
26. M. M. Chawla, On the bionomial tree method for the valuation of options,
International Journal of Applied Mathematics, Vol. 20(2), pp. 235–251, 2007.
27. M. M. Chawla, On mean variance portfolio optimization, International Jour-
nal of Applied Mathematics, Vol. 21(3), pp. 473–494, 2008.
28. Y.-W. Chen and C.-J. Lin, Combining SVMs with various feature selec-
tion strategies, 2005. Available from http://www.csie.ntu.edu.tw/ cjlin/ pa-
pers/features.pdf.
29. U. Cherubini, E. Luciano and W. Vecchiato, Copula Methods in Finance,
John Wiley & Sons, 2004.
References 487
http://www.elan.com.mx/biblioteca/FINANCE%20MATERIAL%
20CM/Wiley%20Copula%20Methods%20in%20Finance.pdf.
30. V. K. Chopra and W. T. Ziembia, The effect of errors in means, variances,
and co-variances on optimal portfolio choices, Worldwide Asset and Liability
Modeling (W. T. Ziembia and J. M. Mulvey editors), Cambridge University
Press, 1998.
31. N. A. Chriss, Black-Scholes and Beyond Option Pricing Models, McGraw Hill,
1997.
32. G. Cornuejols and R. Tütüncü, Optimization Methods in Finance, Cambridge
University Press, 2007.
33. J. C. Cox, S. A. Ross and M. Rubinstein, Option pricing: a simplified approach,
Journal of Financial Economics, Vol. 7(3), pp. 229-263, 1979.
34. J. C. Cox, J. E. Ingersoll and S. A. Ross, A theory of the term structure of
interest rates, Econometrica, Vol. 53, pp. 385–407, 1985.
35. R. Deaves and M. Parlar, A generalized bootstrap method
to determine the yield curve, Macmaster University, 1999.
http://www.stats.uwo.ca/events/Colloquium/W09/pdf/Rodrigo.pdf
36. X. -T. Deng, Z. -F. Li and S. -Y. Wang, A minimax portfolio selection strategy
with equilibrium, European Journal of Operational Research, Vol. 166, pp.
278–292, 2005.
37. D. Dentcheva and A. Ruszczynski, Portfolio optimization with stochastic dom-
inance constraints, Journal of Banking and Finance, Vol. 30, pp. 438–451,
2006.
38. E. Derman, I. Kani, and N. A. Chriss, Goldman Sachs–quantitative strategies
research notes, implied trinomial trees of the volatility smile, 1996.
http:// www.ederman.com/new/docs/gs-implied-trinomial-trees.pdf.
39. D. Duffie and J. Pan. An overview of Value-at-Risk, Journal of Derivatives,
Vol. 4, pp. 7–49, 1997.
40. P. Eizek and K. Komorad, Implied trinomial trees, SFB 649 Discussion Paper
2005-007,
http://edoc.hu-berlin.de/series/sfb-649-papers/2005-7/PDF/7.pdf.
41. E. J. Elton, M. J. Gruber, S. J. Brown and W. Goetzmann, Modern Portfolio
Theory Investment Analysis, 8th edition, John Wiley & Sons, 2009.
42. P. Embrechts, A.J. McNeil and D. Straumann, Correlation and dependency
in risk management: properties and pitfalls, Risk Management: Value at Risk
and Beyond ( M. Dempster and H.K. Moffatt editors), Cambridge University
Press, 2000.
43. T. W. Epps, Pricing Derivative Securities, World Scientific Publishing, 2000.
488 Financial Mathematics: An Introduction
44. C.I. Fábián, G. Mitra and D. Roman, Processing second order stochastic dom-
inance models using cutting plane representations, Mathathematical Program-
ming (Ser A), Vol. 130(1), pp. 33–57, 2011.
45. A. Fensterstock, Credit scoring and the next step, Business Credit, Vol. 107(3),
pp. 46–49, 2005.
46. D. Filipović, Interest Rate Models, http://www.mathematik.uni-
muenchen.de/∼filipo/ZINSMODELLE/zinsmodelle1.pdf
47. P. C. Fishburn, Utility Theory for Decision Making, John Wiley & Sons, 1970.
http://www.dtic.mil/cgi-bin/GetTRDoc?AD=AD0708563.
48. M. Garman and S. Kohlhagen, Foreign currency option values, Journal of
International Money and Finance, Vol. 2, pp. 231–237, 1983.
49. P. Glasserman, Monte Carlo methods in Financial Engineering, Springer,
2003.
50. P. Glasserman, P. Heidelberger and P. Shahabuddin, Variance reduction tech-
niques for estimating Value-at-Risk, Management Science, Vol. 46(10), pp.
1349–1364, 2000.
51. C. Goffman, Introduction to Real Analysis, A Harper International Edition,
1966.
52. A. J. Goldman and A. W. Tucker, Linear Equalities and Related Systems,
Priceton University Press, pp. 53–97, 1956.
53. M. Grigoriu, Stochastic Calculus: Applications in Science and Engineering,
Birkhäuser, 2002.
54. L. Güntay, N.R. Prabhala and H. Unal, Callable bonds
and hedging, Wharton Financial Institution Center,
http://fic.wharton.upenn.edu/fic/papers/02/0213.pdf
55. N. Gupta and A. Chaudhary, Optimal trading strategies, M.Tech. Disserta-
tion, Department of Mathematics, IIT Delhi, 2009.
56. J. H. Halton, On the efficiency of certain quasi-random sequences of points in
evaluating multi-dimensional integrals, Numerical Mathematics, Vol. 2, 84–90,
1960.
57. H. Han and X. Wu, A fast numerical method for the Black-Scholes equation
of American options SIAM Journal on Numerical Analysis, Vol. 41, pp. 2081–
2095, 2003.
58. D. J. Hand, Modelling consumer credit risk, IMA Journal of Management
Mathematics, Vol. 12(1), pp. 139–155, 2001.
59. L. Haris, Trading and Exchanges Market Microstructure for Practitioners, Ox-
ford University Press, 2003.
References 489
60. M. Haugh, Continuous-time short rate models, Term structure models: IEOR
E4710, 2010. http://www.columbia.edu/∼mh2078/cts shortrate models.pdf
61. D. Heath, R. Jarrow and A. Morton, Bond pricing and the term structure of
interest rates: a new method for contingent claim valuation, Econometrica,
Vol. 60, pp. 77–105, 1992.
62. S. Heston, A closed form solution for options with stochastic volatality and
applications to bond and currency options, Review of Financial Studies, Vol.
6, pp. 327–343, 1993.
63. C. W. Hsu, C. C. Chang and C. J. Lin, A practical guide to support vector
classification, 2003.
http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf.
64. G. Huberman and W. Stanzl, Optimal liquidity trading, Review of Finance,
Vol. 9, pp. 165-200, 2005.
65. J. C. Hull, Options, Futures and Other Derivatives, 8th edition, Prentice Hall,
2012.
66. J. Hull and A. White, Pricing interest rate derivative securities, Review of
Financial Studies, Vol. 3, pp. 573–592, 1990.
67. P. J. Hunt and J. E. Kennedy, Financial Derivatives in Theory and Practice,
John Wiley & Sons, 2004.
68. R. Jarrow and A. Rudd, Option Pricing, Irwin Professional Publishers, 1983.
69. P. Jäckel, Monte Carlo Methods in Finance, John Wiley & Sons, 2002.
70. N. Ju and Zhong, An approximate formula for pricing American options.
http://www.rhsmith.umd.edu/faculty/nju/JZ99JD.pdf
71. T. Joro and P. Na, Portfolio perference evaluation in mean-variance-skewness
framework, European Journal of Operation Research, Vol. 175, pp. 516–542,
2006.
72. O. Josef, Models of interest rate evolution-Vasicek and CIR models, Journal
of Applied Mathematics, Vol. 2(3), pp. 117–125, 2009.
73. L. Kaelbling, M. Littman and A. Moore, Reinforcement learning: a survey,
Journal of Artificial Intelligence Research, Vol. 4, pp. 237–285, 1996.
74. P. Kall, Stochastic Linear Programming, Springer, 1976.
75. I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus, 2nd
edition, Springer, 1991.
76. R. Khemchandani, N. Gupta, A. Chaudhary and S. Chandra, Optimal exe-
cution with weighted impact functions: a quadratic programming approach,
Optimization Letters, 2012. DOI 10.1007/s11590-012-0441-4
77. R. Kissell and M. Glantz, Optimal trading strategies: quantitative approaches
for managing market impact and trading risk, AMACON, 2003.
490 Financial Mathematics: An Introduction
2011. http://home.datacomm.ch/paulsoderlind/Courses/OldCourses/
Fin2MiQEFAll.pdf
128. S. Sriboonchitta, W. K. Wong, S. Dhompongsa and H. T. Nguyen, Stochastic
Dominance and Applications to Finance, Risk and Economics, CRC Press,
2009.
129. M. C. Steinbach, Markowitz revisted: mean variance models in financial port-
folio analysis SIAM Review, Vol. 43(1), pp. 31–85, 2001.
130. R. E. Steuer, Y. Qi and M. Hirschberger, Portfolio optimization: new ca-
pabilities and future methods, Zeitschrift fr Betriebswirtschaft, Vol. 76, pp.
199–219, 2006.
131. W. Sun, A. C. Fan, L. -W. Chen, T. Schouwenaars and M. A. Albota, Op-
timal Rebalancing Strategy Using Dynamic Programming for Institutional
Portfolios, Available at SSRN 2004.
132. R. Sutton and A. Barto, Reinforcement Learning: An Introduction, MIT
Press, Cambridge, 1998.
133. California Debt and Investment Advisory Commission, Understanding inter-
est rate swap math pricing, 2007.
http://www.treasurer.ca.gov/cdiac/publications/math.pdf
134. L. C. Thomas, A survey of credit and behavioral scoring forecasting financial
risk of lending to consumers, International Journal of Forecasting, Vol. 16, pp.
163–167, 2000.
135. L. C Thomas, D. Edelman and J. N. Crook, Credit Scoring and its Appli-
cation, SIAM Monograph on Mathematical Modelling and its Computation,
2002.
136. Y. Tian, A modified lattice approach to option pricing, Journal of Futures
Markets, Vol. 13(5), pp. 563–577, 1993.
137. H. Timmons, The cracks in credit scoring: as loan defaults rise, worries grow
about how creditors pick borrowers, Business Week (November) 25, 2002.
138. R. S. Tsay, Analysis of Financial Time Series, John Wiley & Sons, 2005.
139. R. H. Tütüncü and M. Koenig, Robust asset allocation, Annals of Operation
Research, Vol. 132, pp. 157–87, 2004.
140. S. Uryasev and R.T. Rockafellar, Optimization of conditional Value-at-Risk,
Research Report 99-4, ISE Dept., University of Florida, 1999.
141. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995.
142. J. R. Varma, Bond valuation and the pricing of interest rate options in India,
The ICFAI Journal of Applied Finance, Vol. 2(2), pp. 161–176, 1996.
143. O. A. Vasicek, An equilibrium characterization of the term structure, Journal
of Financial Economics, Vol. 15, pp. 177–188, 1977.
494 Financial Mathematics: An Introduction
144. P. -C. G. Vassiliou, Applied Stochastic Finance, Vol. 2, John Wiley & Sons,
2012.
145. F. de Weert, Exotic Options Trading, John Wiley & Sons, 2008.
146. G. West, Interest Rate Derivatives: Lecture Notes, 2010.
http://www.finmod.co.za/ird.pdf.
147. P. Wilmott, Introduces Quantitative Finance, John Wiley & Sons, 2007.
148. L. Y. Yu, X. D. Ji and S. Y. Wang, Stochastic programming models in
financial optimization: a survey, Advance Modeling and Optimization, Vol. 5,
pp. 1-26, 2003.
149. R. Zagst, Interest-Rate Management, Springer, 2002.
150. X. Y. Zhou and D. Li, Continuous time mean- variance selection: a stochastic
LQ framework, Applied Mathematics and Optimization, Vol. 42, pp. 19–33,
2000.
Index