b10b Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 101

B10b Mathematical Models of Financial Derivatives Hilary Term 2012

Michael Monoyios Mathematical Institute University of Oxford March 5, 2012

Useful books
The books by Shreve [13, 14] are excellent on, respectively, probabilisitic aspects of the binomial model and on stochastic calculus for nance. Etheridge [5] is a good stochastic calculus primer for nance, Bjrk [1] covers many nance o topics outside the scope of the course, Wilmott et al [16] is good on the PDE aspects of the subject, and background on nancial derivatives is given in Hull [8]. Jacod and Protter [9] and Grimmett and Stirzaker [6] are excellent for background probability material.

The lecture notes


These notes contain some background material, which is marked with an asterix. In general, this will not be examinable. It is good to know the basic probability theory underlying conditional expectation and martingales, though this will not be directly examined. It is essential to know the properties of Brownian motion, particularly its quadratic variation, and have an idea of how this leads to the It formula and to properties of the It integral, though you are not required o o to know the theory of the construction of the It integral. o

Contents
I Introduction to derivative securities
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5
5 5 6 7 7 7 8 9 10 12 12

1 Financial derivatives 1.1 Underlying assets . . . . . . . . . . . . . . 1.2 Interest rates and time value of money . . 1.3 Forwards and futures . . . . . . . . . . . . 1.3.1 Valuation of forward contracts . . 1.4 Arbitrage . . . . . . . . . . . . . . . . . . 1.4.1 Forward on dividend-paying stock 1.5 Options . . . . . . . . . . . . . . . . . . . 1.5.1 Europen call and put price bounds 1.5.2 Combinations of options . . . . . . 1.6 Some history* . . . . . . . . . . . . . . . .

II

The binomial model

13
13 14 15 17 17 19 22 25 26 27 28 32 33 34 36 36 39 39 40 41 42 44 46 47 48

2 Probability spaces 2.1 Finite probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The binomial stock price process 3.1 Filtration as information ow in the binomial model* . . . . . . . . . . . . . . . . 4 Conditional expectation and martingales 4.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . 4.1.1 Conditional expectation in the three-period binomial 4.1.2 Partial averaging . . . . . . . . . . . . . . . . . . . . 4.1.3 Properties of conditional expectation . . . . . . . . . 4.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . model . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

5 Equivalent measures* 5.1 Radon-Nikodym martingales* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Conditional expectation and Radon-Nikodym theorem* . . . . . . . . . . . . . . 6 Arbitrage pricing in the binomial model 6.1 Equivalent martingale measures and no arbitrage* 6.1.1 Fundamental theorems of asset pricing* . . 6.2 Pricing by replication in the binomial model . . . . 6.2.1 Replication in a one-period binomial model 6.3 Completeness of the multiperiod binomial model .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

7 American options 7.1 Value of hedging portfolio for an American option* . . . . . . . . . . . . . . . . . 7.2 Stopping times* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Properties of American derivative securities* . . . . . . . . . . . . . . . . . . . .

III

Continuous time

49

8 Brownian motion 8.1 Random walk* . . . . . . . . . . . . . . . . . . 8.2 BM as scaled limit of symmetric random walk* 8.3 Brownian motion . . . . . . . . . . . . . . . . . 8.4 Properties of BM . . . . . . . . . . . . . . . . . 8.5 Quadratic variation of BM . . . . . . . . . . . . 8.5.1 First variation . . . . . . . . . . . . . . 8.5.2 Quadratic variation of Brownian motion 9 The 9.1 9.2 9.3 9.4 9.5 10 The 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 It integral o Construction of the It integral* . . . . . . . o It Integral of an elementary integrand* . . . o Properties of the It integral of an elementary o It Integral of a general integrand* . . . . . . o Properties of the general It integral . . . . . o

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

49 49 50 51 52 54 54 54 59 59 60 60 63 64 66 66 68 70 71 71 72 74 74 75 75 77 78 79 79 80 81 81 82 82 82 84 84 85 87 88 89 89 89 91 92 92 92

. . . . . . . . . . process . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

It formula o Its formula for one Brownian motion . . . . . . o Its formula for It processes . . . . . . . . . . . o o Stochastic dierential equations . . . . . . . . . . Multidimensional Brownian motion . . . . . . . . 10.4.1 Cross-variations of Brownian motions . . Two-dimensional It formula . . . . . . . . . . . o Multidimensional It formula . . . . . . . . . . . o 10.6.1 Multidimensional It process . . . . . . . o Extensions* . . . . . . . . . . . . . . . . . . . . . Connections with PDEs: Feynman-Kac theorem The Girsanov Theorem* . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

11 The Black-Scholes-Merton model 11.1 Portfolio wealth evolution . . . . . . . . . . . . . . . . . . . 11.2 Perfect hedging . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Riskless portfolio argument . . . . . . . . . . . . . . 11.3 Solution of the BSM equation . . . . . . . . . . . . . . . . . 11.4 BS option pricing formulae . . . . . . . . . . . . . . . . . . 11.5 Sensitivity parameters (Greeks)* . . . . . . . . . . . . . . . 11.5.1 Delta . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.2 Theta . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.3 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.4 Vega . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Probabilistic (martingale) interpretation of perfect hedging* 11.7 Black-Scholes analysis for dividend-paying stock . . . . . . 11.8 Time-dependent parameters . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

12 Options on futures contracts 12.1 The mechanics of futures markets . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Options on futures contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 American options in the BS model 13.1 Smooth pasting condition for American put . . . . . . . . . . . . . . . . . . . . . 13.2 Optimal stopping representation* . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 The American call with no dividends . . . . . . . . . . . . . . . . . . . . . . . . .

14 Exotic options 14.1 Digital options . . . . . . . . . . . . . . . . . . . 14.2 Pay-later options . . . . . . . . . . . . . . . . . . 14.3 Multi-stage options: compounds and choosers . . 14.3.1 Multi-stage options . . . . . . . . . . . . . 14.3.2 Compound options . . . . . . . . . . . . . 14.3.3 Chooser options . . . . . . . . . . . . . . 14.4 Barrier Options . . . . . . . . . . . . . . . . . . . 14.4.1 PDE approach to valuing barrier options

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

93 94 95 95 95 95 96 96 97

Part I

Introduction to derivative securities


1 Financial derivatives
A European derivative security (or contingent claim) is a nancial contract paying to its holder a random amount (called the payo of the claim) at some future time T (called the maturity time of the derivative). An American derivative delivers the payo at a random time T chosen by the holder of the contract. The payo is (typically) contingent on the value of some other underlying security (or securities), or on the level of some non-traded reference index. A basic example is a forward contract. The holder of such a contract agrees to buy an asset at some future time T for a xed price K (the delivery price) that is decided at initiation of the contract. Hence, the forward contract has a value (to the holder) at maturity of ST K, where ST is the underlying asset value at maturity. The origins of derivatives lie in medieval agreements between farmers and merchants to trade the farmers harvest at some future date, at a price set in advance. This allowed farmers to x the selling price of their crop, and reduced the risk of having to sell at a lower price than their cost of production, which might happen in a bumper harvest year. This is one motivation for the existence of derivatives: they give random payos which can be used to eliminate uncertainty from future asset price trades. The act of removing uncertainty in nance is called hedging. Consider a farmer whose unit cost of crop production is C. His prot on selling the crop would be ST C, where ST is the market crop price at harvest time. If the farmer were to sell (that is, take a short position in) a forward contract with delivery price K > C, at some time t < T , then his overall payo at T would be ST C (ST K) = K C > 0. The risk of the crop price being less than the cost of production has been removed. Of course, since derivatives have random payos, they can also be used to take risk by speculating on the future values of asset prices, and they are often a cheaper device for doing so than investing in the underlying asset. For example, another classical example of a derivative is a European call option on a stock, with payo (ST K)+ , where ST is the (random) stock price at time T and K 0 is a constant called the strike price of the option. This allows the holder of the call to take a positve payo if the stock price is above K, and the cost of acquiring a call option is usually only a fraction of the cost of buying the stock itself. This course will be about how to assign a value to a derivative at any time t T . This will involve modelling the randomness in the underlying asset price process S = (St )0tT . To do this, we will need the notion of a stochastic process on a ltered probability space. We shall see that the key to valuing derivatives is to attempt to use the underlying asset to remove the risk from selling (or buying) the derivatieve. That is, derivative valuation is via a hedging argument.

1.1

Underlying assets

Typical assets which are traded in nancial markets, and which can be the underlying assets for a derivative contract, include: shares (stocks) commodities (metals, oil, other physical products) currencies bonds (assets used as borrowing tools by govenments and companies) which pay xed amounts at regular intervals to the bond holder. An agent who holds an asset will be said to hold a long position in the asset, or to be long in the asset. An agent who has sold an asset will be said to hold a short position in the asset, or to be short in the asset. 5

For the most part in this course, we will focus on derivative securities which have a stock as underlying asset. The stock price will be a stochastic process denoted by S = (St )0tT on a probability space (, F, P). This means that for each t [0, T ], St , the value of the stock at time t, is a random variable.

1.2

Interest rates and time value of money

Let us measure time in some convenient units, say years. If an interest rate r is quoted per annum and with compounding frequency at time intervals t, this means that an amount A invested for a time period t will grow to A(1 + rt). If this is re-invested for another period t, the balance becomes A(1 + rt)2 , and so on. So after n periods, with t := nt, we have A(1 + rt)n = A(1 + rt/n)n . A continuously compounded interest rate corresponds to the limit n , or t 0. In this case, after time t an amount A will grow to lim A 1 + rt n
n

= Aert .

So an amount A invested at time zero for a time t will grow to an amount Aert , where r is the continuously compounded risk-free interest rate. We call the amount Aert the future value of A invested at time zero, and the factor ert is called an accumulation factor. By the same token, receiving an amount A at time t is equivalent to receiving Aert at time zero. We call Aert the present value of A received at time t (we say that A is discounted to the present) and the factor ert is called a discount factor. It is usually convenient (but nothing more) to assume that interest is continuously compounded. We do not need to assert that, in reality, interest is continuously compounded, in order to use a continuously compounded interest rate in all our analysis. If the interest is actually compounded m times a year at an interest rate of R per annum, then we can still use a continously compounded interest rate r simply by making the identication A 1+ R m
mt

= A exp(rt),

(1.1)

so that there is a one-to-one correspondence between the interest rate R (compounded m times per annum) and the continuously compounded interest rate r. In this course, we will nearly always use continuously compounded rates when considering continuous time models. A dierential version of the above arguments is as follows. In continuous time, we model the time evolution of cash in a bank account in terms of a riskless asset which we shall call a money market account, which is the value at time t > 0 of $1 invested at time zero and continuously earning interest which is reinvested. We shall denote the value of this asset at time t by Bt , which satises dBt = rBt dt, B0 = 1, (1.2) where r is the (assumed constant) interest rate. Then the value of the bank account at time t is given by Bt = ert , which we recognise as the familiar accumulation factor we have encountered above. For more advanced modelling we could assume that interest rates are time-varying (and possibly stochastic). In this case the money market account satises dBt = rt Bt dt, B0 = 1, (1.3)

where rt is the instantaneous (or short term) interest rate. We have allowed for this to be timevarying, and rt represents the interest rate in the time interval [t, t + dt). From (1.3) we see that
t

Bt = exp
0

ru du ,

(1.4)

and this is the accumulation factor in this case. This is the factor by which $1 invested at time zero grows to at time t, when the interest generated is continually reinvested. 6

1.3

Forwards and futures

A forward contract is a contract which obliges the holder to buy an underlying asset at some future time T (the maturity time) for a price K (the delivery price that is xed at contract initiation. Hence, at time T , when the stock price is ST , the contract is worth ST K (the payo of the forward) to the holder. This payo is shown in Figure 1. FORWARD CONTRACT PAYOFF T E TERMINAL ASSET PRICE S T K Figure 1: Forward contract payo as function of nal underlying asset price A futures contract is a rather specialised forward contract, traded on an organised exchange, and such that, if a contract is traded at some time t T , the delivery price is set to a special value Ft,T , called the futures price of the asset or the forward price of the asset, chosen so that the value of the futures contract at initiation (that is, at time t), is zero. 1.3.1 Valuation of forward contracts

In what follows we value forward contracts on a non-dividend paying stock, that is, an asset with price process S = (St )0tT that pays no income to its holder. Lemma 1.1. The value at time t T of a forward contract with delivery price K and maturity T , on an asset with price process S = (St )0tT , is ft,T f (t, St ; T ) f (t, St ; K, T ), given by ft,T = St K exp(r(T t)), 0 t T. (1.5)

Proof. This is a simple hedging argument which provides our rst example of a riskless hedging strategy. Start with zero wealth at time t T , and sell the contract at time t T for some price ft,T . Hedge this sale by purchasing the asset for price St . This requires borrowing of St ft,T . At time T , sell the asset for price K under the terms of the forward contract, and require that this is enough to pay back the loan. Hence we must have K = (St ft,T ) exp(r(T t)), and the result follows. Corollary 1.2. The forward price of the asset at time t T is given by Ft,T = St exp(r(T t)), 0 t T.

Proof. Set ft,T = 0 in Lemma 1.1 and then by denition we must have K = Ft,T .

1.4

Arbitrage

The simple argument above for valuing a forward contract is an example of valuation by the principle of no arbitrage. If the relationship in Lemma 1.1 is violated, then an elementary example of a riskless prot opportunity, called an arbitrage, ensues. Here is a formal denition of arbitrage. 7

Denition 1.3 (Arbitrage). Let X = (Xt )0tT denote the wealth process of a trading strategy. An arbitrage over [0, T ] is a strategy satisfying X0 = 0, P[XT 0] = 1 and P[XT > 0] > 0. So an arbitrage is guaranteed not to lose money and has a positive probability of making a prot. If the valuation formula (1.5) for the forward contract is violated, an immediate arbitrage opportunity occurs, as we now illustrate. Suppose ft,T > St K exp(r(T t)). Then one can short the forward contract and buy the stock, by borrowing St ft,T at time t. At maturity, one sells the stock for K under the terms of the forward contract and uses the proceeds to pay back the loan, yielding a prot of K (St ft,T ) exp(r(T t)) > 0. This is an arbitrage. A symmetrical argument applies if ft,T < St K exp(r(T t)) (and you should supply this). The principle of riskless hedging and no arbitrage will also apply, rather less trivially, to the valuation of options later in the course. An equivalent way of looking at no arbitrage is sometimes called the law of one price. Two portfolios which give the same payo at T should have the same value at time t T . Let us show how this applies to the valuation of a forward contract. Consider the following two portfolios at time t T : a long position in one forward contract, a long position in the stock plus a short cash position of K exp(r(T t)). At time T , these are both worth ST K, so their values at time t T must be equal, yielding ft,T = St K exp(r(T t)), as before. Notice that the second portfolio perfectly replicates (or perfectly hedges) the payo of the forward contract, meaning that it reproduces the payo ST K. Denote the position in the stock that is needed to perfectly hedge a forward contract f f by Ht . Then we have that Ht = 1 for alll t [0, T ], and note that
f Ht = fx (t, St ) = 1,

0 t T,

where f (t, x) := x Ker(T t) . This is a simple example of a delta hedging rule, in which one dierentiates the pricing function of the derivative with respect to the variable representing the underlying asset price, in order to get the hedging strategy. We will see a similar result when valuing options. 1.4.1 Forward on dividend-paying stock

The stock in the preceding analysis was assumed to pay no dividends. Now assume that the stock pays dividends as a continuous income stream with dividend yield q. This means that in the interval [t, t + dt), the income received by someone holding the stock will be qSt dt. Suppose that at time t [0, T ] an agent holds nt shares of stock. The income received in the next innitesimal time interval is qnt St dt. If this is immediately re-invested in shares, the holding in shares satises dnt = qnt dt. Hence, if the initial holding is n0 at time zero, we have nt = n0 exp(qt), In particular, we have nt = nT exp(q(T t), 0 t T. This means that in order to hold one share of stock at time T , one may buy exp(q(T t)) shares at t T and re-invest the dividends in the stock. If we use this to value a forward contract on the dividend-paying stock we arrive at the following. 8 0 t T.

Lemma 1.4. The value at time t T of a forward contract with delivery price K and maturity T , on a stock with price process S = (St )0tT paying dividends at a dividend yield q, is given by ft,T = St exp(q(T t)) K exp(r(T t)), 0 t T. (1.6) Proof. This is again a hedging argument. Start with zero wealth at time t T , and sell the contract at time t T for some price ft,T . Hedge this sale by purchasing exp(q(T t)) shares at price St . This requires borrowing of exp(q(T t))St ft,T . Re-invest all dividends immediately in the stock, so as to hold one share at time T . At time T , sell the asset for price K under the terms of the forward contract, and ensure that this is enough to pay back the loan. Hence we must have K = (St exp(q(T t)) ft,T ) exp(r(T t)), and the result follows. Corollary 1.5. The forward price of the dividend-paying asset at time t T is given by Ft,T = St exp((r q)(T t)), 0 t T.

Proof. Set ft,T = 0 in Lemma 1.4 and then by denition we must have K = Ft,T . Remark 1.6 (Forwards and futures on currencies). A foreign currency is treated as an asset which pays a dividend yield equal to the foreign interest rate rf . Hence, suppose S = (St )0tT is the exchange rate (the value in dollars of one unit of foreign currency), then a forward contract on the foreign currency has value at time t T given by ft,T = St exp(rf (T t)) K exp(r(T t)), where T is the maturity and K is the delivery price. 0 t T,

1.5

Options

An option is a contract that gives the holder the right but not the obligation to buy or sell an asset for some price that is dened in advance. The two most basic option types are a European call and a European put. A European call option on a stock is a contract that gives its holder the right (but not the obligation) to purchase the stock at some future time T (the maturity time) for a price K (the strike price or exercise price) that is xed at contract initiation. If S = (St )0tT denotes the underlying assets price process, the payo of a call option is (ST K)+ , as shown in Figure 2. CALL OPTION PAYOFF T K E ST

Figure 2: Call option payo

PUT OPTION PAYOFF T d d d d d K E ST

Figure 3: Put option payo

A European put option on a stock is a contract that entitles the holder to sell the underlying stock for a xed price K, the strike price, at a future time T . If S = (St )0tT denotes the underlying assets price process, the payo of a put option is (K ST )+ , as shown in Figure 3. The act of choosing to buy or sell the asset under the terms of the option contract is called exercising the option. Options which can be exercised any time before the maturity date T are called American options, whilst European options can only be exercised at T . Hence, an American call (respectively, put) option allows the holder to buy (respectively, sell) the underlying stock for price K at any time before maturity. Lemma 1.7 (Put-call parity). The European call and put prices c(t, St ) and p(t, St ) of options with the same strike K on a non-dividend paying traded stock with price St at time t [0, T ] are related by c(t, St ) p(t, St ) = St Ker(T t) , 0 t T. Proof. The payos of a call and put satisfy c(T, ST ) P (T, ST ) = (ST K)+ (K ST )+ = ST K, which shows the (obvious) fact that a long position in a call combined with a short position in a put is equivalent to a long position in a forward contract. Hence, their prices at t T must satisfy c(t, St ) p(t, St ) = ft,T = St Ker(T t) , 0 t T, where ft,T is the value of a forward contract at t T . Remark 1.8. The same argument applied to a dividend-paying stock yields c(t, St ) p(t, St ) = ft.T = St eq(T t) Ker(T t) , where q is the dividend yield. 1.5.1 Europen call and put price bounds 0 t T,

Put-call parity is a model-independent result. From it, we see that c(t, St ) f (t, St ). That is, a call option is always at least as valuable as a forward contract (an obvious fact).1 A call option gives its holder the right to buy the underlying stock, which means that its value can never be greater than that of the stock, so c(t, St ) St .2 From this we deduce model-independent bounds on a European call option price on a non-dividend-paying stock: St Ker(T t) c(t, St ) St ,
1 Equivalently,

0 t T.

comparing the payo of a call with that of one share plus a short position of Ker(T t) in cash, we obtain ST K (ST K)+ , and therefore c(t, St ) St Ker(T t) . + 2 Equivalently, comparing the payos of a share and a call, we have S + T = ST > (ST K) , and therefore c(t, St ) St .

10

These bounds are shown in Figure 4.


Bounds on Value of European Call 20

K=10, r=10%, sigma=25%, T=1year 15

10

call price, C

C=S 5

C=SK.exp(rT)

10

8 10 stock price, S

12

14

16

18

Figure 4: Bounds on European call option value In Figure 4 we have plotted the upper and lower bounds of a European call value (the dotted graphs) as well as the Black-Scholes value of the call (the solid graph). We will show in these lectures how this function arises. If the above call option pricing bounds are violated, then arbitrage opportunities arise. For example, if If c(t, St ) < St Ker(T t) one should buy the call and short the stock, which gives a cash amount St c(t, St ) to be invested in a bank account. At time T , we have two possibilities: 1. ST K, in which case the call is not exercised. The arbitrageur buys the stock in the market to close out the short position, using the proceeds from the bank account, which stand at (St c(t, St ))er(T t) prior to buying the stock. This leaves a prot of (St c(t, St ))er(T t) ST > (St c(t, St ))er(T t) K = er(T t) (St Ker(T t) c(t, St )) > 0. 2. ST > K, in which case the call is exercised. The arbitrageur buys the stock for K to close out the short position, using the proceeds from the bank account, which stand at (St c(t, St ))er(T t) prior to buying the stock. This leaves a prot of (St c(t, St ))er(T t) K = er(T t) (St Ker(T t) c(t, St )) > 0.

11

We can derive similar model-independent bounds on a put option price. A put option gives its holder the right to receive an amount K for the stock, so the most it can be worth at maturity is K (if the nal stock price is ST = 0). Hence, its current value can never be greater than the present value of K, so that p(t, St ) Ker(T t) , 0 t T.

Similarly, for the value of a put at expiry we have P (T, ST ) = (K ST )+ K ST . That is, a put option is at least as valuable as a short position in a forward contract. Hence we have the lower bound p(t, St ) Ker(T t) St , 0 t T. The results in this section are model-independent. To say more about option values we need a model for the dynamic evolution of a stock price. One of the simplest continuous-time models is the Black-Scholes-Merton (BSM) model, which we shall describe later, and one of the simplest discrete-time models is the binomial model, which we shall also see shortly. 1.5.2 Combinations of options

Options can be combined to give a variety of payos for dierent hedging purposes, or for speculation on movements in the underlying asset price, and they are often used to do so because the option premiums are relatively small in some cases, thus proving very attractive to gamblers. A straddle is a call and a put with the same strike and maturity. The payo of a long position in a straddle is K ST , ST < K (ST K)+ + (K ST )+ = (1.7) ST K, ST K, This payo is illustrated in Figure 5.

LONG STRADDLE PAYOFF T d d d d d d d d d ES d T K

Figure 5: Straddle payo

1.6

Some history*

As remarked earlier, the origins of derivatives lie in medieval agreements between farmers and merchants to insure farmers against low crop prices. In the 1860s the Chicago Board of Trade was founded to trade commodity futures (contracts that set trading prices of commodities in advance), formalising the act of hedging against future price changes of important products. 12

Options were rst valued by Bachelier in 1900 in his PhD thesis, a translation of which can be found in the book by Davis and Etheridge [3]. Bachelier introduced a stochastic process now known as Brownian motion (BM) to model stock price movements in continuous time. Bachelier did this before a rigorous treatment of BM was available in mathematics. His work was decades ahead of its time, both mathematically and economically speaking, and was therefore not given the credit it deserved at the time. In the decades that followed, mathematicians and physicists (Einstein, Wiener, Lvy, Kolmogorov, Feller to name but a few) developed a rigorous theory of e Brownian motion, and It developed a rigorous theory of stochastic integration with respect to o Brownian motion, leading to the notion of a stochastic calculus, which we shall encounter. In the 1960s, economists re-discovered Bacheliers work, and this was one of the ingredients that led to the modern theory of option valuation. In the early 1970s a combination of forces existed which made markets more risky, derivatives more prominent, and their valuation and trading possible. The system of xed exchange rates that existed before 1970 collapsed, and the Middle East oil crises caused a big increase in the volatility of nancial prices. This increased the demand for risk management products such as options. At the same time Black and Scholes [2] and Merton [11] (BSM) published their seminal work on how to price options, based on managing the risk associated with selling such an asset. This breakthrough, for which Scholes and Merton received a Nobel Prize (Black having passed away in 1995) coincided with the opening of the Chicago Board Options Exchange (CBOE), giving individuals both a means to value option contracts and a marketplace where they could prot from this knowledge of the fair price. Following on from this, the nancial deregulation of the 1980s, allied to technological developments which made it possible to trade securities globally and to run large portfolios of complex products, caused a huge increase in risky trading across international borders. This opened up yet more risks across currencies, interest rates and equities, and nancial institutions very skillfully (or opportunistically, perhaps) created markets to trade derivatives and to sell these products to customers. This has led to the massive increase in derivative trading that we now see, with the volume of derivative contracts traded now dwarng that in the associated underlying assets. The papers of Black-Scholes [2] and Merton [11] attracted mathematicians to the subject, and led to a mathematically rigorous approach to valuing derivatives, based on probability and martingale theory, inspired by Harrison and Pliska [7]. This led directly to modern nancial mathematics, and has also contributed to the advent of derivatives written on a plethora of underlying stochastic reference entities, such as interest rates, weather indices, default events, as well as on more traditional traded underlying securities such as stocks, currencies and interest rates.

Part II

The binomial model


2 Probability spaces
We shall use a rigorous approach to probability theory, based on measure theory, to model the evolution of a stock price. The reason for taking this path is that it enables us to treat both discrete and continuous random variables with the same foundations, and enables a general denition of the concept of conditional expectation, and hence of a stochastic process known as a martingale. A good reference for this material is Shreve [13], which applies these ideas to nance, while Durrett [4], Jacod and Protter [9], and Williams [15] are good on the pure probability aspects. The rst building block in modelling a probabilistic situation is a set, , called the sample space, which represents the possible outcomes of a random experiment. The number of elements in might be nite or innite, but we shall start o with the simpler, nite case. The elements of are called sample points.

13

2.1

Finite probability spaces

Let be a set with nitely many elements. Let F be the set of all subsets of .3 Denition 2.1. A probability measure P is a function mapping F into [0, 1] with the following properties: 1. P() = 1. 2. If A1 , A2 , . . . is a sequence of disjoint sets in F, then

P
k=1

Ak

=
k=1

P(Ak ).

The interpretation is that, for a set A F, there is a probability in [0, 1] that the outcome of a random experiment will lie in the set A. We think of P(A) as this probability. The set A F is called an event. For A F we dene P(A) := P(). (2.1)
A

We can dene P(A) in this way because A has only nitely many elements, and so only nitely many elements appear in the sum on the right-hand side of the above equation. Example 2.2 (3-coin toss sample space). Let be the nite set = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}, the set of all possible outcomes of three coin tosses, where H stands for a head and T for a tail. Each sample point is a sequence of length three, and denote the tth element of by t , t {1, 2, 3}. Hence a general sample point is written as = (1 2 3 ). For example, when = HTH, 1 = H, 2 = T, 3 = H. Let F be the collection of all subsets of (F is a -algebra). Suppose the probability of H on each coin toss is p (0, 1), then the probability of T is q := 1 p. For each = (1 2 3 ) we dene P() := pNumber of H in q Number of T in . Then for each A F we dene P(A) according to (2.1). Denition 2.3. Let be a nonempty set. A -algebra is a collection, G, of subsets of with the following three properties: 1. G, 2. if A G then Ac G, 3. if A1 , A2 , . . . is a sequence of sets in G, then Ak is also in G. k=1 We interpret -algebras as a record of information. We develop some intuition shortly. Given a set , a -algebra F, and a probability measure P, the pair (, F) is called a measurable space, and the triple (, F, P) is called a probability space. Denition 2.4. Let be a nonempty nite set. A ltration is a sequence of -algebras F0 , F1 , . . . , FT , such that each -algebra in the sequence contains all the sets contained by the previous -algebra. Let T = {0, 1, . . . , T }. A probability space (, F, P) equipped with a ltration F = (Ft )tT , with each Ft F, is called a ltered probability space. Remark 2.5. When dealing with ltrations we shall usually assume that F0 is trivial, that is F0 = {, }, so that A F0 P(A) {0, 1}.
3F

is a -algebra (see later).

14

2.2

Random Variables

Denition 2.6. Let be a nonempty nite set and let F be the -algebra of all subsets of . A random variable X is a measurable function mapping into the real line R, that is, the set X 1 (A) = { : X() A} F for every Borel set A R. We often denote a random variable X by X(), where , to reinforce the fact that the random variable is a mapping from the sample space to the space of real numbers. Since a random variable maps into R, we can look at the preimage under the random variable of sets in R, that is, sets of the form { : X() A R}, which is, of course, a subset of . The complete list of subsets of that you can get as preimages (under X) of sets in R, turns out to be a -algebra, whose content is exactly the information obtained by observing X, and is called the -algebra generated by the random variable X. Denition 2.7. Let be a nonempty nite set and let F be the -algebra of all subsets of . Let X be a random variable on (, F). The -algebra (X) generated by X is dened to be the collection of all sets of the form { : X() A} where A is a subset of R. Let G be a sub--algebra of F. We say that X is G-measurable if every set in (X) is also in G. Notation 2.8. We usually write {X A} for { : X() A}. Remark 2.9. An equivalent characterisation of measurability is that X : R is G-measurable if the event {X x} is an element of G for all x R. Denition 2.10. Let be a nonempty nite set and let F be the -algebra of all subsets of . Let P be a probability measure on (, F) and let X be a random variable on (, F). Given any set A R, we dene the induced measure of the set A to be X (A) := P{ : X() A} P{X A}. So the induced measure of a set A tells us the probability that X takes a value in A. By the distribution of a random variable X, we mean any of the several ways of characterizing X . A common way of doing this is to give the cumulative distribution function FX (x) of X, dened by FX (x) := P{ : X() x} P{X x}. If X is discrete, X places a mass fX (xi ) = P{X = xi } = X {xi }, i = 1, . . . , n at each possible value xi of X. We call fX (xi ), i = 1 . . . , n the probability mass function of X, and knowing this for all possible values of X is equivalent to knowing the cumulative distribution function. (Later we shall encounter continuous random variables which have probability density functions, in which case the induced measure of a set A R is the integral of the density over the set A.) A simple example of a random variable is the following. Denition 2.11. The indicator function IA () of an event A F is IA () := 1, A, 0, A. /

Remark 2.12. We make a clear distinction between random variables and their distributions. A random variable is a mapping from to R, nothing more, and has an existence quite apart from any discussion of probabilities. The distribution of a random variable is a measure X on R, i.e. a way of assigning probabilities to sets in R. It depends on the random variable X and on the probability measure P we use on . If we change P, we change the distribution of the random variable X, but not the random variable itself. Thus, a random variable can have more than one distribution (e.g. an objective or market distribution, and a risk-neutral distribution, and we shall see such constructs in nance). In a similar vein, two dierent random variables can have the same distribution. 15

Denition 2.13. Let be a nonempty nite set and let F be the -algebra of all subsets of . Let P be a probability measure on (, F) and let X be a random variable on (, F). The expected value or expectation of X is dened to be E[X] :=

X()P().

(2.2)

Notice that this is a sum over the sample space . If is a nite set, then X can take only nitely many values x1 , . . . , xK . In this case we can partition into the subsets {X = x1 }( { : X() = x1 }, . . . , {X() = xK }), which allows us to write (2.2) as E[X] :=
K

X()P()

=
k=1 {X=xk } K

X()P()

=
k=1 K

xk
{X=xk }

P()

=
k=1 K

xk P{X = xk } xk X {xk }.
k=1

(the familiar form)

Thus, although expectation is dened as a sum over the sample space , we can also write it as a sum over R. Remark 2.14. When the sample space is innite and, in particular, uncountable, the summation in the denition of expectation is replaced by an integral. In general, the integral over an abstract measurable space (, F) with respect to a probability measure P is a so-called Lebesgue integral (which has all the linearity and comparison properties we associate with ordinary integrals). The expectation E[X] becomes the Lebesgue integral over of X with respect to P, written as E[X] =

X dP

X() dP() =
R

x dX (x).

(2.3)

When X takes on a continuum of values and has a density fX , then dX (x) = fX (x) dx and the integral on the right-hand side if (2.3) reduces to the familiar Riemann integral R xfX (x) dx. We do not delve into the construction of Lebesgue integrals over abstract spaces here. Merely think of the RHS of (2.3) as an alternative notation for the sum X()P(). See Williams [15] or Shreve [14] for more details on Lebesgue integration. It is easy to see (exercise) that, for an indicator function IA of an event A F, the denition of expectation leads to E[IA ] = P(A). The expectation operator is linear, so for two random variables X, Y and constants a, b, we have E[aX + bY ] = aE[X] + bE[Y ]. The variance of X is the expected value of (X E[X])2 : var(X) := (X() (E[X])2 )P() = E[(X E[X])2 [= E[X 2 ] (E[X])2 .

where the last equality comes from expanding the square in the denition of variance and applying the expectation operator to each term.

16

The moment generating function M (a) of the random variable X is M (a) := E[exp(aX)]. The characteristic function (a) of X is dened by (a) := E[exp(iaX)], i= 1. (2.4)

2.3

Stochastic processes

Denition 2.15 (Discrete-time stochastic process). let T = {0} N = {0, 1, 2, . . .} be a discrete time set. A discrete-time stochastic process (Xt )tT is a sequence of random variables. Thus, a d-dimensional stochastic process is a parametrised collection of random variables (Xt )tT dened on a probability space (, F, P) and assuming values in Rd for d N. The parameter space T is thought of as a time index set. For a nite discrete-time T -period model, we would have T = {0, 1 . . . , T }. For an innite horizon model we would have T = {0} N = {0, 1, . . . , }.4 There are a number of ways of thinking about a stochastic process. First, for each xed t T we have a random variable Xt : Rd , that is, the function Xt (), .

On the other hand, if we x we can consider the function : [0, ) Rd , i.e. the map t Xt (), t T,

given by (t) = Xt (), t T, which is called a path (or sample path) of X. We think of t as time and of each as the outcome of an experiment (such as trading in a nancial market). With this in mind, Xt () would represent the result at time t if the outcome of the experiment is (or the price at time t of an asset being traded in a market, in the state ). Sometimes we write X(t, ) Xt (). Thus we may also regard the process as a function of two variables: (t, ) X(t, ), (t, ) T ,

from T into Rd . This is often a natural point of view in stochastic analysis, where it is crucial to have X(t, ) jointly measurable in (t, ). Now, we may identify each with the sample path Xt (), i.e. with the function : [0, ) Rd given by (t) = Xt (), t [0, ) (the so-called coordinate mapping process). In this way we may regard as a subset of the space (Rd )[0,) of all Rd -valued functions on [0, ). Denition 2.16. Let T = {0}N. A stochastic process (Xt )tT on a ltered space (, F, (Ft )tT ) is adapted to the ltration (Ft )tT if Xt is Ft -measurable for each t T.

The binomial stock price process

We will use a stochastic process S on a ltered probability space (, F, F := (Ft )tT , P), where T is some time index set, to represent the evolution of a stock price. We will use a discrete-time framework. For a nite horizon, T -period model, the time set will be T = {0, 1, 2, . . . T }.
4 If

we use a continuous-time model, T could be the half-line: T = [0, ), or a nite interval: T = [0, T ].

17

For each t T, St will be a one-dimensional random variable on the measurable space (, F). We assume also that for each t T, St is Ft -measurable, so that S is an F-adapted process. This encapsulates the idea that the information at time t T, represented by Ft , is sucient information to know the values of Ss for all s t. The sample space will be nite, say = { 1 , . . . , N }, and the probability measure P, called the physical measure (or objective measure, or the market measure), will be such that P( n ) = pn > 0, for n = 1, . . . , N . We shall assume FT = F and also F0 = {, }. In this setting, a random variable Y on (, F, P) is a vector in RN with components Y ( n ), for n = 1, . . . , N . That is Y () (Y 1 , . . . , Y N ) = (Y ( 1 ), . . . , Y ( N )). We introduce a riskless asset (or cash account, or money market account, or bond) with price 0 0 0 process S 0 . It is riskless because its return is always greater than 1: (St+1 St )/St 1. Specically, its price will be assumed to evolve according to
0 0 St+1 = (1 + rt )St ,

t = 0, 1, . . . , T 1.

where (rt )T 1 is the interest rate process, an F-adapted process satisfying rt 0 almost surely t=0 for all t {0, 1, . . . , T 1}. Let us take this process to be constant, rt = r > 0 for all t 0 {0, 1, . . . , T 1}. We shall also assume S0 = 1, so that with constant interest rate r, we have
0 St = (1 + r)t ,

t = 0, 1, . . . , T.

0 Hence, St represents the value of at time t of one unit of currency (say $1) invested at time zero. So the return on the riskless asset is 1 + r 1, and this encapsulates the risk-free nature of this asset. In contrast, the return on the stock can be greater or less than 1, encapsulating that it is a risky asset.

Example 3.1 (One-period binomial model). Let be the nite set = {H, T}, the set of outcomes of a single coin toss. Take a 1-period model, with T = {0, 1}. The risky stock price process is (St )t{0,1} , and S1 () takes on two possible values, S1 (H) or S1 (T), with values given by (see Figure 6) S1 () = where u > 1 + r > d > 0 are constants. S0 u p  S0 d 1p d d S0 d Figure 6: One-period binomial process for stock price. We have associated a probability p (0, 1) with an upward stock price move. S0 u, if = H S0 d, if = T,

Example 3.2 (Three-period binomial model). Let be the nite set = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}, the set of all possible outcomes of three coin tosses. The sample points are of the form = 1 2 3 , with t , t = 1, 2, 3 representing the outcome of the tth toss. 18

In this 3-period model, with T = {0, 1, 2, 3}, the risky stock price process is (St )3 , and t=0 St () = St (1 . . . t ) is the stock price after t tosses, with evolution given by (see Figure 7) St+1 () = St u, if t+1 = H, St d, if t+1 = T, t = 0, 1, 2,

where u > 1 + r > d > 0 are constants. We also write St+1 (1 . . . t+1 ) = St (1 . . . t )u, if t+1 = H St (1 . . . t )d, if t+1 = T, t = 0, 1, 2,

whenever we wish to emphasise that St actually depends only on the outcome of the rst t coin tosses. St u p  St d 1p d d St d Figure 7: Binomial process for stock price. We have associated a probability p (0, 1) with an upward stock price move.

With Ft representing the -algebra determined by the rst t tosses, we take F = F3 . There are a total of 8 sample points . We shall sometimes label these as the set = { 1 , . . . , 8 }. So 1 = HHH, 2 = HHT, and so on. Example 3.3 (T -period binomial model). As Example 3.2, but now let be the set of all outcomes of T coin tosses, so T = {0, 1, . . . , T }, and each is of the form = (1 2 . . . T ), with each t {H, T}, for each t {1, . . . , T }. (j) It is easy to see that at time t T the possible stock prices are St , j = 0, 1, . . . , t, given by St
(j)

= S0 uj dtj ,

j = 0, 1, . . . , t,

t T.

3.1

Filtration as information ow in the binomial model*

A ltration F = (F)tT represents information ow as time marches forward. Let us explain this idea and the associated notion of measurability, with reference to the three-period binomial model of a stock price given in Example 3.2. Let be the nite set of all outcomes of 3 coin tosses: = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. Each sample point is a sequence (1 2 3 ), with t , t = 1, 2, 3 representing the outcome of the of the tth coin toss. Suppose the coin has probability p (0, 1) for H and q := 1 p for T. With each coin toss independent, we have 3 p , for = HHH, 2 p q, for {HHT, HTH, THH}, P() = 2 pq , for {HTT, THT, TTH}, 3 p , for = TTT. We can easily write down all the stock prices in the tree as:

19

3 1 0S 111111111111 0000000000003 (HHH) = u S0 1 0 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 S2 (HH) = u2 S0 111111111111 000000000000 111111111111 000000000000 1 0 1111 0000 111111111111 000000000000 1 0 1111 0000 111111111111 000000000000 1111 0000 111111111111 000000000000 1111 0000 111111111111 000000000000 S1 (H) = uS0 1111 0000 S 111111111111 0000000000003 (HHT) = S3 (HTH) = S3 (THH) = u2 dS0 1111 0000 1 0 1 0 111111111111 000000000000 111111111 000000000 1 0 1 0 111111111111 000000000000 111111111 000000000 111111111111 000000000000 111111111 000000000 111111111111 000000000000 111111111 000000000 111111111111 000000000000 111111111 000000000 111111111111 000000000000 S2 (TH) = udS0 111111111 000000000= 1 0 1 0 S2 (HT) S0 111111111111 000000000000 111111111 000000000 1 0 1 0 111111111111 000000000000 111111111 000000000 111111111111 000000000000 111111111 000000000 111111111111 000000000000 111111111 000000000 111111111111 000000000000 111111111 000000000 1 0 111111111111 0000000000003 (HTT) = S3 (THT) = S3 (TTH) = ud2 S0 1 0 111111111 000000000S 1111 0000 1 111111111111 000000000000 1 0 S1 (T) = dS0 0 1111 0000 111111111111 000000000000 1111 0000 111111111111 000000000000 1111 0000 111111111111 000000000000 1111 0000 111111111111 000000000000 1 0 1111 0000 111111111111 000000000000 1 0 111111111111 000000000000 S2 (TT) = d2 S0 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 1 0S 111111111111 000000000000 3 (TTT) = d3 S0 1 0

Dene the following two subsets of : AH AT = = {HHH, HHT, HTH, HTT}, {THH, THT, TTH, TTT},

corresponding to the events that the rst coin toss results in H and T respectively. Using the denition P(A) := P().
A

we nd P(AH ) = P{HHH, HHT, HTH, HTT} = P{H and P(AT ) = P{THH, THT, TTH, TTT} = P{T precisely in accordance with intuition. Here are two -algebras of subsets of the set . F0 = {, }, F1 = {, , AH , AT }. As an easy exercise, you can verify that both the above collections of sets are indeed -algebras. Let us shed some light on the sense in which -algebras are a record of information. Suppose, after the three coin tosses, that you are not told the outcome, but that you are told, for each set in F1 , whether or not the outcome is in that set. For instance, you might be told that the outcome is not in AH , but is in AT (you would also be told that the outcome is not in but is in , which is obvious). In eect, you have been told that the rst toss was a T, and nothing more. In this sense the -algebra F1 contains the information of the rst toss or the information up to time 1. The -algebra F0 corresponding to information at time zero (before the coin has been tossed, so no information on is available) is composed of all sets A such that F0 is indeed a -algebra, and such that one can answer the question is A?, given that one has no information on the coin tosses. Hence all we can say is that and , and so / F0 = {, }. At time 1, the coin has been tossed once. Then one knows that either 1 = H or that 1 = T. In this case one can answer the following questions. Is ? No. 20 on rst toss} = q, on rst toss} = p,

Is ? Yes. Is AH ? Yes, if 1 = H, otherwise no. Is AT ? Yes, if 1 = T, otherwise no. One cannot answer a question such as: is {HHH, HHT}? The -algebra F1 corresponding to information at time 1 is composed of all the sets A such that F1 is indeed a -algebra, and such that one can answer the question: is A?, given that one has information on the outcome of the rst coin toss. This is why we must have F1 = {, AH , AT , }. Dene AHH ATH = = {HHH, HHT}, {THH, THT}, AHT ATT = {HTH, HTT}, = {TTH, TTT},

corresponding to the events that the rst two coin tosses result in HH, HT, TH and TT respectively. Consider the collection of sets F2 = {, , AHH , AHT , ATH , ATT , plus all unions of these}. Then F2 can be written as (check) F2 = {, , AHH , AHT , ATH , ATT , AH , AT , AHH ATH , AHH ATT , AHT ATH , AHT ATT , Ac , Ac , Ac , Ac }. HH HT TH TT Then F2 is indeed a -algebra (a tedious and lengthy verication) which contains the information of the rst two tosses or the information up to time 2. This is because, if you know the outcome of the rst two coin tosses, you can say whether the outcome of all three tosses satises A for each A F2 . Similarly, F3 F, the set of all subsets of , contains full information about the outcome of all three tosses, whilst the trivial -algebra F0 contains no information. Knowing whether the outcome of the three tosses is in (it is not) and whether it is in (it is) tells you nothing about . The sequence of -algebras F0 , F1 , F2 , F3 is a ltration. Let us show that the stock price process (St )tT is indeed F-adapted. That is, the random variable St is known after t coin tosses, equivalently, St is Ft -measurable, for each t T. First, S0 must be a deterministic constant: S0 () = a R, ,

since one has no information on the outcome of the coin tosses at time zero. Then, sets of the form {S0 A R} are clearly either in or , so S0 is F0 -measurable. The random variable S1 must be of the form S1 () = a R, if AH , b R, if AT ,

since the only information available at time 1 is whether 1 = H or 1 = T, and we notice that S1 is indeed of the above form. Continuing to argue in this fashion, we nd that at each time t T, the event { : St () A R} = {St A}, is in Ft . The stochastic process S = (St )tT is said to be adapted to the ltration F = (Ft )tT , as in Denition 2.16.

21

Now suppose S0 = 4, u = 2 and d = 1 . Then S0 , S1 , S2 and S3 are all random variables, 2 denoted by St () for t = 0, 1, 2, 3 and . We may calculate the values of S2 () for all , as S2 (HHH) S2 (HTH) S2 (TTH) = = = S2 (HHT) = 16, S2 (HTT) = S2 (THH) = S2 (THT) = 4, S2 (TTT) = 1.

Now consider the preimage under the random variable S2 of certain sets in R. Specically, consider the interval [4, 29]. The preimage under S2 of this interval is { : S2 () [4, 29]}. We can characterise the above subset of in terms of one of the sets given earlier in the list of sets in F2 . We have: { : S2 () [4, 29]} = Ac . TT Suppose we list, in as minimal a fashion as possible, the subsets of that we can get as preimages under S2 of sets in R, along with sets which can be built by taking unions of these, then this collection of sets turns out to be a -algebra, the -algebra generated by the random variable S2 , denoted (S2 ). Now, if AHH , then S2 () = 16. If AHT ATH , then S2 () = 4. If ATT , then S2 () = 1. Hence (S2 ) is composed of {, , AHH , AHT ATH , ATT }, plus all relevant unions and complements. Using the identities AHH (AHT ATH ) AHH ATT (AHT ATH ) ATT we obtain (S2 ) = {, , AHH , AHT ATH , ATT , AHH ATT , Ac , Ac }. HH TT The information content of the -algebra (S2 ) is exactly the information learned by observing S2 . So, suppose the coin is tossed three times and you do not know the outcome , but you are told, for each set in (S2 ), whether is in the set. For instance, you might be told that is not in AHH , is in AHT ATH , and is not in ATT . Then you know that in the rst two throws there was a head and a tail, but you are not told in which order they occurred. This is the same information you would have got by being told that the value of S2 () is 4. Note that F2 contains all the sets which are in (S2 ), and even more. In other words, the information in the rst two throws is greater than the information in S2 . In particular, if you see the rst two tosses you can distinguish AHT from ATH , but you cannot make this distinction from knowing the value of S2 alone. = Ac , TT = (AHT ATH )c , = Ac , HH

Conditional expectation and martingales

We review independence in a nite probability space (, F, P). Many of the denitions as written here extend to general probability spaces. Denition 4.1 (Independence of sets). Two sets A F and B F are independent if P(A B) = P(A)P(B). To see that this is a correct denition, suppose that a random experiment is conducted, and is the outcome. The probability that A is P(A). Suppose you are not told , but you are told that B. Conditional on this information, the probability that A is P(A|B) := P(A B) . P(B)

22

The sets A and B are independent if and only if this conditional probability is the unconditional probability P(A), i.e. knowing that B does not change the probability you assign to A. This discussion is symmetric with respect to A and B; if A and B are independent and you know that A, the conditional probability you assign to B is still the unconditional probability P(B). Note that whether two sets are independent depends on the probability measure P. Denition 4.2 (Independence of -algebras). Let G and H be sub--algebras of F. We say that G and H are independent if every set in G is independent of every set in H, i.e. P(A B) = P(A)P(B), for every A G, B H.

Denition 4.3 (Independence of random variables). Two random variables X and Y are independent if the -algebras they generate, (X) and (Y ), are independent. The above denition says that for independent random variables X and Y , every set dened in terms of X is independent of every set dened in terms of Y . Suppose X and Y are independent random variables. The measure induced by X on R is X (A) := P{X A}, for A R. Similarly, the measure induced by Y is Y (B) := P{Y B}, for B R. The pair (X, Y ) takes values in the plane R2 , and we dene the measure induced by the pair (X, Y ) as X,Y (C) := P{(X, Y ) C}, C R2 . In particular, C could be a rectangle, i.e. a set of the form A B, where A R and B R. In this case {(X, Y ) C} = {(X, Y ) A B} = {X A} {Y B}, and X and Y are independent if and only if X,Y (A B) = P({(X A} {Y B}) = P{X A}P{Y B} = X (A)Y (B). In other words, for independent random variables X and Y , the joint distribution represented by X,Y factorises into the product of the marginal densities represented by the measures X and Y . Theorem 4.4. Suppose X and Y are independent random variables. Let g and h be functions from R to R. Then g(X) and h(Y ) are also independent random variables. Proof. Put W = g(X) and Z = h(Y ). We must consider sets in (W ) and (Z). But a typical set in (W ) is of the form { : W () A} = { : g(X()) A}, which is dened in terms of the random variable X, and is therefore in (X). So, every set in (W ) is also in (X). Similarly every set in (Z) is also in (Y ). Since every set in (X) is independent of every set in (Y ), we conclude that every set in (W ) is independent of every set in (Z). Denition 4.5. Let X1 , X2 , . . . be a sequence of random variables. We say that these random variables are independent if for every sequence of sets A1 (X1 ), A2 (X2 ), . . ., and for every positive integer n, P(A1 A2 . . . An ) = P(A1 )P(A2 ) . . . P(An ). Theorem 4.6. If two random variables X and Y are independent, and if g and h are functions from R to R, then E[g(X)h(Y )] = E[g(X)] E[h(Y )], provided all the expectations are dened. 23

Proof. Note that by Theorem 4.4 it is enough to prove the result for g(x) = x and h(y) = y. We prove this only in a nite probability space when X, Y can take on only nitely many values X = xi , i = 1, . . . , K and Y = yj , j = 1, . . . , L. We use the fact that in this case, the expectation K of X has the familiar form E[X] = i=1 xi P{X = xi }. So we have
K L

E[XY ]

=
i=1 j=1 K L

xi yj P({X = xi } {Y = yj })

=
i=1 j=1 K

xi yj P{X = xi }P{y = yj }
L

=
i=1

xi P{X = xi }
j=1

yj P{y = yj }

= E[X] E[Y ].

Remark 4.7. For general probability spaces, the above theorem in proved using the Lebesgue integral representation of expectation, and an argument which Shreve [14] (Section 1.5) calls the standard machine. Let g(x) = IA (x) and h(y) = IB (y) be indicator functions. Then the equation we are trying to prove becomes P({X A} {Y B}) = P{X A}P{Y B}, which is true because X and Y are independent. Now this is extended to simple functions (linear combinations of indicator functions) by linearity of expectation. Sequences of such functions can always be constructed that converge to general functions g and h, and then an integral convergence theorem, the Monotone Convergence Theorem,5 gives the result. The covariance of two random variables X and Y is cov(X, Y ) := E[(X E[X])(Y E[Y ])] = E[XY ] E[X] E[Y ], so var(X) = cov(X, X). According to Theorem 4.6, two independent random variables have zero covariance (though the opposite is not necessarily true!). For independent random variables, the variance of their sum is the sum of their variances. Indeed, for any two random variables X and Y , and Z = X + Y , then var(Z) = var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y ), so that for independent X and Y , var(X + Y ) = var(X) + var(Y ). This argument extends to any nite number of random variables. If we are given independent random variables X1 , X2 , . . . , Xn , then var(X1 + . . . + Xn ) = var(X1 ) + . . . + var(Xn ). Example 4.8. Toss a coin twice, so = {HH, HT, TH, TT}, with probability p (0, 1) for H and probability q = 1 p for T on each toss. Let A = {HH, HT} and B = {HT, TH}. We
5 The Monotone Convergence Theorem is as follows. Let X , n = 1, 2, . . . be a sequence of random variables n converging almost surely to a random variable X (that is, P{Xn = X} 1 as n ). Assume that

0 X1 X2 . . . , Then Z X dP = lim
n

almost surely. Z Xn dP,

or equivalently E[X] = lim E[Xn ].


n

24

have P(A) = p2 + pq = p, P(B) = pq + qp = 2pq, and P(A B) = pq. The sets A and B are independent if and only if 2p2 q = pq, that is, if and only if p = 1 . 2 Let G = F1 be the -algebra determined by the rst toss and H be the -algebra determined by the second toss. Then, writing AH := {HH, HT} and AT := {TH, TT}, we have G H = {, , AH , AT } = {, , {HH, TH}, {HT, TT}}.

It is easy to see that these two -algebras are independent. For example, if we choose AH from G and {HH, TH} from H, we nd P(AH )P{HH, TH} P(AH {HH, TH}) = p(p2 + pq) = p2 , = P{HH} = p2 .

This will be true no matter which sets we choose from G and H. This captures the notion that the coin tosses are independent of each other.

4.1

Conditional expectation

The conditional expectation E[X|G] of a random variable X given the -algebra G is, as we shall see, a random variable that expresses mathematically the best estimate of X given the information represented by G. To motivate this idea, we will use our familiar example of a binomial model. Recall the three-period binomial model of Example 3.2 analysed in Section 3.1. We say that a set A is determined by the rst t coin tosses if, knowing only the outcome of the rst t tosses, we can decide whether the outcome of all the tosses is in A. We denote the collection of sets determined by the rst t tosses by Ft (and this is a -algebra). The random variable St is in fact Ft -measurable for each t T. In Section 3.1 we encountered some of the Ft -algebras for the case T = {0, 1, 2, 3}. Denition 4.9 (Information carried by a random variable). Let X be a random variable on . We say that a set A is determined by the random variable X if, knowing only the value X() of the random variable, we can decide whether or not A. Another way of saying this is that for every x R, either X 1 (x) A or X 1 (x) A = . The collection of subsets of determined by X is a -algebra, called the -algebra generated by X, and denoted by (X). If the random variable X takes nitely many dierent values, then (X) is generated by the collection of sets { : X() = xi }, where xi , (i = 1, . . . , K) are the possible values of X. We can write this collection as {X 1 (xi )} or {X 1 (X()| }. These sets are called the atoms of the -algebra (X). In general, if X is a random variable R then (X) is given by (X) = {X 1 (B); B B(R)}, where B denotes the Borel -algebra on R. Section 3.1 gave an example from the 3 coin-toss experiment of the sets determined by the stock price at time 2, S2 , which form the -algebra generated by S2 . This is given by (S2 ) = {, , AHH , ATT , AHT ATH , plus complements and unions}, where AHH = {HHH, HHT} ATT = {TTH, TTT} AHT ATH = { : S2 () = u2 S0 }, = { : S2 () = d2 S0 }, = { : S2 () = udS0 }.

25

4.1.1

Conditional expectation in the three-period binomial model

To talk about conditional expectation, we need to introduce a probability measure on our cointoss sample space . As usual, we do this by dening p (0, 1) as the probability of H on any throw, and q = 1 p as the probability of T. For a set A , the indicator function of the set A is IA () dened in the usual way, and E[IA X] =
A

X dP =
A

X()P().

We can think of E[IA X] as a partial average of X over the set A. We give an example of a conditional expectation in the 3-period binomial model, to motivate the general denition which follows. Let us estimate S1 , given S2 . Denote this estimate by E[S1 |S2 ]. We expect the properties of E[S1 |S2 ] to be: E[S1 |S2 ] should depend on , that is, it is a random variable: E[S1 |S2 ] = E[S1 |S2 ()] = E[S1 |S2 ](); If the value of S2 is known then the value of E[S1 |S2 ] should also be known. In particular, if = HHH or = HHT, then S2 () = u2 S0 and, even without knowing , we know that S1 () = uS0 . We therefore dene E[S1 |S2 ](HHH) = E[S1 |S2 ](HHT) = uS0 . Similarly we dene E[S1 |S2 ](TTT) = E[S1 |S2 ](TTH) = dS0 . Finally, if A = AHT ATH = {HTH, HTT, THH, THT}, then S2 () = udS0 , so that S1 () = uS0 or S1 () = dS0 . So, to get E[S1 |S2 ] in this case, we take a weighted average, as follows. For A we dene E[S1 |S2 ]() =
A

S1 dP , P(A)

A = AHT ATH ,

which is a partial average of S1 over the set A, normalised by the probability of A. Now, P(A) = 2pq and A S1 dP = pq(u + d)S0 , so that for A E[S1 |S2 ]() = 1 (u + d)S0 , 2 A = AHT ATH .

(In other words, the best estimate of S1 , given that S2 = udS0 , is the average of the possibilities uS0 and dS0 .) Then we have that E[S1 |S2 ]dP =
A A

S1 dP.

In conclusion, we can write E[S1 |S2 ]() = g(S2 ()), where if x = u2 S0 uS0 , 1 (u + d)S0 , if x = udS0 g(x) = 2 dS0 , if x = d2 S0 .

In other words E(S1 |S2 ) is random only through dependence on S2 (and hence is (S2 )-measurable). The random variable has two fundamental properties: 1. E[S1 |S2 ] is (S2 )-measurable

26

2. For every A (S2 ), E[S1 |S2 ] dP =


A A

S1 dP,

which is known as the partial averaging property. Denition 4.10 (Conditional expectation). Let (, F, P) be a probability space, and let G be a sub--algebra of F. Let X be a random variable on (, F, P). Then the conditional expectation E[X|G] is dened to be any random variable Y that satises 1. Y = E[X|G] is G-measurable 2. For every set A G, we have the partial averaging property Y dP
A A

E[X|G] dP =
A

X dP.

Note (we do not prove this here) that there is always a random variable Y satisfying the above properties (provided that E|X| < ), i.e. conditional expectations always exist. There can be more than one random variable satisfying the above properties, but if Y is another one, then Y = Y with probability 1 (or almost surely (a.s.)). For random variables X, Y it is standard notation to write E[X|Y ] := E[X|(Y )]. Here are some ways to interpret E[X|G]. A random experiment is performed, i.e. an element is selected. The value of is partially but not fully revealed to us, so we cannot compute the exact value of X(). Based on what we know about , we compute an estimate of X() which, since it depends on the partial information we have about , depends on , i.e. E[X|G] = E[X|G]() is a function of , though this dependence is often not shown explicitly. If the -algebra G contains nitely many sets, there will be a smallest set A G containing , which is the intersection of all sets in G containing . The way is partially revealed to us is that we are told it is in A, but not told which element of A it is. We then dene E[X|G]() to be the average (with respect to P) value of X over the set A. Thus, for all A, E[X|G]() will be the same. 4.1.2 Partial averaging

The partial averaging property is E[X|G] dP =


A A

X dP,

A G.

We can rewrite this as E[IA E[X|G]] = E[IA X]. (4.1) Note that IA () (which equals 1 for A and 0 otherwise) is a G-measurable random variable. Equation (4.1) suggests (and it is indeed true) that the following holds. Lemma 4.11. If V is any G-measurable random variable, then provided E[V E[X|G]] < , E[V E[X|G]] = E[V X]. (4.2)

Proof. Here is a sketch of the proof in a general probability space. First use (4.1) and linearity of expectations to prove (4.2) when V is a simple G-measurable K random variable, i.e. V = k=1 ck Ak , where each Ak G and each ck is constant. Next consider the case that V is a nonnegative G-measurable random variable, not necessarily simple. Such a V 27

can be written as the limit of an increasing (almost surely) sequence of simple random variables Vn . We write (4.2) for each Vn and pass to the limit n , using the Monotone Convergence Theorem, to obtain (4.2) for V . Finally, the general (integrable) G-measurable random variable V can be written as the dierence of two nonnegative random variables, V = V + V , and since (4.2) holds for V + and V it must hold for V as well. This is the standard machine argument. Based on Lemma 4.11, we can replace the second condition in the denition of conditional expectation by (4.2), so that the dening properties of Y = E[X|G] are: 1. Y = E[X|G] is G-measurable 2. For every G-measurable random variable V , we have E[V E[X|G]] = E[V X].) Notice that we can write (4.3) as E[V (E[X|G] X)] = 0, which allows an interpretation of E[X|G] as the projection of the vector X on to the subspace G. Then E[X|G] X is perpendicular to any V in the subspace. 4.1.3 Properties of conditional expectation (4.3)

Conditional expectations have many useful properties, which we list below (some with sketch proofs). All the X below satisfy E|X| < . 1. E[E[X|G]] = E[X]. Proof. Just take A = in the partial averaging property (or, equivalently, V = I in (4.2)).

The conditional expectation of X is thus an unbiased estimator of the random variable X. 2. If X is G-measurable, then E[X|G] = X. Proof. The partial averaging property A Y dP A E[X|G] dP = A X dP holds trivially when Y is replaced by X. Then, if X is G-measurable, it satises the rst requirement in the denition of conditional expectation as well.

In other words, if the information content of G is sucient to determine X, then the best estimate of X based on G is X itself. 3. (Linearity) For a1 , a2 R, E[a1 X1 + a2 X2 |G] = a1 E[X1 |G] + a2 E[X2 |G]. Proof. By linearity of integrals (i.e. of expectations), as follows: E[a1 X1 + a2 X2 |G] is G-measurable and satises, for any A G, E[a1 X1 + a2 X2 |G] dP
A

=
A

(a1 X1 + a2 X2 ) dP X1 dP + a2
A A

(partial averaging) (linearity of integrals) (partial averaging)

= a1 = a1
A

X2 dP,

E[X1 |G] dP + a2
A

E[X2 |G] dP

=
A

(a1 E[X1 |G] + a2 E[X2 |G]) dP, 28

(linearity of integrals),

and this is the partial averaging property.

4. (Positivity) If X 0 almost surely, then E[X|G] 0 almost surely. Proof. Let A = { : E[X|G] < 0}. This set is in G since E[X|G] is G-measurable. Now, the partial averaging property implies that E[X|G] dP =
A A

X dP.

The RHS of this is 0 and the LHS is < 0 unless P(A) = 0. Therefore we must have P(A) = 0 E[X|G] 0 almost surely.

5. (Jensens inequality) If : R R is convex and E|(X)| < , then E[(X)|G] (E[X|G]). Proof. Recall the usual Jensen inequality: E[(X)] (E[X]). The proof of the conditional version follows exactly the same lines (see the proof of Theorem 23.9 in Jacod and Protter [9]).

6. (Tower property) If H is a sub--algebra of G, then E[E[X|G]|H] = E[X|H], a.s.

Proof. If H H, then it is also true that H G, since H is a sub--algebra of G. Hence E[E[X|G]|H] dP


H

=
H

E[X|G] dP X dP
H

= =
H

E[X|H] dP,

so by a.s. uniqueness of conditional expectations, there is a unique H-measurable random variable on the left and right hand sides of the above, so we must have E[E[X|G]|H] = E[X|H] a.s.

The intuition here is that G contains more information than H. If we estimate X based on the information in G, and then estimate the estimator based on the smaller amount of information in H, then we get the same result as if we had estimated X directly based on the information in H. 7. (Taking out what is known). If Z is G-measurable, then E[ZX|G] = Z E[X|G].

29

Proof. Let ZE[X|G is G-measurable (since the product of G-measurable functions is Gmeasurable), so satises the rst property of a conditional expectation. So we check the partial averaging property. For A G we have ZE[X|G dP = E[IA ZE[X|G]] = E[IA ZX] =
A A

ZX dP =
A

E[ZX|G] dP,

(the second equality obtained using (4.3) with V = IA Z), so the partial averaging property holds.

8. (Role of independence) If X is independent of H (i.e. if (X) and H are independent -algebras), then E[X|H] = E[X]. (4.4) Proof. Observe rst that E[X] is H-measurable, since it is not random. So we only need to check the partial averaging property; we require that E[X] dP =
A A

X dP,

A H.

If X is an indicator of some set B, which by assumption must be independent of H, then the partial averaging equation we must check is P(B) dP =
A A

IB dP.

The LHS is P(A)P(B), and the RHS is IA IB dP =


IAB dP = P(A B),

and so the partial averaging property holds because the sets A and B are independent. The partial averaging property for general X independent of H then follows by the standard machine.

The intuition behind (4.4) is that if X is independent of H, then the best estimate of X based on the information in H is E[X], the same as the best estimate of X based on no information. Remark 4.12. There are also analogues of integral convergence theorems such as Fatous Lemma, and the Monotone and Dominated Convergence Theorems, for conditional expectations as opposed to ordinary expectations. Example 4.13. Consider the 3-period binomial model of stock price movements, with sample space given by = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. The stock price, initially S0 , goes up or down depending on the coin toss. If St () = St (1 . . . t ) is the stock price after t tosses, it evolves as St+1 () = St u, if t+1 = H St d, if t+1 = T,

where u > 1 > d > 0 are constants. Let Ft represent the -algebra determined by the rst t tosses. The -algebra determined by the rst toss is F1 = {, , AH , AT }, where AH (respectively AT ) is the event corresponding to a H (respectively a T) on the rst toss. 30

Using the partial averaging property on the sets AH and AT , we can show (the obvious fact) that E[S2 |F1 ]() = (pu + qd)S1 (), as follows: E[S2 |F1 ] is constant on AH (and constant on AT ) and must satisfy the partial averaging property on these sets: E[S2 |F1 ] dP =
AH AH

S2 dP,
AT

E[S2 |F1 ] dP =
AT

S2 dP.

(Obviously the partial averaging property is true on (all are zero) and it will be true on if it is true on AH and AT since AH AT = ). On AH we have E[S2 |F1 ] dP
AH

= P(AH )E[S2 |F1 ]() = pE[S2 |F1 ],

(since E[S2 |F1 ] is constant over AH )

AH ,

whilst on the other hand S2 dP = p2 u2 S0 + pqudS0 .


AH

Hence E[S2 |F1 ]() = pu2 S0 + qudS0 = (pu + qd)uS0 = (pu + qd)S1 (), Similarly, we can show that E[S2 |F1 ]() = (pu + qd)S1 (), So overall we get E[S2 |F1 ]() = (pu + qd)S1 (), With F0 = {, }, we can show similarly that E[S1 |F0 ] = (pu + qd)S0 . F0 contains no information, so any F0 -measurable random variable must be constant (nonrandom). Therefore E[S1 |F0 ] is that constant which satises the averaging property E[S1 |F0 ] dP =

AH .

AT . .

S1 dP = E[S1 ] = (pu + qd)S0 ,

and so we have E[S1 |F0 ] = (pu + qd)S0 . In a T -period binomial model, we have E[St+1 |Ft ] = (pu + qd)St , t = 0, 1, . . . , T 1.

To show this, dene, for any t = 0, 1, . . . , T 1, the random variable X := St+1 . St

Then X = u if t+1 = H and X = d if t+1 = T, and X is independent of Ft because each coin toss is independent. Hence E[St+1 |Ft ] = E[XSt |Ft ] = St E[X|Ft ] = St E[X] = (pu + dq)St . Note that the stock price is a Markov process.

31

4.2

Martingales

Denition 4.14 (Martingale). A stochastic process M = (Mt )T on a ltered probability space t=0 (, F, F := (Ft )T , P) is a martingale with respect to the ltration F = (Ft )T if t=0 t=0 1. Each Mt is Ft -measurable (so the process (Mt )T is adapted to the ltration (Ft )T ), t=0 t=0 2. for each t {0, 1, . . . , T }, E|Mt | < , 3. E[Mt+1 |Ft ] = Mt , t = 0, 1, . . . , T 1.

So martingales tend to go neither up nor down. A supermartingale tends to go down, i.e. the second condition above is replaced by E[Mt+1 |Ft ] Mt . A submartingale tends to go up, i.e. E[Mt+1 |Ft ] Mt . A simple argument using the tower property and induction shows the following. Lemma 4.15. E[Mt+u |Ft ] = Mt , for arbitrary u {1, . . . , T t}. Proof. Consider E[Mt+2 |Ft ]. By the tower property, E[Mt+2 |Ft ] = E[E[Mt+2 |Ft+1 ]|Ft ] = E[Mt+1 |Ft ] = Mt , and continuing in this fashion we get E[Mt+u |Ft ] = Mt , for u = 1, 2, . . . , T t.

Let X be an integrable random variable (E [|X|] < ) on a ltered probability space (, F, F := (Ft )T , P). Dene t=0 Mt := E[X|Ft ], We then have: Lemma 4.16. M := (Mt )T is a (P, F)-martingale. t=0 Proof. We have E[Mt+1 |Ft ] = = = E[E[X|Ft+1 ]|Ft ] E[X|Ft ] Mt . (by the tower property) t {0, 1, . . . , T }.

Denition 4.17 (Predictable process). On a ltered probability space (, F, (Ft )T , P), a pret=0 dictable process (t )T is one such that, for each t {1, . . . , T }, t is Ft1 -measurable. t=1 Proposition 4.18. Let (Mt )T be a martingale on a ltered probability space (, F, F := t=0 (Ft )T , P). Let (t )T be a bounded predictable process. Then the process N := (Nt )T det=0 t=1 t=0 ned by
t

N0 := 0, is a (P, F)-martingale.

Nt :=
s=1

s (Ms Ms1 ),

t = 1, . . . , T,

32

Remark 4.19. The process N is called a martingale transform or discrete-time stochastic integral, t and is sometimes denoted Nt = 0 s dMs or Nt = ( M )t . Proof. We have
t+1

E[Nt+1 |Ft ]

= E
s=1 t

s (Ms Ms1 ) Ft s (Ms Ms1 ) + t+1 (Mt+1 Mt ) Ft


s=1

= E

= E [ Nt + t+1 (Mt+1 Mt )| Ft ] = Nt + t+1 (E[Mt+1 |Ft ] Mt )


=0

(since Nt and t+1 are Ft -measurable)

= Nt .

Proposition 4.20. On a ltered probability space (, F, F := (Ft )T , P), an adapted sequence t=0 of real random variables M = (Mt )T is a (P, F)-martingale if and only if for any predictable t=0 process = (t )T , we have t=1
t

E
s=1

s (Ms Ms1 ) = 0,

t = 1, . . . , T.

Proof. If (Mt )T is a martingale, dene the process X := (Xt )T by X0 = 0 and, for t = t=0 t=0 t 1, . . . , T , Xt := s=1 s (Ms Ms1 ) for any predictable process = (t )T . Then, by Proposit=1 tion 4.18, X is also a martingale, so E[Xt ] = X0 = 0. Conversely, if E s=1 s (Ms Ms1 ) = 0 holds for any predictable , take u {1, . . . , T }, let A Fu be given, and dene a predictable process by setting u+1 = 1A , t = 0 for all other t {1, . . . , T }. Then
t t

= E
s=1

s (Ms Ms1 )

= E[IA (Mu+1 Mu )] = E [E[IA (Mu+1 Mu )]|Fu ] = E [IA (E[Mu+1 |Fu ] Mu )] . Since this holds for all A Fu it follows that E[Mu+1 |Fu ] = Mu , so M is a martingale.

Equivalent measures*

Here is a deep theorem, which we do not prove. Theorem 5.1 (Radon-Nikodym). Let P and Q be two probability measures on a space (, F). Assume that for every A F satisfying P(A) = 0, we also have Q(A) = 0. Then we say Q is absolutely continuous with respect to P. Under this assumption, there is a nonnegative random variable Z such that Q(A) = Z dP, A F, (5.1)
A

and Z is called the Radon-Nikodym derivative of Q with respect to P.

33

The random variable Z is often written as Z= dQ . dP

Equation (5.1) implies the apparently stronger condition EQ [X] = E[XZ], for every random variable X for which E[XZ] < . To see this, note that (5.1) in Theorem 5.1 is equivalent to EQ [IA ] = E[IA Z], A F. This is then extended this to general X via the standard machine argument. If Q is absolutely continuous with respect to P and P is absolutely continuous with respect to Q, then we say that P and Q are equivalent. In other words, P and Q are equivalent if and only if P(A) = 0 Q(A) = 0, A F. If P and Q are equivalent and Z is the Radon-Nikodym derivative of Q w.r.t. P, then Radon-Nikodym derivative of P w.r.t. Q, i.e. EQ [X] E[Y ] = E[XZ] X, 1 Y, = EQ Y Z
1 Z

is the (5.2) (5.3)

and letting X and Y be related by Y = XZ we see that the above two equations are the same. Example 5.2 (Radon-Nikodym theorem in 2-period coin toss space). Let = {HH, HT, TH, TT}, the set of coin toss sequences of length 2. Let P correspond to probability 1 for H and 2 for T, 3 3 1 and let Q correspond to probability 1 for H and 2 for T. Then the Radon-Nikodym derivative 2 of Q w.r.t. P is easily seen to be Q() Z() = , P() so that Z(HH) = 9 , 4 Z(HT) = 9 , 8 Z(TH) = 9 , 8 Z(TT) = 9 . 16

5.1

Radon-Nikodym martingales*

Let be a nite set (such as the set of all sequences of T coin tosses). Let F = (Ft )T be a t=0 ltration. Let P be a probability measure and let Q be a measure absolutely continuous with respect to P, written as Q P. Assume P() > 0, Q() > 0, ,

so that P and Q are equivalent. The Radon-Nikodym derivative of Q with respect to P is Z() = Dene the P-martingale Zt := E[Z|Ft ], We can check that Zt is indeed a martingale: E[Zt+1 |Ft ] = E[E[Z|Ft+1 ]|Ft ] = E[Z|Ft ] = Zt . 34 t = 0, 1, . . . , T. Q() . P()

Lemma 5.3. For t {0, 1, . . . , T }, if X is Ft -measurable, then EQ [X] = E[XZt ]. Proof. EQ [X] = = = = E[XZ] E[E[XZ|Ft ]] E[XE[Z|Ft ]] E[XZt ].

Note that Lemma 5.3 implies that if X is Ft -measurable, then for any A Ft , EQ [IA X] = E[IA XZt ], or equivalently, X dQ =
A A

XZt dP,

A Ft .

Lemma 5.4. If X is Ft -measurable and 0 s t, then EQ [X|Fs ] = Proof. Note rst that
1 Zs E[XZt |Fs ]

1 E[XZt |Fs ]. Zs

is Fs -measurable. So for any A Fs , we have E[XZt |Fs ] dP


A

1 E[XZt |Fs ] dQ = Zs =

(Lemma 5.3)

XZt dP
A

(partial averaging) (Lemma 5.3)

=
A

X dQ

=
A

EQ [X|Fs ] dQ (partial averaging).

Example 5.5 (Radon-Nikodym theorem in 2-period coin toss space, continued). We show in Figure 8 the values of the martingale Zt . Note that we always have Z0 = 1, since Z0 = E[Z] =

Z dP = Q() = 1.

Z2 (HH) =

9 4


Z1 (H) =

3 2 9 8

Z0 = 1 d d

d Z (HT) = Z (TH) = d 2 2
Z1 (T) =
3 4

d d

Z2 (TT) =

9 16

Figure 8: The values of the Radon-Nikodym martingale Zt in the 2-period binomial model example In this section we give another application of the Radon-Nikodym theorem. This material is for background only. 35

5.2

Conditional expectation and Radon-Nikodym theorem*

In this section we give another application of the Radon-Nikodym theorem. This material is for background only. Let (, F, Q) be a probability space. Let G be a sub--algebra of F, and let X be a nonnegative random variable with X dQ = 1.

(5.4)

We can construct the conditional expectation (under Q) of X given G. Recall that this is the (unique) G-measurable random variable EQ [X|G] that satises the partial averaging property EQ [X|G] dQ =
A A

X dQ,

A G.

On G we can dene two probability measures P and P by P(A) = Q(A), and P(A) =
A

A G,

X dQ,

A G.

Notice that P is indeed a probability measure since it satises P() = 1, by (5.4). Now, whenever Y is a G-measurable random variable, we have Y dP =

Y dQ,

(5.5)

since if Y = IA for some A G then (5.5) is just the denition of P, and the full result follows from the standard machine. Also, if A G and P(A) = 0, then Q(A) = 0, so that P(A) = 0. In other words, the measure P is absolutely continuous with respect to the measure P. The Radon-Nikodym Theorem then implies that there exists a G-measurable random variable Z such that P(A) =
A

Z dP,

A G,

that is, X dQ =
A A

Z dP,

A G,

or, by (5.5), X dQ =
A A

Z dQ,

A G,

i.e. Z has the partial averaging property, and since it is G-measurable, it is the conditional expectation (under Q) of X given G. In other words, the existence of conditional expectations is a consequence of the Radon-Nikodym Theorem.

Arbitrage pricing in the binomial model

Take the T -period binomial stock price process S of Example 3.3 on the ltered probability space (, F, F := (Ft )tT , P), generated by T coin tosses, with time index set T = {0, 1, 2, . . . T }. The sample space is nite, the probability measure P is called the physical measure (or objective measure, or the market measure). We assume FT = F and F0 = {, }. There is a riskless asset with price process S 0 , assumed to evolve according to
0 0 St+1 = (1 + r)St ,

t = 0, 1, . . . , T 1.

36

0 where r 0 is the constant interest rate process. We shall also assume S0 = 1, so that with constant interest rate r, we have 0 St = (1 + r)t ,

t = 0, 1, . . . , T.

(6.1)

0 Hence, St represents the value of at time t of one unit of currency (say $1) invested at time zero. The sample space is the set of all outcomes of T coin tosses, so each is of the form = (1 2 . . . T ), with each t {H, T}, for each t {1, . . . , T }. The evolution of the stock price process S = (St )T is given by t=0

St+1 =

St u, if t+1 = H, St d, if t+1 = T,

t = 0, 1, . . . , T 1,

where u > 1 + r > d > 0. We introduce a single agent with initial wealth X0 at time zero. At each time period, the investor chooses his investment portfolio. The agents trading strategy is the two-dimensional stochastic process 0 (t , t ), t {1, . . . , T },
0 where, for t {1, . . . , T }, t denotes the number of units of the riskless asset held over the interval [t 1, t) and t , denotes the number of units of the stock held over the interval [t 1, t) and . Hence, the positions in the portfolio at time t, for t {1, . . . , T }, are decided at time t 1 and kept until time t, when new asset price quotations are available. 0 Assumption 6.1. The portfolio process (t , t )t{1,...,T } is predictable, so that for each t 0 {1, . . . , T }, t and t are Ft1 -measurable. 0 The vector initial of portfolio weights is (1 , 1 ). Denote the agents portfolio wealth process T by X = (Xt )t=0 . Then the initial wealth is given by 0 X0 = 1 + 1 S0 . (budget constraint)

(6.2)

Equation (6.2) is a budget constraint, that the agent splits all his initial wealth between cash and the risky stock. The time 1 wealth is 0 X1 = (1 + r)1 + 1 S1 , (6.3) where we have assumed that no wealth has been taken out of the portfolio for (say) consumption and no outside income has been injected into the portfolio. Equation (6.3) is thus one form of a self-nancing condition on the portfolio wealth evolution. Using the budget constraint (6.2) we re-cast (6.3) into the form X1 = (1 + r)X0 + 1 (S1 (1 + r)S0 ). (6.4)

At time 1, that is, at the beginning of the period [1, 2), in response to the new risky asset price 0 S1 , the investor adjusts his asset holdings to (2 , 2 ) so that X1 is also given by
0 0 0 X1 = 2 S1 + 2 S1 = (1 + r)2 + 2 S1 ,

where, since the portfolio is self-nancing, there is no withdrawal or injection of funds, and all wealth changes arise only from changes in asset prices and from re-adjustments in the portfolio holdings. We can therefore write the self-nancing condition at time 1 as
0 0 0 0 2 S1 + 2 S1 = 1 S1 + 1 S1 .

Similar self-nancing portfolio rebalancing occurs at each time t {1, . . . , T 1}. Dene the wealth process X = (Xt )tT , where Xt denotes the wealth at time t (that is, at the end of the interval [t 1, t) and the beginning of the interval [t, t + 1)), for t = 0, 1, . . . , T 1, with XT the nal wealth at the end of the interval [T 1, T ]. We then have the following evolution.

37

At the beginning of the interval [t 1, t), and just after portfolio rebalancing has taken place, the wealth is Xt1 given by
0 0 0 Xt1 = t St1 + t St1 = t (1 + r)t1 + t St1 ,

(6.5)

where the last equality follows from the expression (6.1) for the value of the riskless asset at any 0 time in T. The position (t , t ) is held over [t 1, t), and the wealth Xt achieved at the end of this interval (and hence at the start of the interval [t, t + 1)) is
0 0 0 Xt = t St + t St = t (1 + r)t + t St . 0 At this time, t, the portfolio is rebalanced to (t+1 , t+1 ) so that Xt is also given by 0 0 Xt = t+1 St + t+1 St .

(6.6)

Hence the general self-nancing condition is


0 0 0 0 t+1 St + t+1 St = t St + t St ,

t = 1, . . . , T 1.

We can raise this to a denition.


0 Denition 6.2. A trading strategy (t , t )T is self-nancing if for every t = 1, . . . , T 1, we t=1 have 0 0 0 0 t+1 St + t+1 St = t St + t St , t = 1, . . . , T 1. 0 Using (6.5) to eliminate t from (6.6) we can write the portfolio wealth evolution as

Xt = (1 + r)Xt1 + t (St (1 + r)St1 ),

t = 1, . . . , T.

(6.7)

This can be put into a neater form if we work with discounted quantities, that is, we evaluate all quantities in units of the bond price. The discounted stock price process S is dened by St = St St 0 = (1 + r)t , St t = 0, 1, . . . , T,

and the discounted wealth process is similarly dened by Xt = Xt Xt 0 = (1 + r)t , St t = 0, 1, . . . , T.

Then, in terms of discounted quantities, the wealth evolution equation (6.7) becomes Xt = Xt1 + t (St St1 ), t = 1, . . . , T. (6.8)

Iterating this evolution from time zero to t T we obtain


t

Xt = X0 +
s=1

s (Ss Ss1 ),

t = 1 . . . , T.

(6.9)

From this we see that the wealth process is completely specied by the initial wealth X0 and the choice of stock portfolio . When we need to emphasise the dependence of wealth on the chosen portfolio we write X() X. We dene the sum in (6.9) as the (discrete-time) stochastic integral of with respect to S, denoted by ( S):
t

( S)t :=
s=1

s (Ss Ss1 ),

t = 1, . . . , T.

38

6.1

Equivalent martingale measures and no arbitrage*

Denition 6.3 (Equivalent martingale measure). An equivalent martingale measure (EMM), also called a risk-neutral measure, is a probability measure Q P such that the discounted stock price S is a Q-martingale. Lemma 6.4. If a martingale measure exists, then the discounted wealth process of a self-nancing portfolio process is a Q-martingale. Proof. If a martingale measure Q exists, we have EQ [St |Ft1 ] = St1 . Then, from (6.8) we obtain EQ [Xt |Ft1 ] = Xt1 , t = 1, . . . , T, so that the discounted wealth process is also a Q-martingale. We can also obtain this result from the fact that the discounted wealth process is a nite sum of stochastic integrals. When S is a martingale, the stochastic integral is called a martingale transform, and is itself a martingale (see Proposition 4.18). For any self-nancing strategy we have the discounted wealth process given by (6.9), and combining this with Proposition 4.18, the discounted wealth process X is then a Q-martingale. 6.1.1 Fundamental theorems of asset pricing*

Recall the denition of arbitrage. Denition 6.5 (Arbitrage). An arbitrage is a strategy satisfying X0 = 0, P[XT () 0] = 1 and P[XT () > 0] > 0. It is possible to give conditions under which such opportunities do not exist, and these are usually called Fundamental Theorems of Asset Pricing (FTAPs). We do not prove them here, but shall see them in action in the binomial model. Theorem 6.6 (First Fundamental Theorem of Asset Pricing (FTAP I)). A nite sample space, discrete-time nancial market is arbitrage-free if and only if there exists an equivalent martingale measure. Remark 6.7. The proof of Theorem 6.6 is easy in one direction: suppose there exists an equivalent martingale measure Q. Then for any self-nancing strategy we have, from Lemma 6.4, that the discounted wealth process is a Q-martingale, so EQ [XT ] = X0 . This immediately precludes the possibility of arbitrage. For suppose X is such that X0 = 0 and XT 0 P-a.s., so that XT 0 Q-a.s. (since P and Q are equivalent). But since EQ [XT ] = X0 = 0, it must be the case that XT = 0, Q-a.s. This implies that XT = 0 P-a.s., since P and Q are equivalent, so there is no arbitrage. Assume the market is arbitrage free, so there exists at least one martingale measure Q. Denition 6.8. A European contingent claim with expiration time T is a non-negative FT measurable random variable Y , which is called the payo of the claim. Denition 6.9. A European contingent claim Y is said to be attainable (or hedgeable or replicable) if there exists a constant X0 and a portfolio process = (t )T such that the self-nancing t=1 wealth process (Xt )T satises t=0 XT () = Y (), .

In this case, for t = 0, . . . , T , we call Vt := Xt the no-arbitrage price at time t of Y , and the portfolio which attains XT = Y is called the replicating portfolio for the claim. Denition 6.10 (Complete market). A nancial market is said to be complete if every contingent claim is attainable. Otherwise, the market is said to be incomplete. 39

Theorem 6.11 (Second Fundamental Theorem of Asset Pricing (FTAP II)). A nite-state discrete-time arbitrage-free market is complete if and only if there is a unique equivalent martingale measure. Here are some examples of European claims. A European call option, with payo Y = (ST K)+ for xed strike K 0. A European put option, with payo Y = (K ST )+ for xed strike K 0. A xed strike lookback call option, with payo Y = (MT K)+ for xed strike K 0, where MT is the maximum of the stock price over {0, 1, . . . , T }, that is MT = maxtT St . A oating strike lookback call option, with payo Y = (ST mT )+ , where mT is the minimum of the stock price over {0, 1, . . . , T }, that is mT = mintT St . An arithmetic average xed strike Asian call option, with payo Y = (AT K)+ for xed strike K 0,, where AT is the arithmetic average of the stock price over {0, 1, . . . , T }, that is T 1 St . AT = T + 1 t=0 In a complete market, all contingent claims are attainable. So, given a contingent claim Y , there is a unique trading strategy with wealth process X = (Xt )tT such that XT = Y almost surely. This immediately implies that, to avoid arbitrage, the price of the claim at any time t T must be Vt := Xt , as in Denition 6.9. Denote the discount factor from time t T to time zero by Dt . So, with constant interest 0 rate r, Dt = (1 + r)t = 1/St . Lemma 6.12. The no-arbitrage price of an attainable claim Y is given by Vt = 1 Q E [DT Y |Ft ], Dt k T. (6.10)

Any other price for the claim will lead to an arbitrage opportunity. Proof. Let X = (Xt )tT be the wealth process of the replicating strategy. The discounted wealth process X = DX is a Q-martingale, so satises EQ [DT XT |Ft ] = Dt Xt , t T.

Using XT = Y and the denition Vt := Xt yields (6.10). To show that there is arbitrage if (6.10) is violated, consider buying or selling the claim at time zero (and a similar argument would hold at any time t T). First, suppose V0 > EQ [DT Y ]. Sell the claim for V0 and use the proceeds to invest in the replicating portfolio, which requires an initial investment of X0 = EQ [DT Y ]. The wealth in this portfolio at time T is given by XT = Y , by assumption. Therefore, one can, at time zero, invest V0 X0 > 0 in the bank, use the proceeds from the replicating portfolio to settle ones obligations from the claim, and make a riskless prot of (V0 X0 )(1 + r)T > 0. This is an arbitrage. Similarly, if V0 < EQ [DT Y ], one buys the claim and sells the replicating portfolio, leading to prots (X0 V0 )(1 + r)T > 0.

6.2

Pricing by replication in the binomial model

In this section we shall show that in the binomial model it is possible to replicate any European claim, so the model is complete. If we were to reply on the rst FTAP we could deduce this immediately, since we can dene a unique EMM Q via Q(t = H) = pQ := 1+rd , ud Q(t = T) = q Q := 40 u (1 + r) , ud t T.

We will explicity construct a replicating strategy and see that this measure emerges naturally. Recall the T -period binomial model. The stock price, initially S0 , goes up or down depending on the outcome of a coin toss. The coin is tossed a total of T times. Denote the elements by = (1 . . . T ), where t (t = 1, . . . , T ) is a H or a T, and represents the outcome of the tth toss. Then is the set of all sequences of H and T with T components. Let F be the -algebra of all subsets of . Under the physical measure P, the probability of H on each coin toss is p (0, 1), and the probability of T is q := 1 p. For each = (1 . . . T ) we dene P() = pNumber of H in q Number of T in . Then for each A F we dene P(A) =
A

P().

If St () = St (1 . . . t ) is the stock price after t tosses, then its evolution is given by St+1 () = St u, if t+1 = H St d, if t+1 = T,

where u > 1 + r > 1 > d > 0 are constants, and r is the one-period interest rate. We also write St+1 (1 . . . t+1 ) = St (1 . . . t )u, if t+1 = H St (1 . . . t )d, if t+1 = T,

Let Ft represent the -algebra determined by the rst t tosses. We take F = FT . 6.2.1 Replication in a one-period binomial model

We already know how to value a claim, using (6.10) from Lemma 6.12. Here we derive an explicit replicating strategy in a one-period binomial model, and extend this to multi-period binomial models in the next section. Let T = 1, so the model has one period, T = {0, 1}. Suppose an agent sells a claim on the stock at time zero that expires at time 1. There are just two points , given by = H and = T. The claim pays o an amount Y at time 1, where Y is an F1 -measurable random variable. This measurability condition is important - it basically says that the value of the claim at its maturity date is determined by the coin toss, i.e. by the value of the stock price at time 1. This is why it does not make sense to use some stock unrelated to the derivative security in valuing it. Suppose a trader sells the claim at time zero for some price V0 . The trader attempts to manage the risk from this sale by building a hedging portfolio composed of a number 1 0 shares of the underlying stock and a number 1 0 of shares of the riskless asset (which has 0 initial value S0 = 1). We suppose that the proceeds from the sale of the claim, V0 , are all that the trader uses to construct a hedging portfolio. Therefore the initial wealth in the hedging portfolio is X0 = V0 = 0 + S0 . (6.11)

As the stock price evolves in time the hedging portfolio and option value will also evolve. The option payo is the random variable Y () (so for, say, a call option, Y () = (S1 () K)+ , where K is the options strike and S1 () is the stock price after one coin toss). The traders hedge portfolio wealth at time 1 is X1 (), given by X1 () = (1 + r) 0 + S1 (). Eliminating 0 using (6.11), we write X1 as X1 () = (1 + r)X0 + (S1 () (1 + r)S0 ). 41

If the hedging portfolio is to successfully manage the risk from the option sale its value must replicate the option payo in each possible nal state, so that we require X1 () = Y () for = H and = T, yielding the equations (1 + r)X0 + (S1 (H) (1 + r)S0 ) (1 + r)X0 + (S1 (T) (1 + r)S0 ) Solving these equations for gives = Y (H) Y (T) . S1 (H) S1 (T) = Y (H), = Y (T), if = H, if = T.

Then, the initial wealth is computed from either of (6.12) or (6.12) as X0 = 1 pQ Y (H) + q Q Y (T) , 1+r

where we have used (6.2) and S1 (H) = uS0 , S1 (TT ) = dS0 . (The cash position required can be obtained using (6.11).) If the trader holds the portfolio ( 0 , ), then he will be able to meet all his obligations of the claim. Therefore the current claim price is the initial wealth required to do this, or V0 = X0 , as given by (6.11). So we get the claim price at time zero as V0 = 1 EQ [Y ]. 1+r

The measure Q is the unique EMM for this one-period market, and is also known as the riskneutral probability measure. It is clear that Q is equivalent to the physical measure P, and that Q is indeed a martingale measure, in that S1 EQ = S0 . 1+r It is also clear that the discounted wealth process, and hence the discounted claim price process, is also a Q-martngale, just as we would have expected from the FTAPs.

6.3

Completeness of the multiperiod binomial model

The above analysis can be generalised to a multiperiod binomial model, and the next theorem rigorously demonstrates that a portfolio process to hedge any contingent claim in the binomial model exists, and derives an expression for t , t = 1, . . . , T . Dene the unique EMM Q by setting the Q-probability of H on each coin toss is to be pQ , and the Q-probability of T to be q Q := 1 pQ , given by (6.2). Theorem 6.13. The T -period binomial model is complete. In particular, let Y be European claim with maturity time T , and dene Vt (1 . . . t ) t (1 . . . t1 ) := := (1 + r)t EQ [(1 + r)T Y |Ft ](1 . . . t ), t = 0, . . . , T, Vt (1 . . . t1 H) Vt (1 . . . t1 T) , t = 1, . . . , T. St (1 . . . t1 H) St (1 . . . t1 T)

Then, starting with initial wealth X0 := V0 = EQ [(1 + r)T Y ], the self-nancing wealth process corresponding to the portfolio process 1 , . . . , T is the process V0 , . . . , VT . Proof. Let V0 , . . . , VT and 1 , . . . , T be dened as in the theorem. Observe that VT = Y almost surely. Start with wealth X0 = V0 = EQ [(1 + r)T Y ] and consider the self-nancing value of the process 1 , . . . , T . This wealth satises the recursive formula Xt+1 = (1 + r)Xt + t+1 (St+1 (1 + r)St ), 42 t = 0, 1, . . . , T 1.

We need to show that, with Xt , Vt , t dened as above, we have Xt = Vt , almost surely, t {0, . . . , T }. (6.12)

We proceed by induction. For t = 0, (6.12) holds by denition of X0 . Now assume that (6.12) holds for some xed value of t {1, . . . , T 1}, i.e. for each xed (1 . . . t ) we have Xt (1 . . . t ) = Vt (1 . . . t ). Then we need to show that Xt+1 (1 . . . t H) Xt+1 (1 . . . t T) = = Vt+1 (1 . . . t H), Vt+1 (1 . . . t T).

We shall prove the rst equality, and note that the second can be proved similarly (an exercise). Note rst that {(1 + r)t Vt }T is a martingale under Q, since t=0 EQ [(1 + r)(t+1) Vt+1 |Ft ] = EQ [EQ [(1 + r)T Y |Ft+1 ]|Ft ] = E [(1 + r) = So in particular, Vt (1 . . . t ) = = EQ [(1 + r)1 Vt+1 |Ft ](1 . . . t ) 1 (pQ Vt+1 (1 . . . t H) + q Q Vt+1 (1 . . . t T)). 1+r
t Q T

(by denition of Vt+1 )

Y |Ft ]

(by the tower property)

(1 + r) Vt .

Since (1 . . . t ) will be xed for the rest of the proof, we simplify notation by suppressing these symbols. For example, the last equation is written as Vt = Now we compute Xt+1 (H) = = (1 + r)Xt + t+1 (St+1 (H) (1 + r)St ) 1 (pQ Vt+1 (H) + q Q Vt+1 (T)). 1+r (6.13)

(1 + r)Vt + t+1 (St+1 (H) (1 + r)St ) (since Xt = Vt ) Vt+1 (H) Vt+1 (T) = (1 + r)Vt + (St+1 (H) (1 + r)St ) St+1 (H) St+1 (T) Vt+1 (H) Vt+1 (T) = pQ Vt+1 (H) + q Q Vt+1 (T) + (St+1 (H) (1 + r)St ) St+1 (H) St+1 (T) = pQ Vt+1 (H) + q Q Vt+1 (T) + q Q (Vt+1 (H) Vt+1 (T)) = Vt+1 (H),

where we have used St+1 (H) = St u and St+1 (T) = St d. Example 6.14 (European call in 2-period model). Let u = 2, d = 1/u, r = 1/4, S0 = 4, so that pQ = q Q = 1/2. Consider a European call with expiration time 2 and payo function Y = (S2 5)+ . The possible stock prices in this model are shown in Figure 9. We rst note that there are four elements = {HH, HT, TH, TT}, so in principle there are four possible nal stock prices. But in fact, two of the outcomes lead to the same stock price. We say that the stock price is path-independent since it only depends on the number of H and T in the sequence = (1 , 2 ) (where t , t = 1, 2 is either H or T), and does not depend on the order in which the H and T occur. Thus S2 (HT) = S2 (TH) = 4. for example. The terminal option payos for each are Y (HH) = 11, Y (HT) = Y (HT) = Y (TH) = Y (TT) = 0, 43

S2 (HH) = 16


S1 (H) = 8

S0 = 4 d d

d d S2 (HT) = S2 (TH) = 4
S1 (T) = 2

d dS (TT) = 1 2 Figure 9: Two period binomial lattice and these are of course the option values at time 2: V2 (HH) = 11, V2 (HT) = V2 (HT) = V2 (TH) = V2 (TT) = 0.

Then using the binomial algorithm in Theorem 6.13 we work backwards in time using the fact that V is a Q-martingale, to obtain V1 (H) V1 (T) V0 = = = 1 (pQ V2 (HH) + q Q V2 (HT)) = 1+r 1 (pQ V2 (TH) + q Q V2 (TT)) = 1+r 1 4 (pQ V1 (H) + q Q V1 (T)) = 1+r 5 4 1 1 22 (11) + (0) = , 5 2 2 5 4 1 1 (0) + (0) = 0, 5 2 2 1 22 1 44 + (0) = = 1.76. 2 5 2 25

American options

We briey discuss the pricing of American derivative securities in the binomial model. American derivative securities can be exercised at any time prior to maturity. Denition 7.1. In a discrete-time framework with time set T = {0, 1, . . . , T }, an American derivative security with maturity T is a sequence of nonnegative random variables (Yt )T such t=0 that for each t T, Yt is Ft -measurable. The owner of an American derivative security can exercise at any time t T, and if he does, he receives the payment Yt . For example, an American put option of strike K on a stock price S = (St )T can be exercised 0 at any time t T to give the owner a payment Yt := (K St )+ , which is called the intrinsic value of the option at time t. Recall the pricing of European securities. Consider a binomial model with T periods, so the time set is T = {0, 1, . . . , T }. Suppose YT is the payo of a European derivative. For t T, we dene by backward recursion VT := YT , Vt := 1 [pQ Vt+1 (H) + q Q Vt+1 (T)], 1+r t = 0, . . . , T 1, (7.1)

where, as before, the second equation is a shorthand for Vt (1 . . . t ) = = EQ [(1 + r)1 Vt+1 |Ft ](1 . . . t ) 1 (pQ Vt+1 (1 . . . t H) + q Q Vt+1 (1 . . . t T)). 1+r

Then Vt is the value of the option at time t T, and the hedging portfolio over [t 1, t) is t given by Vt (H) Vt (T) Vt (H) Vt (T) = , t = 1, . . . , T, t = St (H) St (T) St1 (u d)

44

which is shorthand for t (1 . . . t1 ) = Vt (1 . . . t1 H) Vt (1 . . . t1 T) , St (1 . . . t1 H) St (1 . . . t1 T) t = 1, . . . , T.

Now suppose the option is American, with payo Y = (Yt )T . At any time t T, the holder of t=0 the American derivative can exercise the option and receive the payment Yt . Hence, the hedging portfolio should create a wealth process X which satises Xt Yt , t T, almost surely.

This is because the value of the American option at time t is at least as much as the so-called intrinsic value Yt , and the value of the hedging portfolio at that time must equal the value of the option. This suggests that, to price an American derivative, we should replace the European algorithm (7.1) by to the following American algorithm: VT = YT , Vt = max Yt , 1 [pQ Vt+1 (H) + q Q Vt+1 (T)] , 1+r t = 0, . . . , T 1, (7.2)

which checks whether the intrinsic value is greater than the value of the discounted risk-neutral expectation, which would signify that the option would be exercised in that state. Then Vt would be the value of the American derivative at time t T. Example 7.2 (American put in a 2-period model). Consider an American put option in a 2-period binomial model with u = 2, d = 1/u, r = 1/4, S0 = 4, so that pQ = q Q = 1/2. Let the option have payo function Yt = (5 St )+ . The possible stock prices in this model are shown in Figure 10. The terminal values of the option are given by V2 = Y2 = (5 S2 )+ and these are also shown in the gure.
S2 (HH) = 16, V2 (HH) = 0


S1 (H) = 8

S0 = 4 d d

d d S2 (HT) = S2 (TH) = 4, V2 (HT) = V2 (TH) = 1


S1 (T) = 2

d dS (TT) = 1, V (TT) = 4 2 2 Figure 10: Stock price and terminal value of American put Then the values of the option at time 1 are: V1 (H) V1 (T) = = max (5 8)+ , 4 5 4 max (5 2)+ , 5 1 0+ 2 1 1+ 2 1 1 2 1 4 2 = max 0, 2 2 = 5 5

= max[3, 2] = 3.

In particular, we notice that at time 1, and for 1 = T, the option should be exercised, as the intinsic value is greater than the discounted risk-neutral expectation of later values. The option value at time zero is V0 = max (5 4)+ , 4 5 1 2 2 5 1 + (3) 2 = max 1, 34 34 = = 1.36. 25 25

Now let us attempt to construct the hedging portfolio for this option. We begin with initial wealth X0 = 34/25, and we compute 1 via the replication condition for 1 = H: X1 (H) = (1 + r)X0 + 1 (S1 (H) (1 + r)S0 ) = V1 (H) = 45 2 , 5

which yields 1 = 13/30. We could just as well calculate 1 by looking at the wealth X1 (T), as follows: X1 (T) = (1 + r)X0 + 1 (S1 (T) (1 + r)S0 ) = V1 (T) = 3, which also yields 1 = 13/30. Now let us try to compute 2 in a similar manner: X2 (HH) = (1 + r)X1 (H) + 2 (H)(S2 (HH) (1 + r)S1 (H)) = V2 (HH) = 0,
1 which yields 2 (H) = 12 . The same result is obtained if one considers the wealth X2 (HT). Now let us try to compute 2 (T) as follows:

X2 (TH) = (1 + r)X1 (T) + 2 (T)(S2 (TH) (1 + r)S1 (T)) = V2 (TH) = 1, which yields 2 (T) = 11/6. However, if we try to compute 2 (T) using X2 (TT), we get X2 (TT) = (1 + r)X1 (T) + 2 (T)(S2 (TT) (1 + r)S1 (T)) = V2 (TT) = 4, which yields 2 (T) = 1/6. In other words, we get dierent answers for 2 (T), the position in stock that should be chosen at the start of the interval [1, 2) when 1 = T! This apparent anomaly has arisen because X1 (T) = 3 (since the American put is exercised when 1 = T) rather than 2, which would be the case if the option were European (and you can check that in this case the above calculations would both have yielded 2 (T) = 1). This example showns that we need to analyse the hedging portfolio for an American option more closely.

7.1

Value of hedging portfolio for an American option*

Consider the following generalisation of the evolution of the wealth of a self-nancing portfolio, equation (6.7): Xt = (1 + r)(Xt1 Ct1 ) + t (St (1 + r)St1 ), t = 1, . . . , T, (7.3)

where, for t {0, 1, . . . , T 1}, Ct represents the amount of wealth consumed at time t. In other words, we are allowing for some funds to be withdrawn from the self-nancing portfolio. We found earlier that, for a self-nancing portfolio, the discounted wealth process ((1 + r)t Xt )T t=0 is a martingale. The consequence of allowing consumption from the portfolio will mean that the discounted wealth process will be a supermartingale (i.e. it will tend to go down). To appreciate why this adjustment might be needed, consider the American algorithm in (7.2). We see that the value of the option can be greater than that given by a discounted riskneutral expectation, because of the possibility of early exercise. In other words, we might have that 1 Vt+1 Ft , Vt > EQ 1+r or, equivalently EQ [(1 + r)(t+1) Vt+1 |Ft ] < (1 + r)t Vt , so that the option value is a supermartingale. (It turns out that the value process of an American option is the smallest supermartingale that dominates the payo, though we do not prove this here.) To see how consumption enters the hedging portfolio, consider the situation in which Vt > EQ 1 Vt+1 Ft . 1+r (7.4)

Then the holder of the American option should exercise (this is the case in the state 1 = T in Example 7.2), so that hedging should stop at this point (which is why we had diculty isolating what the hedging portfolio should be in the example). If the holder of the option does not exercise, then the seller of the option may consume to close the gap between the left and right 46

hand sides of (7.4). By doing this, he can ensure that Xt = Vt for all t T, where Vt is the value dened by the American algorithm. In Example 7.2, we had V1 (T) = 3, V2 (TH) = 1, V2 (TT) = 4, so that EQ 4 1 1 1 V2 F1 (T) = 1 + 4 = 2, 1+r 5 2 2

and there is a gap if size 1 in (7.4). If the owner of the option does not exercise it at time 1 in the state 1 = T, then the seller can consume an amount 1 at time 1. Thereafter he uses the usual hedging portfolio Vt (H) Vt (T) t = . (u d)St1 In the example, we had V1 (T) = Y1 (T), which means that, acting optimally, the holder of the option should exercise. It turns out that it is optimal for the owner of the American option to exercise whenever its value Vt agrees with its intrinsic value Yt .

7.2

Stopping times*

Denition 7.3 (Stopping time). Let (, F, P) be a probability space and let F := (Ft )T be t=0 a ltration. A stopping time with respect to F (or an F-stopping time) is a random variable : {0, 1, . . . , T } {} with the property that: { : () = t} Ft , with the convention that F F FT . Denition 7.4 (Information up to a stopping time). Let (, F, P) be a probability space and let F := (Ft )T be a ltration. Let be an F-stopping time. We say that a set A is determined t=0 by time provided that A { : () = t} Ft , t = 0, 1, . . . , T, . t = 0, 1, . . . , T, ,

The collection of sets determined by is a -algebra, which we denote by F . Denition 7.5 (Value of a stochastic process at a stopping time). If (, F, P) is a probability space, F := (Ft )T is a ltration, (Xt )T is an F-adapted stochastic process and is an Ft=0 t=0 stopping time, then X is an F -measurable random variable whose value at is given by X () := X () (). Example 7.6 (Example 7.2 continued). Consider the American put option of Example 7.2 Dene the stopping time () := min{t {0, 1, 2}|Vt = (5 St )+ , Vt = 0}. The stopping time corresponds to stopping the rst time the value of the option agrees with its intrinsic value, provided this is non-zero, and is in fact an optimal exercise time. Let us write down the values of () for dierent . For = HH we saw in Example 7.2 that the put expires worthless and is never exercised. In this case, we dene (HH) = . For = HT, though, the option is exercised at maturity, so that (HT) = 2. For AT , the option is exercised at time 1 (i.e. in the state 1 = T), which implies that we have () = 1 for AT . Hence, in summary, we have , = HH 2, = HT () = 1, AT .

47

Let us verify that is indeed a stopping time. Examine the sets { | () = t}, for t = 0, 1, 2, , and check if they lie in Ft . We have { | () = 0} { | () = 1} { | () = 2} = F0 , = AT F1 , = HT F2 , HH F2 = F.

{ | () = } =

In each case, { | () = t} Ft , so is a stopping time. Let us show that the set {HT} is determined by the time , but that the set {TH} is not. For A = {HT} and A = {TH}, examine the sets A { | () = t}, for t = 0, 1, 2, , and check if they lie in Ft , with F = F = F2 . We have {HT} { | () = 0} {HT} { | () = 1} {HT} { | () = 2} {HT} { | () = } so that {HT} is determined by . However, for {TH} we get {TH} { | () = 0} {TH} { | () = 1} {TH} { | () = 2} {HT} { | () = } = {TH} = F0 , = {TH} AT = {TH} F1 , / = {TH} {HT} = F2 , = {HT} {HH} = F, = {HT} = F0 , = {HT} AT = F1 , = {HT} {HT} = {HT} F2 , = {HT} {HH} = F,

so that {TH} is not determined by . In a similar manner one can show that the set {HH} is determined by , as is AT , but that {TT} is not, so that the atoms of the -algebra F are {HH}, {HT} and AT = {TH, TT} (note that these are the sets where takes the values , 2, 1, and this signies that having the information in F tells you the value fo , excepting the trivial case where = 0 tells you that nothing has yet happened). This means that F is composed of these sets, plus complements/unions, along with and .

7.3

Properties of American derivative securities*

We list (without proof) properties of such an American contingent claim. See Shreve [13] for proofs in the binomial model. Theorem 7.7 (Properties of American derivative securities). The properties of an American derivative security in a complete discrete-time nancial market, with unique EMM Q and underlying ltration F = (Ft )0tT are as follows. 1. The value Vt of the security at time t T is Vt = max(1 + r)t EQ [(1 + r) Y |Ft ],
t

t T,

where the maximum is over all F-stopping times satisfying t almost surely. 2. The discounted value process ((1+r)t Vt )T is the smallest supermartingale which satises t=0 Vt Yt , 3. Any stopping time which satises V0 = EQ [(1 + r) Y ], 48 t T, almost surely.

is an optimal exercise time. In particular := min{t T|Vt = Yt }, is an optimal exercise time. 4. The hedging portfolio is given by t (1 . . . , t1 ) = Vt (1 . . . , t1 , H) Vt (1 . . . , t1 , T) , St (1 . . . , t1 , H) St (1 . . . , t1 , T) t = 1, . . . , T.

5. Suppose for some t T and , we have Vt () = Yt (). Then the owner of the derivative security should exercise it. If he does not, then the seller of the security can immediately consume 1 EQ [Vt+1 |Ft ]() Vt () 1+r and still maintain the hedge. Proof. See Shreve [13], Chapter 4.

Part III

Continuous time
8
8.1

Brownian motion
Random walk*

Toss a coin innitely many times, so that the sample space is the set of all innite sequences = (1 2 . . .) of H and T. One can construct a well-dened probability space (, F, P) called the space of innite coin tosses (though this is not completely trivial, as is an uncountably innite space), as well as a ltration (F)t0 on this space. We do not have time to delve into this here, for more details see Chapters 1 and 2 of Shreve [14]. Assume that each toss is independent, and that on each toss the probability of H is p, so that the probability of T is q := 1 p. Dene Yj () := if j = H, if j = T. j = 1, 2, . . . (8.1)

The random variable Yj , which always takes one of two values, is sometimes called a Bernoulli random variable. Dene a process M = (Mk ) by k=0 M0 Mk := :=
j=1

0,
k

Yj ,

k = 1, 2, . . . .

(8.2)

The process (Mk ) is called a random walk. It is the sum of independent, identically distributed k=0 (i.i.d.) Bernoulli variables, and is sometimes called a binomial random variable. Remark 8.1 (Symmetric random walk). For = 1, = 1, p = q = 1 , the process (Mk ) is a k=0 2 symmetric random walk, whose analogue in continuous time is Brownian motion, as we shall see.

49

8.2

BM as scaled limit of symmetric random walk*

On the innite coin toss space (, F, P) , dene the random variables Xj () := 1 1 if j = H, if j = T, , j = 1, 2 . . .

1 with P{j = H} = P{j = T} = 2 , so that each Xj has mean zero and variance 1, (i.e. 2 E[Xj |Fj1 ] = 0 and E[Xj |Fj1 ] = 1). So X1 , X2 , . . . is a sequence of independent, identically distributed random variables. Then dene the symmetric random walk M via

M0 Mk

:= :=

0,
k

Xj ,
j=1

k = 1, 2, . . . .

By the Law of Large Numbers we know that 1 Mk 0, k almost surely, as k .

By the Central Limit Theorem we know that for large k, Mk / k is approximately standard normal: 1 Mk Z N (0, 1), almost surely, as k . k Brownian motion arises if we suitably speed up the tossing of the coins and scale the size of each random walk increment. To this end, if t 0 is of the form t = k/n =: kt (so t = 1/n is the time betwen coin tosses) for positive integers k, n, then dene a continuous time process via Wt
(n)

1 := Mnt = n
(n)

t Mk = tMt/t , k

t 0,

with linear interpolation used to dene Wt for any times t 0 not of the form k/n. Take the 1 limit k , with t xed.6 Then since k Mk Z N (0, 1) as k , we have that Wt
(n)

Wt N (0, t),

as n ,

and we call the process (Wt )t0 a standard Brownian motion. Notice that with t = kt, we have (though this is purely formal) dWt dt If, instead of Wt
(n) (n)

= lim

Wt+t Wt t0 t

(n)

(n)

1 = lim Xk+1 . t0 t

, we were to dene Vt
(n)

:=

1 t Mnt = Mk = tMt/t , n k

then by the Law of Large Numbers, Vt and


(n)

0,
(n)

as k ,
(n)

(n) V Vt dVt (tj ) = lim t+t t0 dt t

= lim Xk+1 = 1,
t0

6 Equivalently, n , or equivalently, t 0, where t := 1/n is the time interval between coin tosses. Then (n) k = nt = t/t, and hence t = kt, so we are speeding up the coin tossing, and since t/k = t, Wt = tMt/t = tMk , so that we are scaling each increment of the random walk by t.

50

so while the derivative of V (n) is dened (unlike that of W (n) ), the process V (n) is trivially zero in the limit. In other words the Brownian particle can only have motion if it has innite velocity. This is a manifestation of the fact that paths of Wt are almost surely continuous but not dierentiable, as we will see again in a short while. Remark 8.2 (Random walks and the binomial model). In the binomial model, the logarithm of the stock price process follows a random walk. A similar analysis as above can be used to show that the continuous time limit of a binomial model has stock price process given by St = S0 exp 1 b 2 t + Wt , 2 t 0, (8.3)

where where b and > 0 are constants related to the binomial parameters u, d and to the probability p of the stock price rising in the binomial tree, by log u = 1 b 2 t + t, 2 = 1 b 2 t t, 2 p= 1 , 2

The process (8.3) is known as geometric Brownian motion.

8.3

Brownian motion

We shall see that Brownian motion (BM) is a continuous stochastic process which is Markov, Gaussian, and a martingale. Let X N (, 2 ) denote that a random variable X is normally distributed with mean and variance 2 . Denition 8.3 (Brownian motion). A standard 1-dimensional Brownian motion (BM) is a continuous adapted process W := (Wt , Ft )0t< on some ltered probability space (, F, F := (Ft )t0 , P) with the properties that W0 = 0 a.s. and, for 0 s < t, Wt Ws is independent of Fs and normally distributed as Wt Ws N (0, t s). The ltration (Ft )t0 is a part of the denition of BM. However, if we are given (Wt )t0 but no ltration, and if we know that W has stationary, independent increments and that Wt = W Wt W0 N (0, t), then with (Ft )t0 being the ltration generated by the BM, we have W (Wt , Ft )0t< is a BM in the sense of Denition 8.3 (see ([10], Problem 1.4). Here is another denition BM, based on a process called quadratic variation , one denition of which is given below. Let M2 denote the space of right-continuous square-integrable martingales on a complete ltered probability space (, F, F := (Ft )t0 , P): that is, for M := (Mt )t0 M2 we have M0 = 0 a.s., and E[Mt2 ] < , for all t 0. Denition 8.4 (Quadratic variation). For M M2 , the quadratic variation (QV) of M is the unique, increasing adapted process [M ] such that [M ]0 = 0 a.s. and such that (Mt2 [M ]t )t0 is a martingale. Denition 8.5 (Cross-variation). For X, Y M2 , dene their cross-variation process ([X, Y ]t )t0 by 1 [X, Y ]t := ([X + Y ]t [X Y ]t ) , 4 For X, Y Mc (i.e. continuous), this is the unique increasing adapted process [X, Y ] such that 2 [X, Y ]0 = 0 a.s. and such that ((XY [X, Y ])t )t0 is a martingale. Remark 8.6. For Brownian motion W := (Wt )t0 , we have [W ]t = t, since Wt2 t is a martingale (see Problem Sheet 2). Indeed, Brownian motion may be dened as the unique continuous process that satises this property. We denote by (Ft )t0 the ltration generated by Brownian motion. Its required properties are:

51

For each t, Wt is Ft -measurable; for each t and for t < t1 < t2 < . . . < tn , the Brownian motion increments Wt1 Wt , Wt2 Wt1 , . . . , Wtn Wtn1 are independent of Ft . Here is one way to construct Ft . First x t. Let s [0, t] and A B(R) be given. Put the set {Ws A} = { : Ws () A} in Ft . Do this for all possible numbers s [0, t] and all Borel sets A B(R). Then put in every other set required by the -algebra properties. This -algebra Ft contains exactly the information learned by observing the Brownian motion up to time t, and (Ft )t0 is called the ltration generated by the Brownian motion.

8.4

Properties of BM

Stationarity We say a stochastic process X = (Xt )t0 is stationary if Xt has the same distribution as Xt+h for any h > 0. Brownian motion has stationary increments. To see this, dene the increment process I = (It )t0 by It := Wt+h Wt . Then It N (0, h), and It+h = Wt+2h Wt+h N (0, h) have the same distribution. This is equivalent to saying that the process (Wt+h Wt )h0 has the same distribution for all t. Martingale property The independent increments property allows us to shw that BM is a martingale. For 0 s t we have E[Wt |Fs ] = E[Wt Ws + Ws |Fs ] = E[Wt Ws |Fs ] + Ws = E[Wt Ws ] + Ws = Ws . Covariance of BM at dierent times Let 0 s t be given. Then Ws and Wt Ws are independent, and (Ws , Wt ) are jointly normal with E[Ws ] = E[Wt ] = E[Wt Ws ] = 0, var(Ws ) = s, var(Wt ) = t, var(Wt Ws ) = t s, so that the covariance of Ws and Wt is cov(Ws , Wt ) := E[(Ws E[Ws ])(Wt E[Wt ])] = = = = = E[Ws Wt ] E[Ws (Wt Ws + Ws )]
2 E[Ws (Wt Ws )] + E[Ws ]

E[Ws ]E[Wt Ws ] + s (by independence) s.

Thus, for any s 0, t 0 (not necessarily s t), we have cov(Ws , Wt ) = E[Ws Wt ] = s t = min(s, t), or, equivalently, the covariance matrix of the vector Ws,t = (Ws , Wt ) is C A1 , given by C = A1 = s st st t , (positive denite, symmetric).

Denition 8.7 (Transition density). Fix x R, t0 R+ . Then P(Wt0 +t [y, y + dy]|Wt0 = x) = p(t, x, y) dy, where the transition density of Brownian motion is the function p(t, x, y) = (x y)2 1 exp 2t 2t , y R, t > 0.

This is the probability density that the BM moves from x to y R in a time period t. 52

Starting points other than zero For a standard Brownian motion W that starts at zero we have a probability space (, F, P) that satises P{W0 = 0} = 1. Then for t 0, Wt N (0, t). x For x R, we can dene a process Wtx := x + Wt which will satisfy P{W0 = x} = 1 and, for x t 0, Wt N (x, t). Equivalently, we can dene another probability measure Px (or, more formally, a probability space (, F, Px )) under which Px {W0 = x} = 1, and with W having stationary independent increments under Px : for s t, Wt Ws N (0, t s) and independent of Fs . Then, under Px , Wt N (x, t). In this case, we say that W is a Brownian motion starting at x. We see that such a Brownian motion is equivalent to x + W , where W is a standard Brownian motion starting at zero. Note that: If x = 0, then Px puts all its probability on a completely dierent set from P. The distribution of Wt under Px is the same as the distribution of Wtx = x + Wt under P, that is Law(W x , P) = Law(W, Px ). Markov property We can show that W is a Markov process as follows. Recall that the Markov property is equivalent to stating that for s 0, t 0, we have E[h(Ws+t )|Fs ] = g(Ws ), where h and g are functions. Consider E[h(Ws+t )|Fs ] = E[h(Ws+t Ws + Ws )|Fs ]. Use the properties that Ws+t Ws is independent of Fs , and that Ws is Fs -measurable, along with the following independence lemma: if X, Y are random variables on a probability space (, F, P), and if G is a sub--algebra of F, with X G-measurable Y independent of G. Then if f (x, y) is a function of two variables, and if we dene g(x) := E[f (x, Y )], then we have E[f (X, Y )|G] = g(X). In this lemma, take G = Fs , X = Ws , Y = Ws+t Ws , and f (x, y) = h(x + y). Then dene g(x) := E[h(Ws+t Ws + x)] = = Then E[h(Ws+t )|Fs ] = g(Ws ) = EWs [h(Wt )], which is the Markov property. In fact, Brownian motion has the strong Markov property (though we do not prove this). Strong Markov property Fix x R and dene := min{t 0|Wt = x}. Then we have E[h(W +t )|F ] = g(x) = Ex h(Wt ). E[h(x + Wt ) E [h(Wt )].
x

(since Wt N (0, t) has the same distribution as Ws+t Ws )

53

8.5

Quadratic variation of BM

Denition 8.8 (pth variation). Let P = {t0 , t1 , . . . , tn } be a partition of [0, t], i.e. 0 = t0 t1 . . . tn = t. The mesh of the partition is dened to be P =
k=0,...,n1

max

|tk+1 tk |

The pth variation of a function f : R+ R on an interval [0, t], [f, f ](p) (t), is dened by
n1

[f, f ]t

(p)

:= lim

P 0

|f (tk+1 ) f (tk )|p .


k=0

(8.4)

In particular, if p = 1 this is called the total variation (or the rst variation) and if p = 2 this is called the quadratic variation. 8.5.1 First variation
(1)

Consider the rst variation (or total variation), [f, f ]t , of a function f . Suppose f is dierentiable. Then the Mean Value Theorem7 implies that in each subinterval [tk , tk+1 ], there is a point t such that k f (tk+1 ) f (tk ) = (tk+1 tk )f (t ). k Then
n1 n1

|f (tk+1 f (tk )| =
k=0 k=0

|f (t )|(tk+1 tk ), k

and so
n1

[f, f ]t

(1)

= =

P 0 t

lim

|f (t )|(tk+1 tk ) k
k=0

|f (s)| ds.
0

Thus, rst variation measures the total amount of up and down motion of the path of f over the interval [0, t]. 8.5.2 Quadratic variation of Brownian motion
(2)

To simplify notation, we write [f, f ]t interval [0, t].

= [f ]t for the quadratic variation of a function f over the

Lemma 8.9. If f is dierentiable, then [f ]t = 0. Proof.


n1 n1

|f (tk+1 f (tk )|2


k=0

=
k=0

|f (t )|2 (tk+1 tk )2 k
n1

P
k=0

|f (t )|2 (tk+1 tk ), k

7 The Mean Value Theorem states that if f is dierentiable in (a, b), then there is a point x (a, b) at which f (b) f (a) = (b a)f (x).

54

and so
n1

[f ]t

= =

P 0

lim

P P

P 0 t

lim

|f (t )|2 (tk+1 tk ) k
k=0

P 0

lim

|f (s)|2 ds
0

0.

Theorem 8.10. For Brownian motion W = (Wt )t0 we have [W ]t = t, or more precisely P{ : [W ]t () = t} = 1. In particular, the paths of Brownian motion are not dierentiable. Some words of intuition. Since Wt N (0, t) its moment generating function M (a) is given by M (a) = E[exp(aWt )] = exp 1 2 a t . 2

Expanding the exponentials as Taylor series in powers of a yields E[Wt ] = 0, E[Wt2 ] = t, E[Wt3 ] = 0, E[Wt4 ] = 3t2 . Hence the variance of Wt2 is var[Wt2 ] = E[Wt4 ] (E[Wt2 ])2 = 3t2 t2 = 2t2 . The important observation is that, for small t, the variance of Wt2 will be negligible compared to its expected value. Put another way, the randomness in Wt2 is negligible compared to its mean, for small t. This suggests that if we take a ne enough partition P of [0, t], a nite set of points 0 = t0 < t1 < . . . < tn = t with grid mesh P = max |tk+1 tk | small enough, then writing n1 2 Dk := W tk+1 Wtk and tk := tk+1 tk , we conjecture that k=0 Dk will closely resemble
n1 2 E[Dk ] = k=0 k=0 n1 n1

tk = t.

2 This can be made rigorous, as we show below, and the limit of k=0 Dk as the partition becomes ner is the quadratic variation of Brownian motion over the interval [0, t]. We shall rst prove that the quadratic variation of Brownian motion over [0, t] is equal to t in mean square, and then we shall prove that the result holds almost surely. Recall that a sequence (Xn )nN of random variables converges in mean square (or in L2 (, F, P)) to a random variable X if E[|Xn X|2 ] 0 as n , and converges to X almost surely if P{ |Xn () = X()} 1 as n .

Proof of Theorem 8.10 I: convergence in L2 . Let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. Set Dk := Wtk+1 Wtk and dene the sample quadratic variation
n1

QP :=
k=0

2 Dk .

Then QP t =

n1 2 [Dk (tk+1 tk )]. k=0

55

We want to show that lim P 0 (QP t) = 0 in mean square. Consider an individual summand 2 Dk (tk+1 tk ). This has expectation zero, so
n1

E[QP t] = E
k=0

2 [Dk (tk+1 tk )] = 0.

Therefore, if we compute E[(QP t)2 ] = var(QP t) and nd it to approach zero as P 0, then we have shown that the quadratic variation of Brownian motion is equal to t in mean square or, equivalently, that var(QP ) 0 as P 0, so that QP essentially becomes non-stochastic as P 0. (In fact [W ]t = t with probability one, or almost surely (a.s.), as we shall see.) 2 2 Now, for j = k, the terms Dj (tj+1 tj ) and Dk (tk+1 tk ) are independent (due to the independent increments property of BM), so
n1

var(QP t)

=
k=0 n1

2 var[Dk (tk+1 tk )]

=
k=0 n1

4 2 E[Dk 2(tk+1 tk )Dk + (tk+1 tk )2 ]

=
k=0

[3(tk+1 tk )2 2(tk+1 tk )2 + (tk+1 tk )2 ]


n1

2
k=0

(tk+1 tk )2
n1

= Thus we have

2 P
k=0

(tk+1 tk )

2 P t. var(QP t) 2 P t.

E(QP t) = 0,

As P 0, var(QP t) 0, or E[(QP t)2 ] 0 as P 0 (i.e. as n ), so QP t, in L2 .

Proof of Theorem 8.10 II: a.s. convergence. To show that the convergence is also almost sure, consider the dyadic partition tk = kt/2m , k = 0, 1, . . . , 2m , i.e. we partition [0, t] into 2m intervals of width t/2m , so that the mesh of the partition approaches zero as m . Then the sample quadratic variation over [0, t] may be written as
2m 1 2m 1

Qm (t) :=
k=0

(k+1)t/2m

2 kt/2m

=:
k=0

(Wk )2 ,

where we have written Wk = W(k+1)t/2m Wkt/2m . We have Wk N (0, t/2m ), Wk , Wj are independent for k = j, and hence (Wk )2 , (Wj )2 are independent for k = j. Recall that for X N (0, v) we have E[X 4 ] = 3v 2 , so that var[X 2 ] = E[X 4 ] (E[X 2 ])2 = 3v 2 v 2 = 2v 2 .
2 Therefore, from E[Wk ] = t/2m we get

E[Qm (t)] = t,

56

regardless of m. Further, by the independence of the squared increments we have E[(Qm (t) t)2 ] = = var(Qm (t))
2m 1

var
k=0 2m 1

(Wk )2
2 var(Wk )

=
k=0

= =

2m 2

t 2m

2t2 0, 2m

as m .

Therefore, since the limit of Qm (t) as m is [W ]t . we have established the mean square convergence [W ]t = lim Qm (t) t,
m

in L2 .

Now we show almost sure convergence using the Chebyshev inequality and the Borel-Cantelli lemmas (see, for instance Grimmett and Stirzaker [6], Section 7.3).8 By Chebyshevs inequality we have, for a > 0, P{|Qm (t) t| > a} So 1 2t2 E[(Qm (t) t)2 ] = 2 m . a2 a 2

2t2 m2 . 2m Write Am = {|Qm (t) t| > 1/m}, and consider the sequence of events (Am ) . Then m=1 m=1 P(Am ) < , so by the Borel-Cantelli lemmas, the event that innitely many of the Am occur has probability given by P{|Qm (t) t| > 1/m} m2 E[(Qm (t) t)2 ] =

P lim sup Am
m

=P
m=1 k=m

Ak

= 0.

In other words, |Qm (t) t| 1/m for large m, almost surely, or [W ]t = lim Qm (t) t,
m

almost surely.

8 Chebyshevs

inequality follows from the following result, which is Theorem 7.3.1 in [6].

Theorem 8.11. Let h : R [0, ) be a non-negative function, Then P(h(X) a) E[h(X)] , a a > 0.

Proof. Let A := {h(X) a}. Then h(X) aIA . Taking expectations gives the result. Setting h(x) = |x| gives Markovs inequality. Taking h(x) = x2 gives Chebyshevs inequality: P(|X| a) E[X 2 ]/a2 . The Borel-Cantelli lemmas (Theorem 7.3.10 in [6]) state: Theorem 8.12 (Borel-Cantelli lemmas). Let A1 , A2 , . . . be an innite sequence of events from some probability space (, F , P). Let A be the event that innitely many of the An occur (or {An innitely often} = {An i.o.}, given by \ \ A := {An i.o.} = lim sup An = Ak .
n n=1 k=n

Then: 1. P(A) = 0 if 2. P(A) = 1 if


n=1 P

P(An ) < ,

n=1 P(An ) = and A1 , A2 . . . are independent events.

57

Path length* Given a continuous function f : [0, t] R its total variation over [0, t] is, over any partition P = {0 = t0 t1 . . . tn = t} of [0, t],
n1

T V (f ) [f, f ]t

(1)

= lim

P 0

|f (ti+1 ) f (ti )|.


i=0

This may be innite, or some nite number, in which case we say that f has bounded variation. Consider an element of arc length si along f in the interval [ti , ti+1 ]. If this interval is small, we have (si )2 (ti )2 +(fi )2 , where we have written ti = ti+1 ti and fi = f (ti+1 )f (ti ). By the triangle inequality we have |fi | |si | |fi | + |ti |. Denoting the total arc length (or path length) of f over [0, t] by s[f ] we therefore have, in the limit P 0, T V (f ) s(f ) T V (f ) + t. Therefore, nite path length T V (f ) < .

In contrast, the quadratic variation of f over [0, t] is


n1

[f ]t

P 0

lim

|fi ||fi |
i=0 n1

= For any continuous function, lim

P 0

lim

i=0,...,n1

max

|fi |

P 0

lim

|fi |
i=0

P 0 P 0

lim

i=0,...,n1

max

|fi | T V (f ).

(maxi=0,...,n1 |fi |) 09 , so we conclude that [f ]t = 0 for all t 0.

T V (f ) <

In other words, paths of Brownian motion (Ws )0st over the interval [0, t] have innite path length. Because the total variation of Brownian motion is innite (i.e. Brownian paths are very t long) one is not able to give meaning to integrals with respect to Brownian motion, 0 bs dWs , via a path-by-path procedure. Thus we are led to a new type of integral, the It stochastic o integral, which we shall describe shortly. Remark 8.13 (Heuristics). If we (formally) write dWt for the innitesimal (corresponding to the t innitesimal time interval dt) increase in Wt , then we have 0 dWs dWs = t, which is often summarised by the formula dWt dWt = dt. A better way to write this would be Formally, note that if dWt dWt = dt, then in some sense dWt / dt = 1/ dt as dt 0. In other words, Brownian motion is nowhere dierentiable, as we saw earlier. Lvys characterisation of Brownian motion* BM W is a martingale with continuous e paths whose quadratic variation is [W ]t = t. In fact, this is a complete characterisation of BM, given in the following Theorem (see Shreve [14], Section 4.6.3 for more details). Theorem 8.14 (Lvys theorem, 1-dimensional). Let M be a martingale relative to a ltration, e with M0 = 0, continuous paths, and [M ]t = t for all t 0. Then M is a BM.
9 This

d[W ]t = dt.

is a standard theorem from real analysis, proven from compactness arguments.

58

The It integral o

Now we consider how to dene an integral with respect to Brownian motion. The probability space (, F, P) (with F = (F)t0 the ltration generated by Brownian motion) is given, and always lurks in the background, even when not explicitly mentioned. Recall that Brownian motion, Wt () : [0, ) R has the properties: 1. W0 = 0; (technically, P{ : W0 () = 0} = 1); 2. Wt is a continuous function of t; 3. If 0 = t0 t1 . . . tn = t, then the increments Wt1 Wt0 , . . . , Wtn Wtn1 are independent, normal, with E[Wtk+1 Wtk ] E[(Wtk+1 Wtk ) ]
2

= =

0, tk+1 tk , k = 0, 1, . . . , n 1.

9.1

Construction of the It integral* o


t

We want to construct the It integral, which we write as o It =


0

bs dWs ,

t 0.

The integrator is Brownian motion, (Wt )t0 , with associated ltration (Ft )t0 and the following properties: 1. s t every set in Fs is also in Ft ; 2. Wt is Ft -measurable, t 0; 3. for t t1 . . . tn , the increments Wt1 Wt , Wt2 Wt1 , . . . , Wtn Wtn1 are independent of Ft . The integrand is a process b = (bt )t0 , where 1. bt is Ft -measurable t 0 (i.e. (bt )t0 is adapted to the ltration (Ft )t0 ); 2. b is square-integrable:
t

E
0

b2 ds < , s

t 0.

Remark 9.1. For a dierentiable function f (t), we can dene


t t

b(s) df (s) =
0 0

b(s)f (s) ds.

This wont work when the integrator is Brownian motion, because the paths of Brownian motion are not dierentiable.

59

9.2

It Integral of an elementary integrand* o

Let P = {t0 , t1 , . . . , tn } be a partition of [0, t], i.e. 0 = t0 t1 . . . tn = t. Assume that b is constant on each interval [tk , tk+1 ), such that for t [tk , tk+1 ), bt = btk . We call such a process b an elementary process, or a simple process. The It integral It of such a o process is dened by
n1

It :=
k=0

btk (Wtk+1 Wtk ),

t 0.

Remark 9.2 (Interpretation as gains from trading). We can interpret the functions b and W as follows: Think of Wt as the price per share of an asset at time t. Think of t0 , t1 , . . . , tn as the trading dates for the asset. Think of btk as the number of shares of the asset held in the interval [tk , tk+1 ), i.e. acquired at trading date tk and held until trading date tk+1 (so the process , dened such that tk+1 = btk , is a predictable process). Then the Ito integral It can be interpreted as the gain from trading at time t. Denition 9.3 (It integral of elementary process). If tk t tk+1 , then the It integral o o t It = 0 bs dWs of the elementary process b is dened by
t k1

It :=
0

bs dWs :=
j=0

btj (Wtj+1 Wtj ) + btk (Wt Wtk ),

t 0.

9.3

Properties of the It integral of an elementary process o


For each t 0, It is Ft -measurable;

Adaptedness Linearity

With
t t

It =
0

bs dWs ,
t

Jt =
0

as dWs ,

then for , R, It + Jt =
0

(bs + as ) dWs .

Martingale property (It )t0 is a martingale. Let us prove this for the case of an integrand which is an elementary process. Theorem 9.4 (Martingale property). The process I = (It )t0 dened by
k1

It :=
j=0

btj (Wtj+1 Wtj ) + btk (Wt Wtk ),

is a (P, F)-martingale. Proof. Let 0 s t be given. We treat the more dicult case that s and t are in dierent subintervals, i.e. there are partition points t and tk such that s [t , t +1 ] and t [tk , tk+1 ].

60

Write
k1

It

=
j=0 1

btj (Wtj+1 Wtj ) + btk (Wt Wtk )

=
j=0

btj (Wtj+1 Wtj )


k1

bt (Wt +1 Wt ) +
j= +1

btj (Wtj+1 Wtj ) + btk (Wt Wtk ).

(9.1)

Compute conditional expectations. For 0 s t, we have


1 1

E
j=0

btj (Wtj+1

W tj ) F s

=
j=0

btj (Wtj+1 Wtj ),

E bt (Wt +1 Wt )|Fs

= bt (E[Wt +1 |Fs ] Wt ) = bt (Ws Wt ).

These are the conditional expectations of the rst rst two terms on the RHS of (9.1). They add up to Is and so contribute this to E[It |Fs ]. We show that the third and fourth terms contribute zero:
k1 k1

E
j= +1

btj (Wtj+1 Wtj ) Fs

=
j= +1 k1

E E btj (Wtj+1 Wtj )|Ftj |Fs

=
j= +1

E btj (E[Wtj+1 |Ftj ] Wtj )|Fs = 0,

and E [btk (Wt Wtk )|Fs ] = E [btk (E[Wt |Ftk ] Wtk )|Fs ] = 0.

The It isometry Because (It )t0 is a martingale and I0 = 0 we have E[It ] = 0 for all t 0. o 2 It follows that var(It ) = E[It ], a quantity given by the formula in the next theorem. Theorem 9.5 (It isometry). The It integral of the elementary process b, dened by o o
k1

It :=
j=0

btj (Wtj+1 Wtj ) + btk (Wt Wtk ),

(9.2)

satises
t 2 E[It ] = E 0

b2 ds , s

t 0.

Proof. To simplify notation, write Dj = Wtj+1 Wtj , j = 0, . . . , k 1 and Dk = Wt Wtk , so k that (9.2) is written as It = j=0 btj Dj . Then
k 2 It = j=0 2 b2j Dj + 2 t 0i<jk

bti btj Di Dj .

First we show that the expected value of the cross terms is zero. For i < j, the random variable bti btj Di is Ftj -measurable, while the Brownian increment Dj is independent of Ftj , so E[Dj |Ftj ] = E[Dj ] = 0. Therefore, E[bti btj Di Dj ] = E E[bti btj Di Dj |Ftj ] = E bti btj Di E[Dj |Ftj ] = 0. 61

2 Now consider the square terms b2j Dj . The random variable b2j is Ftj -measurable, while the t t 2 2 2 squared Brownian increment Dj is independent of Ftj , so E[Dj |Ftj ] = E[Dj ] = tj+1 tj , for 2 2 j = 0, . . . , k 1, and E[Dk |Ftk ] = E[Dk ] = t tk . Therefore, k 2 E[It ]

=
j=0 k

2 E[b2j Dj ] t

=
j=0 k

2 E E[b2j Dj |Ftj ] t

=
j=0 k

2 E[b2j E[Dj |Ftj ]] t

=
j=0 k1

2 E[b2j E[Dj ]] t

=
j=0

E[b2j (tj+1 tj )] + E[b2k (t tk )]. t t


tj+1 2 bs tj

But btj is constant on [tj , tj+1 ), so b2j (tj+1 tj ) = t so


k1 2 E[It ] tj+1

ds and similarly, b2k (ttk ) = t


t

t tk

b2 ds, s

=
j=0

E
tj k1 tj+1

b2 ds s

+E
tk t

b2 ds s b2 ds s

= E

b2 ds s
j=0 t tj

+
tk

= E
0

b2 ds . s

Quadratic variation of the integral The quadratic variation of the integral, thought of as the quadratic variation process (I]t )t0 of the integral process I = (It )t0 . Brownian motion t t t Wt = 0 1 dWs has quadratic variation [W ]t = 0 12 d[W ]s = 0 1 dt. We say that Brownian motion accumulates quadratic variation at the rate of one per unit time. In the It integral It = o t bs dWs , BM is scaled in a time and path-dependent way (i.e. depending on (s, ) [0, t] ) 0 by the integrand bs . Because increments are squared in the computation of quadratic variation, the QV of BM will be scaled by b2 as it enters the integral. The following theorem gives the s precise statement. Theorem 9.6 (Quadratic variation of the It integral). Let b be a simple process. Then the It o o integral
t

It =
0

bs dWs ,

t 0,

has quadratic variation process ([I]t )t0 given by


t

[I]t =
0

b2 ds, s

t 0.

We say that the It integral accumulates quadratic variation at a rate b2 , s [0, t] per unit o s t time, and that the quadratic variation accumulated up to time t by the integral is [I]t = 0 b2 ds. s

62

Proof. First compute the quadratic variation accumulated by the integral on one of the subintervals [tj , tj+1 ) on which bs = btj , s [tj , tj+1 ), is constant. Choose partition points tj = s0 < s1 < . . . < sm = tj+1 , and consider
m1 m1

(Isi+1 Isi )2
i=0

=
i=0

[btj (Wsi+1 Wsi )]2


m1

= b2j t
i=0

(Wsi+1 Wsi )2 .

(9.3)

As m and the mesh of the partition, maxi=0,...,m1 (si+1 si ) approaches zero, the term m1 2 i=0 (Wsi+1 Wsi ) converges to the QV accumulated by BM over [tj , tj+1 ), which is tj+1 tj . Therefore, the limit of the RHS of (9.3), which is the QV accumulated by the integral over [tj , tj+1 ), is
tj+1

b2j (tj+1 tj ) = t

b2 ds, s
tj

where we have use the fact that bs is constant for s [tj , tj+1 ). Similarly, the QV accumulated t by the integral over [tk , t] is tk b2 ds. Adding up all these contributions proves the theorem. s Informally, we establish the theorem in dierential form via dIt = bt dWt = d[I]t = dIt dIt = b2 dWt dWt = b2 d[W ]t = b2 dt, t t t just as we wrote d[W ]t = dWt dWt = dt earlier. In fact, one can do a lot of the calculations in It calculus simply by applying the informal multiplication rules: o dWt dWt = dt, dWt dt = dt dt = 0.

Note the contrast between Theorems 9.5 and 9.6. The QV [I]t is computed path-by-path, so the result can depend on the path, and so in principle is random. The variance of the integral is precisely the expectation of the QV, as given by the It isometry (i.e. it is is an average over all o possible paths of the QV), and so is non-random.

9.4

It Integral of a general integrand* o

Fix t > 0. Let b be a process (not necessarily an elementary process) such that bs is Fs -measurable, s [0, t]; E
t 2 b 0 s

ds < .

We then have the following result. Theorem 9.7. There is a sequence of elementary processes (b(n) ) such that n=1
t n

lim E
0

(n) |bs bs |2 ds = 0.

Proof. See [12], Section 3.1, or [10], Section 3.2 and Problem 3.2.5 in [10].

63

We have shown how to dene It


(n)

=
0

b(n) dWs , s

for every n N. We now dene the general It integral by o


t t

bs dWs := lim
0

b(n) dWs . s
0

The only diculty with this approach is that we need to make sure the above limit exists. Suppose m and n are large positive integers. Then E[|It
(n)

It

(m) 2

| ]

var(It

(n) t

It

(m)

)
2

= E
0 t

(b(n) b(m) ) dWs s s (b(n) b(m) )2 ds s s


0 t

(It isometry) o (triangle inequality) ((a + b)2 2(a2 + b2 ))

E
0

(|b(n) bs | + |bs b(m) |)2 ds s s


t t (n) |bs bs |2 ds + 2E 0 0

2E

|bs b(m) |2 ds , s

which approaches zero as m, n , by Theorem 9.7. This guarantees that the sequence (n) (It ) is a Cauchy sequence in L2 (, F, P) and so has a limit. n=1

9.5

Properties of the general It integral o


t

The general It integral is o It =


0

bs dWs ,

where b is any adapted, square-integrable process. Its properties are inherited from the properties of It integrals of simple processes and are summarised below. o Adaptedness Linearity If
t t

For each t 0, It is Ft -measurable;

It =
0

bs dWs ,
t

Jt =
0

as dWs ,

then for , R, It + Jt =
0

(bs + as ) dWs .

Martingale property (It )t0 is a martingale. In fact, we have the converse result, known as the martingale representation theorem (which we do not prove, see [12] for example). Theorem 9.8 (It representation theorem for Brownian motion). Let (Wt )t0 be a Brownian o motion on a ltered probability space (, F, F := (Ft )t0 , P), with (Ft )t0 the natural ltration Ft = (Ws , 0 s t). Suppose that X L2 (, Ft , P). Then there exists an adapted process b such that E
t 2 b 0 s

ds < , t 0 and
t

X = E[X] +
0

bs dWs .

64

Theorem 9.9 (Martingale representation theorem for Brownian motion). Let (Wt )t0 be a Brownian motion on a ltered probability space (, F, F := (Ft )t0 , P), with (Ft )t0 the natural ltration Ft = (Ws , 0 s t). Suppose that the process M = (Mt )t0 is a square-integrable, M M2 with respect to this ltration, i.e. Mt L2 (, Ft , P) for all t 0. Then there exists an adapted process b such that E
t 2 b 0 s

ds < , t 0 and
t

Mt = M0 +
0

bs dWs .

2 The It isometry The variance of the It integral is var(It ) = E[It ] given by o o t 2 E[It ] = E 0

b2 ds . s

Continuity

It is a continuous function of the upper limit of integration t.

Quadratic variation The It integral o


t

It =
0

bs dWs , t 0,

has quadratic variation process ([I]t )t0 given by


t

[I]t =
0

b2 ds. s

Example 9.10. Consider the It integral o


t

It =
0

Ws dWs .
(n)

We approximate the integrand by an elementary process bs , s [0, t], in the following way. Partition the interval [0, t] into n time intervals t, so that t = nt, and 0 = t0 < t1 = t = and dene bs
(n)

t kt < . . . < tk = kt = < . . . < tn = t, n n

by b(n) = Wtk = Wkt/n , if s


kt n

s<

(k+1)t , n

k = 0, . . . , n 1.

Then by denition
t n1

It =
0

Ws dWs = lim

Wkt/n W(k+1)t/n Wkt/n .


k=0

To simplify notation, write Wk Wkt/n so that


t n1

Ws dWs = lim
0

Wk (Wk+1 Wk ).
k=0

Then we note that


2 2 Wk+1 Wk

= =

2 (Wk+1 Wk )2 + 2Wk Wk+1 2Wk

(Wk+1 Wk )2 + 2Wk (Wk+1 Wk ),

65

so that
n1

Wk (Wk+1 Wk )
k=0

1 2 1 2

n1 2 2 (Wk+1 Wk ) k=0 n1 2 Wn k=0

n1

(Wk+1 Wk )2
k=0

(Wk+1 Wk )2

. (W0 = 0)

Now we let n and use the denition of quadratic variation to get


t

Ws dWs =
0

1 1 (W 2 [W ]t ) = (Wt2 t). 2 t 2

1 Remark 9.11 (Reason for the 2 t term). If f is a dierentiable function with f (0) = 0, then t t

f (s) df (s) =
0 0

f (s)f (s) ds =

1 2 1 f (s)|t = f 2 (t). 0 2 2

In contrast, for Brownian motion, we have


t

Ws dWs =
0 1 2t

1 (W 2 t). 2 t

The extra term comes from the nonzero quadratic variation of Brownian motion. It has to be t there, because E[ 0 Ws dWs ] = 0 (the It integral is a martingale), but E[ 1 Wt2 ] = 1 t. Note that o 2 2 this remark is equivalent to our initial characterisation of Brownian motion in Remark 8.6.

10
10.1

The It formula o
Its formula for one Brownian motion o

We want a rule to dierentiate expressions of the form f (Wt ). If Wt were dierentiable then the ordinary chain rule would give d f (Wt ) = f (Wt )Wt , dt which could be written in dierential notation as df (Wt ) = f (Wt )Wt dt = f (Wt ) dWt . However, Wt is not dierentiable, and in particular has nonzero quadratic variation, so the correct formula has an extra term, namely, 1 df (Wt ) = f (Wt )Wt dt = f (Wt ) dWt + f (Wt ) d[W ]t , 2 with the understanding that d[W ]t = dt. This is a version of Its formula in dierential form. o Integrating this, we obtain a version of Itos formula in integral form. Theorem 10.1 (It formula for one BM). If f (x) is a C 2 (R) function and t 0, then o
t

f (Wt ) f (W0 ) =
0

f (Ws ) dWs +

1 2

f (Ws ) d[W ]s .
0

(10.1)

Remark 10.2 (Dierential versus integral forms). The mathematically meaningful form of Its o formula is its integral form, because we have solid denitions for the integrals appearing on the RHS of (10.1). For pencil and paper computations, the more convenient form is the dierential form. 66

Proof of Theorem 10.1. Fix t > 0 and let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. By Taylors theorem we have
n1

f (Wt ) f (W0 )

=
k=0

[f (Wtk+1 ) f (Wtk )]
n1

=
P 0 k=0 t 0

1 f (Wtk )(Wtk+1 Wtk ) + f (Wtk )(Wtk+1 Wtk )2 + 2 f (Ws ) dWs + 1 2


t

f (Ws ) d[W ]s ,
0

with higher order terms disappearing and the last summation converging to a Riemann integral as it becomes the quadratic variation of an It integral, i.e. for the It integral o o
t n1

It =
0

bs dWs = lim

P 0

btk (Wtk+1 Wtk ),


k=0

we have
t n1

b2 ds = [I]t s
0

P 0

lim

(Itk+1 Itk )2
k=0 n1

P 0

lim

b2k (Wtk+1 Wtk )2 . t


k=0

A heuristic derivation would simply state that, by Taylors theorem 1 df (Wt ) = f (Wt ) dWt + f (Wt ) dt, 2 where we have used dWt dWt = dt in the last term on the RHS, and higher order terms are neglected. Denition 10.3 (Geometric Brownian motion). Geometric Brownian motion is the process S = (St )t0 given by 1 St = S0 exp Wt + 2 t , 2 where and > 0 are constant, and the parameter is called the volatility of the process S. Dene 1 f (t, x) = S0 exp x + ( 2 )t , 2

so that St = f (t, Wt ) and ft (t, x) = 1 2 f (t, x), 2 fx (t, x) = f (t, x), fxx (t, x) = 2 f (t, x),

with the subscripts denoting partial derivatives. Then by Its formula o dSt = df (t, Wt ) 1 = ft (t, Wt ) dt + fx (t, Wt ) dWt + fxx (t, Wt ) dt 2 1 2 1 = f (t, Wt ) dt + f (t, Wt ) dWt + 2 f (t, Wt ) dt 2 2 1 1 = 2 St dt + St dWt + 2 St dt 2 2 = St dt + St dWt , 67

which is geometric Brownian motion in dierential form, Geometric Brownian motion in integral form may be written as
t t

St = S0 +
0

Ss ds +
0

Ss dWs .

Quadratic variation of geometric Brownian motion In the integral form of geometric Brownian motion,
t t

St = S0 +
0

Ss ds +
0 t

Ss dWs ,

the Riemann integral F (t) =


0

Ss ds

is dierentiable with F (t) = St . This term has zero quadratic variation. The It integral o
t

G(t) =
0

Ss dWs

is not dierentiable. It has quadratic variation


t

[G]t =
0

2 2 Ss ds.

Thus the quadratic variation of S is given by the quadratic variation of G, i.e.


t

[S]t = [G]t =
0

2 2 Ss ds.

In dierential notation we write


2 d[S]t = dSt dSt = 2 St dt,

which follow from the following informal multiplication rules involving the dierentials dt and dWt : d[W ]t = dWt dWt = dt, dWt . dt = dt. dWt = dt. dt = 0. Remark 10.4. Note that
0 t

d[S]s = 2 Ss

2 ds = 2 t,
0

indicating that for geometric Brownian motion, the quadratic variation, when scaled by the square of the stock price process, is a measure of the volatility of the process S.

10.2

Its formula for It processes o o

Denition 10.5 (It process). Let (Wt , Ft )t0 be a standard Brownian motion. An It process o o is a stochastic process of the form
t t

Xt = X0 +
0

as ds +
0

bs dWs ,

t 0,
t 0

(10.2) |as | ds < and

where X0 is non-random and a, b are adapted stochastic processes satisfying E


t 2 b 0 s

ds < .

In dierential form we write (10.2) as dXt = at dt + bt dWt .

68

Lemma 10.6 (Quadratic variation of an It process). The quadratic variation of the It process o o (10.2) is the process ([X]t )t0 given by
t

[X]t =
0

b2 ds, s

t 0.
t 0

Proof. This is immediate from the fact that the quadratic variation of

as ds is zero.

Denition 10.7 (Integral with respect to an It process). Let (Xt )t0 be the It process (10.2) o o and let (t )t0 be an adapted process satisfying
t t 2 s b2 ds < , s 0 0

|s as |ds < ,

for every t 0. The It integral of with respect to X is the process J = (Jt )t0 dened by o
t t t

Jt :=
0

s dXs :=
0

s bs dWs +
0

s as ds.

Theorem 10.8 (It formula for It processes). Let (Xt )t0 be the It process (10.2) and let o o o f (t, x) C 1,2 ([0, ) R). Then, for every t > 0,
t t

f (t, Xt )

= f (0, X0 ) +
0 t

ft (s, Xs ) ds +
0

fx (s, Xs ) dXs +

1 2

fxx (s, Xs ) d[X]s


0

= f (0, X0 ) +
0 t

1 ft (s, Xs ) + as fx (s, Xs ) + b2 fxx (s, Xs ) 2 s

ds

+
0

bs fx (s, Xs ) dWs .
t 2 b 0 s

Proof. As for the It formula with respect to BM, and use the fact that [X]t = [I]t = o It is usually easier to remember and use this theorem in the dierential form 1 df (t, Xt ) = ft (t, Xt ) dt + fx (t, Xt ) dXt + fxx (t, Xt ) d[X]t , 2 where d[X]t = dXt dXt is computed according to the rules dt dt = dt dWt = dWt dt = 0,
t t

ds.

dWt dWt = dt.

Example 10.9 (Generalised geometric Brownian motion). Dene the It process o Xt =


0

s dWs +
0

1 2 s s 2

ds,

t 0,

where , are adapted processes. Then dXt d[X]t 1 2 = t dWt + t t 2


2 2 = t d[W ]t = t dt.

dt,

A common model for an asset price process S = (St )t0 is given by St = S0 eXt , with S0 > 0 non-random, which is called a generalised geometric Brownian motion. We write o St = f (Xt ) where f (x) = S0 ex . The It formula gives dSt = t St dt + t St dWt . Applying the It formula to the function g(t, St ) = log St , we nd that o 1 2 d(log St ) = dXt = t dWt + t t 2 69 dt.

10.3

Stochastic dierential equations


t t

Given an It process X := (Xt )t0 satisfying o Xt = x +


0

s ds +
0

s dWs ,

which we usually write in dierential form dXt = t dt + t dWt , (10.3)

then given a function f C 1,2 ([0, ) R) (i.e. f = f (t, x) for t [0, ), x R, f : [0, ) R R, dierentiable at least once with respect to t and at least twice with respect to x), the process (Yt )t0 dened by Yt := f (t, Xt ) has dierential given by 1 dYt df (t, Xt ) = ft (t, Xt ) dt + fx (t, Xt ) dXt + fxx (t, Xt ) d[X]t , 2 where d[X]t = dXt dXt is computed according to the rules dt dt = dt dWt = dWt dt = 0, In integral form Yt is given by
t

dWt dWt = dt.

Yt = Y0 +
0

1 2 ft (s, Xs ) + s fx (s, Xs ) + s fxx (s, Xs ) 2

ds +
0

s fx (s, Xs ) dWs .

Markovian diusions If, in (10.3), we have t = (t, Xt ), t = (t, Xt ) for well-behaved (see precise conditions later) functions (t, x), (t, x), so that dXt = (t, Xt ) dt + (t, Xt ) dWt , which is called a stochastic dierential equation (SDE) for X, then the process X is Markovian: E[h(XT )|Ft ] = E[h(XT )|Xt ], and the integral equation for Y may be written
t t

0 t T,

Yt = Y0 +
0

(ft (s, Xs ) + Af (s, Xs )) ds +


0

(s, Xs )fx (s, Xs ) dWs ,

where A is called the generator of the diusion X, and is dened by 1 Af (t, x) := (t, x)fx (t, x) + 2 (t, x)fxx (t, x). 2 Solutions to stochastic dierential equations We ask whether there exists a well-dened process X satisfying the stochastic dierential equation (SDE) dXt = (t, Xt ) dt + (t, Xt ) dWt , or, more precisely, whether there exists a process X satisfying X0 = x and
t t

(10.4)

Xt = x +
0

(s, Xs ) ds +
0

(s, Xs ) dWs ,

t 0.

The basic existence result is as follows. Suppose there is a constant K such that for all x, y, t we have |(t, x) (t, y)| K|x y|, |(t, x) (t, y)| K|x y| |(t, x)| + |(t, x)| K(1 + |x|).

(The rst two conditions are Lipschitz continuity in x.) Then the SDE (10.4) has a unique, adapted, continuous Markovian solution, and there exists a constant C such that E[|Xt |2 ] CeCt (1 + |x|2 ). 70

Example 10.10 (Exponential martingales). Let be a process adapted to the ltration of the Brownian motion W . Dene the process Z = (Zt )0tT by
t

Zt = exp
0

s dWs

1 2

t 2 s d[W ]s . 0

In Problem Sheet 2 we show via the It formula that o dZt = t Zt dWt , and deduce that Z is a martingale provided that E
T 0 2 2 t Zt dt < .

Remark 10.11 (Novikov condition). A sucient condition for Z to be a martingale is the Novikov condition 1 T 2 dt < . E exp 2 0 t

10.4

Multidimensional Brownian motion

Denition 10.12 (d-dimensional Brownian Motion). A d-dimensional Brownian motion is a process Wt = (Wt1 , . . . , Wtd ) with the following properties: Each Wti (i = 1, . . . , d) is a one-dimensional Brownian motion; If i = j, then the processes Wti and Wtj are independent. Associated with a d-dimensional Brownian motion, we have a ltration (Ft )t0 such that: For each t, the random vector Wt is Ft -measurable; For each t t1 . . . tn , the vector increments Wt1 Wt , . . . , Wtn Wtn1 are independent of Ft . 10.4.1 Cross-variations of Brownian motions

Because each component Wti of Wt is a one-dimensional Brownian motion, we have [W i ]t = t, i = 1, . . . , d.

However, if we dene the cross-variation between W i and W j as


n1

[W i , W j ]t := lim

P 0

(Wtik+1 Wtik )(Wtjk+1 Wtjk ),


k=0

i, j = 1, . . . , d,

where P = {t0 , t1 , . . . , tn } is a partition of [0, t], then we have: Theorem 10.13. If i = j, then [W i , W j ]t = 0. Proof. Let P = {t0 , t1 , . . . , tn } be a partition of [0, t]. For i = j, dene the sample cross variation of Wti and Wtj on [0, t] to be
n1

CP :=
k=0

(Wtik+1 Wtik )(Wtjk+1 Wtjk ).

71

The increments appearing on the RHS of the above equation are all independent of one another and all have mean zero. Therefore E[CP ] = 0.
2 We compute var(CP ) = E[CP ]. First note that n1 2 CP

=
k=0

(Wtik+1 Wtik )2 (Wtjk+1 Wtjk )2


n1

2
<k

(Wti +1 Wti )(Wtj +1 Wtj )(Wtik+1 Wtik )(Wtjk+1 Wtjk ).

All the increments appearing in the sum of cross terms are independent of one another and have mean zero. Therefore
n1 2 var(CP ) = E[CP ] = k=0

(Wtik+1 Wtik )2 (Wtjk+1 Wtjk )2 .

But (Wtik+1 Wtik )2 and (Wtjk+1 Wtjk )2 are independent of one another, and each has expectation (tk+1 tk ). It follows that
n1 n1

var(CP ) =
k=0

(tk+1 tk )2 P
k=0

(tk+1 tk ) = P t.

As P 0 we have var(CP ) 0 so CP converges in mean square10 to the constant E[CP ] = 0.

Lvys characterisation of Brownian motion* Lvys characterisation of BM (as given e e in Theorem 8.14 extends to the multi-dimensional case (see Shreve [14], Section 4.6.3 for more details). Theorem 10.14 (Lvys theorem, d-dimensional). Let M be a d-dimensional martingale relative e to a ltration, with M0 = 0, continuous paths, and [M i , M j ]t = ij t for all t 0. Then M is a d-dimensional BM.

10.5

Two-dimensional It formula o

There is a multi-dimensional version of the It formula. We content ourselves for now with the o following two-dimensional version. The formula generalises (as we shall see) to any number of processes driven by a Brownian motion of any number (not necessarily the same number) of dimensions. Let W := (W 1 , W 2 )T (T denoting transposition) be a two-dimensional Brownian motion (so that W 1 , W 2 are independent Brownian motions), and let X := (X 1 , X 2 )T be a two-dimensional It process following o dXt = at dt + bt dWt , where at = so that (10.5) is equivalent to
1 dXt 2 dXt
10 The

(10.5)

a1 t a2 t

bt =

b11 t b21 t

b12 t b22 , t

= =

a1 dt + b11 dWt1 + b12 dWt2 , t t t a2 dt + b21 dWt1 + b22 dWt2 , t t t

convergence also holds almost surely, though we do not prove this here.

72

or in integral form
t 1 Xt 2 Xt

= x1 +
0 t

1 2 a1 ds + b11 dWs + b12 dWs , s s s 1 2 a2 ds + b21 dWs + b22 dWs , s s s 0

= x2 +

or in compact form
t

Xt = x +
0

as ds + bs dWs ,

where x = (x1 , x2 )T . Such processes, consisting of a nonrandom initial condition, plus a Riemann integral, plus one or more It integrals, are examples of semimartingales. The integrands as , bs o can be any adapted processes such that the relevant integrals exist. The adaptedness of the integrands guarantees that X is also adapted. Theorem 10.15 (Two-dimensional It formula). Let f (t, x1 , x2 ) be a function f : [0, ) R2 o 1 2 R. Then the process Y := (Yt )t0 dened by Yt := f (t, Xt , Xt ) f (t, Xt ) follows dYt = +
1 2 1 2 1 1 2 2 ft (t, Xt , Xt ) dt + fx1 (t, Xt , Xt ) dXt + fx2 (t, Xt , Xt ) dXt 1 1 1 2 1 2 1 2 fx1 x1 (t, Xt , Xt ) d[X 1 ]t + fx2 x2 (t, Xt , Xt ) d[X 2 ]t + fx1 x2 (t, Xt , Xt ) d[X 1 , X 2 ]t , 2 2

j i where d[X i , X j ]t = dXt dXt , i = 1, 2 are computed according to the rules

dt dt = dt dWti = dWti dt = 0, with ij = In integral form the theorem is Yt Y0


1 2 1 2 = f (t, Xt , Xt ) f (0, X0 , X0 ) t t 1 2 ft (s, Xs , Xs ) ds + 0 0 t 1 2 fx1 x1 (s, Xs , Xs ) d[X 1 ]s + 0 t 1 2 fx1 x2 (s, Xs , Xs ) d[X 1 , X 2 ]s . 0

dWti dWtj = ij dt,

1, 0,

i = j, i = j.

t 1 2 1 fx1 (s, Xs , Xs ) dXs + 0 1 2 2 fx2 (s, Xs , Xs ) dXs

= + + 1 2

1 2

t 1 2 fx2 x2 (s, Xs , Xs ) d[X 2 ]s 0

Markovian diusion case If, in (10.5), we have at = a(t, Xt ), bt = b(t, Xt ) for well-behaved (see precise conditions later) functions a(t, x), b(t, x), so that dXt = a(t, Xt ) dt + b(t, Xt ) dWt , then the process X is Markovian: E[h(XT )|Ft ] = E[h(XT )|Xt ], The integral equation for Y may be written
t t

0 t T.

Yt = Y0 +
0

(ft (s, Xs ) + Af (s, Xs )) ds +


0

( f (s, Xs ))T b(s, Xs ) dWs ,

where A is the generator of the two-dimensional diusion X, and is dened by


2

Af (t, x) Af (t, x1 , x2 )

:=
i=1

ai (t, x)fxi (t, x) + aT (t, x) f (t, x) + 1 2

1 2
2

bbT
i=1 j=1 2 ij

ij

(t, x)fxi xj (t, x)

bbT
i=1 j=1

(t, x)fxi xj (t, x),

(10.6)

73

and where f (t, x) = fx1 (t, x) fx1 (t, x) .

Exercise 10.16 (The product rule). Let X 1 , X 2 be It processes: o


t 1 Xt = x1 + 0 t 2 dXt = 0 t t 1 b11 (s) dWs + 0 t 0 t 1 b21 dWs + s 0 0 2 b22 dWs . s 2 b12 dWs , s

a1 ds + s a2 ds + s

Use a two-dimensional It formula to derive the product rule o


1 2 1 2 2 1 1 2 d(Xt Xt ) = Xt dXt + Xt dXt + dXt dXt ,

or, in integral form


t t 1 2 Xs dXs + 0 0 2 1 Xs dXs + 0 t

[X 1 , X 2 ]t =

d[X 1 , X 2 ]s .

10.6
10.6.1

Multidimensional It formula o
Multidimensional It process o

Let Wt = (Wt1 , . . . , Wtd )T be a vector of d independent Brownian motions, i.e. Wt is ddimensional Brownian motion. We can use the Brownian motion vector to form the following n n 1 It processes Xt , . . . , Xt : o
1 dXt

= a1 dt + b11 dWt1 + + b1d dWtd t t t . . . = an dt + bn1 dWt1 + + bnd dWtd , t t t

n dXt

or, in matrix notation, with X = (X 1 , . . . , X n )T , dXt = at dt + bt dWt , where 1 Xt . Xt = . . n Xt a1 t . at = . . an t b11 t . bt = . . bn1 t . . . b1d t . . . bnd t (10.8) (10.7)

Note that the coecients a and b are required to satisfy certain conditions so that the integrals implicit in the above equations are well dened. In particular, their elements should all be adapted process, so that we know their values at time t if we know Xt . Theorem 10.17 (Multidimensional It formula). Suppose Xt satises (10.7). Let o f (t, x) = (f1 (t, x), . . . , fp (t, x))T be a twice dierentiable map from [0, ) Rn into Rp . Then the process Yt := f (t, Xt ) is again an It process, whose k th component, Ytk , is given by the multidimensional It formula as o o dYtk = fk fk 1 i (t, Xt ) dt + (t, Xt ) dXt + t xi 2 i=1
n n n

i=1 j=1

2 fk j i (t, Xt ) dXt dXt , xi xj

(10.9)

j i where dXt dXt is computed according to the rules

dWti dWtj = ij dt,

dt dt = dWti dt = dt dWti = 0. 74

Example 10.18. Let W = (W 1 , . . . , W n ) be Brownian motion in Rn , for n 2. Consider Rt := |Wt | = ((W 1 )2 + + (W n )2 ) 2 , t t which is a process describing the distance of the n-dimensional Brownian motion from the origin. Now, the function f (t, x) = |x| is not dierentiable at the origin, but since Wt never hits the origin (almost surely, or with probability one) when n 2 (see, for example, ksendal [12], Exercise 9.7), the multidimensional It formula still works. o Take Xt = Wt , so that dXt = dWt , and consider the process Yt = Rt = f (t, Xt ) = f (t, Wt ) = 1 1 |Wt | = ((Wt1 )2 + + (Wtn )2 ) 2 . Then f (t, x) = (x2 + . . . + x2 ) 2 , so that n 1 f = 0, t Then
n
1

f xi = , xi |x|

ij 2f xi xj = . x2 |x| |x|3 i

dRt

=
i=1 n

Wti 1 dWti + |Wt | 2

i=1 j=1

W i Wtj ij t 3 |Wt | |Wt |

ij dt

=
i=1

n1 Wti dWti + dt. Rt 2Rt

10.7

Extensions*

We state without proof some results concerning extensions of the stochastic integral to more general integrands and integrators. Denition 10.19 (Local martingale). Let X = (Xt , Ft )t0 be a (continuous process). If there (n) exists a non-decreasing sequence (n ) of stopping times of (Ft ) such that (Xt )t0 dened n=1 (n) by Xt := Xtn is a martingale for each n 1 and P [limn n = ] = 1, then we say that X is a (continuous) local martingale; if, in addition, X0 = 0 a.s., we write X Mloc (respectively, X Mc,loc if X is continuous. We state the following facts without proof. Every martingale is a local martingale, but the converse is not true. If X, Y Mc,loc , then there exists a unique, non-decreasing continuous, adapted, bounded variation process [X, Y ], with [X, Y ]0 = 0 a.s., such that XY [X, Y ] Mc,loc . For X = Y we write [X] = [X, X]. A lower bounded local martingale is a supermartingale. The It integral It = 0 bs dWs has so far been dened for integrands satisfying E 0 b2 ds < o s t . It is possible to relax this condition to the weaker condition that 0 b2 ds < almost s surely, and the It integral is then a local martingale. The process I = (It )t0 is still o F-adapted, but there is no guarantee that the martingale or It isometry properties hold. o The stochastic integral It = 0 bs dXs can be generalised to include integrators X which are general semimartingales (not necessarily continuous), i.e. processes of the form X = X0 + M + A, where M Mloc and A is a nite variation process.
t t t

10.8

Connections with PDEs: Feynman-Kac theorem

There is a remarkable connection between stochastic calculus for Markov diusions and partial dierential equations (PDEs). Consider the one-dimensional diusion dXt = a(t, Xt ) dt + b(t, Xt ) dWt . 75 (10.10)

The process X = (Xt )t0 is a Markov process, satisfying E[h(XT )|Ft ] = E[h(XT )|Xt ], 0 t T, (10.11)

for a function h(x) such that the above expectations are dened. A consequence of the Markov property is that the right-hand-side of (10.11) is a function of (t, Xt ) only. Write v(t, x) := E[h(XT )|Xt = x]. Lemma 10.20. The process Y = (Yt )0tT dened by Yt := v(t, Xt ) is a martingale. Proof. By the Markov property, we have Yt = E[h(XT )|Xt ] = E[h(XT )|Ft ]. Then, for 0 s t T, E[Yt |Fs ] = E[E[h(XT )|Xt ]Fs ] = E[E[h(XT )|Ft ]|Fs ] = E[h(XT )|Fs ] = E[h(XT )|Xs ] = Ys . (by Markov property) (by Tower property) (10.12)

Theorem 10.21 (Feynman-Kac). The function v(t, x) in (10.12) satises the PDE 1 vt (t, x) + a(t, x)vx (t, x) + b2 (t, x)vxx (t, x) = 0, 2 Proof. By the It formula o dYt = = dv(t, Xt ) 1 [vt (t, Xt ) + a(t, Xt )vx (t, Xt ) + b2 (t, Xt )vxx (t, Xt )] dt + b(t, Xt )vx (t, Xt ) dWt . 2 v(T, x) = h(x). (10.13)

Since Y is a martingale the coecient of the dt term must be zero for all (t, Xt ), and (10.13) follows. Note that the PDE (10.13) may be written vt (t, x) + Av(t, x) = 0, where A is the generator of the diusion (10.10): 1 Av(t, x) = a(t, x)vx (t, x) + b2 (t, x)vxx (t, x). 2 Note also that the theorem is still valid if we replace h(XT ) in (10.12) by h(T, XT ), a function dependent on T as well as XT . Finally, there is an obvious generalisation to a multi-dimensional situation. We content ourselves with the following two-dimensional version. Suppose we have a two-dimensional diusion X = (X 1 , X 2 ) following dXt = a(t, Xt ) dt + b(t, Xt ) dWt , where a(t, Xt ) = a1 (t, Xt ) a2 (t, Xt ) , b(t, Xt ) = b11 (t, Xt ) b12 (t, Xt ) b21 (t, Xt ) b22 (t, Xt ), (10.15) v(T, x) = h(x), (10.14)

76

so that (10.15) is equivalent to


1 dXt 2 dXt

= a1 (t, Xt ) dt + b11 (t, Xt ) dWt1 + b12 (t, Xt ) dWt2 , = a2 (t, Xt ) dt + b21 (t, Xt ) dWt1 + b22 (t, Xt ) dWt2 .

Let h(t, x) h(t, x1 , x2 ) be a function h : [0, ) R2 R. Dene the function v(t, x) := E[h(XT )|Xt = x].. The generator of the diusion (10.15) is A, given by (10.6):
2

(10.16)

Af (t, x) Af (t, x1 , x2 )

:=
i=1

ai (t, x)fxi (t, x) +

1 2

bbT
i=1 j=1

ij

(t, x)fxi xj (t, x).

Theorem 10.22 (Feynman-Kac, two-dimensional). The function v(t, x) in (10.16) satises the PDE vt (t, x) + Av(t, x) = 0, v(T, x) = h(x), where A is the generator of the diusion (10.15).

10.9

The Girsanov Theorem*

Given a Brownian motion W := (Wt )0tT on (, F, F, P) with the ltration F := (Ft )0tT being that generated by W , and given an adapted process := (t )0tT , dene the (local) martingale Z by
t

Zt := E( W )t := exp
0

s dWs

1 2

t 2 s ds , 0

0 t T,

where E is the so-called Dolans exponential. We have that Z follows e dZt = t Zt dWt . Then, provided satises the Novikov condition E exp 1 2
T 2 t dt 0

< ,

(10.17)

we can dene a new probability measure Q P on F FT by Q(A) =


A

ZT dP,

A F,

and the process


t

WtQ := Wt +
0

s ds,

0 t T,

is a Q-Brownian motion. We write ZT = dQ and we have, for any F-measurable random variable dP X, EQ [X] = E[XZT ]. (10.18) Remark 10.23. The Novikov condition (10.17) is sucient to guarantee that Z is a (P, F)martingale, so that E[ZT ] = 1 and Q is indeed a probability measure. As well as (10.18) we have the following results connecting conditional expectations under Q and P. Let 0 t T . If X is Ft -measurable, then EQ [X] = E[XZt ]. 77

Bayes formula If X is Ft -measurable and 0 s t T , then Zs EQ [X|Fs ] = E[XZt |Fs ]. There is a multi-dimensional version of Girsanovs Theorem. Once again we content ourselves with a two-dimensional version. Given a two-dimensional Brownian motion W = (W 1 , W 2 ) on a stochastic basis (, F, F := (Ft )0tT , P), and a two-dimensional adapted process = (1 , 2 ), dene a (local) martingale Z by Zt = E( W )t E(1 W 1 2 W 2 )t
t t 1 1 s dWs 0 0 2 2 s dWs t 1 2 (s )2 + (s )2 ds . 0

exp

1 2

Then, provided we have the two-dimensional Novikov condition E exp 1 2


T 1 2 (t )2 + (t )2 ds 0

< ,

(10.19)

we can dene a new probability measure Q P on F FT by Q(A) =


A

ZT dP,

A F,

and the process W Q = (W Q,1 , W Q,2 ) dened by


t t 1 s ds, 0

WtQ,1 := Wt +

WtQ,2 := Wt +
0

2 s ds.

is a two-dimensional Q-Brownian motion.

11

The Black-Scholes-Merton model

This is the classical option pricing model dating back to Black-Scholes [2] and Merton [11]. The basic idea is that one can use absence of arbitrage to value an option on a stock: if the option is valued correctly, then it should not be able to make sure prots by taking positions in the option and the stock. In our rendition below we employ the equivalent notion that one can trade the underlying stock to reproduce the option payo, and in the absence of arbitrage the portfolio wealth of the replicating portfolio must be the option value. We are given a ltered probability space (, F, F = (Ft )t0 , P on which we dene a standard BM W . A single stock follows the SDE dSt = St dt + St dWt , where the drift and volatility are constants. By the It formula o d(log St ) = so that St = S0 exp 1 2 2 dt + dWt ,

1 2 t + Wt , 2

t 0.

There is also a riskless asset with price process S 0 following


0 0 dSt = rSt dt, 0 S0 = 1,

where r 0 is the interest rate. Hence the price of the riskless asset is given by the usual accumulation factor 0 St = exp(rt), t 0. 78

The model makes a number of idealised assumptions, of a continuous-time frictionless market. This means that continuous trading is possible, with the absence of any exploitable arbitrage opportunities, there are no trading costs or limits or taxes (so assets can be held in any amount), assets are divisible and short-selling is always permitted. We also assume constant parameters in the price model (this can be relaxed to some extent) and that the stock pays no dividends (this can be relaxed).

11.1

Portfolio wealth evolution

An agent trades a portfolio of stock and cash in a self-nancing manner, meaning all prots and losses are generated by price changes and by adjusting the proportion of wealth allocated to the stock and the bond. Let X = (Xt )0tt denote the wealth process of the agent. Denote the (adapted) processes for number of shares in the bond and in the stock by H 0 and H, so that the wealth of the agent at time t [0, T ] is
0 0 Xt := Ht St + Ht St ,

0 t T.

(11.1)

The self-nancing condition asserts that once we set the portfolio up we will neither put any more money into it nor take any out; any increase in the number of stocks must be nanced by selling bonds, any increase in the number of bonds must be nanced by selling stocks, and nothing is sold unless the funds are needed to buy something else. In other words, on seeing the new bond and stock prices and deciding how many units of each to buy and sell, the change in the number of stocks must be nanced by the change in the number of bonds, and vice versa. Mathematically, this is expressed as
0 0 0 (St + dSt ) dHt + (St + dSt ) dHt = 0,

or
0 0 St dHt + d[S, H]t + St dHt + d[S 0 , H 0 ]t = 0,

(self-nancing condition).

(11.2)

Applying the It product rule to the denition (11.1) of the wealth process we have o
0 0 0 0 dXt := Ht dSt + St dHt + d[S, H]t + Ht dSt + St dHt + d[S 0 , H 0 ]t ,

and augmenting this with the self-nancing condition (11.2) we arrive at the evolution of the wealth of a self-nancing portfolio, given by
0 0 0 0 dXt = Ht dSt + Ht dSt = Ht dSt + rHt St dt.

Many books simply take this as a denition of a self-nancing portfolio. Using the denition (11.1) of X we can write dXt = Ht dSt + r(Xt Ht St ) dt.

11.2

Perfect hedging

Consider selling a European claim with payo h(ST ) at time zero. Suppose that there exists a function v : [0, T ] R+ R+ such that the claims value at t [0, T ] is v(t, St ) (so we suppose the claim is sold for v(0, S0 )). The goal is to characterise the function v(t, x) that is consistent with the no-arbitrage principle. We suppose that the proceeds from the option sale are invested in a self-nancing portfolio following, so X0 = v(0, S0 ). We want to show that we can achieve replication, that is, nd a portfolio whose nal value matches the option payo. To achieve this we insist that the portfolio wealth matches the option value at all times t [0, T ]: Xt = v(t, St ), For this to be hold will also require dXt = dv(t, St ). 79 (11.3) 0 t T.

By the It formula, the innitesimal change in the process v(t, St ) is o dv(t, St ) = vt (t, St ) dt + vx (t, St ) dSt + 1 d[S]t . 2

Impose (11.3). Equating terms multiplying dSt gives the appropriate holding of shares as Ht = vx (t, St ) =: t , 0 t T. (11.4)

This is the celebrated Black-Scholes (BS) delta hedging rule, and the quantity in (11.4) is called the delta of the claim. Using (11.4) and equating terms multiplying dt in (11.3) yields that the option pricing function v(t, x) must satisfy the BS PDE 1 vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0. 2 For v to represent the option pricing function we would also require the terminal condition v(T, ST ) = h(ST ). We then have a terminal value problem for v: 1 vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, 2 v(T, x) = h(x). (11.5)

Provided we can solve this PDE, then we must have that, to avoid arbitrage, that v(t, St ) must be the unique option price at time t [0, T ]. If it were not, an immediate arbitrage opportunity aords itself. For instance, if the claim is available in the market at time zero at a price V0 > v(0, S0 ), then one can sell the claim and invest in the in the replicating portfolio. The excess V0 v(0, S0 ) can be invested in the bank account. At time T , one uses the proceeds from the replicating portfolio to pay ones obligations under the claim, leaving a prot of (V0 v(0, S0 )erT > 0. A symmetric argument is possible if V0 < v(0, S0 ), with reversed positions in the claim and the replicating portfolio. 11.2.1 Riskless portfolio argument

An alternative, yet equivalent route to the BS PDE is to construct a riskless portfolio involving the option and the stock. Take a position of one unit in the claim and a short position of H shares, that is, a position H in the stock, so the overall portfolio has wealth Y given by Yt = v(t, St ) Ht St , where we once again assume the price process for the claim is given by some function v(t, St ). The dynamics of the portfolio are given by dYt = dv(t, St ) Ht dSt 1 d[S]t Ht dSt . 2

= vt (t, St ) dt + vx (t, St ) dSt +

Choose H such that the terms involving dSt vanish (so that the terms involving dWt vanish). This implies that H must be chosen such that Ht = vx (t, St ), 0 t T,

matching the delta hedging condition we found earlier. With this choice, the portfolio value will only contain a nite variation term (that is, a dt term) in its dynamics, so in the absence of arbitrage it must be a riskless portfolio satisfying dYt = rYt dt. Combining this with the above choice for H yields once again that the pricing function v must satisfy the BS PDE as before. Notice that the BS PDE has no dependence on the stocks P-drift . This is a legacy of removing all risk associated with the claim. 80

11.3

Solution of the BSM equation


v(t, x) = EQ [er(T t) h(ST )|St = x], (11.6)

By the Feynman-Kac Theorem, a solution to (11.5) is given by where E denotes expectation under a measure Q P, and under which S follows
Q

dSt = rSt dt + St dWtQ ,

(11.7)

where W Q is a Q-Brownian motion. This measure is called a risk-neutral measure, or an equivalent local martingale measure (ELMM), because under it, the discounted stock price is a local martingale: d(ert St ) = ert St dWtQ . We will see later a probabilistic argument that gives a justication for the risk-neutral valuation result (11.6). Equation (11.7) is very suggestive. Under the measure Q, the stock price has an average growth rate of r, so behaves like a riskless asset. This is no accident. The measure Q has arisen out of an argument in which all risk associated with a claim was eliminated by dynamic trading. The result of this is that the claim can be priced by expectation with the caveat that one treats the stock as though its price grows, on average, like that of a riskless asset. For this reason the formula (11.6) is often called a risk-neutral valuation formula.

11.4

BS option pricing formulae


v(t, x) = EQ [er(T t) h(ST )|St = x],

We will use the risk-neutral evaluation formula

to derive formulae for some European options. European call price For a European call, h(x) = (x K)+ , and under the ELMM Q the log-stock price is Gaussian. Given St = x, under Q we have 1 Q log ST = log x + r 2 (T t) + (WT WtQ ), 2 Hence the probability law of ST under Q, given St = x, is LawQ [log ST |St = x] = N(m(x, t, T ), 2 (t, T )),
2

0 t T.

0 t T,

where N(m, s ) denotes the Gaussian probability law of mean m and variance s2 , and where 1 m(x, t, T ) = log x + r 2 (T t), 2 2 (t, T ) = 2 (T t), 0 t T.

In terms of Y := log S, we have, writing c(t, x) for the call option pricing function: c(t, x) = er(T t) EQ [(eYT K)1{YT >log K} |Yt = log x]. Then an easy computation using Gaussian integrals gives the celebrated Black-Scholes formula for a call option as c(t, St ), where c(t, x) = x(y) Ker(T t) (y T t), (11.8) 1 x 1 2 y = log + r + (T t) , (11.9) K 2 T t where () denotes the standard cumulative normal distribution function, dened by 1 (y) := 2
y

exp

u2 2

du,

so that (y) is the probability that a standard normal random variable (one with mean zero and variance 1) is less than or equal to x. We have (y) = 1 (y), by the symmetry with respect to negation of the function exp(u2 /2). The call price function is plotted as a function of stock price in Figure 11. 81

Intrinsic and Time Value of European Call 9

8 K=10, r=10%, sigma=25%, T=1year 7

call price, C

8 10 stock price, S

12

14

16

18

Figure 11: Black-Scholes call value as function of stock price. European put price by put-call parity, as The Black-Scholes put valuation formula can be obtained from (11.8)

p(t, St ) = Ker(T t) (y + T t) St (y), where we have used the property (y) = 1 (y).

11.5
11.5.1

Sensitivity parameters (Greeks)*


Delta

The derivatives of the call pricing function c(t, x) with respect to various variables are called sensitivity parameters, or Greeks. We have already come across one of these, the delta, which is cx (t, x) = (y), with y dened in (11.9). The call option delta is plotted as a function of stock price in Figure 12. It is positive, meaning that if one sells a call option, it is hedged with a dynamically adjusted long position in the stock. For a put option, the delta can be computed using put-call parity, which implies that cx (t, x) px (t, x) = 1, so that px (t, y) = (y), and this function is plotted in Figure 13. It is negative, meaning that if one sells a put option, it is hedged with a dynamically adjusted short position in the stock. 11.5.2 Theta x ct (t, x) = rKer(T t) (y T t) (y). 2 T t

The theta of a call is

82

Delta of Call Option

0.8

0.6

call delta
0.4 0.2 0 0

0.2

0.4

0.6

0.8 1 stock price

1.2

1.4

1.6

1.8

Figure 12: Black-Scholes call delta as function of stock price. The parameters are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.

Delta of Put Option

0.2

0.4

put delta
0.6 0.8 1 0

0.2

0.4

0.6

0.8 1 stock price

1.2

1.4

1.6

1.8

Figure 13: Black-Scholes put delta as function of stock price. The parameters are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.

83

Because () and () are always positive, theta is always negative, meaning that the price of an option declines as we approach maturity (if all other factors remain unchanged). This is true regardless of whether the option is a call or a put (as you can easily verify using put-call parity). 11.5.3 Gamma

The gamma of a call is 1 (y), x T t which is always positive, and is equal to pxx (t, x), the put gamma (again this follows easily from put-call parity). The BS gamma is plotted in Figure 14. cxx (t, x) =
Gamma of European Option 2.5

1.5

gamma
1 0.5 0 0 0.2 0.4 0.6 0.8 1 stock price 1.2 1.4 1.6 1.8

Figure 14: Black-Scholes call gamma as function of stock price. The parameters are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25. Gamma is closely related to volatility and to the risk introduced into the BS hedging program if trading is not continuous. To get an intuitive understanding of this eect, notice that gamma measures how quickly the delta of an option changes as the stock price changes. If the magnitude of gamma is small, then delta changes slowly, so a trader will not have to re-hedge very often in order to maintain delta neutrality. On the other hand, if the magnitude of gamma is large, then the trader must re-hedge very often to maintain delta neutrality. To make a portfolio gamma neutral, one cannot use a position in the underlying asset, as this has zero gamma. In other words, gamma neutrality can only be achieved by adding more option positions to ones portfolio. 11.5.4 Vega

The vega of an option is the derivative of the option price with respect to volatility, and given by c (t, x; ) = x T t (y). This function is plotted versus the stock price in Figure 15. Note the similarity with the gamma plot, which captures our intuitive notion that gamma does indeed measure sensitivity to volatility in some way. 84

Vega of European Option 0.7

0.6

0.5

0.4

Vega
0.3 0.2 0.1 0

0.2

0.4

0.6

0.8 stock price

1.2

1.4

1.6

Figure 15: Variation of vega with stock price. The parameters are K = 1, r = 0.1, T = 0.5, t = 0, q = 0, = 0.25.

11.6

Probabilistic (martingale) interpretation of perfect hedging*

The wealth process in the BSM model satises dXt = rXt dt + Ht St ( dt + dWt ), where := ( r)/ is called the market price of risk of the stock. Dene the discounted stock price process S and discounted wealth process X by St := ert St , Xt := ert Xt , t 0.

Then S, X satisfy, under the physical measure P, dSt = St ( dt + dWt ), dXt = Ht dSt = Ht St ( dt + dWt ).

By the Girsanov theorem, if we dene the measure Q P by dQ 1 = ZT := exp WT 2 T dP 2 then the process W Q , dened by WtQ := Wt + t, t 0, ,

is a Brownian motion under Q. Hence the discounted stock price and discounted wealth process are local Q-martingales (and in fact, can be shown to be martingales in the BSM model). The measure Q, known as an equivalent (local) martingale measure (ELMM), or as a riskneutral measure, is dened as one such that traded asset prices are local Q-martingales, and notice that it is uniquely dened in the BSM model. This is a consequence of the model being complete (though we do not prove this here).

85

Earlier, we found a portfolio wealth process which could replicate the payo of an option, and in this case it must satisfy Xt = v(t, St ) = for all t [0, T ], else there is arbitrage. This led us to the BS PDE and, via the Feynman-Kac theorem, to the representation for the option value: v(t, St ) = EQ [er(T t) v(T, ST )|St ], 0 t T.

Hence the discounted option price is a Q-martingale, and hence the discounted wealth process of a replicating strategy must also be a Q-martingale. There is a purely probabilistic route to these conclusions, with the arguments as follows. Begin with the Q-dynamics of the wealth process of any self-nancing trading strategy:
t

t Xt = X0 +
0

Q s Hs Ss dWs ,

0 t T,

(11.10)

where t := ert is the discount factor for t [0, T ]. Introduce a European contingent claim with FT -measurable payo C at time T , and then dene the Q-martingale Mt := EQ [T C|Ft ], 0 t T. Then by the representation of Brownian martingales as stochastic integrals (the martingale repT 2 resentation theorem), there exists an adapted process ; [0, T ] R with EQ [ 0 t dt] < such that we have
T

Mt = M0 +
0

Q s dWs ,

0 t T.

Since X = X is a Q-martingale, we make the identication M = X, so X0 = M0 = EQ [T C], and Xt = t Xt = EQ [T C|Ft ], 0 t T, (11.11) by construction. Equivalently,
T

t Xt = EQ [T C] +
0

Q s dWs ,

0 t T.

Comparing this with (11.10), we choose the portfolio process H to be determined by HS = . Moreover, since we have T XT = XT = MT = T C, we have XT = C, a.s.

so that replication is guaranteed. This argument relies only on the martingale representation theorem and the existence of a unique measure Q such that the discounted wealth process is a local Q-martingale. The parameters , in the BSM model could just as well have been random, provided they were F-adapted processes. Notice that the wealth process of the replicating portfolio is actually a Q-martingale, not just a local martingale. Since XT = C for the replication portfolio wealth process, we must have that the value of the claim at all earlier times is equal to the wealth process. Let V be the value process of the claim. We then have Vt = Xt , 0 t T, (11.12) with VT = C. Moreover, using (11.11) and (11.12) we arrive at Vt = er(T t) EQ [C|Ft ], 0 t T,

which is a risk-neutral valuation formula, valid for any European claim. We must also have dVt = dXt . Under Q, we have the dynamics of the discounted wealth process d(t Xt ) = t Ht St dWtQ . For the discounted claim value we have, d(t Vt ) = t dVt + Vt dt , 86

with no cross-variation term since is of nite variation. Hence, we have d(t Vt ) = t dVt rt Vt dt. Since V = X when X is the replication portfolio wealth process, we have, under Q: t dVt + rt Vt = t Ht St dWtQ . Now suppose the model is Markovian, and assume Vt = v(t, St ). Then we obtain t [(vt + LS,Q v)(t, St ) rt v(t, St )] dt + St vx (t, St ) dWtQ = t Ht St dWtQ , where LS,Q is the generator of S under Q, given by 1 LS,Q v(t, x) = rxvx (t, x) + 2 x2 vxx . 2 Then (11.13) implies that the hedging strategy for the claim is given by Ht = vx (t, St ), 0 t T, (11.13)

the delta hedging rule we found before, and that the claim pricing function must satisfy vt (t, x) + LS,Q v(t, x) rv(t, x) = 0, which is the BS PDE. We are seeing a manifestation of deep results connecting absence of arbitrage with existence of ELMMs and with completeness. These are called the Fundamental Theorems of Asset Pricing (FTAPs). In continuous asset price models such as the BSM model, the theorems state that no-arbitrage is equivalent to the existence of an ELMM, and completeness is equivalent to there being a unique ELMM. Here is an easy part of the statements in the FTAPs to prove. Lemma 11.1. If a model has an equivalent martingale measure Q such the discounted wealth process X := X is a Q-martingale, then it admits no arbitrage. Proof. Suppose there is an arbitrage. Then there exists a portfolio wealth process X with X0 = 0, By the martingale property EQ [XT ] = X0 = 0. (11.14) But since XT 0 P-almost surely, we have XT 0 Q-almost surely (since Q P). Similarly, P[XT > 0] > 0 implies Q[XT > 0] > 0. These properties imply that EQ [XT ] > 0, which contradicts (11.14), so we conclude that there is no arbitrage. XT 0, a.s. and P[XT > 0] > 0.

11.7

Black-Scholes analysis for dividend-paying stock

Suppose S pays dividends at a constant dividend yield q. Then the wealth dynamics for a portfolio with H shares of S and H 0 shares of S 0 become
0 0 dXt = Ht dSt + qHt St dt + Ht dSt ,

since the dividend income received in the interval [t, t + dt) is qHt St dt. It is easy to apply the same replication analysis as in Section 11.2 to once again yield the same delta hedging rule as before: Ht = vx (t, St ), 0 t T,

87

and this time the option pricing function satises the BS PDE with dividend yield q, given as 1 vt (t, x) + (r q)xvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, 2 with v(T, x) = h(x) (for a claim with path-independent payo). We then obtain the risk-neutral pricing formula v(t, x) = EQ [er(T t) h(ST )|St = x], where EQ denotes expectation under Q P, and under which S follows dSt = (r q)St dt + St dWtQ . One can then go through a similar computation for the price function c(t, x) of a call option, to obtain c(t, x) = xeq(T t) (y) Ker(T t) (y T t), x 1 1 log + r q + 2 (T t) . y = K 2 T t Notice that we can obtain this formula by the relacement x xeq(T t) in the original BS formula (11.8). Using put-call parity, namely, c(t, St ) p(t, St ) = St eq(T t) Ker(T t) , we can compute the put option price function as p(t, x) = Ker(T t) (y + T t) xeq(T t) (y).

11.8

Time-dependent parameters

It is also straightforward to adapt the BS analysis to deal with time-dependent parameters. So suppose , , r, q are time-dependent, but not stochastic, that is, the stock price process under P evolves according to dSt = (t)St dt + (t)St dWt , where (t), (t) are deterministic functions of time. For a portfolio with H shares of S, the wealth process evolves according to dXt = Ht dSt + q(t)Ht St dt + r(t)(Xt Ht St ) dt, where r(t), q(t) are deterministic functions of time. The replication analysis goes through unchanged from the case with constant parameters, and we get that the hedging strategy for a claim with pricing function v(t, x) is given by the usual delta-hedging rule: Ht = vx (t, St ), 0 t T. The claim pricing function then solves the BS PDE with time-dependent parameters: 1 vt (t, x) + (r(t) q(t))xvx (t, x) + 2 (t)x2 vxx (t, x) r(t)v(t, x) = 0, 2 with v(T, x) = h(x) (for a claim with path-independent payo). We then obtain the risk-neutral pricing formula
T

v(t, x) = EQ exp
t

r(s) ds h(ST ) St = x ,

where EQ denotes expectation under Q P, and under which S follows dSt = (r(t) q(t))St dt + St dWtQ . 88

By applying the It formula to log S it is straightforward to arrive at the law of the log-stock o price process under Q: Law(log ST |St = x) = N (m, 2 ), where m = log x +
t T

1 r(s) q(s) 2 (s) 2

ds,

2 =
t

2 (s) ds.

It is then evident that we can obtain BS-style formulae for option prices of we make the following replacements in the standard formulae: r r, q q , , where
T T T

r(T t) =
t

r(s) ds,

q (T t) =
t

q(s) ds,

2 (T t) =
t

2 (s) ds.

12
12.1

Options on futures contracts


The mechanics of futures markets

Consider a futures contract on a non-dividend-paying stock with maturity T . If entered into at time t T , then it obliges the holder to buy the stock at time T for the so-called futures price Ft,T = St er(T t) . The cost of the contract at time t is zero. So the futures contract is a forward contract with maturity T and delivery price K = Ft,T . The change in value of a forward contract with delivery price K and maturity T is df (t, T ; K) = d(St Ker(T
t)

) = dSt rKer(T

t)

) dt.

So the change in value for the holder of a futures contract entered into at time t ought to be the amount df (t, T ; Ft,T ) = dSt rSt dt. Note that dFt,T = d(St er(T
t)

) = er(T

t)

( dSt rSt dt) = er(T

t)

df (t, T ; Ft,T ).

(12.1)

The mechanics of futures markets are such that the holder of a futures contract receives the amount dFt,T in the interval [t, t + dd), despite the fact that this is not the change in value of the associated forward contract. For the rest of this section, the maturity time T of a futures contract will be xed, so we write Ft Ft,T from now on. Denition 12.1. A futures contract with maturity T is a contract which costs nothing to acquire at any time t [0, T ], and is such that after each time interval [t, t + dt) the contract holder receives the amount dFt , where Ft = St er(T t) .

12.2

Options on futures contracts

A European option, with maturity T T , on a futures contract with maturity T , is a contract with payo h(FT ) at time T . For instance, a call option with strike K on a futures contract pays (FT K)+ at time T . Note that if T = T , then since FT = ST , the futures option in this case pays the same as a conventional option on the stock. If one holds a dynamic portfolio of futures contracts with position H = (Ht )0tT , plus some cash, the associated portfolio wealth at time t evolves according to dXt = Ht dFt + rXt dt. Notice that this evolution is precisely that which we would obtain for an asset with price process F and dividend yield q = r. Thus means that we can value a futures option with BS-style formulae provided we set the underlying asset price to F and the dividend yield to r, as we now demonstrate. 89

Consider a European futures option with maturity T , and with price process (v(t, Ft ))0tT , where v(t, x) is some function. This evolves according to 1 dv(t, Ft ) = vt (t, Ft ) dt + vx (t, Ft ) dFt + vxx (t, Ft ) d[F ]t . 2 We attempt to hedge this option with a dynamic portfolio of futures contracts. Imposing the replication condition Xt = v(t, Ft ), 0 t T, and hence also requiring dXt = dv(t, Ft ), gives the required hedge as Ht = vx (t, Ft ), 0 t T.

Then using the fact that d[F ]t = e2r(T t) d[S]t (from (12.1)), we have d[F ]t = 2 Ft2 dt (in the BS model), so the futures option price function solves the PDE 1 vt (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, 2 v(T, x) = h(x).

Hence we obtain, via the Feynman-Kac theorem, the risk-neutral valuation formula v(t, x) = EQ [er(T t) h(FT )|Ft = x], where, under Q, F follows dFt = Ft dWtQ , with W Q a Q-BM. Note that F is a Q-martingale, since we have 1 Q FT = Ft exp (WT WtQ ) 2 (T t) , 2 t T,

so EQ [FT |Ft ] = Ft for t T . We observe that we can indeed recover the option price formula from the standard BS model with dividends by treating the futures option as being written on an underlying asset with price process F and dividend yield q = r. Hence, the call price function for a futures option is given by c(t, x) = er(T t) [x(y) K(y T t)], y= 1 x log K T t 1 + 2 (T t) . 2

Lemma 12.2 (Put-call parity). For European call and put futures options with common maturity T and strike K, the prices at t t satisfy c(t, Ft ) p(t, Ft ) = er(T t) (Ft K), 0 t T.

Proof. One can either observe that the required relation folows from the standard put-call parity relation for an asset with price process F and dividend yield q = r, or else compute that c(T, FT ) p(T, FT ) = FT K). Then a discounted risk-neutral expectation, using the fact that the discounted option prices as well as the futures price are Q-martingales, gives the result. From the put-call parity relation, the put pricing function is obtained as p(t, x) = er(T t) [K( T t y) x(y)], y= 1 x log K T t 1 + 2 (T t) . 2

90

13

American options in the BS model

We give an informal treatment of American option pricing in the BS model. The full theory requires the machinery of stopping times and the theory of optimal stopping, both of which are beyond the scope of the course. An American claim with (path-independent) payo function h() and maturity T pays h(St ) (the so-called intrinsic value) to its holder if exercised at time t [0, T ]. For instance, an American put pays (K St )+ if exercised at t [0, T ]. It would thus never be exercised if St K, so will only be exercised if St < K, and we conjecture this would happen if the stock price dropped to a low enough level. We conjecture the existence of some critical stock price at each t [0, T ], denoted Sf (t), such that the option is exercised at t [0, T ] if St Sf (t). The function Sf : [0, T ] R+ is called the optimal exercise boundary, and is an example of a free boundary. This means it is not known a piori, and must be computed as part of the solution to the pricing problem. Let (v(t, St ))0tT denote the price process of an American claim. We must have v(t, St ) h(St ), t [0, T ].

If this were not so, an immediate arbitrage opportunity would ensue: one could buy the option and exercise it immediately to make prot h(St ) v(t, St ). By analogy with the American algorithm we developed in the binomial model, we conjecture that v(t, St ) is given by 1 v(t, St ) = max h(St ), lim EQ [erh v(t + h, St+h )|St ] . h0 h Usig the It formula and the Leibniz rule, we have o lim 1 Q 1 2 E [v(t + h, St+h )|St ] = vt (t, St ) + rSt vx (t, St ) + 2 St vxx (t, St ). h 2 (13.1)

h0

Using this along with erh = 1 rh + O(h2 ), we subtract v(t, St ) from both sides of (13.1) to obtain that the pricing function satises 0 = max[h(x) v(t, x), LBS v(t, x)] = 0, where LBS denotes the BS operator: 1 LBS v(t, x) := vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x). 2 We interpret (13) as follows. At time t [0, T ] the option holder faces two possibilities: 1. Exercise the option, in which case the maximisation in (13) is achieved by the rst term, and we have v(t, St ) = h(St ) (corresponding to St Sf (t) for a put), and LBS v(t, St ) < 0 (so the discounted option price would be decreasing on average, that is, ert v(t, St ) is a Q-supermartingale). We say that the stock price is in the stopping region. 2. Do not exercise the option, in which case the maximisation in (13) is achieved by the second term, and we have v(t, St ) > h(St ) (corresponding to St > Sf (t) for a put), and LBS v(t, St ) = 0 (so the discounted option price would be constant on average, that is, ert v(t, St ) is a Q-martingale). We say that the stock price is in the continuation region. We conclude that the American option price function solves the free boundary problem v(t, x) h(x), LBS v(t, x) 0,

with the rst inequality holding as equality when we are in the stopping region, in which case the second inequality is strict, or else the rst inequality is strict and the second holds as equality, when we are in the continuation region. 91

Example 13.1 (American put). There is some free boundary Sf (t), which must be determined as part of the solution to the problem, such that the American put pricing function pA (t, x) satises pA (t, St ) > (K St )+ , p (t, St ) > K St , L
A BS A

LBS pA (t, St ) = 0, p (t, St ) = 0, if St Sf (t),

if St > Sf (t),

wher we have used the fact that Sf (t) K to write (K St )+ = K St in the stopping region.

13.1

Smooth pasting condition for American put

It turns out (though we do not prove this here) that the pricing function pA (t, x) is continuous in t and x. We also have the so-called smooth pasting condition (also not proven here) pA (t, Sf (t)) = 1. x That is, the pricing function joins the payo smoothly, and the derivative pA (t, x) is continuous x in x. Thsi condition is not straightforward to establish rigorously. It cannot be deduced from pA (t, Sf (t)) = K Sf (t), because we do not know a priori where Sf (t), and we need an extra condition to x it and to be able to solve the free boundary problem. In summary, the American put pricing function satises pA (t, x) (K x)+ , LBS pA (t, x) = 0, pA (t, x) = K x, LBS pA (t, x) < 0, x > Sf (t) x Sf (t) (continuation region), (stopping region),

along with boundary conditions pA (t, Sf (t)) = K Sf (t), pA (t, Sf (t) = 1, x p(T, x) = (K x)+ ,
x

lim pA (t, x) = 0.

The free boundary problem can also be written as a so-called linear complementarity problem, see [16], Chapter 7, for more details.

13.2

Optimal stopping representation*

We do not prove this here, but the American option pricing function for a claim with intrinsic value (h(St ))0tT is given by v(t, x) = sup
T (t,T )

EQ [er( t) h(S )|St = x],

0 t T,

(13.2)

where T (t, T ) denotes the class of F-stopping times with values in [t, T ], and F denotes the underlying Brownian ltration.

13.3

The American call with no dividends

If S pays no dividends, we have cA (t, x) = c(t, x), that is, the American call has the same value as its European counterpart, and so is never execised before maturity. For r > 0, here is a simple argument. We have, for t < T and r > 0, cA (t, x) c(t, x) x Ker(T t) > x K. But since the last term is the exercise value, the American option price is strictly greater than this value of r > 0, so is never exercised early. A more rigorous approach, which can establish the property for all r 0, is to use the optimal stopping representation (13.2) and then to show that (ert (St K)+ )0tT is a Q-submartingale (via the conditional Jensen inequality) so tends to rise, making it optimal to wait at all times t < T . Here is the argument. 92

Lemma 13.2. Let h(x) be a non-negative convex function of x 0 satisfying h(0) = 0. Then the discounted intrinsic value (ert h(St ))0tT of an American claim that pays h(St ) upon exercise, is a Q-submartingale. Proof. Since h is convex, we have h((1 )x1 + x2 ) (1 )h(x1 ) + h(x2 ), Taking x1 = 0, x2 = x and using h(0) = 0 gives h(x) h(x), for all x 0, [0, 1]. (13.3) 0 1, 0 x1 x2 .

For 0 s t T , 0 er(ts) 1, so (13.3) implies that EQ [er(ts) h(St )|Fs ] EQ [h(er(ts) St )|Fs ]. (13.4)

But using the conditional Jensen inequality and the fact that (ert St )0tT is a Q-martingale, we have E Q [h(er(ts) St )|Fs ] h E Q [er(ts) St |Fs ] = h(Ss ). (13.5) Combining (13.4) and (13.5) gives the submartingale property for (ert h(St ))0tT . Theorem 13.3. Let h(x) be a non-negative convex function of x 0 satisfying h(0) = 0. Then the value of the American claim expiring at time T and having intrinsic value (h(St ))0tT , is the same as the value of the European claim with payo h(ST ). Proof. Set 0 t T in Lemma 13.2 to obtain EQ [er(T t) h(ST )|Ft ] h(St ), 0 t T, (13.6)

which says that the value of the European claim always dominates the exercise value of the American claim, that is, early exercise is never optimal. Applying the above results to the call payo shows that an American call on a non-dividendpaying stock is never exercised early.

14

Exotic options

Any option which is not a plain vanilla call or put is called an exotic option. There are (usually) no markets in these options and they are bought over-the-counter (OTC). Eective risk management of these products is important as they are much less liquid than standard options. They often have discontinuous payos and can have large deltas near expiration which can make them dicult to hedge. We recall that we can price any option with payo function h(ST ) via the risk-neutral valuation formula v(t, x) = EQ [er(T t) h(ST )|St = x], 0 t T, where Q denotes the EMM under which the stock price follows dSt = rSt dt + St dWtQ , and that the risk-neutral pricing formula corresponds to the Feynman-Kac solution of the BS PDE 1 vt (t, x) + rxvx (t, x) + 2 x2 vxx (t, x) rv(t, x) = 0, v(T, x) = h(x). 2 Finally, note that the risk-neuutral valuation formula generalises to any payo, not necesssarily one that is just a function of the nal stock price. For a (possibly path-dependent) European claim payo C (some FT -measurable random variable), its price at t T is given by Vt = EQ [er(T t) C|Ft ], 93 0 t T.

14.1

Digital options

Digital (or binary, or cash-or-nothing) options have discontinuous payos. The cash-or-nothing digital call with strike K and maturity T has payo hc/n (ST ) given by hc/n (ST ) = 0, if ST < K 1, if ST K.

The price of a digital call at t T is cc/n (t, St ), where cc/n (t, x) = er(T t) EQ [1{ST >K} |St = x], and the usual computation with Gaussian integrals yields cc/n (t, x) = er(T t) (y T t), 1 x log K T t 1 + r + 2 (T t) . 2

An alternative way of obtaining the above formula is to observe that the standard call option has price function given by c(t, x) = er(T t) EQ [(ST K)1{ST >K} |St = x] = x(y) Ker(T t) (y T t). The rst and second terms on the RHS are: er(T t) EQ [ST 1{ST >K} |St = x] Ke
r(T t)

E [1{ST >K} |St = x]


Q

= Ker(T t) (y T t),

= x(y),

and the second of these gives the formula for the digital call. A digital (or cash-or-nothing) put option pays 1 if the terminal stock price is less than the strike and zero otherwise, so the put price can be obtained via the put-call parity relation cc/n (t, St ) + pc/n (t, St ) = er(T t) , 0 t T,

which follows from cc/n (T, ST ) + pc/n (T, ST ) = 1 along with a risk-neutral expectation. Because the payos of digital options are discontinuous functions of the terminal value of the asset, some of the sensitivity parameters (or Greeks) are very sensitive to small changes in the asset price as the time approaches maturity. The pin risk associated with these options is that if the price of the underlying oscillates around the strike price near expiration, the hedger will have to buy and sell large numbers of assets very quickly to replicate the option. At some point the risk from small price changes may exceed the maximum liability of the digital. An asset-or-nothing call option has payo ha/n (ST ) = 0, ST , if ST < K if ST K.

Once again, a computation using Gaussian integrals can be used to give the price at time t T . Or we can use the observation that a vanilla call option can be decomposed into a portfolio of an asset-or-nothing option plus a short position in K cash-or-nothing options. We have c(t, x) = x(y) Ker(T t) (y T t) = ca/n (t, x) Kcc/n (t, x), 0 t T. Hence we have ca/n (t, x) = x(y). Hence, a European call option can be replicated by buying one asset-or-nothing call and selling K cash-or-nothing calls.

94

14.2

Pay-later options

Digital options can be used in the valuation of pay-later (or contingent premium) options. A pay-later option is a standard European option which costs nothing to initiate: the holder pays the premium at maturity, and then only if the option is in the money. At maturity time T , therefore, a pay-later call option is equivalent to a portfolio consisting of a long position in a standard call (strike K), plus a short position in a certain number a(t) R cash-or-nothing calls (strike K), chosen such that the contract has zero value at initiation (time t T ). In other words, at maturity time T , the value of the pay-later call initiated at t T is cp/l (T, ST ), given by cp/l (T, ST ) = c(T, ST ) a(t)cc/n (T, ST ). Hence, the price function of the pay-later call is given by cp/l (t, x) = EQ [er(T t) (c(T, ST ) a(t)cc/n (T, ST ))|St = x] = c(t, x) a(t)cc/n (t, x) = 0, the last equality by denition. That is, a(t) is chosen so that the pay-later option has zero value at time t. Hence, the required value of a(t) is given by a(t) = c(t, x) cc/n (t, x) = xer(T t) (y) K. (y T t)

14.3
14.3.1

Multi-stage options: compounds and choosers


Multi-stage options

Some option contracts allow decisions to be made, or stipulate conditions, at intermediate dates during the life of the contract. These are called multi-stage options. An example is a forward start call option initiated at time zero. At some time T1 < T , the option holder receives an at-the-money call option (one whose strike is equal to the asset price at T1 , ST1 ) with maturity time T . The procedure for valuing an option with expiry at T and some intermediate stage with date T1 is to rst determine the nal payo of the option at time T , then determine the value of this payo at the intermediate time T1 , and then determine the value at t T1 by using the value of the contract at time T1 as a terminal payo. Denote the price function of the multi-stage option by v m/s (t, x). Then v m/s (t, x) = EQ [er(T1 t) v m/s (T1 , ST1 )|St = x], 0 t T.

We shall see an example of this shortly in the valuation of a chooser option. Two other examples of multi-stage options, forward start options and ratchet options are discussed in Problem Sheet 4. 14.3.2 Compound options

These are options on options. For example, the call-on-a-call option gives the holder the right to buy a call option for a xed price (or strike) K1 at maturity time T1 . The underlying call option has strike K2 and expiry T2 > T1 . The underlying call has price process (c(t, St ))0tT2 . Write c(t, St ) c(t, St ; K2 , T2 ) to emphasise the strike and maturity of this call. At time T1 the price of the compound option is v(T1 , ST1 ), given by v(T1 , ST1 ) = (c(T1 , ST1 ; K2 , T2 ) K1 )+ . Then the value at time t T1 given St = x will be v(t, x) = er(T1 t) EQ [(c(T1 , ST1 ; K2 , T2 ) K1 )+ |St = x], 0 t T1 .

An analytic expression for this fuunction can be found in terms of the cumulative distribution function of the bivariate normal distribution. 95

14.3.3

Chooser options

A chooser option allows the holder at time T1 the choice of buying, for an amount K1 , either a call or a put, these options having with strike K2 and matuirity T2 > T1 . The terminal payo at time T2 is therefore (ST2 K2 )+ or (K2 ST2 )+ . At time T1 , the chooser option will be exercised if either of the underlying call or put at that time are worth more than the chooser strike K1 , that is, if c(T1 , ST1 ) > K1 or p(T1 , ST1 ) > K1 . The holder of the chooser option will select the vanilla option with the larger T1 -value. Hence the value of the chooser option at T1 is given by v(T1 , ST1 ) = max(c(T1 , ST1 ; K2 , T2 ) K1 , p(T1 , ST1 ; K2 , T2 ) K1 , 0). For K1 = 0, we can use put-call parity to write v(T1 , ST1 ) = c(T1 , ST1 ; K2 , T2 ) + max(0, K2 er(T2 T1 ) ST1 ). Hence the chooser with strike K1 = 0 is equivalent to a call of maturity T2 and strike K2 plus a put of maturity T1 and strike K2 er(T2 T1 ) . The value of the chooser with strike K1 = 0 at t T1 is then given by v(t, St ) = c(t, St ; K2 , T2 ) + p(t, St ; K2 er(T2 T1 ) , T1 ).

14.4

Barrier Options

Barrier options are said to be path-dependent, as the history of the asset price process determines the payout at expiry. A barrier option is activated or de-activated if the asset price crosses a barrier. These options are called weakly path dependent, as their value only depends on time and the current asset price. There are two types, knock-in options (which are activated if the barrier is breached) and knock-out options (which are de-activated if the barrier is breached). The knock-in options are classied as follows: 1. An up-and-in option is activated if the barrier is hit from below. 2. A down-and-in option is activated if the barrier is hit from above. The knock-in options are classied as follows: 1. An up-and-out option is de-activated (so becomes worthless) if the barrier is hit from below. 2. A down-and-out option, is de-activated (so becomes worthless) if the barrier is hit from above. A reverse knock-out option is one which has a knock-out condition when the option is inthe-money. In this case there is a discontinuous payo and the delta hedging of such options is subject to pin risk. Call options with a barrier allow a considerable price reduction over the plain vanilla call. If you want to buy a call, and believe the asset price will not fall very much, then a down-and-out call is much cheaper than a standard call (though of course you run the risk of losing the option if the barrier is crossed). More complex barrier options, with multiple barriers, can be constructed. For example, a double knock-out option has two barriers, say B2 > B1 , and is de-activated if the barrier B1 is breached from above, and also if the barrier B2 is breached from below. American digital options can be viewed as barrier options, since they become active, and are exercised, as soon as the asset price reaches the strike. For instance, an American cash-or-nothing call pays hc/n (St ) if exercised at t [0, T ], where hc/n (St ) = 0, 1, 96 if St < K if St K.

If the current asset price lies above the strike, the option is immediately exercised, as there will be no greater payo (and might be a lesser payo) from waiting. If the current asset price is below the strike, the option woild not be exercised, but as soon as the strike is breached from below, the option is exercised. Barrier options can sometimes incorporate a rebate. In this case, if the option knocks out, then the holder receives a rebate R. This is equivalent to adding an American digital option whose strike is equal to the barrier. For example, consider a down-and-out call with barrier B and rebate R. If the barrier is breached from above, the call is cancelled, and the holder receives cash R. Hence, the holder of the down-and-out call with rebate holds a standard down-and-out call plus a long position in R American cash-or-nothing puts, each with strike B and payo g c/n (St ) if exercised at t [0, T ], given by g c/n (St ) = 14.4.1 1, 0, if St < B if St B.

PDE approach to valuing barrier options

We outline a PDE approach to valuing barrier options, by considering a down-and-out call option. We only need to consider knock-out options as the value of knock-in options can be found from the observation that a portfolio of a knock-out option plus a knock-in option with the same barrier and strike is equivalent to a standard option. Hence, for a down-and-out call with price cd/o (t, St and a down-and-in call with price cd/i (t, St at time t [0, T ], we have cd/o (t, St ) + cd/i (t, St ) = c(t, St ), with c(t, St ) the price of a call with the same strike as the two barrier optioons. First consider the case where the barrier lies below the strike: B < K. Denote the running minimum of the stock price by mt := min Ss ,
0st

0 t T.

The payo of the down-and-out call is hd/o (ST , mT ) = (ST K)+ 1{mT >B} . Suppose that at t [0, T ], we have mt > B, then the barrier has not been breached from above, and hence we also have St > B. In this case the option has not been cancelled, and the downand-out call is just a standard call at this point. Assuming (as we have) that the down-and-out call option price is a function only of time and current stock price (this can be justied if we were to use a probabilistic approach to the valuation, as discussed briey further below), then we have cd/o (t, St ) = c(t, St ), for St > B. In this region of the (t, St )-plane, the option will therefore satisfy the BS PDE LBS cd/o (t, St ) = c(t, St ), for St > B.

Recall that for a standard call we would have boundary conditions c(T, ST ) = (ST K)+ , c(0, S0 ) = 0, limSt c(t, St ) = St . For the barrier option, we would still have the rst and third of these as long as the option is still active, but since the option is cancelled (and hence becomes worthless) as soon as the barrier is breached from above, in place of the boundary condition at zero stock price we must instead have the boundary condition at the barrier: cd/o (t, B) = 0, (14.1)

So the problem of valuing the down-and-out-call becoems one of nding a solution to the BS PDE subject to this altered boundary condition.

97

We perform the change of variables given in Problem Sheet 3 which converts the BS PDE to the heat equation. We dene y(x) := log and then write u(, y) := c(t, x) exp(a by), K 1 a := (k + 1)2 , 4 1 b := (k 1), 2 k := 2r , 2 x , K (t) := 1 2 (T t), 2

where c(t, x) is the vanilla call option price function given St = x, for t [0, T ]. Then u satises the heat equation with an appropriate initial condition: u 2u (, y), (, y) = y 2 u(0, y) = exp 1 (k + 1)y exp 2 1 (k 1)y 2
+

=: hc (y).

(14.2)

We apply the same change of variables to the down-and-out call. That is, we write cd/o (t, x) = Kud/o ( (t), y(x)) exp (a (t) + by(x)) , for some function ud/o (, y). The value of y corresponding to the barrier level B is yB := log(B/K), so the boundary condition (14.1) becomes ud/o (, yB ) = 0. (14.3)

(Notice also that limx cd/o (t, x) = x translates to limy ud/o (, y) = exp(a + (1 b)y.) Since the down-and-out call is a standard call for x > B, we have ud/o (, y) = u(, y), ud/o (0, y) = hc (y), for y > yB .

So the function ud/o (, y) satises the heat equation with initial condition ud/o (0, y) = hc (y) for y > yB , along with the boundary condition (14.3). Having translated the BS PDE to the heat equation, we can now apply the so-called method of images to the problem. The argument is as follows. The problem (14.2) for the standard call option is analogous to the ow of heat in an innite rod (since < y < ) with initial condition u(0, y) = hc (y). The corresponding problem for the down-and-out call is analogous to the ow of heat in a semi-innite rod with the temperature held at zero at one end, corresponding to the boundary condition (14.3). The method of images relies on the observation that solutions to the heat equation are unaffected by translation or reection of the spatial co-ordinate: if u(, y) solves the heat equation, then so do u(, y + y0 ) and u(, y + y0 ), for any constant y0 . To apply the method of images to the down-and-out problem (in the semi-innite interval) we solve a problem in an innite interval made up of two semi-innite problems with equal and opposite initial temperature distributions which cancel at the point yB , so that we get the correct boundary condition (14.3). We thus reect the initial data about the point yB , and at the same time change its sign (thus creating an image solution) and combine the original solution and the image solution so as to respect the boundary condition (14.3). The required initial condition is thus given by ud/o (0, y) = hc (y) hc (2yB y), which automatically satises ud/o (0, yB ) = 0. The solution for the down-and-out call at arbitrary time is given by ud/o (, y) = u(, y) u(, 2yB y), (14.4) where u is the solution of the vanilla call problem, and by construction we respect the boundary condition (14.3). In other words we have written ud/o (, y) = u(, y) + u0 (, y), 98

where u0 (, y) = u(, 2yB y) is a solution of a problem on an innite interval with antisymmetric initial data. Translating the solution (14.4) into the original variables, we have cd/o (t, x) = Kud/o (, y)ea +by = K(u(, y) u(, 2yB y))ea +by = c(t, x) Ku(, 2yB y))ea +by . But for any y R we have Using 2yB y = log we obtain Ku(, 2yB y))ea +by = c(t, B 2 /x)e2b(yyB ) , or x B Hence the down-and-out price function is given by Ku(, 2yB y) = cd/o (t, x) = c(t, x) x B
12r/ 2

Ku(, y) = c(t, Key )ea by . B 2 /x K

c(t, B 2 /x).

12r/ 2

c(t, B 2 /x).

If the barrier is above the strike, B > K, then the payo of the down-and-out call is discontinuous. The payo at the terminal time is that of a standard call option with strike B together with (B K) cash-or-nothing calls of strike B. We can apply the same reasoning as above to construct the reected solution, and we nd that the price function of such a down-and-out call is given by cd/o (t, x) = c(t, x; B) + (B K)cc/n (t, x; B) x B
12r/ 2

c(t, B 2 /x; B) + (B K)cc/n (t, B 2 /x; B) ,

where c(t, x; B) and cc/n (t, x; B) denote the price function of a vanilla call and a cash-or-nothing call of strike B.

99

References
[1] T. Bjork, Arbitrage theory in continuous time, Oxford University Press, third ed., 2009. [2] F. Black and M. Scholes, The pricing of options and corporate liabilities, J. Polit. Econ., 81 (1973), pp. 637659. [3] M. H. A. Davis and A. Etheridge, Louis Bacheliers Theory of Speculation: the origins of modern nance, PUP, 2006. [4] R. Durrett, Probability: theory and examples, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, fourth ed., 2010. [5] A. Etheridge, A course in nancial calculus, Cambridge University Press, Cambridge, 2002. [6] G. R. Grimmett and D. R. Stirzaker, Probability and random processes, Oxford University Press, New York, third ed., 2001. [7] J. M. Harrison and S. R. Pliska, Martingales and stochastic integrals in the theory of continuous trading, Stochastic Process. Appl., 11 (1981), pp. 215260. [8] J. C. Hull, Options, futures and other derivatives, Pearson, eighth ed., 2011. [9] J. Jacod and P. Protter, Probability essentials, Universitext, Springer-Verlag, Berlin, second ed., 2003. [10] I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus, vol. 113 of Graduate Texts in Mathematics, Springer-Verlag, New York, second ed., 1991. [11] R. C. Merton, Theory of rational option pricing, Bell J. Econom. and Management Sci., 4 (1973), pp. 141183. [12] B. ksendal, Stochastic dierential equations, Universitext, Springer-Verlag, Berlin, sixth ed., 2003. An introduction with applications. [13] S. E. Shreve, Stochastic calculus for nance. I, Springer Finance, Springer-Verlag, New York, 2004. The binomial asset pricing model. [14] , Stochastic calculus for nance. II, Springer Finance, Springer-Verlag, New York, 2004. Continuous-time models.

[15] D. Williams, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press, Cambridge, 1991. [16] P. Wilmott, S. Howison, and J. Dewynne, The mathematics of nancial derivatives, Cambridge University Press, Cambridge, 1995. A student introduction.

100

You might also like