0% found this document useful (0 votes)
349 views266 pages

Chourdakis Notes 08a

This document is the contents page for a book titled "Financial Engineering: An introduction using the Matlab system" by Kyriakos Chourdakis from December 2007. The book covers topics in stochastic calculus including stochastic processes, Brownian motion, stochastic differential equations, the Black-Scholes model, finite difference methods for solving partial differential equations, and applications to option pricing. The contents page lists 6 chapters and their section topics at a high level.

Uploaded by

vindownload
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
349 views266 pages

Chourdakis Notes 08a

This document is the contents page for a book titled "Financial Engineering: An introduction using the Matlab system" by Kyriakos Chourdakis from December 2007. The book covers topics in stochastic calculus including stochastic processes, Brownian motion, stochastic differential equations, the Black-Scholes model, finite difference methods for solving partial differential equations, and applications to option pricing. The contents page lists 6 chapters and their section topics at a high level.

Uploaded by

vindownload
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 266

Kyriakos Chourdakis

FINANCIAL ENGINEERING
An introduction using the Matlab system
December 2007

Contents

Elements of stochastic calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 The sample space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -algebras and Borel sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generated -algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -algebras and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Measures and probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equivalent probability measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Stochastic process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributions of a process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Brownian motion and diusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of the Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Brownian motion is a martingale . . . . . . . . . . . . . . . . . . . . . . . . . . The Brownian Motion is Gaussian, Markov and continuous . . . . . The Brownian motion is a diusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Brownian motion is wild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dealing with diusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Stochastic dierential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variation processes and the It integral . . . . . . . . . . . . . . . . . . . . . . . . o The Stratonovich integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . It diusions and It processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o o Its formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o 1.6 The partial dierential equation approach . . . . . . . . . . . . . . . . . . . . . . Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 The Feynman-Kac formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 3 4 5 5 6 7 8 9 10 11 11 13 14 15 17 17 18 19 19 20 21 21 23 23 24 27 28 29 31


1.8 2

VIII(0.0)

Girsanovs transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 35 35 35 36 37 39 40 40 42 43 43 44 44 44 45 46 47 49 50 52 55 58 61 63 64 66 67 67 68 68 69 70 73 74 74 76 77 79 80 81 82 82

The Black-Scholes world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The original derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Black-Scholes assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The replicating portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arbitrage opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Black-Scoles partial dierential equation . . . . . . . . . . . . . . . . . . 2.2 The fundamental theorem of asset pricing . . . . . . . . . . . . . . . . . . . . . . The fundamental theorem of asset pricing and Girsanovs theorem A second derivation of the Black-Scholes formula . . . . . . . . . . . . . . . Expectation under the true measure P . . . . . . . . . . . . . . . . . . . . . . . . . . Expectation under the risk neutral measure Q . . . . . . . . . . . . . . . . . . The Feynman-Kac form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Exotic options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercise timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Payo structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Path dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The Greeks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Delta hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Delta-Gamma hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gamma and uncertain volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vega . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dividends and foreign exchange options . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Implied volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Stylized facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leptokurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volatility features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Price discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finite dierence methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Derivative approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Parabolic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A PDE as a system of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Explicit nite dierences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stability and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implicit nite dierences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Crank-Nicolson and the -method . . . . . . . . . . . . . . . . . . . . . . . . . Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 A PDE solver in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plain vanilla options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

IX(0.0)

83 87 90 93 96 97 99 100 102 107 107 108 109 110 111 113 114 116 116 117 121 123 123 124 127 129 131 132 134 134 135 136 136 136 140 140 140 141 141 142 143 146

3.4

3.5 3.6 4

Early exercise features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barrier features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing the Greeks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multidimensional PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finite dierence approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alternative direction implicit methods . . . . . . . . . . . . . . . . . . . . . . . . . . A two-dimensional solver in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Transform methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The dampened cumulative density . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Option pricing using transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Delta-Probability decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . The Fourier transform of the modied call . . . . . . . . . . . . . . . . . . . . . . 4.3 An example in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Fourier inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Applying Fast Fourier Transform methods . . . . . . . . . . . . . . . . . . . . . . FFT inversion for the probability density function . . . . . . . . . . . . . . FFT inversion for European call option pricing . . . . . . . . . . . . . . . . . 4.5 The fractional FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Adaptive FFT methods and other tricks . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Properties of the ML estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The score and the information matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . Consistency and asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis testing and condence intervals . . . . . . . . . . . . . . . . . . . . 5.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear ARMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lvy models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Likelihood ratio tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 The Kalman lter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Some general features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Historical volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The implied volatility surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


6.2

X(0.0) 147 147 148 149 150 152 154 159 160 160 161 162 163 163 164 165 168 169 171 172 173 173 175 179 180 185 186 191 191 193 193 196 196 198 199 200 202 203 204 206 206 208 209 210

6.3

6.4

Two modeling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Autoregressive conditional heteroscedasticity . . . . . . . . . . . . . . . . . . . The Arch model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Garch model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Garch likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Garch option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utility based option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distribution based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Heston and Nandi model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The stochastic volatility framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Hull and White model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Stein and Stein model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Girsanovs theorem and option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . Example: The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The PDE approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Feynman-Kac link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example: The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation and ltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calibration example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The local volatility model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpolation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implied densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fixed income securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Yields and compounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The yield curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Nelson-Siegel-Svensson parametrization . . . . . . . . . . . . . . . . . . . The dynamics of the yield curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The forward curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 The short rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Short rate and bond pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The hedging portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The price of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 One-factor short rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Vasicek model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lognormal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The CIR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Models with time varying parameters . . . . . . . . . . . . . . . . . . . . . . . . . . The Ho-Lee model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Hull-White model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

XI(0.0)

211 211 211 214 216 217 220 222 223 224 225 225 225 226 226

Interest rate trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calibration of interest rate trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The rst stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The second stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pricing and price paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Black-Karasinski model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Multi-factor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors and principal component analysis . . . . . . . . . . . . . . . . . . . . . . 7.7 Forward rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calibration of HJM models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Short versus forward rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Bond derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Black-76 formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Changes of numraire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10 The Libor market model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 A

Credit risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Using Matlab with Microsoft Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 Setting up Matlab with the C/C++ compiler . . . . . . . . . . . . . . . . . . . A.2 Writing the Matlab functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Writing the VBA code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 The Excel add-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.5 Invoking and packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 236 237 239 241 241

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

Figures

1.1 1.2 1.3 1.4 2.1 2.2 2.3

Construction of a Brownian motion trajectory. . . . . . . . . . . . . . . . . . . . . Zooming into a Brownian motion sample path. . . . . . . . . . . . . . . . . . . . A sample path of an It integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o Asset price trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behavior of a Call option Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Delta hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample output of the dynamic Delta hedging procedure. A call option is sold at time = 0 and is subsequently Delta hedged to maturity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behavior of a call option Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Delta-Gamma hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Histograms for Delta and Delta-Gamma hedging . . . . . . . . . . . . . . . . Delta hedging with uncertain volatility . . . . . . . . . . . . . . . . . . . . . . . . . . Behavior of a call option Vega . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16 20 26 27 50 53

2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10

54 55 56 57 61 63 71 75 76 79 80 85 89 94 95 96 101

Finite dierence approximation schemes. . . . . . . . . . . . . . . . . . . . . . . . . A two-dimensional grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Explicit FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Implicit FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Crank-Nicolson FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early exercise region for American options. . . . . . . . . . . . . . . . . . . . . . . European versus American option prices. . . . . . . . . . . . . . . . . . . . . . . . . Greeks for American and European puts. . . . . . . . . . . . . . . . . . . . . . . . . Oscillations of the Greeks in FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The structure of the Q-matrix that approximates a two-dimensional diusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Two-dimensional PDE grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2

Damping the transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Finite dierence approximation schemes. . . . . . . . . . . . . . . . . . . . . . . . . 115


4.3 4.4 4.5 4.6 5.1 5.2 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12

XIV(0.0) 118 120 121 127

Numerical Fourier inversion using quadrature . . . . . . . . . . . . . . . . . . . Normal density function using Fourier inversion . . . . . . . . . . . . . . . . . Normal inverse Gaussian density function using Fourier inversion Comparison of the FFT and the fractional FFT . . . . . . . . . . . . . . . . . .

Example of likelihood functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Bias and asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Historical volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mixtures of normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The VIX volatility index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implied volatility surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filtered volatility for DJIA and SPX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calibrated option prices for Hestons model . . . . . . . . . . . . . . . . . . . . . . The ill-posed inverse problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implied and local volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Static arbitrage tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yields curves using the Nelson-Siegel-Svensson parametrization . Historical yield curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Historical Nelson-Siegel parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation of CIR yield curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calibration of the Black-Karasinski model . . . . . . . . . . . . . . . . . . . . . . . Price path for a ten year bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Price path for bond options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yield curve factor loadings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yield and one-year forward curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pull-to-par and bond options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cash ows for interest rate caplets and caps. . . . . . . . . . . . . . . . . . . . . Typical Black volatilities for caplets and caps. . . . . . . . . . . . . . . . . . . . 142 143 144 146 154 177 178 182 183 194 195 197 208 219 228 229 229 230 230 231 231

A.1 Screenshots of the Matlab Excel Builder . . . . . . . . . . . . . . . . . . . . . . . . 238 A.2 The folders created by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 A.3 Screenshot of the add-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Listings

1.1 2.1 2.2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 5.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : Black-Scholes Greeks. . . . . . . . . . . . . . . . . . . . . . . . . . . . : Dynamic Delta hedging. . . . . . . . . . . . . . . . . . . . . . . . . : Payo and boundaries for a call. . . . . . . . . . . . . . . . . . . . . . : Payo and boundaries for a put. . . . . . . . . . . . . . . . . . . . . . . . : -method solver for the Black-Scholes PDE. . . . . . . . . . : Implementation of the -method solver. . . . . . . . . . : PSOR method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : -method solver with early exercise. . . . . . . . . . . . . : Implementation of PSOR for an

17 49 51 83 83 84 85 88 88

American put. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 : Solver with barrier features. . . . . . . . . . . . . . . . . . . . 91 : Implementation for a discretely monitored barrier option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 : PDE approximations for the Greeks. . . . 93 : Payo and boundaries for a two-asset option. . . . . . . . 101 : Solver for a two dimensional PDE (part I). . . . . . . . . 103 : Solver for a two dimensional PDE (part II). . . . . . . . 104 : Implementation of the two dimensional solver.105 : Characteristic function of the normal distribution. 116 : Characteristic function of the normal inverse Gaussian distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 : Trapezoidal integration of a characteristic function. . . . 119 : Call pricing using the FFT. . . . . . . . . . . . . . . . . . . . . . . . . 124 : Fractional FFT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 : Call pricing using the FRFT. . . . . . . . . . . . . . . . . . . . . . 126 : Integration over an integral using the FRFT. 129 : Transform a characteristic function into a cumulative density function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 , and : Simulation and maximum likelihood estimation of ARMA models . . . . . . . . . . . . . . . . . 137


6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 7.1 7.2 7.3 7.4 7.5 7.6 A.1 A.2 A.3 A.4 A.5 A.6 A.7

XVI(0.0) 151 153 156 165 176 177 180 181 184 186 195 196 212 216 217 218 237 237 240 241 242 243 243

: Garch likelihood function. . . . . . . . . . . . . . . . . . . . . . . : Estimation of a Garch model. . . . . . . . . . . . . . . . . . : Egarch likelihood function. . . . . . . . . . . . . . . . . . . . . : Characteristic function of the Heston model. . . . . . : Sum of squares for the Heston model. . . . . . . . . . . . : Calibration of the Heston model. . . . . . . . . . . . . . . : Nadaraya-Watson smoother. . . . . . . . . . . . . . . . . . . . . . . . . : Implied volatility surface smoothing. . . . . . . . . . . . . . . . . . : Tests for static arbitrage. . . . . . . . . . . . . . . . . . . . . . . . . . . : Construction of implied densities and the local
volatility surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : Yields based on the Nelson-Siegel-Svensson parametrization. . . . . . . . . . . . . . . . . . . . . . . . . : Calibration of the Nelson-Siegel formula to a yield curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : Create Hull-White trees for the short rate. . . . . . . . . : Compute the price path of a payo based on a Hull-White tree for the short rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : The price path of a payo based on the Hull-White tree when American features are present. . . . . . . . . . . . . : Implementation of the Black-Karasinski model using a Hull-White interest rate tree. . . . . . . . . . . . . . . . . . . . . . Matlab le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matlab le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VBA module ( ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VBA Activation Handlers ( ) . . . . . . . . . . . . . . . . . . . . . . . . . VBA User Input Handlers I ( ) . . . . . . . . . . . . . . . . . . . . . . . VBA User Input Handlers II ( ) . . . . . . . . . . . . . . . . . . . . . . VBA Add-in installation ( ) . . . . . . . . . . . . . . . . . . . . . . . .

1 Elements of stochastic calculus

In order to understand the evolution of asset prices in continuous time in general, and in the Black-Scholes framework in particular, one needs some tools from stochastic calculus. In this part we will give an overview of the main ideas from a probabilistic point of view. In the next chapter we will put these ideas in developing derivative pricing within the Black-Scholes paradigm. Our objective at this stage is to construct stochastic processes in continuous time that have the potential to capture the probabilistic/dynamic behavior of assets. We want to be as rigorous as possible in our denitions without leaving any exploitable loopholes, but we dont want to be too abstract. The theory is covered (in increasing mathematical complexity) in ksendal (2003), the two volumes of Rogers and Williams (1994a,b) and Protter (2004). Expositions that have some elements of maths relating to nance are (again in increasing complexity) Hull (2003), Neftci (2000), Bingham and Kiesel (2000), and Shreve (2004a,b)

. T
We like to think of stochastic processes (or asset prices in our case) as the outcome of an experiment or as the result of the state of nature. Each state of the world is a conguration that potentially aects the value of the stochastic process. Denition 1. We will denote the set of all states of the world with and we call it the state space or sample space. The elements are called the states of the world, sample points or sample paths. Of course these states of the world are very complicated multidimensional congurations, and are typically not even numerical. In most cases they are not directly revealed to us (the observer). Denition 2. A random variable quanties the outcome of the experiment, by mapping events to real numbers (or vectors), and this is what we actually observe.

2(1.1)

Therefore, a random variable X is just a function X : R : X() Example 1. As an example, Say that we toss a coin three times, the sample space will be the set (with 23 = 8 elements) = {HHH HHT HT H TTH TTT}

This sample space is not numerical, but we can dene the random variables X and Y in the following way 1. X = X() = { : = number of H in } 2. Y = Y () = {| | : = number of H T in } In the following table we summarize the possible sample space outcomes, together with the corresponding values of two dierent random variables X and Y

HHH HHT HTH THH HTT THT TTH TTT X () 3 2 2 2 1 1 1 0 Y () 3 1 1 1 1 1 1 3

Apparently, the random variable X counts the number of heads thrown, while the random variable Y counts the absolute dierence between the heads and tails throws. Of course this implies that the probabilistic behavior of the random variable will depend solely on the probabilistic behavior of the states of the world. In particular, we can write the probability (although we have not formally dened yet what a probability is) Pr[X() = ] = Pr[{ : X() = }] In the example above the sample space was small and discrete, and for that reason the analysis was pretty much straightforward. Unfortunately, this is not usually the case, and a typical sample space is not discrete. If the sample space is continuous, expressions like Pr[] for elements will be mostly zero, and therefore not of much interest. Therefore, rather that assigning probabilities to elements of we need to assign them to subsets of . A natural question that follows is the following: can we really assign probabilities to any subset of , no matter how weird and complicated it is? The answer to this question is generally no. We can construct sets, like the Vitali set, for which we cannot dene probabilities.1 Subsets of
1

Note that this does not mean that the probability is zero, it means that even if we assume that the probability is zero we are driven to paradoxes. In fact, the probability of such a set can not exist, and we cannot allow such a set to be considered for that purpose.

3(1.1)

the sample space that are nice enough to allow us to dene probabilities on them are called sigma algebras.

Denition 3. A subset of the power set F P() is called a -algebra on if complements and countable unions also belong to the set F : 1. F F F F (Complements) 2. F1 F2 F F F (Countable unions) =1 It turns out that -algebras are just the families of set we need to dene probabilities upon, as they are nice enough not to lead us to complications and paradoxes. Probabilities will be well dened on elements of F . The elements of a -algebra are therefore called events. As we will see, probabilities are just special cases of a large and very important class of set functions called measures. It is measures in general that are dened on -algebras. The pair ( F ) is called a measurable space, to indicate the fact that it is prepared to be measured. Example 2. For a sample space there will exist many -algebras, and some will be larger than others. For the specic sample space of the previous example, where a coin is tossed three times, some -algebras that we may dene are the following 1. The minimal -algebra is F0 = { }. It is apparent that this is the smallest possible set that will satisfy the conditions. 2. F1 = { {HHH HHT HT H HT T } {T HH T HT T T H T T T } } is another -algebra on . Apparently F0 F1 . 3. The powerset of is also a -algebra, F = P(). In fact this is the largest (or maximal) -algebra that we can dene on a discrete set. If was continuous, say the closed interval = R, the powerset is not a sigma algebra as it includes elements like the Vitali set. The largest useful algebra that we use in this case is the Borel -algebra. An example of a set that is not a -algebra is E = { {HHH} {HT H} {T HH} {T T H} } since the complement E {HHH} = \ {HHH} E . / We usually work with subsets of the real numbers, and, as we hinted in the previous subsection, -algebras that are dened on the real numbers (or any other Euclidean space) are very important. We saw that when we want to dene a large -algebra, the powerset is not an option, since it includes pathological cases. The Borel algebra takes its place for such sets. Denition 4. More formally, the Borel ( -)algebra is the smallest -algebra that contains all open sets of R (or R ).

4(1.1)

Roughly speaking, Borel sets are constructed from open intervals in R, by taking in addition all possible unions, intersections and complements. We denote the Borel algebra with B = B(R ). In fact, it is very dicult to nd a set that does not belong to the Borel algebra, and the ones that dont are so complicated that we cannot enumerate their elements.

So far we have dened -algebras and we have shown ways to describe them by expressing some property of their elements. We can also dene a -algebra based on a collection of reference subsets of . Denition 5. In particular, given a family G of subsets of , there is a -algebra which is the smallest one that contains G . This is the -algebra generated by G , and we denote it with F (G ) = FG . The generated -algebra will be equal to the intersection of all -algebras that contain G (since it is the smallest one with that property) F (G ) = {F : F is a -algebra on , and G F }

A random variable can also create a -algebra. Given a random variable X, there is a -algebra which is the smallest one that contains the pre-image of X X 1 (G) : G R , and G is open This is the -algebra generated by X, and it is denoted by F (X) = FX = {X 1 (B) : B B} Example 3. Following our coin example, the random variable X will generate the -algebra FX = { {HHH} {HHT HT H T HH} {HT T T HT T T H} {T T T } all complements all unions all intersections} It should be straightforward to verify that FX 1. is a -algebra 2. contains all sets X 1 (G), for G B 3. is the smallest such set For Y , the generated -algebra is FY = { {HHT HT H T HH HT T T HT T T H} {HHH T T T } all complements all unions all intersections}

5(1.2)

In the theory of stochastic processes -algebras are closely linked with information. Intuitively, the generated -algebra FX captures the information we acquire by observing realizations of the random variable X. Knowing the realization X = , allows us to decide in which element of the -algebra FX the sample point belongs. The sample point is the one that created the realization X() = . If we have two random variables X, Y on the same measure space ( F F ), if FY FX then knowing the realization of X gives us enough information to determine what the realization of Y is, without observing it directly. In particular there exists a function such that Y = (X) If in addition the inverse set relationship does not hold, and FX FY , then this function is not invertible, and knowledge of the realization of Y does not determine X uniquely. That is to say observing Y does not oer us enough information to infer the value of X. In the case where the -algebras are the same, FY = FX , then the two variables contain exactly the same information: observing one is the same as observing the other. Example 4. In our coin example it is easy to conrm that FY FX , but FX FY . This will mean that if we are given the realization of the random variable X, then we should be able to uniquely determine the realization of the random variable Y , but not vice versa. As an example, say that we observe X = 2. Then we know that the sample point that was selected from the sample space will belong to set F = {HHT HT H T HH}, since only for these points X() = 2. It is easy to verify that F FX and also F FY . In fact, for all F we have Y () = 1. Therefore observing X = 2 uniquely determines the value of Y = 1. On the other hand, say that we observe the random variable Y , and we have the realization Y = 3, indicating then the sample point selected belongs in the set F = {HHH T T T }. Now, of course, F FY but it does not belong to the -algebra generated by X, that is F FX . In fact, if are given Y = 3 are not / given enough information to decide if X = 3 or X = 0.

. M
In the previous section we paved the way for the introduction of the measure function. We introduced the sample space and the sets of its subsets that form -algebras, which are exactly the sets that are well-behaved enough to be measured. We saw that things are relatively straightforward when the sample space is discrete, and a natural -algebra is the powerset. When the sample space is continuous we have to be more careful when constructing -algebras,

6(1.2)

as there are sets that we need to exclude (like the Vitali set). The Borel algebra is here the natural choice. Denitions of the probability date back as far as Carneades of Cyrene (214129BC), a prominent student at Platos academy. More recently, Abraham De Moivre (1711) and Pierre-Simon Laplace (1812) have also attempted to formalize the everyday notion of probability. Modern probability took o in the 1930s, largely inspired by the axiomatic foundations on measure theory by Andreii Nikolayevich Kolmogorov.2 Denition 6. Given a measurable space ( F ) we can dene a measure as a function that maps elements of the -algebra to the real numbers, : F R, and also has the following two properties: 1. the measure of the empty set is zero: () = 0 2. the measure of disjoint sets is the sum of their measures, also called additivity: F1 F2 F , and F F = for all = F = =1 =1 (F ) The measure can be thought of as the mathematical equivalent of our everyday notion of measure, as in the length of line segments, the volume of solid bodies, the probability of events, the time needed to travel between points, and so on. After augmenting the measurable space with a measure, the triplet ( F ) is called a measure space. The subsets of that are elements of F are called measurable sets, indicating that they can be potentially measured by . Note that we can dene more than one measure on the same measurable space, creating a whole array of measure spaces ( F 1 ), ( F 2 ), and so on.

M
Based on the notion of measurable sets, we can turn to functions that map from one measurable space ( F ) to another measurable space ( G ). Functions that have the property that their pre-images of measurable sets in the destination measure space are also measurable sets in the departure set . Denition 7. A function that maps from a measure space ( F ) to ( G ) : is a (F G )-measurable function if for all G G we have
2

(G) F

In the words of Kolmogorov: The theory of probability as [a] mathematical discipline can and should be developed from axioms in exactly the same way as geometry and algebra.

7(1.2)

If the function maps from to the Euclidean space = R , augmented with the Borel -algebra G = B(R ), then we call the function just F -measurable (shortened to F -meas) A random variable X is indeed a function X : R , therefore we can talk of measurable random variables. In particular, by the denition of the generated -algebra, a random variable will always be measurable to the -algebra it generates, X is FX -meas. If in addition maps to a Euclidean space = R , also augmented with the corresponding Borel algebra, F = B(R ), then the function is called just measurable. Example 5. In our coin example we dened two random variables, X and Y , from the sample space of three coin tosses X Y : R Each one of these random variable will induce a measurable space on , by the -algebras it generates X ( FX ), and Y ( FY )

By construction, X is FX -meas and Y is FY -meas. On the other hand, while Y is FX -meas, X is not a FY -meas random variable. In terms of information Y being FX -meas means that knowing X will determine Y , but not the other way round

P
As we indicated in the last subsections, measures are the mathematical equivalent of our everyday notion of measure. In the context of stochastic processes we are not interested in general measures, but in a small subset: the probability measures. Denition 8. A probability measure P is just a measure on ( F ), with the added property that P() = 1 The measure space ( F P) is called a probability space Therefore, for a function P : R to be a probability measure there are three requirements 1. P() = 0 2. P() = 1 3. for all F1 F2 =1 P(F )

F , with F F = for all

=1

8(1.2)

It is obvious that probability measures are not unique on a measurable space. Given ( F ) we can dene dierent probability spaces ( F P1 ), ( F P2 ), and so on. Given a probability space and a random variable X : R , we can dene a probability measure on the Euclidean space R endowed with its Borel algebra, (R B(R )), in the following way PX : B(R ) [0 1] : PX (B) = P(X 1 (B)) It is straightforward to verify that PX is a probability measure on (R B(R )). It is important to remember that the probability measure is dened on events of the sample space, but it induces a probability measure on the real numbers through random variables. That means that the same random variable X can induce dierent probability measures on R , based on dierent probability spaces. For the same B B(R ) ( F P) ( F Q) PX (B) = P(X 1 (B)) QX (B) = Q(X 1 (B))

In practice we cannot manipulate the sample space directly, since we typically we might not even know what the sample space is. Instead, we assume that the sample space exists and a measurable space is well dened, but we work with the induced probability measure. Furthermore, with some abuse of notation, we also denote this induced measure with P. Dierent induced probability measures PX QX will be then due to dierent measures P Q on the measurable space ( F ). These dierent measures can be associated with dierences of beliefs, dierences of behavior, or other issues. For example, in nance investors are interested for the probabilistic behavior of a speculative asset price, which is a random variable S : R+ . In a simple setting, all investors might know and agree on the true (induced) probability measure P, but they might behave as if the probability measure was a dierent one, say Q. There can be many ways that this discrepancy can be theoretically explained: we will see that it can be a consequence of the risk aversion of investors, market frictions like transaction costs or liquidity constraints, or other causes.

E
As we pointed out in the previous subsection, each random variable can induce a multitude of dierent probability measures. We can categorize dierent probability measures according to some of their properties. It turns out that the most important of these classications is the one that looks at the sets that probability measures assign zero probability. Measures that agree on these sets are called equivalent.

9(1.2)

Denition 9. Given two probability measures P, Q on a measurable space ( F ), we say that Q is absolutely continuous with respect to P, and we write P Q, if (P(F ) = 0 Q(F ) = 0 for all F F ) The Radon-Nikodym derivative of Q with respect to P is dened as M= dQ dP

which makes sense since both P and Q are real-valued functions. If Q P and P Q the probability measures are called equivalent, and we write P Q. Absolute continuity implies that impossible events under P will also be impossible under Q. If the measures are equivalent then the agree on the subsets of that have zero probability.

C
The conditional probability is one of the main building blocks of probability theory, and deals with situations where some partial knowledge about the outcome of the experiment shrinks the sample space. Given a probability space ( F P) and two events A F F . If we assume that a randomly selected sample A, we want to investigate the probability that F . Since we know that A, the sample space has shrunk to A , and the appropriate sigma algebra is constructed as FA = {F : F = G A G }. The members of the FA are conditional events, that is to say event F is the event G conditional on event A. We denote the conditional events as F = G|A. It is not hard to verify that FA is indeed a -algebra on A. 1. The empty set = ( A) FA trivially. 2. Also, for an element (G|A) FA the complement (in the set A) (G|A) = G A FA , since G F . 3. Finally, the countable union I (G |A) = I (G A) = I G A FA .

Therefore (A FA ) is a measurable space. Denition 10. Consider a probability space ( F P) and an event A F with P(A) > 0. The conditional probability is dened, for all F F , as PA (F ) = P(F |A) = P(F A) P(A)

We can verify easily that PA is a probability measure on ( F ), which makes ( F PA ) a probability space.3 For all events F F where P(F A) = 0, the
3

This is indeed an example of dierent probability measures dened on the same measurable space.

10(1.2)

conditional probability P(F |A) = 0. This means that these two events cannot happen at the same time. We argued above that by conditioning on the event A we shrink the measurable space ( F ) to the smaller measurable space (A FA ). In fact, equipped with the measure PA , the latter becomes a probability space. It is easy to verify that PA is a probability measure on (A FA ), since P(A|A) = 1. Thus, we can claim that by conditioning on A the probability space ( F P) shrinks to the probability space (A FA PA ). We can also successively condition on a family of events A1 A2 A . In fact, we can derive the following useful identity Another consequence of the denition is the celebrated Bayes theorem, that states that if F F , I is a collection of events with I F = , and A F is another event, then P(F )P(A|F ) P(F |A) = I P(F )P(A|F ) Bayes theorem is extensively used to update expectations and forecasts based on new evidence as this is gathered. This is an example of the ltering problem. P(A1 A2 A ) = P(A1 ) P(A1 |A2 ) P(A |A1 A
1 )

E
Given a probability space ( F P), consider an F -meas random variable X, and assume that the random variable is integrable

|X()| dP() <

A very important quantity is the expectation of X.

Denition 11. The expectation of X with respect to the probability measure P is given by the integral EX =

X() dP() =

dPX ( )

The conditional expectation given a sub- -algebra G F is a random variable E[X|G ] that has the properties 1. E[X|G ] is G -measurable 2. For all G G G E[X|G ]dP =
G

The conditional expectation is a random variable, since for dierent the quantity E[X|G ] will be dierent. One can use the Radon-Nikodym derivative to compute expectations under dierent equivalent probability measures. In particular, expectations under Q are written as dQ EQ X = dQ( ) = dP( ) = M( )dP( ) = EP [M(X)X] R dP

XdP

11(1.3)

I
Two events F1 F2 F are independent if P(F1 F2 ) = P(F1 ) P(F2 ) Two -algebras F1 , F2 are independent if all pairs F1 F1 and F2 F2 are independent Two random variables X1 and X2 are independent if the corresponding generated -algebras FX1 and FX2 are independent If X is G -measurable, then E[X|G ] = X If X is independent of G , then E[X|G ] = EX If H G then E[E[X|G ]|G ] = E[X|G ] (tower property)

. S
Of course a random variable is sucient if we want to describe uncertainty at a single point in time. For example, we can assume that an asset price at a future date is a random variable that depends on the state of the world on that date. But typically we are interested not only on this static prole of the asset price, but also on the dynamics that might lead there. Therefore, by collecting a number of random variables, resembling the asset price at dierent times, we construct a stochastic process. Denition 12. A stochastic process is a parameterized family of random variables {X( )} T , where all random variables are dened on the same probability space ( F P) X : R In our setting the subscript denotes time, but it could well be a spatial coordinate. The set T will determine if the stochastic process is dened in continuous or in discrete time. In particular, if T = {0 1 2 } then we have a discrete time processes, while if T = [0 ) the process is cast in continuous time. There are two dierent ways to look at the realizations of a stochastic process. 1. If we x time we have a random variable X( ), for all 2. If we x a state of the world we have the trajectory or path X( ), for all T

There are also dierent ways to denote a stochastic process, and we use the one that claries the way we view it at the time, for example X , X( ), X (), or X()( ).

12(1.3)

Example 6. Let us revisit our coin experiment, where we ip 3 times. The state space will collect all possible outcomes = {HHH HHT HT H We we dene the collection of random variables X( ) = number of H in the rst throws TTT}

These random variables dene a stochastic process on T = {0 1 2 3}. In this simple case we can tabulate them and keep track of its behavior for all times and sample points
HHH HHT HTH THH HTT THT TTH TTT X (0 ) 0 0 0 0 0 0 0 0 X (1 ) 1 1 1 0 1 0 0 0 X (2 ) 2 2 1 1 1 1 0 0 X (3 ) 3 2 2 2 1 1 1 0

We can x time, say = 2, and concentrate on the random variable X2 () which is given by the horizontal slice of the table above
HHH HHT HTH THH HTT THT TTH TTT X (2 ) 2 2 1 1 1 1 0 0

Alternatively we can x the sample point, say = T HT , and concentrate on the function X (T HT )
0 0 1 0 2 1 3 1

X ( T HT )

Example 7. On the same probability space we can dene another stochastic process, say Y ( ), where Y ( ) = 1 if we roll an even number of H up to time , 0 otherwise (where zero is considered an even number). In that case the possible values of the process are given in the following table
HHH HHT HTH THH HTT THT TTH TTT Y (0 ) 1 1 1 1 1 1 1 1 Y (1 ) 0 0 0 1 0 1 1 1 Y (2 ) 1 1 0 0 0 0 1 1 Y (3 ) 0 1 1 1 0 0 0 1

13(1.3)

F
We discussed in the previous section how -algebras can be associated with information. In particular, we noted that if a random variable X is F -measurable, then we can determine the value of X() without knowing the exact value of the sample point , but by merely knowing in which sets F F the sample point belongs to. In the context of stochastic processes information changes: typically information is accumulated and a ltration is dened, but sometimes information can also be destroyed. Therefore, the sigma algebra with respect to the random variables X( ) are measurable must evolve to reect that. Denition 13. Consider a probability space ( F P). A ltration is a collection of non-decreasing -algebras on F = {F } T with F 1 F 2 for all
1 2

where of course F F for all T The quadruple ( F F P) is called a ltered space. A stochastic process X is called adapted (or F -adapted if the ltration is ambiguous) if all random variables X are F -measurable. In the previous section we discussed how a random variable generates a -algebra which keeps the information gather by observing the realization of this random variable. Here each collection of random variables {X } will generate a -algebra (for each T ). This collection of -algebras is called the natural ltration of the stochastic process X . We denote this ltration by F = (X : 0 ), and in fact it is the smallest ltration that makes X adapted. It represents the accumulated information we gather by observing the process X up to time . Note that this is in fact dierent from the -algebra generated by the random variable X alone, in fact F (X ) F . Intuitively when was chosen, the complete path {X } T was chosen as well, but this path has not been completely revealed to us. Our information consists only of the part {X }0 . Based on this information we cannot pinpoint precisely which has selected, but we can tell with certainty if belongs on some specic subsets of that form the natural ltration F . Another process Y () will be F -adapted if we can ascertain with certainty the value Y by observing X . There are two ways of looking at this dependence 1. There exist functions { } T such that Y = ({X }0 ) for all . The value Y is a deterministic function of the history of X up to time . 2. The natural ltration of Y is subsumed in the natural ltration of X , that is to say F (Y ) F (X ) for all T . Example 8. In our coin example the -algebras generated by the random variables X are the following

14(1.3)

F (X0 ) = { } F (X1 ) = { {HHH HHT HT T HT H} {T HH T HT T T H T T T } } F (X2 ) = { {HHH HHT } {T T H T T T } {HT H HT T T HH T HT } all complements all unions all intersections} The corresponding ltrations will include all unions, intersections and complements of the individual algebras, namely F0 = F (X0 ) F1 = F (X0 ) F (X1 ) F2 = F (X0 ) F (X1 ) F (X2 ) For example the set {HT H HT T } does not belong in neither F (X1 ) nor F (X2 ), but it belongs in F2 , since {HT H HT T } = {HHH HHT HT T HT H} {HT H HT T T HH T HT } where {HHH HHT HT T HT H} F (X1 ) {HT H HT T T HH T HT } F (X2 ) Intuitively the element represents the event rst toss is a head and second toss is a tail. Since X measures the number of heads, this event cannot be decided upon by just observing X1 or by just observing X2 , but it can be deduced by observing both. In particular, it is equivalent with the event one head up to time = 1 (the event in F (X1 )) and (intersection) one head up to time = 2 (the event in F (X2 )).

D
Based on the probability space ( F P) we can dene the nite-dimensional distributions of the process X . For any collection of times { } =1 , and Borel events {F } =1 in B(R ), the distribution P(X 1 F1 X 2 F2 X F )

characterizes the process and determines many important (but not all) properties. The inverse question is of importance too: Given a set of distributions, is there a stochastic process that exhibits them? Kolmogorovs extension theorem gives an answer to that question. Suppose that for all = 1 2 , and for all nite set of times { }=1 in T , we can provide a probability measure 1 2 on (R B(R )) that satises the following consistency conditions 1. For all Borel sets {F }=1 in R the recursive extension 1
2

(F1 F2 F ) = 1

+1

(F1 F2 F R)

15(1.4)

2. For all Borel sets {F }=1 in R , and for all permutations on the set {1 2 } (1)
(2) ()

(F1 F2 F ) =1
2

(F1 (1) F1 (2) F1 () )

Then, there exists a probability space ( F P) and a stochastic process X from to R , which has the measures as its nite distributions, that is to say for all = 1 2 P(X 1 F1 X 2 F2 X F ) = 1
2

(F1 F2 F )

Kolmogorovs extension theorem gives a very small set of conditions that can lead to the existence of stochastic processes. This can be very useful as we do not need to explicitly construct a process from scratch. Indeed, it is easy to prove the existence of many processes, like ones with innitely divisible nite distributions, based on this theorem. The most important stochastic process is undoubtly the Brownian motion.

. B
There are many dierent denitions and characterizations for the Brownian motion.4 Here, in oder to utilize Kolmogorovs extension theorem, we will dene Brownian motion by invoking its transition density. For simplicity we will only consider the one-dimensional case, but it is straightforward to see the generalization to more dimensions. We dene the Gaussian transition density with parameter for all R , which essentially describes the probability mass of moving from point to over a time interval of length ( 1 ( )2 exp ; )= 2 2

For any = 1 2 , all times 1 2 F in , and all Borel sets F1 F2 R we also dene the probability measures on R in the following way 1
2

(F1 F2 F ) = (
1; 1)

( (

2; 2

1) ) d 1d
2

F1 F2 F 1 ; 1

The name Brownian motion is in honor of the botanist Robert Brown (1773-1858) who did extensive botanic research in Australia and observed the random movements of particles within pollen, the rst well documented example of Brownian motion. The stochastic process is also called Wiener process, in honor of Norbert Wiener (18941964) who studied extensively the properties of the process.


. : Construction of a Brownian motion trajectory.

16(1.4)

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1.2

0.2

0.4

0.6

0.8

It is easy to verify that the assumptions of Kolmogorovs extension theorem are satised, which means that there exists a probability space ( F P), and a process with Gaussian increments, which we denote with {B } 0 and we dene as a Brownian motion [BM] (started at B0 = ) We can also construct the Brownian motion directly, as the innite sum of random tent shaped functions. To this end we will need an innite collection of standard normal random variables B , for all natural numbers = 0 1 , and for all odd numbers , where 2 . We need to dene the auxiliary function ( ) which are piecewise constant ( 1)/2 for 2 ( 1) < 2 2 0 ( 1)/2 ( ) = 2 for 2 < 2 ( + 1) 1 =1 0 elsewhere The tent shaped functions ( over the interval [0 1] for each and ) are the following integrals ( )=
0

( )d

Finally, the Brownian motion is dened as the sum over all appropriate and , that is to say

B( ) =
=0 odd 2

( )B

This is of course a function B : [0 1] R. Essentially, at each level a ner renement is added to the existing function, with an impact which falls as 2

17(1.4) L . :

10

15

20

25

as the functions show. Listing 1.1 shows how the function is implemented (a function call gives the support of the Brownian motion over [0 1] as the vector , and also the rst levels of approximation as the columns of the matrix ). The construction of the Brownian motion in this way is given schematically in gure 1.1, where the construction for levels = 2, = 5 and = 10 are illustrated.

Having dened and constructed the Brownian motion, we now turn to investigating some important properties. We will assume that {B } 0 is a Brownian motion and F is its natural ltration. The Brownian motion is a martingale Given a ltered space ( F F P), an stochastic process X is a martingale if 1. X is adapted to the ltration

18(1.4)

2. The process is integrable, E[|X |] < , for all 0 3. The conditional expectation E[X |F ] = X , for all

It is not hard to verify that the Brownian motion is a martingale with respect to its natural ltration. This means that the conditional expected increments of a Brownian motion are zero, or that the best forecast one can provide is just the current value E[B B |F ] = 0, or E[B |F ] = B Also, one can easily show that B2 is a martingale. Lvys theorem also states the converse: Given a ltered space, if {X } 0 is a continuous martingale, and X 2 is also a martingale, then X is a Brownian motion. If we drop the second part, and we put instead that E[X 2 |F ] = ( ), for an adapted function , then the time-changed process X( ) will satisfy the requirements of Lvys theorem. We can then conclude that every continuous martingale can be represented as a time-changed Brownian motion. Another important martingale is the exponential martingale process, given 1 by M = exp B 2 2 for any parameter value R. The Brownian Motion is Gaussian, Markov and continuous By its denition, the Brownian motion is Gaussian, that is for all times B ) has a multi-normal 1 2 T the random variable B = (B1 B2 distribution A Markov process has the property that P(X F |F ) = P X F |F (X ) for all 0, which means that the conditional distribution depends only on the latest value of the process, and not on the whole history. Remember the dierence between the -algebras F which belongs to the ltration of the process and therefore includes the history, and F (X ) which is generated by a single observation at time . For that reason Markov process are coined memory-less. The Brownian motion is Markov, once again by its denition. A Feller semigroup is a family of linear mappings indexed by 0 P : C (R) C (R) where C (R) is the family of continuous functions that vanish at innity, such that 1. 2. 3. 4. P0 is the identity map P are contraction mappings, ||P || 1 for all 0 P has the semigroup property, P + = P P for all The limit lim 0 ||P || = 0, for all C (R)

0, and

A Feller transition density is a density that is associated with a Feller semigroup. A Markov process with a Feller transition function is called a Feller

19(1.4)

process. One can verify that the Brownian motion is indeed a Feller process, for the Feller semigroup which is dened as P ( )= (
R

; ) ( )d

Based on the Feller semigroup expectations of functions of the Brownian motion will be given by E[ (B + )|F ] = P (B ) = (B
R

; ) ( )d

The Brownian motion is also a process that has almost surely continuous samples paths. This is due to Kolmogorovs continuity theorem, which states that if for all T we can nd constants > 0 such that E|X 1 X 2 | | 1 2 |1+ for all 0
1 2

then X has continuous paths (or at least a version). For the Brownian motion E|B 1 B 2 |4 = 3| 1 2 |2 and therefore B will have continuous sample paths. The Brownian motion is a diusion A Markov process with continuous sample paths is called a diusion. A diusion {X } 0 is characterized by its local drift and volatility . Loosely speaking, for small we write the instantaneous drift and volatility E[X + X |F ] = (X ) + o( ) E[(X + X (X ) )2 |F ] = 2 (X ) + o( ) If the drift and volatility is constant, the process X = + B for a Brownian motion {B } 0 will be a diusion. More generally the instantaneous drift and volatility do not have to be constant, but can depend on the location X and the time . Diusions are then given as solutions to stochastic dierential equations. The Brownian motion is wild If we x the sample point , a Brownian motion as a function of time B( ) is a lot wilder than many normal functions. We have shown already, using Kolmogorovs continuity theorem, that a sample path of the Brownian motion is almost everywhere continuous, but it turns out that it is nowhere dierentiable. The total variation of a Brownian motion trajectory is unbounded, and the quadratic variation is non-zero. |B |B
+1

B |

+1

B |2


. : Zooming into a Brownian motion sample path.
1.5 1 0.5 0 0.3 1.1

20(1.4)

1.5 1 0.5 0 -0.5

0.5

0.4

0.5

0.6

1.2 1 1 0.9 0.8 0.45 0.5 0.55 0.8 0.48 0.5 0.52

for partitions { } of the time interval [0 ], where sup | +1 | 0. For normal functions the total variation would be the length of the curve; this means that to draw a Brownian motion trajectory on a nite interval we will need an innite amount of ink. Also, the quadratic variation of normal functions is zero, since they will not be innitely volatile in arbitrarily small intervals. When we consider a Brownian motion path, it is impossible to nd a interval that is monotonic, no matter how much we zoom in the trajectory. Therefore we cannot split a Brownian motion path in two parts with a line that is not vertical. Figure 1.2 gives a trajectory of a Brownian motion and illustrates how wild the path is by successively zooming in the process.

D
As we mentioned in the previous subsection, diusions arise as solutions to stochastic dierential equations. In nance we typically use diusions to model factors such as stocks prices, interest rates, volatility, and others, that aect the value of nancial contracts. There are three techniques for solving problems that relate to diusions: 1. The stochastic dierential equation [SDE] approach 2. The partial dierential equation [PDE] approach 3. The martingale approach All approaches are in principle interchangeable, but in practice some are more suited for particular problems. As a matter of fact, in nance we use all

21(1.5)

three to tackle dierent situations. PDEs oer a global view of the problem in hand, while the other two approaches oer a more probabilistic local view.

. S
A stochastic dierential equation [SDE] resembles a normal dierential equation, but some parts or some parameters are assumed random. Therefore, the solution is not a deterministic function but some sort of a generalized, stochastic function. The calculus of such functions is called It calculus, in honor of o Kiyoshi It (1915-). Loosely speaking, one can represent a SDE as o dX = ( X ) + noise terms d The solution of such a dierential equation could be represented, once again loosely, as X = X0 +
0

( X )d +
0

noise termsd

If we write the noise terms in terms of a Brownian motion, say B , we have a process that has given drift and volatility, called an It diusion o X = X0 +
0

( X )d +
0

( X )dB

The last integral, called an It integral with respect to a Brownian motion, is o not readily dened, and must clarify what we actually mean by it. Before we do so, note that we usually write the above expression in a shorthand dierential form as dX = ( X )d + ( X )dB It is obvious that unlike normal (Riemann or Lebesgue), It integrals have a o probabilistic interpretation, since they depend on a stochastic process. To give a simple motivating example, a Brownian motion can be represented as an It o integral as B =
0

dB

Before we turn to the denition of the It integral, we need to give some more o information on the variation processes, some of which we have already encountered when discussing the properties of the Brownian motion. For any process X () we dene the -th order variation process, which we denote with X X ( ) , as the probability limit


( )

22(1.5) |X
+1

X X

= plim

() X ()| as

for a dyadic partition of [0 ]. Therefore, the quadratic variation of the Brownian motion will be the (probability) limit B B = B B (2) = plim |B |2 . We have already seen that the quadratic variation B B = , since E E (B )2 (B )2
2

=0 =2 ( )2 0
2

We usually write the above expression in shorthand as dB () = d . If B was a normal function, then the It integral could be written in its o Riemann sense, by using the derivative of B ( X )dB =
0 0

( X )

dB d

d
=

Here the function B () is nowhere dierentiable, and therefore we cannot express the integral in such a simple form, but we can think of it as the limit of Riemann sums. When we x the sample point , the random variable X = X () becomes a function over time (albeit a wild and weird one), allowing these Riemann sums to be dened. This indicates that stochastic integrals will also be random variables, since in that sense they are mappings
0

( )dB () R

For the dyadic partitions of the time interval [0 ], we dene the It integral o as the limit of the random variables
0

)[B

+1

B ]()

The limit is taken with respect to the L2 -norm || ||2 = | ( )|2 d . More precisely we rst dene the integral for simple, step-like functions, then extend it to bounded functions , and nally move to more general functions , such that 1. ( ) ( ) is B F -meas 2. ( ) is F -adapted 3. E 0 2 ( )d < The nal property ensures that the function is L2 -integrable, and allows the required limits to be well dened using the It isometry which states o
2

E
0

( )dB ()

=E
0

( )d

23(1.5)

In particular, for a sequence of bounded functions that converges (in L2 ) to E


0

[ ( ) ( )]2 d 0 as

the corresponding stochastic integrals will also converge.

One important observation when computing the It integral is that the left endo points of each subinterval in the partition were used to evaluate the integrand . If the integrand was well behaved, then it would not make any dierence if we took the right point or the midpoint instead, but since the integrand has innite variation it matters. If we used the midpoint instead, then the resulting random variable is called the Stratonovic stochastic integral, denoted with 0 ( X ) dB , which is the limit of the random variables
0

+1

) + ( 2

[B

+1

B ]()

It is easy to see that the Stratonovic integral is not an F -adapted random variable, since we need to know the value of the process at the future time point + + 1, in order to ascertain the value of the Riemann sums. For that reason it is not used as often as the It representation in nancial mathematics,5 but o it has better convergence properties (due to the midpoint approximation of the integral) and it is used when one needs to simulate stochastic processes. In fact, when the process is an It diusion (see below) the two stochastic integrals are o related ( X ) dB =
0 0

( X )dB +

1 2

( X ) ( X )d

Consider a Brownian motion B on a ltered space ( F F P), the ltration it generates {F } 0 , and two F -adapted functions and . As we noted before, an It diusion is a stochastic process on ( F P) of o the form X = X0 +
0

( X )d +
0

( X )dB

We need a few conditions that ensure regularity and that solutions for the SDE exist and do not explode in nite time
5

But it can be used for example if is a spatial rather than a time coordinate, since then we could actually observe the complete realization in one go.

24(1.5)

1. The It isometry E 0 2 ( X )d < , for all times o 0 2. There exist constants A, B such that for R, [0 ] |( )| + | ( )| + | ( ) ( )| )| A(1 + | |) B(1 + | |)

|(

) (

An It process is a stochastic process on the same ltered space, of the form o X = X0 +


0

( )d +
0

( )dB

In It diusions the information is generated by the Brownian motion, now we o let the information to be more general, as long as the Brownian motion remains a martingale. We consider a ltration G that makes B a martingale, and assume the and are G -adapted. Instead of the integrability and isometry assumptions we need instead P[
0

|( )|d < for all 2 ( )d < for all


0

0] = 1 0] = 1

P[

It processes generalize It diusions in two ways. o o 1. Information: we can have more information than just the one we gather by observing the SDE, but this information should not make the Brownian motion predictable. 2. Dependence: drift and volatility can depend on the whole history, rather than the latest value of the process, X . Unlike It diusions, It processes are not always Markov. A diusion dX = o o ( X)d + ( X)dB will coincide in law with a process dY = ( )d + ( )dB if
2 E [ ( )|F Y ] = ( Y ), and ( ) = 2 ( Y )

which essentially states that the process is Markov.

I
Its formula or Its lemma is one of the fundamental tools that we have in o o stochastic calculus. It plays the rle that the chain rule plays in normal calculus. Just like the chain rule is used to solve ODEs or PDEs, a clever application of Its formula can signicantly simplify a SDE. We consider an It process o o dX = ( )d + ( )dB

25(1.5)

A function ( ) in C (1 2) (dierentiable in and twice dierentiable in ) will dene a new It process as the transformation Y = ( X ). Its formula o o describes the dynamics of Y in terms of the drift and volatility of X , and the derivatives of the transformation . In particular, the SDE for Y is given by dY = 1 2 ( X )d + ( X )dX + ( X )(dX )2 2 2

The trick is that the square (dX )2 is computed using the rules d d = d dB = 0, and (dB )2 = d One can easily prove Its formula based on a Taylors expansion of the o function .6 In particular, one can write for > 0 the quantity X = X + X = ( X ) + ( X )(B + B ) + o( ). Taking powers of the Brownian increments B = B + B yield EB = 0 E(B )2 = E(B ) = o( ) for all 3 This implies that the random variable (B )2 will have expected value equal to and variance of order o( ). A consequence is that in the limit B , since the variance goes to zero. Now the Taylors expansion for Y = ( + X + ) ( X ) will give Y = ( X) 1 2 ( X ) ( X) + X + (X )2 + o( ) 2 2

Passing to the limit yields Its formula. o Example 9. Its formula can be used to simplify SDEs and cast them is a form o that is easier to explicitly solve. Say for example that we are interested in the stochastic integral B dB
0

where B is a standard Brownian motion. We will consider the function ( ) = 2 ( X ) = 1 B2 . Using Its formula we can o 2 , and dene the process Y = 2 specify the dynamics of this process, namely dY = 0 d + B dB + In other words we can write d
6

d 1 1 (dB )2 = B dB + 2 2 1 d + B dB 2
is a function that is suciently smooth.

1 2 B 2

This Taylors expansion is valid, since


F . : A sample path of an It integral. o

26(1.5)

0.5
0

B B

-0.5 B -1

-1.5

0.2

0.4

0.6

0.8

By taking integrals of both sides, and recognizing that can solve for the It integral in question o B dB =
0

1 2 2B

1 = 2 B2 , we

1 2 1 B 2 2

A trajectory B () for an element , and the corresponding solution 0 B ()dB () is given in gure 1.3. Example 10. The most widely used model for a stock price, say S , satises the SDE for a geometric Brownian motion with constant expected return and volatility , given by dS = S d + S dB This corresponds loosely to the ODE dS = S which grows exponentially. d Motivated by this exponential growth of the ODE we consider the logarithmic function ( ) = log . Using Its formula we can construct the SDE for = o log S 1 d = 2 d + dB 2 This SDE has constant coecients and can be readily integrated to give =
0

1 + 2 2

+ B

We can cast this expression back to the asset price itself

27(1.6) F

. : Asset price trajectories

4 3.5 3 2.5 2 1.5 1 0.5

0.2

0.4

0.6

0.8

S = S0 exp

1 2 2

+ B

Note that under the geometric Brownian assumption the price of the asset is always positive, an attractive feature in line with the property of limited liability of stocks. Some stock price trajectories for dierent are given in gure 1.4.

. T
In the stochastic dierential equation approach we typically consider a function of an It diusion, and the construct the dynamics of the process under this o transformation. In many applications we are not interested in the actual paths of the process, but for some expectation of some function at a future date. In the PDE approach we want to investigate the transition mechanics of the process, and based on that the transition mechanics of the expectation in hand.

28(1.6)

G
Say we are given a Brownian motion B on a ltered space. For a SDE that describes the motion of a stochastic process X , say dX = ( X )d + ( X )dB we can dene an elliptic operator which is applied to twice-dierentiable functions C (2) d 1 d2 L ( ) = ( ) + 2 ( ) 2 ( ) d 2 d This elliptic operator is also the innitesimal generator of the process, which is given formally in the following denition. Denition 14. Given an It diusion X , the (innitesimal) generator of the proo cess, denoted with A , is dened for all functions C (2) as the limit A ( ) = lim
0

E (X ) ( ) =L ( )

It is Kolmogorovs backward equation that gives us the expectation of the function at a future date, as the solution of a partial dierential equation. In particular, if we denote the expectation with ( ) = E[ (XT )|X = ], for C (2) we have ( )=A ( ) = ( ) ( 1 ) + 2( 2 ) 2 ( 2 )

The subscript of the generator in the above expression just indicates that the derivatives are partial and taken with respect to . The nal condition for Kolmogorovs backward PDE will be (T ) = ( ). There is a very intuitive way of viewing Kolmogorovs equation. If we consider the expectation E = ( X ) = E[ (XT )|F ] as a stochastic process, then we can observe (by applying Its formula) that Kolmogorovs backward PDE sets the o drift of E equal to zero, rendering E a martingale. This means that expectations we form at time are unbiased, in the sense that we do not anticipate any special information that will change them. If we use the indicator function, then the expectation becomes the conditional probability that the diusion will take values in a set F at time T . In particular, if we denote this conditional probability with ( ; T F ) = P(XT F | = ) = E[(XT F )| = ], then Kolmogorovs backward PDE takes the form ( ; T F ) = ( ) ( 1 ; T F ) + 2( 2 ) 2 ( 2 ; T F)

for T , with a terminal condition (T ; T F ) = ( F ) (which is equal to one if F and zero otherwise). Therefore this PDE describes the evolution

29(1.6)

of the probability that we will end up in a certain set of states at some future time T . It is called a backward PDE because we start with a terminal condition at the future time T and integrate backwards to the present time . Kolmogorovs forward equation also known as Fokker-Planck equation considers the transition density ( ; T ) = P(XT d | = ). It postulates that ( T ;T )= (T ) ( ;T ) + 1 2 2 (T 2 2 ) ( ;T )

for T , with an initial condition ( ; ) = ( = ) (the Dirac function). The PDE gives the evolution of the distribution of T given the current state = . It is called a forward PDE because we start from the state at the current time and integrate forwards towards the future time T .

S
Denition 15. A stopping time is a random variable : [0 ], such that { : () } F for all 0

That is to say, a stopping time is dened by an event that is F -measurable. This means that at any time 0 we can ascertain with certainty whether or not the event has happened. Examples of stopping times are the rst hitting times, rst exit times from a set, and so on. Given a stopping time we can dene the stopped process, which is simply X = Xmin( ) . Say that is a stopping time with E < , meaning that the process will be stopped at some point in the future almost surely. Dynkins formula gives expectations at a stopping time , as

E (X ) = ( ) + E
0

A (X )d

Here, C (2) , and also has compact support. Note that in the above integral the upper bound is a random variable. Dynkins formula can be used to assess when a process is expected to be stopped, that is to say the expectation E [()]. Example 11. For example, say that we are holding a stock with current price S, and dynamics that follow the geometric Brownian motion dS = S d + S dB We want to know how long should we expect to wait, before our asset will be worth at least S > S. Mathematically, we are interested on the rst exit time from the set [0 S] = inf{ 0:S S} We cannot directly apply Dynkins formula, since we need E < and we are not sure about that. For example the asset might have a negative drift and

30(1.7)

exponentially drop towards zero. We can dene instead the exit times from the set [ S] = inf{ 0:S S or S } The expected exit time from a compact set is indeed bounded, and therefore Dynkins formula can now be applied. Here it will be useful to remind us the solution of the SDE for the geometric Brownian motion 1 S = S0 exp 2 + B 2 Suppose that < 2 /2, which means that the expected returns of the asset are not large enough for the price to be expected to grow. In this case, as the expectation of the asset price ES S 0, and in fact every trajectory S 0, almost surely. Then, every trajectory will exit the set at least at lower bound . Of course, the process might hit the upper bound S rst. For that reason, say that the probability P(S = ) = , and of course P(S = S) = 1 . 12/ 2 Consider the function ( ) = . This might appear to be an odd choice, by we have selected this function because when we apply generator A A ( )= 1 2 2
12/ 2

1 2 2 1 2 2 )
12/ 2

2 2

12/ 2

=0

Therefore Dynkins formula yields (for the exit times )


2 S 12/ + (1

= S 12/

Passing to the limit

0 we can retrieve the probability of never reaching our


12/ 2

S target of S, namely = S . This probability become higher for lower expected returns or higher volatility. If > 2 /2 then the expected returns are high enough for all sample paths to eventually breach the target S, since S as , almost surely. The process will exit with probability 1, but E we might still be . In this case we consider the function ( ) = log . Our objective now it for the generator to be constant, in order to simplify the integral 0 A (X )d . In particular 1 A ( ) = 2 2 Dynkins formula will yield in this case (once again for the exit times )

log S + (1

) log

1 = log S + 2 ES 2

d
0

Passing to the limit this time will give the expected stopping time ES ES = log S/S 2 /2 as 0

31(1.7)

. T

-K

The Feynman-Kac formula generalizes Kolmogorovs backward equation, and provides the connection that we need between the SDE and PDE approaches. It is named after Richard Feynman (1918-1988) and Marek Kac (1914-1984), but was published by Kac in 1949. It gives expectations not only of a functional of the terminal value of the process, but also some functionals that are computed at intermediate points. In particular, we start with an It diusion which is associated with a geno erator A , and we also consider two functions C ((2) and C , continuous and lower bounded. We are interested in computing an expectation of the form ( )=E exp
0

(X )d

(X )

The Feynman-Kac formula states that this expectation satisfy the partial dierential equation ( )=A ( ) ( ) ( )

with boundary condition (0 ) = ( ). The Feynman-Kac formula has been very successful in nancial mathematics, as it can represent stochastic discount factors through the exponential exp 0 (X )d . Example 12. Suppose that the interest rate follows an It diusion given by o dB d = ( )d +

Also suppose that we have an investment that depends on the level of the interest rate, for example a house, with value given by H( ). This implies that the property value will also follow an It diusion, with dynamics given by Its formula. At a o o future time T , the house price will be H( T ), which is of course unknown today. We are interested in buying the property at time T , which means that are interested in the present value of H( T ), namely
T

)=E

exp

d
0

H( )

Say that we have a project with uncertain payos that depend on the evolution of a variable X , which has current value X0 = . The dynamics are 2 dX = d + dB, and the project will pay (XT ) = XT + . We are interested in establishing the present value
2 E exp {RT } ( XT + )

This will be equal to (T

), where

satises the PDE


( )=
2

32(1.8) 1 2 ) + 2 2 ( 2 )R ( )

( +

with boundary (0 ) =

. G

As we saw in section 1.2 the same measurable space ( F ) can support different probability measures. We also saw how the Radon-Nikodym derivative can be used to compute expectations under dierent equivalent measures. Girsanovs theorem gives us the tools to specify this Radon-Nikodym derivative for It processes. In particular, consider an It process on the ltered space o o ( {F }0 F P), that solves the SDE T dX = ( )d + ( )dB According to Girsanovs theorem, for each equivalent measure Q P there exist F -adapted processes such that the process BQ () = B () +
0 Q Q is a Brownian motion in ( {F Q }0 T F Q), where F = (B : 0 Q is the -algebra generated by B . If we dene the F -adapted function as

( )d )

( ) = ( ) ( ) ( ) then the process X can be written as a stochastic dierential equation under Q as dX = ( )d + ( )dBQ This means that if we are given an equivalent measure we can explicitly solve for for the function ( ) and write down the SDE that X will solve under the new measure. Girsanovs theorem also allows us the inverse construction: given an adapted function ( ) we can explicitly construct an equivalent probability measure under which B ()+ 0 ( )d is a Brownian motion. We dene the exponential martingale 1 M = exp ( )dB 2 ( )d 2 0 0 Based on this exponential martingale we dene the following measure on ( F ) Q(F ) =
F

M ()dP() = E [M (X F )] , for all F F

which we represent as the real-valued process dQ() = M () dP(), or in terms of the Radon-Nikodym derivative dQ F = M . It follows that dP

33(1.8)

1. Q is a probability measure on F . 2. The process BQ = 0 ( )d + B . is a Brownian motion on the ltered space ( F F Q). 3. The It process can be written in SDE form as o dX = ( )d + ( )dBQ Essentially under the new measure the It process will have a dierent drift o , but the same volatility as the original one: this is the Girsanov transformation. The process B will not be a Brownian motion under the new measure, since we select elements using dierent probability weights. One the other hand, it turns out that the process 0 ( )d + B will be a Brownian motion. In practice we are not really interested in the probability distribution or the dynamics on the set , but rather the distribution of F -measurable random variables Y (). The Radon-Nikodym derivative will allow us to express expectations under dierent equivalent probability measures. In particular, EQ [Y ] = EP [M Y ] This is the relation that is routinely used in nancial economics, as we very often want to changes probability measures as they adjust with respect to the risk aversion prole of the agents, or with respect to dierent numerire securities. Example 13. Say that the price of an asset follows a geometric Brownian motion dS = S d + S dB Here B is a Brownian motion under the ltered space ( F F P). We can construct a new probability measure Q, under which the asset price is a martingale. We can write the process above as dS = S d + dB

Therefore, if we set = , then we are looking for the probability measure that makes the process BQ = + B

a Q-martingale. Then the asset price process under Q will be given by the SDE dS = S dBQ Girsanovs theorem tells us that such a probability measure on exists. If we want to take expectations under this equivalent measure we need to construct the exponential martingale 1 M = exp 2 B 2 Then, for any F -measurable random variable Y we can write EQ [Y ] = EP [M Y ].

34(1.8)

Example 14. Lets say that we want to verify the above claim that EQ [Y ] = EP [M Y ], stated by Girsanovs theorem. For example, lets take the random variable Y = log S . Under Q the logarithm will be given by Its formula as o 1 Y = log S0 2 + BQ 2 and since BQ is a Q-martingale, the expectation EQ [Y ] = log S0 1 2 . Under 2 P we have to consider the process Z =MY Its formula (applied on the function ( o ) = ) will give us the dynamics for Z , namely dZ = M dY + Y dM + dY dM which actually produces 1 dZ = exp 2 B 2 1 2 d + dB 2 1 + log S0 + 2 + B [dB ] 2 1 + 2 d + dB 2

[dB ]

The solution to the above SDE is written as Z = Z0 + +


0

1 1 exp 2 B 2 d 2 2 0 1 2 1 exp B log S0 2 2 2

dB

Taking expectations, and using that Z = M Y , Z0 = M0 Y0 = log S0 and = will yield EP [M Y ] = log S0 +
0

1 2 2

1 d = log S0 2 2

And the two expectations are apparently the same. Observe though how easier it was to compute the expectation under Q. Girsanovs theorem can be a valuable tool when one wants to simplify complex expectations, just by casting them under a dierent measure.

2 The Black-Scholes world

In this chapter we will use some of the previous results to establish the BlackScholes (BS) paradigm. We will assume a frictionless market where assets prices follow geometric Brownian motions, and we will investigate the pricing of derivative contracts. The seminal papers of Black and Scholes (1973) and Merton (1973) (also collected in the excellent volume in Merton, 1992) dened the area and sparked thousands of research articles on the fair pricing and hedging of a variety of contracts. The original derivation of the BS formula is based on a replicating portfolio that ensures that no arbitrage opportunities are allowed. Say that we are interested in pricing a claim that has payos that depend on the value of the underlying asset at some xed future date T . The idea is to construct a portfolio, using the underlying asset and the risk free bond, that replicates the price path of that claim, and therefore its payos. If we achieve that, then the claim in question is redundant, in the sense that we can replicate it exactly. In addition, the value of the claim must equal the value of the portfolio, otherwise arbitrage opportunities would arise. After the

. T
In this section we lay down the assumptions for the BS formula. We also give some important denitions on trading strategies, market completeness and arbitrage. We conclude by illustrating that the market under these assumptions is complete, by constructing the corresponding replicating portfolio.

-S

We x a ltered space ( F F P), and a Brownian motion on that space, say B . We will maintain the following assumptions:

36(2.1)

1. The asset price follows a geometric Brownian motion, that is to say dS = S d + S dB (2.1)

2.

3.

4.

5.

The parameter gives the expected asset return, while is the return volatility. There is a risk free asset which grows at a constant rate , which applies for both borrowing and lending. There is no bound to the size of funds that can be invested or borrowed risk-free. Trading is continuous in time, both for the risk free asset, the underlying asset and all derivatives. This means that any portfolios can be dynamically rebalanced continuously. All assets are innitely divisible and there is an inelastic supply at the spot price, that is to say the assets are innitely liquid. Therefore, the actions of any investor are not sucient to cause price moves. There are no taxes or any transaction costs. There are no market makers or bid-ask spreads. The spot price is the single price where an unlimited number of shares can be bought. Short selling is also allowed.

A derivative security is a contract that oers some payos at a future (maturity) time T , that depend on the value of the underlying asset at the time, say (ST ). We are interested in establishing the fair value P of such a security at all times before maturity, that is the process {P : 0 T }.

T
Of course the derivative price at time will depend only on information available at the time, that is P must be F -adapted. Also, the asset price is Markovian, which indicates that P should not depend on the history of the asset price, but only on the latest value S . We can therefore write the price of the derivative as a function P = ( S ). The function is the unknown pricing formula. If we actually had the functional form of ( S), an application of Its formula o would provide us with the derivative price dynamics dP = 2 S 2 2 ( S ) + S ( S )+ ( S) d S 2 S 2 + S ( S )dB S

Although we dont actually know ( S) explicitly yet, we will later use the dynamics above to construct a partial dierential equation that the pricing function has to satisfy. To produce the PDE we will need to introduce some terminology, including trading strategies and arbitrage opportunities. We will construct portfolios that we rebalance in time, and we will keep track of them using a trading strategy H . Since we must make all rebalancing decisions based on the available information, the trading strategy will be an

37(2.1)

F -adapted process as well. Our investment instruments are the underlying and the risk free asset, therefore the trading strategy H = {(H S H F ) : 0} where H S keeps track of the number of shares held, and H F is the amount invested in the risk free asset (that is the bank balance) at time . The value of the portfolio that is generated by the trading strategy is denoted with V = V (H). A self-nancing trading strategy is one where no funds can enter or exit the portfolio. All changes in the value are due to changes in the price of the assets that compose it. In this case we dont really need to keep track of the holdings of both assets, since there are related via HF = V HS S Therefore we will only keep the process of the shares held, H = {H : 0}, as the trading strategy. Also, in this case the dynamics of the portfolio value are given by dV = H dS + (V H S ) d = (H S + V H S ) d + H S dB Say for a minute that we knew the pricing formula for the derivative price, P = ( S ). We can then dene a trading strategy, where the number of shares held at each time is given by H = ( S) S

We have selected this particular trading strategy because it sets the volatility of the portfolio value, V , equal to the volatility of the derivative value, P . We call this a hedging or replicating strategy and the portfolio the hedging or replicating portfolio.

A
We claim that if the portfolio has the same volatility dynamics it should also oer the same return. Otherwise arbitrage opportunities will emerge. An arbitrage opportunity is a trading strategy J that has the following four properties (for a stopping time T > 0) 1. The strategy J is self-nancing, that is there are no external cash inows or outows. We can move funds from one asset to another, but we cannot introduce new funds. 2. V (J) = 0, that is we can engage in the portfolio at time with no initial investment. This means that we can borrow all funds needed to set up the initial strategy at the risk free rate, without investing any funds of our own. 3. VT (J) 0, it is impossible to be losing money at time T . The worst outcome is that we end up with zero funds, but we did not invest any funds in the rst place. 4. P VT (J) > 0 > 0, there is a positive probability that we will actually be making a prot at time T .

38(2.1)

An arbitrage opportunity is a risk free money making device, since with no initial investment we have a probability to make a prot, without running any risk of realizing losses. Finance theory assumes that exploitable arbitrage opportunities do not exist when pricing claims. Now say that at time 0 we we engage in the following self-nancing strategy , where 1. we are short (we have sold) one derivative contract, 2. we hold H shares, and 3. we keep an amount in the risk free bank account. Thus, our holdings at any time 0 will have value

V () = P + H S + We want to keep the initial gross investment equal to zero, V = 0, and therefore our initial bank balance will be 0 = P0 H0 S0 . We also want to maintain the strategy self-nancing, and therefore all changes in the value of our portfolio must come through changes in the assets themselves, dV = dP + H dS + d = dP + H dS + (V + P H S ) d Using Its formula for P = ( S ) and the stochastic dierential equation for o S we can write after some algebra (which incidentally cancels the drifts ) dV = ( S )+ S
S(

S )+

2S2 2

SS (

S )V P

(2.2)

The trading strategy is self-nancing, and its initial value is V () = 0. Therefore it has two of the four requirements that we set for an arbitrage opportunity. In order to avoid such opportunities, we want to verify that there exists no stopping time , such that V () 0 and P V () > 0 > 0. The value of the trading strategy will evolve in a deterministic way, as illustrated in the above relationship where no stochastic term is present. Therefore, if the term in the brackets is equal to zero for all , then dV = 0 which implies V = V0 = 0 for all . Then, apparently, no arbitrage opportunities are present since P V () > 0 = 0. We can also show that this condition is also necessary. Say that > 0 is the rst time that the term in brackets of (2.2) becomes non-zero, and say that it is negative implying a positive dV . Since is continuous in both arguments, there will be an interval ( + ) on which portfolio value will remains positive, and therefore the value of the portfolio V+( /2) () > 0, which indicates an arbitrage opportunity. If at the value of the portfolio becomes negative, then we can implement the inverse trading strategy for which V+( /2) () > 0, and again reach an arbitrage opportunity.

39(2.2)

-S

In the previous subsection we concluded that the value of the composite portfolio must be V () = 0 for all 0, otherwise arbitrage opportunities will be present. Then, equation (2.2) will give the celebrated Black-Scholes partial dierential equation (BS-PDE), namely 1 2 ( S) + S ( S) + 2 S 2 2 ( S) = ( S) (2.3) S 2 S must be satised by the derivative pricing function ( S). This is one of the fundamental relationships in nancial economics, as it has to be obeyed by any derivative contract. It shows that the price of the derivative can be replicated by a dynamically balanced portfolio that consists of the underlying asset and a risk free bank account, and is actually independent of the expected return on the underlying asset . As we pointed out, in order to derive the BS-PDE we did not make any assumptions on the nature of the contract, meaning that the PDE will be satised by all derivatives. The nature of the particular contract will specify the terminal condition of the PDE. Indeed, we know that on the maturity date PT = (S ) (T S) = (S) In their paper BS present the case of a European call option, a contract that gives the holder the right (but not the obligation) to purchase a share at a xed price K on the maturity date. Then, the terminal condition becomes (T S) = max(S K 0) = (S K )+ . In this case BS show how the PDE can be solved analytically and produce the Black-Scholes formula, which is the particular pricing function ( S) for this contract ( S) = S N(
+)

K exp (T ) N(

(2.4)

where N() is the cumulative (standardized) normal distribution function, and are given by S 1 log K + 2 2 (T ) = T Typically we prefer to work with time to maturity, and we use the change of variable T (abusing the notation slightly). It is also convenient to dene the log-prices setting the variable log S. Some elementary calculus produces the BS-PDE under these variable changes, a dierential equation with constant coecients 1 1 2 ( )+ 2 ( ) + 2 2 ( ) = ( ) (2.5) 2 2 Apart from rendering this expression easier for numerical methods to handle (since the coecients are constant), we have a PDE with an initial condition rather than a terminal one, namely (0 ) = (exp( ) K )+ . In this form, the BS-PDE is a standard convection-diusion partial dierential equation, a form that has been studied extensively in classical and quantum physics.

40(2.2)

. T
Since V () = 0 for all times 0, the relationship P =H S + will also hold at all times. This means that using the trading strategy H we create a portfolio that will track (or mimic) the process P . Therefore we do not really need to introduce derivatives in the BS world, as their trajectories and payos can be replicated by using a carefully selected trading strategy. For that reason we say that in the BS world derivatives are redundant securities. This of course only holds under the strict BS assumption, and does not generally hold in any market. It certainly does not hold in the real world where markets are subject to a number of frictions and imperfections. As we search for markets and models where securities can be hedged, we need to introduce the notion of market completeness. We will say that a market is called complete if all claims can be replicated. A market that is complete will of course be arbitrage-free, but the inverse is not true. There are many markets that are arbitrage free but incomplete. One can speculate that the real world markets fall within this category: claims cannot be perfectly replicated due to market imperfections, and these imperfections also make arbitrage opportunities scarce and short lived. We set a probability space ( F P), under which the price process is dened. In nancial mathematics, an equivalent martingale measure (EMM) is a measure Q equivalent to the objective one P, under which all discounted asset prices form martingales. Therefore for the discounting factor B , any price process V will satisfy V0 = EQ [B V ] for all 0

The fundamental theorem of asset pricing states the following two propositions: There exists an EMM There are no arbitrage opportunities

There exists a unique EMM The market is complete

Girsanovs theorem is a very useful companion to the fundamental theorem of asset pricing, as it provides us with the link between dierent equivalent probability measures. A typical approach would be to assume a process for an asset under the true probability measure. This will specify the true dynamics of an asset or a collection of assets, that is to say the process that we would produce

41(2.2)

The (M N) matrix will determine the correlation structure of the assets. In fact, the covariance matrix of the stocks will be given by the product . Essentially, each asset will satisfy the SDE dS ( ) = S ( )d +
N =1

based on time series of the prices. Then, we use Girsanovs theorem and try to specify the Radon-Nikodym derivative that produces discounted asset prices that form martingales. Of course there might be more than one probability measures with that feature, but if we manage to nd one then we can conclude that the system is arbitrage-free. If we show that such a measure does not exist, then we know that the system as it stands oers some arbitrage opportunities, and then we can proceed to nd them. Unfortunately, the fundamental theorem of asset pricing does not always guide us towards these opportunities, but sometimes it can oer useful insights to identify them. Suppose that we are facing a market that oers a risk free rate , and a collection of M stocks. There are N sources of uncertainty, represented by N independent Brownian motions B( ) = {B : = 1 N}. If we collect the asset returns in a (M 1) vector with dS( ) = {S : = 1 M}, then we can write dS( ) = S( ) + [ dB( )] S( )

S ( )dB ( )

We want to establish whether or not we can nd a trading strategy using these M stocks that will be an arbitrage opportunity. To this end we will examine the probability measures that are equivalent to the true one. In particular, all equivalent measures will have a Radon-Nikodym derivative M that satises
N

dM = M
=1

dB

We are looking for these equivalent probability measures under which the discounted prices will form martingales, which means that under the EMM the dynamics of the assets will be
N

dS = S d +
=1

S dB

Using Girsanovs theorem we can actually nd the instantaneous drift under Q, which will be given by EQ dS = EP 1+ dM M dS
N N

= S d + EP
=1

dB ( )
=1

dB

42(2.2)

Since the Brownian motions are mutually independent, we can simplify the above expression to EQ dS = +
N =1

Sd = Sd

which has to be satised for all 0 and for all = 1 M. Therefore the parameters = { : = 1 N} will be constant, and they must satisfy the system of M equations with N unknowns = 1 This system can have no solutions, a unique solution, or an innite number of solutions, depending on the rank of the matrix . If the rank is lower than the number of unknowns, rank() < N, then the system will not admit a solution. This means that there does not exist an equivalent martingale measure, and due to the fundamental theorem of asset pricing it is implied that arbitrage trading strategies can be constructed using a portfolio of the M stocks. If rank() > N then there exists an innite number of vectors that are solutions to the system. Each one of these solutions will dene an equivalent martingale measure and the market is arbitrage-free. Finally, if rank() = N then the solution to the system is unique. This unique will dene a unique EMM and the market will be complete. In that case, any other asset that depends on the Brownian motions B( ) can be replicated using the M assets in the market.

-S

Let us now consider the simple case where there is only one risky stock in the market, with the dynamics given in equation (2.1). Then, it follows that the coecient of Girsanovs transformation will solve = = Therefore the coecient is the Sharpe ratio of the risky asset. The Sharpe ratio is a measure of the risk premium per unit of volatility risk, and represents the compensation that investors demand for holding the stock which has uncertain payos. In this case, since is unique, the market will be complete. Girsanovs theorem will dene the equivalent martingale probability measure Q as the one with Radon-Nikodym derivative the exponential martingale dQ dP 1 = M = exp 2 + B 2

It follows that the discounted prices process of any other asset must form a Q-martingale as well. In particular we can consider a European-style contract that delivers an amount (ST ) at time T , a payo that depends explicitly on the price of the underlying stock at the time. The value of this claim at all times 0 T will satisfy

43(2.2)

V = exp (T ) EQ (ST ) = exp (T ) EP [MT (ST )] These equalities oer us three options to evaluate the value of the derivative at time = 0. Expectation under the true measure P Under P the asset price and the Radon-Nikodym derivative at time T are functions of BT , and the price of the derivative at time = 0 can be written, using the second equality, as V0 = exp( T )EP exp 2 T + BT 2 S0 exp 2 2 T + BT

For general functions () this expectation can be computed by simply simulating values for BT from the normal distribution with mean zero and variance T . Expectation under the risk neutral measure Q A much simpler approach is to use the fact that the dynamics of the underlying asset under Q are known, and in fact dS = S d + S dBQ Therefore using the rst equality we can express the price of the derivative as V0 = exp( T )EQ S0 exp 2 2
Q T + BT

Now the process {BQ } 0 is a Brownian motion under Q, and therefore once again we can draw from the normal distribution with zero mean and variance T Q to simulate the values of BT . The two expression will of course yield the same result, but the latter is substantially simpler. In particular, in the case of a standard European call option, the price will satisfy P0 = exp( T )E
Q

S0 exp

2 2

T+

Q BT

Q Since BT is normally distributed, after some algebra the expectation simplies to

P0 =

exp 2 T /2 S0 2T

exp B

B2 2T

dB

exp( T ) K 2T

exp

B2 2T

dB

In the above expression = log(S0 /K ) + ( 2 /2)T / . Evaluating the two integrals will eventually lead to the Black-Scholes formula.

44(2.3)

The Feynman-Kac form A third approach would invoke the Feynman-Kac formula. In particular we can write the rst expectation of the valuation formula as V0 = EQ exp 0
T

d
0

(ST )

with S following the risk neutral dynamics. We shall also dene the function ( ) = EQ exp 0 d (S )|S0 = , implying that in fact we are interested in the value V0 = (T S0 ). Following the Feynman-Kac approach (see section 1.7) the function ( ) solves the parabolic PDE that depends on the dynamics of the asset prices process under Q (since the expectation is taken under Q) ( )= S ( )+ 1 2 ( 2 2 ) ( )

with initial condition (0 ) = ( ). This is just the Black-Scholes partial differential equation (2.3), after the we change the time variable to the time-tomaturity, which transforms the BS-PDE terminal condition into an initial one.

. E
The power of the fundamental theorem of asset pricing is unleashed when one considers pricing contracts that are more complicated than the simple European calls and puts. There is a very large and fairly liquid market for contracts that are call exotic, in the sense that they exhibit features that are non-standard. In practice, the role of a trader is to create tailor-made contracts for her clients, and the role of the nancial engineer is to produce benchmark prices for these contracts that are arbitrage-free, and also present ways to hedge the exposure of the trading book using available liquid contracts, like the underlying assets and standard calls and puts. The fundamental theorem of asset pricing will dictate that no matter how complicated the payo structure, the no-arbitrage price will be equal to the discounted expected payos under the equivalent martingale measure Q. Sometimes it is more convenient to simulate these payos under Q, or to evaluate the expectation in closed form, but in other cases solving the PDE might be more ecient. Exercise timing Exotic contracts can be classied with respect to their exercise times and their payo structure. European-style contracts can be exercised only on the maturity date, while American derivatives can be exercised at any point before the maturity date. That is to say, a three-month American put with strike price 30 gives the holder the right to sell the underlying asset for 30 at any point she

45(2.3)

wishes in the next three months. In this case the holder will have to determine the optimal exercise point. Bermudan options are somewhat between the European and the American ones,1 and allow the holder to exercise at a predened set of equally spaced points. For example if the put option described above was a Bermudan one, perhaps it could oer weekly exercising at the closing of each Friday during the next three months. Once again every Friday the holder must decide if it is optimal to exercise or to wait for the next exercise point. The shout option is slightly more complicated, as the holder has the option to lock-in one or more prices up to the maturity date (that is by shouting to the seller), and use the price they choose to compute the payos at maturity. For example, if the put was a one-shout option, and after six weeks the underlying price is 22 the holder has the opportunity to shout and lock in that price. Therefore, if on the maturity date the price is 26 the holder will choose which value will be used to compute the payos, in this case 22 which gives payos (30 22 )+ = 8 per share. Typically, one computes the prices of contracts with exotic exercise structure using the partial dierential equation. In most cases this PDE has to be solved numerically. In chapter 3 we will give an overview of some methods that are used to numerically solve for the price of the option, the optimal exercise strategy and the hedging parameters. Payo structures Apart from the standard calls and puts there can be a wide range of structures that dene the payos of the contract. The simplest deviation is the digital option (also called binary or all-or-nothing option), where the payo is a xed amount if the underlying is above or below the strike price. For example a two month digital call with strike 60 will pay $1 if the value of the underlying is above 60 after two months. In that sense it is a standard bet on the future level of the underlying asset price. Another popular option is the cross option, where the underlying asset is quoted in one currency but the payos (and the strike price) are denominated in another. For example, British Airways are traded in the London stock exchange and are priced in British pounds, but a US based investor will want the strike price and the payo in US dollars. Therefore, if X is the USD/GBP exchange rate, and S is the BA price in London (quoted in GBP), then a European call will have payos of the form (ST XT K )+ , where the strike price is quoted in USD. Therefore the writer of this option is also exposed to exchange rate risks and the correlation between the exchange rate and the underlying asset returns. A quanto option will address this dependence by setting the exchange rate that will be used for the conversion beforehand, say X . Therefore the payos will only depend on the uctuations of the underlying asset, given by (ST X K )+ . The cross option described above is an example of an option that depends on more than one underlying assets. Other exotics share this feature, like the
1

Just as Bermuda is between Europe and the US.

46(2.3)

exchange option that allows the holder to exchange one asset for another, a basket option that uses a portfolio of assets as the underlying asset, or the rainbow option that depends on the performance of a collection of assets. An example of a rainbow option is a European put where the payos are computed using the worst of ten stocks. Other contracts have features that involve other derivatives, like the compound option that is an option to buy or sell another option. In this case you can have a call on a call, a put on a call, etc. The swing option lets the holder decide if she will use the option as a call or as a put, at a pre-specied number of time points. Typically the holder is not allowed to use all options as calls or puts, and some provisions are in place to ensure that a mix is actually used. The chooser option is a variant that allows the holder to decide if the option will pay o as a call or a put. This decision must be made at some point before maturity. If the option is of the European type, one can retrieve its price by using either the PDE or by simulating the expectation. When the number of underlying assets is small it is usually faster to numerically solve the PDE, but as the number of assets grows these numerical methods become increasingly slower. It is typically stated that if the number of assets is larger than four, then simulation methods become more ecient. Path dependence For options with early exercise features one has to make decisions on the exercise times. This decision will be dependent on the complete price path of the underlying asset, and not only on its value at maturity. Some other option contracts exhibit more explicit or stronger path dependence. A barrier option has one or more predened price levels (the barriers). Reaching these barrier can either activate (knock-in barrier) or deactivate (knockout barrier) the contract. Say, for example, that the current price of the underlying asset is 47 , and consider a six month call option with strike 55 and a knock-in barrier at 35 . In order for payos to be realized on maturity, not only the price has to end up higher than the 55 strike price, but the contract must have been activated beforehand, that is the price needs to have fallen below 35 at some point before maturity. Monitoring of barrier options is not usually continuous, but takes place on some predened time points that are typically equally spaced. The payo of a Parisian option will depend on the time that is spend beyond the corresponding barriers, in order to smooth discontinuities. Lookback options have payos that depend not on the terminal value of the underlying asset, but on the maximum or the minimum value over a predened period. Once again in most cases this maximum or minimum is taken over a discrete set of time points. The special case where the maximum or minimum over the whole price path is considered yields the Russian option. An Asian option will have payos that depend on the average (arithmetic or geometric) of the price over a time period, rather than a single value. Therefore an Asian

47(2.4)

option could be a call with payos that depend on the average daily prices of the underlying over the month prior to maturity. Path dependence is easily accommodated using simulation methods, as sample paths of the underlying can be produced and the payo can be computed over each path. Nevertheless, one would still set up the relevant PDEs if this was possible. Sometimes to specify the PDE one must dene some auxiliary variables, for example the time above the knock out barrier in the case of a Parisian option.

. T

So far we have addressed the problem of nding the no-arbitrage price of a derivative contract, under the assumption that underpin the Black-Scholes paradigm. We showed that in that case the market is complete, and any contract can be replicated, at least in principle. Now we will look at these replicating strategies more closely, and investigate a number of dierent hedging strategies. We will take two dierent views, illustrated in the next two settings: 1. A trader at a nancial institution ABC wants to give a quote for a derivative, most probably one with exotic features. The trader will investigate the trading strategies that would, in theory at least, replicate the payos of this derivative. In theory, following the BS procedure we shall hold H shares at all times, as discussed in section 2.1. In practice, as trading is not continuous and markets are not frictionless, this replication will not be exact. The quote she will produce will be the replicating costs, plus a premium for the risk she runs due to imperfect hedging, plus a fee for her time and bonus. 2. An investor XYZ is holding a portfolio of assets that depend on one or more risk factors. She wants to enter some options positions that will hedge her position against adverse moves of these factors, perhaps in the form of exotic options purchased from the nancial institution above. Of course this insurance will come at a premium, and she wants to investigate the cost of dierent protection levels. For example, if her portfolio is well diversied, the market will be a natural factor she is exposed to. She will consider enhancing her portfolio with derivative contracts that are written on a market index. It is important to observe that the value of the derivative that ABC has sold will obey the same partial dierential equation that the portfolio of XYZ does. This follows from the absence of arbitrage opportunities that would otherwise occur. If we assume that there is a single underlying source of risk, summarized by the asset S , then any portfolio or derivative contract with value V can be expressed as a function of S , namely V = V ( S ). This function will satisfy the Black-Scholes PDE 1 2 V ( S) + S V ( S) + 2 S 2 2 V ( S) = V ( S) S 2 S

48(2.4)

We will use some Greek letters for the derivatives involved, namely = V (the S 2 Delta), = V (the Gamma) and = V (the Theta). Then we can write the S 2 BS-PDE as 1 + S + 2 S 2 = V 2 More importantly, a Taylors expansion of the value function V ( S) over a small time interval and a small price change S yields V = V V 1 2 V + S + S 2 + o( S 2 ) S 2 S 2 1 V + S + S 2 2 The Delta of the derivative or the portfolio will therefore represent its sensitivity with respect to changes in the underlying asset. In continuous time trading, holding units of the underlying asset at all times is sucient to replicate the path and payos of the portfolio value. will be the time decay of this value, representing the changes as we move closer to maturity, even if the underlying asset does not move. When trading takes place in discrete time, there is going to be some misalignment between the two values, and higher order derivatives can be used to correct for that. In addition, the controls the size of the hedging error when one uses the wrong volatility for pricing and/or hedging. This is an important feature, as the volatility is the only parameter in the BS PDE that is not directly observed and has to be estimated. In the BS framework there also some parameters that are considered constant, namely the volatility , the risk free rate , and the dividend yield . Therefore one can write the value of function as V = V ( S; ), and practitioners use the derivatives of the value functions with respect to these parameters as a proxy of the respective sensitivities. In particular or = V (the Vega or Kappa2 ), = V (the Rho), and V = V (the Phi). With the increased popularity of exotic contracts that are particularly sensitive to some parameter values a new set of sensitivities is sometimes used, although very rarely. These sensitivities are implemented via higher order Taylors expansions of the value function. Running out of Greek letters, these sensitivities have taken just odd-sounding names or have borrowed their names from 3 2 V 3 quantum mechanics, like the Speed V , the Charm S , the Color S2V , the S 3 2 2 V Vanna S , and the Volga V . 2 No matter what Greek or non-Greek letters are used, the objective is the same: to enhance the portfolio with a number of contracts that result in a position that is neutral with respect to some Greek. This turns out to be a simple exercise, as portfolios are linear combinations of assets and this carries through to their sensitivities. Say that we are planning to merge two portfolios with values V 1
2

Vega is not a Greek letter, and for that reason this sensitivity is also found in the literature as Kappa.

49(2.4) L . :

: Black-Scholes Greeks.

10

and V 2 into one with value V 1+2 , and suppose that we are interested in any sensitivity (where could be ). It follows that since is actually a derivative, 1+2 = 1 + 2 The simplest asset that we can use to enhance our portfolio in order to achieve some immunization is the underlying asset itself, S . Trivially, the valuation function of the asset is V ( S; ) = S, and therefore the Delta of the asset S = S = 1, while all other sensitivities are equal to zero. The S argument above indicates that by augmenting our portfolio with more units of the underlying we will change the Delta of the composite position. In order to immunize other sensitivities we will need to construct a position that incorporates derivative contracts, with the plain vanilla calls and puts being the most readily available candidates. For that reason we will now investigate the Greeks of these simple options and examine how we can use them to achieve Greekneutrality. Listing 2.1 gives the Matlab function that produces the price and the major Greeks for the Black-Scholes option pricing model, for both calls and puts.

Say that start with a portfolio with value V and Delta V . As we noted above we can adjust the Delta of a portfolio by adding or removing units of the underlying asset. In particular, if we add S units of the asset, the Delta of the portfolio will become V +S = V + S S = V + S In order to achieve Delta-neutrality, V +S = 0, we will need to short S units of the underlying asset. Note that by adding or removing funds from the risk-free

50(2.4)

F . : Behavior of a call option Delta. Part (a) gives the behavior of the delta of options with specications {K } = {100 0 02 0 20}, and three different times to maturity: = 0 05 (solid), = 0 25 (dashed) and = 0 50 (dotted). Part (b) gives the behavior of the delta as the time to maturity increases, for a contract which is at-the-money (S = 100, solid), in-the-money (S = 95, dashed), and out-of-the-money (S = 105, dotted).
1.0 PSfrag underlying 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 70 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0

80

90

100

110

120

130

0.2

0.4

0.6

0.8

1.0

(a) Across Prices

(b) Across Time

bank account does not have any impact on the Greeks. We can therefore adjust the bank balance with the proceedings of this transaction. A position that is Delta-neutral will not change in value for small asset price changes (but it will change due to the time change as V dictates). Of course after a small change in the asset price the value of V +S will change, as V +S = V . In order to maintain a Delta neutral portfolio, one has to rebalS ance it in a continuous fashion, employing a dynamic Delta hedging strategy. In the BS framework European calls (and puts) are priced in closed form as in equation (2.4). Taking the derivative with respect to the price S yields the Delta for calls and puts Calls: C = N(
+)

Puts: P = 1 N(

+)

The values of Delta for a European call option, across dierent spot prices and dierent maturities is displayed in gure 2.1. The Delta for deep-in-the-money options is equal to one, as exercise appears very likely and the seller of the option will need to hold one unit of asset in order to deliver. For options that are deep-out-of-the-money exercise is unlikely and the seller of the option will not need to carry the asset, making the Delta equal to zero. As the time to maturity increases the Deltas of in- and out-of-the-money contracts converge towards the at-the-money Delta. Dynamic Delta hedging A seller of an option that maintains a Delta-neutral position at all times is replicating the contract and should end up with a zero bank balance, no matter what

51(2.4) L . :

: Dynamic Delta hedging.

10

15

20

25

the path of the underlying asset is. Of course in practice one cannot rebalance continuously (even in the ideal case where the markets are frictionless). Figure 2.2 illustrates dynamic Delta hedging in a simulated BS world, while in 2.3 the actual strategy is presented step-by-step. Initially we sell one call option with strike price K = 100 (at-the-money) and four months to maturity for $2 25. In order to hedge it we need to purchase = 0 55 shares, and we will need to borrow $52 72 to carry out this transaction. As the price of the underlying asset drops, the Delta of the call follows suit. We are therefore selling our holdings gradually, recovering some funds for our bank balance. Eventually the price recovers and we build up the asset holdings once more. In discrete time intervals the option price changes are not matched exactly by changes in our portfolio value. In particular these discrepancies are larger for large moves of the underlying. Overall the hedging portfolio will mimic

52(2.4)

the process of the call option to a large extent, but not exactly. In this simulation run we are left with a prot of $0 12. Increasing the frequency of trades will decrease the volatility of this hedging error, and of course at the limit the replicating strategy is exact. If from one transaction to the next the Delta does not move a lot, we would expect the impact of discrete hedging to be small. On the other hand, the impact will be most severe in the areas where the Delta itself changes rapidly. The second order sensitivity with respect to the price, the Gamma, is in fact summarizing these eects.

G
The gamma of a portfolio is dened as the second order derivative of the portfolio value with respect to the price, or equivalently as the rst order sensitivity of the portfolio Delta with respect to the price. As we already mentioned above, we expect the Delta of a portfolio to change across time, as the price of the asset changes. Gamma will give us a quantitative insight on the magnitude of these changes.3 We have already analyzed how a portfolio can be made Delta-neutral, by taking a position in the underlying asset. In order to achieve Gamma-neutrality, the underlying asset is not sucient. This is due to the fact that S = 2 S =0 S 2

This indicates that we need instruments that are nonlinear with respect to the underlying asset price, in order to achieve Gamma-neutrality. Options are perfect candidates for this job. On the other hand, the fact that S = 0 has some benets, as it implies that after we have made the portfolio Gamma-neutral we can turn into achieving Delta-neutrality by taking a position in the underlying asset. The zero value of Gamma will not be aected by this position. We call the strategy where we are neutral with respect to both Delta and Gamma simultaneously dynamic Delta-Gamma hedging. Say that we hold a portfolio with value V and given Delta and Gamma, V and V respectively. We follow a two step procedure where we rst achieve Gamma-neutrality, using a liquid contract with known sensitivities. For instance we can employ a European call option with price C and known Greeks C and C . In the second step we will use the underlying asset, which has price S to achieve delta neutrality (recall that S = 1 and S = 0). The resulting portfolio will be Delta-Gamma neutral.
3

Delta will also change as time passes, even if the asset price remains the same. The 2 V Charm S would quantify this impact. Generally speaking the impact of asset price changes captured with the Gamma are more signicant that the Delta changes captured with the Charm. This happens because the magnitude of the squared Brownian increment (captured by Gamma) is of order o( ), while the Charm captures eects of order o( 3/2 ).

53(2.4)

F . : Dynamic Delta hedging of a call option. At time zero we sell a European call with strike price K = 100, and we Delta hedge it 25 times over its life. The underlying asset process at the hedging times is given in (a), and the number of shares that we need to hold are given in (b). Subgure (c) gives the corresponding call price and (d) our bank balance. As the option expires out-of-the money we are not asked to deliver at maturity, and the option expires worthless. In (e) changes in the option price and changes in the hedging portfolio are compared. Subgure (f) illustrates the replication error between the hedging portfolio (solid) and the option (dashed).
101 0.7

100

0.6

99

0.5

98

0.4

97

0.3

96

0.2

95

0.1

94 0.00

0.05

0.10

0.15

0.20

0.25

0.0 0.00

0.05

0.10

0.15

0.20

0.25

(a) Underlying price


2.5 +10 0 2.0 -10 -20 -30 1.0 -40 -50 0.5 -60 0.0 0.00 -70 0.00

(b) Delta (shares held)

1.5

0.05

0.10

0.15

0.20

0.25

0.05

0.10

0.15

0.20

0.25

(c) Call price


+1.0 +0.8 +0.6 +0.4 +0.2 0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 -2.5 0.00 -2.0 -1.5 -1.0 -0.5 0.0

(d) Bank balance

-0.5

0.0

+0.5

+1.0

0.05

0.10

0.15

0.20

0.25

(e) Option/portfolio changes

(f) Replication

54(2.4)

F . : Sample output of the dynamic Delta hedging procedure. A call option is sold at time = 0 and is subsequently Delta hedged to maturity

55(2.4)

F . : Behavior of a call option Gamma. Part (a) gives the behavior of the Gamma of options with specications {K } = {100 0 02 0 20}, and three dierent times to maturity: = 0 05 (solid), = 0 25 (dashed) and = 0 50 (dotted). Part (b) gives the behavior of the Gamma as the time to maturity increases, for a contract which is at-the-money (S = 100, solid), in-the-money (S = 95, dashed), and out-of-the-money (S = 105, dotted).
0.09 PSfr 0.08 0.07 0.06 0.05 0.05 0.04 0.04 0.03 0.02 0.01 0.00 70 0.03 0.02 0.01 0.00 0.0 0.10 0.09 0.08 0.07 0.06

80

90

100

110

120

130

0.2

0.4

0.6

0.8

1.0

(a) Across Prices

(b) Across Time

We want to buy C units of the option. This makes the value of our composite position equal to V + C, and most importantly it will have a Gamma equal to V +C = V + C C . Therefore, to achieve Gamma-neutrality we need to hold V C = C units of the option. V The Delta of the new portfolio is of course V +C = V C C . To make the position Delta-neutral we want to also hold S = V +C shares of the underlying asset. For European call and put options the value of Gamma is given by C = N ( + ) S

Graphically, gure 2.4 gives Gamma across dierent moneyness and maturity levels. Apparently the Gamma is signicant for contracts that are at-the-money. In particular, the Gamma of at-the-money options goes to innity as maturity approaches. This is due to the discontinuity of the derivative of the payo function. Dynamic Delta-Gamma hedging As Gamma is the sensitivity of the Delta with respect to the underlying price S, we can use a Delta-Gamma neutral strategy to construct a replicating portfolio which is second order accurate in S. When we Delta hedge over a discrete time interval we introduce replication errors since the Delta of our position will not remain equal to zero as the time changes over this rebalancing interval.

56(2.4)

F . : Dynamic Delta-Gamma hedging of a call option. At time zero we sell a European call with strike price K = 100, and we Delta-Gamma hedge it 25 times over its life. To do so we use the underlying stock and a call option which has at all points a strike price that is 105% the current spot price. The underlying asset process is the same as in gure 2.2. Subgure (a) gives the number of options (solid) and shares (dashed) that we need to hold to maintain Delta-Gamma neutrality. The dotted line gives the number of shares that Delta hedge (as in gure 2.2). Subgure (b) gives the bank balance if we Delta-Gamma hedge (solid) or just Delta hedge (dotted). In (c) changes in the option price and changes in the hedging portfolio are compared. Crosses give the Delta-Gamma hedging deviations, while circles correspond to pure Delta hedging. Finally, subgure (f) illustrates the replication error between the hedging portfolio (solid) and the option (dashed) which are virtually indistinguishable. The dotted line gives the process of the Delta hedging portfolio.
1.8 1.6 1.4 -10 1.2 1.0 0.8 0.6 0.4 -50 0.2 0.0 -0.2 0.00 -60 -70 0.00 -20 -30 -40 +10 0

0.05

0.10

0.15

0.20

0.25

0.05

0.10

0.15

0.20

0.25

(a) Shares and options held


1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 -1.0 -2.5 0.00 -2.0 -1.5 -1.0 -0.5 0.0

(b) Bank balance

-0.5

0.0

0.5

1.0

0.05

0.10

0.15

0.20

0.25

(c) Option/portfolio changes

(d) Replication

57(2.4)

F . : Comparison of the replication errors for Delta and Delta-Gamma neutral positions. The histograms are based on 10,000 simulations of the underlying asset price. For each run a call option with strike price K = 100 was Delta or Delta-Gamma hedged, as in gures 2.2 and 2.5.
4500 4000 3500 3000 2500 2000 1500 1000 500 0 -3 -2 -1 0 +1 +2 +3 4500 4000 3500 3000 2500 2000 1500 1000 500 0 -3 -2 -1 0 +1 +2 +3

(a) Delta hedging

(b) Delta-Gamma hedging

These changes of Delta will be proportional to the derivative = V = V . S S 2 Therefore, if we construct a position that has = = 0 we form a portfolio that will maintain a position which is (approximately) neutral for larger price changes, and since the price is diusive, for longer periods of time.4 Of course as we mentioned above we cannot implement such a position using the underlying asset alone, and we will need an instrument that exhibits non-zero Gamma. Typically we use liquid call and put options that are around-the-money to do so. In gure 2.5 we repeat the experiment of gure 2.2 using a Delta-Gamma neutral strategy this time. We sell one call option with strike K = 100 and construct a Delta-Gamma hedge that uses, apart from the underlying asset, a call option. We could use an option with a constant strike price throughout the time to maturity, but there is always the risk that as the underlying price uctuates this option might become deep-in- or deep-out-of-the-money. Such an option will have C 0 (see gure 2.4), and our position in options C = V / C . To get around this problem, at each point in time we use a call option that has strike price 105% the value of the underlying asset at this point, K = 1 05S . This essentially means that when we rebalance we sell the options we might hold and invest in a brand new contract.5 Figure 2.5 gives the processes for this experiment. In subgure (c) it is easy to see that the Delta-Gamma changes in the portfolio follow the changes of the hedged instrument a lot more closer than the portfolio of gure 2.2 which was only Delta neutral. This improvement in replication accuracy is also illustrated in subgure (d), where the two processes are virtually indistinguishable.
4

There is also an error associated with Delta changes as time passes, proportional to the Charm , but these eects are typically small and deterministic. Of course if transaction costs were present this would not be the optimal strategy.

58(2.4)

We can also repeat the above experiments to assess the average performance of simple Delta and Delta-Gamma hedging. Here we create 10,000 simulations of the underlying asset and option prices, and implemented the two hedging strategies. The table below gives the summary statistics for the hedging errors, when we hedge 10, 25 or 50 times during the three-month interval to expiration. Figure 2.6 presents the corresponding histograms for the two hedging strategies, when we rebalance 25 times. Hedges 10 25 50 Strategy & & & Mean 0 01 0 12 +0 00 0 02 0 00 0 01 St Dev 0 52 0 35 0 33 0 17 0 24 0 11 Min 3 34 3 76 1 77 2 11 1 50 1 05 Max +1 76 +1 45 +1 30 +2 22 +1 00 +2 28 Skew 0 47 3 17 0 24 2 63 0 27 +0 63 Kurt 4 60 19 1 4 30 28 9 4 77 53 3

In comparison, the Delta-Gamma neutral strategy gives pricing errors that are a lot more concentrated around zero, but with signicant outliers. This is illustrated in gure 2.6 and the table above. In particular, for all hedging frequencies Delta-Gamma hedging produces half the standard deviation of the errors. On the other hand, for some paths of the underlying asset, implementing Delta-Gamma hedging produces outliers. This is also conrmed by the the table, where the minimum and maximum values and the kurtosis indicate extremely fat tails. Of course, this behavior is dependent on the exact implementation of the hedging strategies, that is to say which instruments are used and how the rebalancing points are selected. Gamma and uncertain volatility The BS-PDE (2.3) depends on two parameters, the risk free rate , which is a quantity that is directly observable, and on the volatility , which is not. Typically, an options writer will sell contracts based on a conservative estimate of the volatility, say and subsequently hedge it. It is therefore natural to ask what will the implications be if we hedge our position using a wrong value for . It turns out that Gamma has another important role to play, as it will determine the impact of this misspecication. The approach that we follow here is outlined in Carr (2002) and Gatheral (1997, 2006), among others. To put things concretely, say that the true process for the underlying asset is given by the SDE dS = S d + A S dB where the superscript A denotes the actual volatility. We take the position of the writer of a European-style option that oers a payo (ST ) at time T , and consider its valuation as a function not only of ( S), but also of the volatility . For that reason we denote the value of this derivative with V ( S; ). We therefore consider a family of pricing functions for dierent values of , where all satisfy the BS-PDE

59(2.4)

1 2 V ( S; ) + S V ( S; ) + 2 S 2 2 V ( S; ) = V ( S; ) S 2 S with the appropriate boundary condition V (T S; ) = (S). Let us assume that we are asked to sell one such contract, and we quote a price that solves the BS-PDE for a volatility parameter I , which we call the I implied volatility. We can write our quote as V0 = V (0 S0 ; I ). After selling the contract we proceed with the Delta hedging approach. In particular, we implement a self-nancing trading strategy H = {(H S H F ) : 0}, where we H S = H = S V ( S ; H ) units of the underlying asset at each time point, and keep a risk-free bank balance of H F . Note that when we compute the Delta of the contract we use a third volatility H , that is to say the hedging volatility. The initial bank account balance will be
F H0 = V (0 S0 ; I )

V (0 S0 ; H )S0 S

(2.6)

and the bank account dynamics will be aected at each time by the amount needed to purchase (or sell) stocks to maintain Delta-neutrality, and by interest payments. In particular, at time + d the Delta has changed to H S + dH S , indicating that we will need to purchase dH S shares. The price of each share is of course S + dS when we make this purchase. Also, over this period we will gain an amount H F d due to the interest on the bank balance. Putting these two together we can write the dynamics of the bank account balance as6 dH F = dH S (S + dS ) + H F d = d H S S + H S dS + H F d = d H S The solution of the above SDE can be written as
F F exp( T )HT H0 = exp( T )H ST + H S0 T 0 T

+ H dS + H F d

+
0

exp( ) H dS H S d

(2.7)

Its formula will give us the dynamics for the quantity V H = V ( S ; H ) o that gives us the value of Delta that we wish to maintain. In particular, 1 dV H = H + ( A )2 S 2 H d + H dS 2 Since V ( S; H ) satises the BS-PDE, H + H S + 1 ( H )2 S 2 H = V H , we 2 can write
6

The second equation is due to the fact that d(X Y ) = X dY + Y dX + dX dY , where the others are largely based on d(exp( )X ) = exp( )dX exp( )X .


dV H =


VH + 1 ( A )2 ( H )2 S 2 H d + H dS H S d 2

60(2.4)

We can solve the above expression for the last square bracket and substitute into the expression for the bank balance dynamics (2.7). This will produce
F F exp( T )HT H0 = exp( T )H ST + H S0 T 0 T

+
0

exp( ) dV H V H d 1 ( A )2 ( H )2 2
T 0

exp( )S 2 H d

H Now we can use (2.6) and the fact that VT = (S ) and H = (ST ) to T write the nal bank balance in a parsimonious way as F I H HT = exp( T ) V0 V0 + (ST ) ST (ST )

1 ( A )2 ( H )2 2

T 0

exp( )S 2 H d

Also, at time T we are holding H = (ST ) shares that we will sell, and also T deliver the payo of the derivative contract (ST ). Overall, our prot (or loss) from the Delta hedging strategy will be P&L = exp( T ) V (0 S0 ; I ) V (0 S0 ; H ) + 1 ( H )2 ( A )2 2
T 0

exp (T ) S 2 H d

(2.8)

Equation (2.8) is very interesting for a number of reasons. If we happen to know (or be able to estimate fairly accurately) the actual volatility A that we prevail over the life of the contract, then by Delta hedging we can lock in the prot that the dierence between the quote V (0 S0 ; I ) and the fair value V (0 S; A ), irrespectively of the path of the underlying asset. To do so we should use the actual volatility to compute the Delta of our strategy, V H = V A for all 0 T . This happens of course because in this case our dynamic rebalanced portfolio replicates the true payos (ST ). It is likely though that we will not know A , and in this case we might choose to hedge using the implied volatility I . Then, the rst part of P&L vanishes, and the nal prots will depend on the path of the underlying asset, and will therefore be uncertain. In fact, the sign of the prot will depend on the sign of H . For standard calls and puts 0, which implies that we will always realize prots if once again the implied (and here also hedging) volatility is greater than the realized one, I = H > A . The Gamma for standard calls and puts resembles the underlying probability density, and has its peak around-themoney. This means that the realized prots will be maximum if the underlying

61(2.4)

F . : Delta hedging with uncertain volatility. An at-the-money put is sold, and subsequently Delta-hedged using the implied volatility. Dierent trajectories of the underlying asset will generate dierent prots, with the highest when the asset does not trend.
130 0.0

120

-0.2

110

-0.4

100

-0.6

90

-0.8

80 0.00

0.05

0.10

0.15

0.20

0.25

-1.0 0.00

0.05

0.10

0.15

0.20

0.25

(a) Asset trajectories

(b) Shares long

price does not trend upwards or downwards (as this would render the option inor out-of-the-money). Figure 2.7 gives an example. We are asked to quote an at-the-money European call (S0 = K = $100) with maturity three months.7 The actual volatility over the life of the option is A = 15%, which indicates that the fair value of A I this contract is V0 = $2 74. We agree to sell this option at V0 = $3 73, which I implies a volatility = 20%. Essentially the option is overpriced by $0 99. The gure illustrates three possible trajectories, where the underlying asset moves up, down or sideways over the life of the option. We might know the future actual volatility, in which case we can select H = 15%. If we do not, we can hedge at the implied volatility H = 20%. The following table gives the prots realized using each sample path, with 5 000 rebalances over the three month period (about 60 per day). One can observe that in the case where the asset does not trend, using I outperforms A . Also, note that when the asset moves sideways, even such a frequent rehedging strategy is not identical to the continuous one. P&L when asset moves up down sideways +$0 99 +$0 99 +$0 92 +$0 51 +$0 57 +$1 44

H = A = 15% H = I = 20%

V
We have already highlighted the dependence of derivative contracts on the volatility of the underlying asset. The BS methodology makes the assumption
7

The drift of the underlying asset is = 8%, and the risk free rate of interest is

= 2%.

62(2.4)

that the volatility is constant across time, but practitioners routinely compute the sensitivity of their portfolios with respect to the underlying volatility, and in some cases try to hedge against volatility changes. Of course, in order to be precise one should start with a model that species a process for the volatility of the asset, and not the BS framework where the volatility is constant. Then, sensitivities with respect to the spot volatility are in principle computed in a straightforward matter, exactly as we compute the BS Delta. In practice, practitioners use the Black-Scholes Vega instead. It might appear counterintuitive to use the derivative with respect to a constant, but it oers a good (rst order) approximation. Unless the rebalancing intervals are too long, or the volatility behaves in an erratic or discontinuous way, the Vega is fairly robust and easy to compute and use. We follow the last subsection and consider the value of the portfolio as a function of the volatility (in addition to ( S)), V = V ( S; ). Then, applying Taylors expansion yields 1 V = + S + S 2 + + o( S 2 ) 2 The underlying asset price does not depend explicitly on the volatility, rendering S = 0. Once more we need to rely on nonlinear contracts, such as options, to make a portfolio Vega-neutral. If we want to achieve joint GammaVega-neutrality hedge, we will have to use two dierent derivative securities. Say that we use two options with prices C1 and C2 , with known deltas (C1 and C2 ), Gammas ( C1 and C2 ) and Vegas ( C1 and C2 ). We also use the underlying asset to achieve delta neutrality (of course S = 1 and S = S = 0). We want to buy C1 and C2 units of the two derivative securities to achieve Gamma-Vega neutrality. We are therefore faced with the system V +C1 +C2 = V +
V +C1 +C2 V C1 + C2 C2 = 0 C1 C1 + C2 C2 = 0 C1

= +

This will identify the holdings of the two derivatives


C1

V C2 C2 V C1 C2 C2 C1

C2

C1 V V C1 C1 C2 C2 C1

After that we can adjust our holdings of the underlying asset to make our position Delta-neutral as well. For a European call or put option the BS value of Vega is given by C = S N ( + ) Graphically, the Vega across dierent moneyness and maturity levels is given in gure 2.8. It is straightforward to observe that Vega, like Gamma, is more pronounced for at-the-money options. Unlike Gamma though, thee Vega drops as we move closer to the maturity. Thus, to achieve Vega neutrality one should incorporate long dated at-the-money options in her portfolio.

63(2.4)

F . : Behavior of a call option Vega. Part (a) gives the behavior of the Vega of options with specications {K } = {100 0 02 0 20}, and three dierent times to maturity: = 0 05 (solid), = 0 25 (dashed) and = 0 50 (dotted). Part (b) gives the behavior of the Vega as the time to maturity increases, for a contract which is at-the-money (S = 100, solid), in-the-money (S = 95, dashed), and out-of-the-money (S = 105, dotted).
30

v20 v19

25

v18
20

v17
15

v16 v15

10

v14
5

v13
0 70

80

90

100

110

120

130

v12 x12

x13

x14

x15

x16

x17

(a) Across Prices

(b) Across Time

D
In the above analysis we have ignored the impact of dividends, just to keep things simple. When a stock pays a continuous dividend at a constant rate , the process of the underlying asset under Q is given by the GBM dS = ( )S d + S dBQ Derivatives will be given once again as expectations P0 = exp( T )EQ (ST ), and their pricing function P = ( S ) will satisfy the PDE 1 2 ( S) + ( )S ( S) + 2 S 2 2 ( S) = ( S) S 2 S The prices and the Greeks can be computed easily following the same steps. In particular we can summarize the most useful Greeks in the following catalogue, where = +1 for calls and = 1 for puts, and

log(S0 /K ) + ( 2 /2)(T ) (T )

Option price P = V ( S) P = S
(T )

N(

+)

(T )

N(

Delta =

V ( S) S

(T )

N(

+)


S)

64(2.5)

( Theta = V

SN ( + ) 2 T
2 V ( S) S 2

(T )

SN(

+)

(T )

+ K

(T )

N(

Gamma =

= Vega = Rho =
V ( S; )

N ( + ) (T ) S T
+) (T )

V ( S; )

= S T N ( = K T

(T )

N(

Dividend-rho =

V ( S; )

= ST

(T )

N(

+)

Similar expressions can be derived for foreign exchange rates. In particular, the exchange rate (denominated in the domestic currency) under risk neutrality is assumed to follow the GBM dS = S d + S dBQ

In essence holding the foreign currency will depreciate at the risk free rate dierential. Therefore is is straightforward to conrm that the formulas for the option prices and their Greeks above will hold, where and .

. I
As we have pointed out a few times already, the parameters of the BS formula are all considered to be F0 -measurable by assumption. In reality though, although the current price, the strike price, the maturity and the interest rate are observed at time = 0, the volatility of the asset price is not. Since an array of call and put options are also available with prices that are observed at time zero, one will naturally attempt to invert numerically the BS formula and construct a series of implied volatilities { (T K )} across dierent maturities and strike prices. Bajeux and Rochet (1996) show that there is a one-to-one relationship between implied volatilities and option prices. As pointed out in Dupire (1994), these implied volatilities will indicate how the underlying asset should vibrate in the BS world, in order for the contract to be priced correctly. Following our discussion in the previous section, these volatilities would be natural candidates to compute the Delta that will hedge the option.

65(2.5)

If the assumptions underlying the BS formula were correct, then all prices would be priced according to the BS formula, and therefore we should extract the same implied volatility from all options (that is for any maturity and strike price combination). It turns out though, that these implied volatilities are not constant, and in fact exhibit some very clear and persistent patterns. We will see that these patterns can be attributed to actual volatilities that are time varying, to discontinuities in the asset price process, to hedging demands for specic option contracts, and to liquidity premiums for some specic groups of options. The variation of volatility is a well documented feature of asset returns, and models that incorporate stochastic volatilities, rst introduced in Hull and White (1987, HW), give the theoretical background to interprete the implied volatility as the expectation of the average future (realized) volatility over the life of the option. Say that we are considering the price of a European call, that is (ST ) = (ST K )+ . We assume, that the future volatility is stochastic but independent of the stock price, and also has zero price of risk.8 The main idea of HW is to condition on the average variance over the life of the option, namely the random variable 1 T 2 2 = d T 0 Then, using the tower property we can write the option price as P0 = exp( T )EQ (ST ) = exp( T )EQ EQ [(ST )|] where the outermost expectation is with respect to all possible realizations of . It turns out that the conditional option prices are equal to their Black-Scholes counterparts, with . Thus, we can write the HW prices as a weighted sum of BS prices ) = EQ BS ( S; K ) HW ( S; K T In the above expression the dots represent parameters that govern the volatility dynamics, and are the corresponding pricing functions. If we now consider an at-the-money option, where the strike is set at the forward price KAT M = S exp( T ), then the HW formula will give ) = EQ S 2N T 1 HW ( S; KAT M 2 On the other hand, if PAT M is the observed price, the ATM implied volatility will solve AT M PAT M = BS ( S; K AT M ) = S 2N T 1 2 Assuming that the HW model is the correct model, PAT M , we have the relationship
8

HW (

S; KAT M

) =

Intuitively this means that the volatility risk is diversiable, or that investors are indierent to the level of volatility risk. We will come back to these issues later.


N AT M T 2 = EQ N T 2

66(2.6)

Assuming short maturities, T , the cumulative normal density is approximately linear around zero, which yields the approximate relationship AT M EQ 1 T
T

2d

Thus the implied ATM volatility is approximately equal to the expected average volatility over the life of the option.

. S
The log-normality of the asset price distribution, a result of the GBM that underlies the BS derivation, is not a satisfactory assumption. In fact, it has been documented that equity prices do not follow such a distribution even from the PhD dissertation of Bachelier (1900). Nonetheless, the BS methodology results into a formula that is intuitive and very easy to implement in practice, and therefore it is widely used both for academic and practical purposes. In fact, options in exchanges are actually quoted with their implied volatilities rather than in dollar or sterling terms. In addition, the fact that the volatility of the underlying asset and the risk free rate of return are assumed constant, simplies the exposition, by forcing the markets to be complete. Testing the BS model gives rise to many theoretical and practical problems. If we use actual option prices to carry out such tests, we cannot distinguish between the potential mis-specications of the pricing formula and market ineciencies. The joint hypothesis that the correct model is used and that the markets are ecient is necessarily tested (for a discussion see for example Hull, 2003). The fact that at any time a parameter of the BS model is actually unobserved further complicates things, as it is not clear which one to use. A third problem arises from the possible asynchroneity of the equity, bond and option markets. If trading does not take place simultaneously, or the market are very thin, it is questionable if the assumption of completeness is satisfactory. Not having data on synchronous transactions in liquid markets distorts the results. The patterns of the implied volatilities summarize many of the failures of the BS model, and researchers have been looking at them closely since good quality data became available. An early analysis is the seminal paper of Rubinstein (1985), where dierent patterns of implied volatilities emerge, depending largely on the particular period that was used, with predominantly a U-shaped pattern with the lowest point at-the-money. In the more recent work of Rubinstein (1994) and Jackwerth and Rubinstein (1996) implied volatilities tend to be higher for out-of-the-money puts and lower for out-of-the-money calls. These emerging pattern of implied volatilities with respect to dierent measures of moneyness is

67(2.6)

often encountered in the literature as the implied volatility smile, skew or smirk. If we create a three-dimensional view of the implied volatility with respect to the moneyness and the time to maturity we construct the implied volatility surface. Figure XX presents such a surface based on options data on the FTSE100. These implied volatility patterns can be attributed to some of the best documented stylized facts of the distribution and dynamics of asset returns (two excellent surveys are Bollerslev, Engle, and Nelson, 1994, and Ghysels, Harvey, and Renault, 1996). Below we shall give a small overview of these features and discuss how they are reected on the implied volatility surface. Leptokurtosis It has been long observed that asset returns follow a distribution which is far from normal, in particular one that exhibits a substantial degree of excess kurtosis or fat tails (Fama, 1965). These fat tails seem to be more pronounced for short investment horizons (ie intraday, daily or weekly returns), and they tend to gradually die out for longer ones (ie monthly, quarterly or annual returns). A distribution with high kurtosis is consistent with the presence of an implied volatility smile, as it attaches higher probabilities to extreme events, compared to the normal distribution. If the at-the-money implied volatility is used, then the BS formula will underprice out-of-the-money puts and calls. A higher implied volatility is needed for the BS formula to match the market prices. Merton (1976) among others, notes that a mixture of normal distributions can exhibit fat tails relative to the normal, and therefore models that result in such distributions can be used in order to improve on the BS option pricing results. Most (if not all) modern option pricing models to some extend do exactly that: expressing calendar returns as a mixture of normal distributions. Skewness Apart from exhibiting fat tails, some asset return series also exhibit signicant skewness. For stocks and indices this skewness is typically negative, highlighting the fact that the speed that stock prices drop is higher than the speed they grow (although the tend to grow for longer periods then they decline). For currencies the skew is not generally one sided, swinging from positive to negative and back, over periods of time. The asymmetries of the implied volatility skew can be attributed to the skewness of the underlying asset returns. In prices are more likely to drop by a large amount than rise, one would expect out-of-the-money puts to be relatively more expensive than out-of-the-money calls. Black (1972) suggests that volatilities and asset returns are negatively correlated, naming this phenomenon the leverage eect or Fisher-Black eect. Falling stock prices imply an increased leverage on rms, which is presumed by agents to entail more uncertainty, and therefore volatility. This asymmetry can generate skewed returns, but is not always sucient to explain the very steep implied skews we observe in (especially index) options markets. A second component that is

68(2.6)

needed is accommodating for market crashes arriving as jumps in the asset price process, or even just fears of such crashes (the crash-o-phobia of Bates, 1998). Volatility features The fact that volatility is not constant is well documented, and allowing it to be time varying is perhaps the simplest way to construct models that mix normal distributions. Empirically, it appears that volatility in the market comes in cycles, where low volatility periods are followed by high volatility episodes. This feature is known in the literature as volatility clustering. The Arch, Garch and Egarch families9 , as well as models with stochastic volatility have been used in the literature to model the time variation of volatility and model volatility clustering. The survey of Ghysels et al. (1996) gives a good overview of volatility models from a modeling perspective. Local volatility models take a completely dierent approach, as they focus solely on the pricing and hedging of derivatives, preferring to keep volatility time-varying but deterministic rather than stochastic (Dupire, 1994). We will discuss these extensions in chapter 6. The variation of volatility can be linked to the arrivals of information, and high trading volume (Mandelbrot and Taylor, 1967; Karpo, 1987, among others). One can argue that trading does not take place in a uniform fashion across time: new information will result in a more dense trading pattern with higher trading volumes, which in turn result in higher volatilities. Price discontinuities Even allowing the volatility to be time varying cannot accommodate for very sharp changes in the stock price, typically crashes, which although are very rare events, have a signicant impact on the behavior of the market. On October 19th, 1987, the S&P500 index lost about 20% of its value within a day and without any signicant warnings. If the market was to follow the Black-Scholes assumption of a GBM with constant volatility, such an event should happen once in 1087 years,10 Even if we allow the volatility to vary wildly, a model with continuous sample paths that will exhibit such a behavior is not plausible. Starting with Merton (1976), researchers have been augmenting the diusive part of the price process with

10

Arch here stands for autoregressive conditional heteroscedasticity (Engle, 1982), Garch stands for generalized Arch (Bollerslev, 1986), and Egarch for exponential Garch (Nelson, 1991) This is a very long time. For a comparison, the age of our universe is estimated to be about 1024

3 Finite dierence methods

The Black and Scholes (1973, BS) partial dierential equation (PDE) is, as we saw, one of the most fundamental relationships in nance. It is as close to a law as we can get in a discipline that deals with human activities. The importance of the expression stems from the fact that it must be satised by all derivative contracts, independently of their contractual features. In some special cases, for example when the contract in question is a European-style option, the solution of the PDE can be computed in closed-form, but this is not the general case. In many real situation we will have to approximate the solution of the PDE numerically. If denotes time and S = S( ) is the value of the underlying asset, the BS model assumes that S follows a geometric Brownian motion dS( ) = S( )d + S( )dB( ) It follows that, for any derivative contract, the pricing function satisfy the BS PDE ( S) ( S) 1 2 2 2 ( S) + S + S = S 2 S 2 = ( S) will

( S)

(3.1)

where S is the price of the asset and is the time to maturity. Equation (3.1) is not sucient to uniquely specify , initial and perhaps a number of boundary conditions are also needed for (3.1) to admit a unique solution. In fact, dierent derivative contracts will impose dierent initial and boundary conditions, but (3.1) must be satised by all of them. For example, the standard call option will impose the initial condition (0 S) = max(S K 0) Finite dierence methods (FDMs) is the generic tern for a large number of procedures that can be used for solving a (partial) dierential equation, which have as a common denominator some discretization scheme that approximates the

70(3.1)

required derivatives. In this chapter we will give an overview of these methods and also examine some examples that illustrate the methodology in nancial engineering. Thomas (1995) gives a detailed overview of dierent approaches, together with exhaustive analysis of the consistency, convergence and stability issues. Wilmott, Dewynne, and Howison (1993) present FDMs within an option pricing framework.

. D
Before we turn to the fully edged PDE (3.1), let us assume for a moment that we are given a one-dimensional function = ( ). Our goal is to provide some estimate of the derivative of at the point , namely () = d d() . We can express the derivative using three dierent expressions that involve limits towards : (+ ) () lim 0 d () ( = lim 0 () ) d ) lim 0 (+ 2 ( )

For a dierentiable function all three limits are equal, and suggest three candidates for discrete approximations for the derivative. In particular we can construct: 1. The right limit yields the forward dierences approximation scheme d () ( + ) () d

2. The left limit yields the backward dierences approximation scheme d () () ( ) d

3. The central limit yields the central dierences approximation scheme d () ( + ) ( ) d 2 These schemes are illustrated in gure 3.1, where the true derivative is also given for comparisons. Of course the approximation quality will depend on the salient features of the particular function, and in fact, it turns out to be closely related to the behaviour of higher order derivatives. Let us now assume that we have discretized the support of using a uniform grid, { } = 0 + , and dene the values of the function = with = ( ). Then, we can introduce the corresponding dierence operators D+ , D and D0 , and rewrite the dierence approximations in shorthand1 as
1

For us these operators serve as a neat shorthand for the derivative approximations, but there is, in fact, a whole area of dierence calculus that investigates and exploits their properties.

71(3.1)

F . : Finite dierence approximation schemes. The forward (green), backward (blue) and central (red) dierences approximation schemes, together with the true derivative (dashed).
4

3.5 +
d () d

2.5

2 1.5

0.5

10

15

20

Forward: D+ Backward: D Central: D0

= = =

1 +1 1 2
+1

What are the properties of these schemes and which one is more accurately representing the true derivative? A rst inspection of gure 3.1 reveals that the central dierences approximation is closer to the true derivative, but is this generally true? In order to formally assess the quality of the approximations we will use Taylor expansions of around the point , that is to say the expansions of the points 1 :
+1

= =

d ( ) 1 d2 ( + d 2 d 2 d ( ) 1 d2 ( + d 2 d 2 +

1 d3 ( 6 d 3 ) 1 d3 ( 2 6 d 3 )
2

) )

+ +

Substituting the corresponding values in the approximation schemes will yield the important relationships


D+ D D0 d ( ) + o( ) d d ( ) = + o( ) d d ( ) = + o( 2 ) d =

72(3.2)

In the above expressions we introduce the big-O notation, where o( ) includes all terms of order and smaller.2 Now since | 2 | | | around zero, it follows that the terms |o( 2 )| |o( )|, which means that central dierences are more accurate than forward of backward dierences. We say that central dierences are second order accurate while forward and backward dierences are rst order accurate. Therefore, without any further information on the function, we should use central dierences where possible. If we have some extra information, perhaps using one-sided derivatives might be benecial. Such cases could arise when the drift term dominates the PDE, or alternatively when the volatility is very small. In our setting though we will concentrate on approximations that use central dierences as their backbone. The BS PDE also involves second order derivatives, on top of the rst order ones. We therefore need to establish an approximation scheme for these second derivatives. When we achieved that we will be able to proceed to the actual discretization of the BS PDE (3.1). Since we are trying to establish second order accuracy, we are looking for a scheme that approximates the second derivatives using central dierences. It turns out that an excellent choice is an approximation that takes central dierences twice over a half-step . 2
d (
+1/2 )

D2

d ( d

1/2 )

+1

+1

2 + 2

Using the same substitutions from the Taylor expansions as above yields D2 = d2 ( ) + o( 2 ) d 2

Therefore, we conclude that the operator D2 is second order accurate. In addition D2 has the advantage that in order to compute it we use the same values that were needed for the rst dirence D0 , namely 1 , and the value .

| Formally, if a function = ( ) is o( ) then the limit of the ratio |( )|| < C < (meaning that it is bounded) as 0. Intuitively, ( ) approaches zero at the same speed as . We say that is of order .

73(3.2)

. P

PDE

The BS PDE belongs to a wide and well documented class of PDEs called parabolic partial dierential equations. Many important natural phnomena are associated with parabolic PDEs, ranging from Einsteins heat equation to Schrdingers description of quantum mechanics. In order to simplify the subsequent notation we use the shorthand elliptic operator L ( ) = ( ) ( ) + ( ) 2 ( )
2

+ (

) (

for general functionals , and . Therefore the BS PDE will be of the general form ( S) = L ( S) (3.2) Suppose that we work on a grid = { }+ , with constant grid spacing = equal to . We will concentrate on an initial value problem, and therefore assume that the function extends over the whole real line. The problem of boundary conditions will be addressed later in this chapter. We will also dene the value function at the grid points, ( ) = ( ), = +. We construct the discretized operator by applying the dierences D0 and D2 L ( ) = ( ) D0 ( ) + ( ) D2 ( ) + ( ) ( ) In the above expression the functionals , and are just the restrictions of , and on the grid point . Substituting the dierence operators gives L ( )= ( )
+1 (

) 2

1 (

) + ( )
+1 (

)2 ( )+ 2

1 (

+( ) ( )

Our goal was to construct a discretized operator that, in some sence, converges to the actual operator as the discretization becomes ner, or somehow L L . Essentially, since we want to establish convergence we will need a measure of distance between the operators. We will discuss these issues in more detail in section 3.2, following the introduction to the explicit method. After establishing this convergence we will move forward and approximate the PDE itself at the point with () =L ( ) () = The functionals given by
+

( )

+1 (

)+

( ) ( )+

( )

1 (

(3.3)

( ) and

( ) depend on the structure of the PDE and are

74(3.2)

1 1 + ( ) 2 2 1 0 ( ) = ( ) 2 ( ) 2 ( ) = ( )

A PDE

ODE

Since (3.3) will hold for all grid points { }+ , we have represented the dis= cretized problem (3.3) as a system of ODEs, which can be cast in matrix form for ( ) = { ( )}+ = () = Q( ) ( ) (3.4) subject to the initial condition (0) = { ( 0)}+ . Equation (3.4) describes the = evolution of the pricing function as the time to maturity increases. From this point on we will make the additional assumption that the matrix Q( ) is timeinvariant, Q( ) = Q. Time dependence can be accommodated in a straightforward way in the numerical implementation. The matrix Q is tridiagonal, in particular .. .. .. . . . + 0 0 1 1 1 0 + 0 Q= 0 0 0 0 0 + 0 0 0 +1 +1 +1 .. .. .. . . . There is a large number of solvers for such systems. We will consider methods that apply time discretization as well, and therefore work on a two-dimensional grid.

T
In equation (3.4) we converted the PDE in question into a system of innite ODEs. Apparently it is not feasible in practice to numerically solve systems with an innite number of equations. We will therefore nedd to truncate the grid and consider a subset with N elements = { }N . This means that we will =1 need to take special care on the treatment of the numerical approximations at the articial boundaries 1 and N . We will discuss these issues in detail in section 3.2. Also, to construct a two-dimensional grid we need to discretize across time as well, using N points that dene subintervals of constant width , { }N . =1 Figure 3.2 illustrates such a grid, together with a view of a function surface that we could reconstruct over that grid. It is important to note that neither the space nor the time grid have to be uniform. One can, and in some cases should, consider non-uniform grids based on some qualitative properties of the PDE in hand.

75(3.2) F
120
+1

. : A two-dimensional grid.

100

80

60

40

20

0 -0.5

0.5

1.5

2.5

76(3.2)

. : The Explicit FDM.

100 95
1

+1

+1

90 5 85 1.1 1.2 1.3 1.4 25 15 20 10

As we noted, equation (3.4) describes the dynamic evolution of derivative prices, subject to initial and perhaps boundary conditions. Our time discretization has that objective as well: given the pricing function values at time we should be able to determine the function values at time +1 . Therefore, starting from the initial values at time 0 = 0, we recursively produce the values at 1 , 2 , and so on. At rst glance using central dierences is not feasible, since the central dierence at the time point 0 needs the values at 1 and 1 to be determined, but the values at 1 are unavailable. On the other hand forward dierences in time will do, as we only need the values at 0 and 1 to form them. In order to condense notation we will use to denote the value ( ). We also assume a uniform grid with spacings and , although it is not a lot harder to work over non-uniform grids.3 Then, by approximating explicit nite dierence method
+1 ( )
+1

, we derive the

=L ( )=

+1

(3.5)

We can explicitly solve4 the above expression for cursive relationship


3 4

+1

, which yields the re-

Just a lot more messier. Hence the name!

77(3.2)
+1

=
+

+1

+ 1+

Essentially, the values 1 and determine the next periods value +1 . This is schematically depicted in gure 3.3. In matrix form, the updating takes place as
+1

= (I + Q )

(3.6)

Now we turn to the BS PDE (3.1) and apply this discretization scheme. To simplify the expressions we perform the change of variable = log S. This will transform the PDE into one with constant coecients, namely ( ) + ( ) 1 2 ( + 2 2 )
2

with = 1 2 . The coecients and 0 in the system of ODEs (3.4), which 2 also determine the explicit scheme (3.6), become

2 + 2 2 2 = 2 =

S
By constructing a FDM, like the explicit scheme, we use derivative approximations to reconstruct the true, but unknown, pricing function ( S). The outcome of the FDM is a set of prices at time , namely = { }N , for all dierent =1 =1 N . The natural question is of course how close are the values to the true prices ( )? If we are to use such a scheme in practice we need to be convinced that somehow ( ) as the discretization becomes ner. Also, if we are to put some trust in this approximation we should have an idea about the order of this convergence. One straightforward way would be to examine how the pointwise errors between the true prices and their approximation behave. If we denote the true prices with = { ( )}N , then the errors in question would be the dierences =1 =

We can investigate the convergence by inspecting the -norm, namely5 = max =1 N | ( )|. Apparently, if the maximum (absolute) value converges to zero, then all other values will do as well, and the FDM prices will converge to the true ones. Before we move to the inspection of the global errors, we rst examine the local truncation error, dened as the discrepancy between the true parabolic
5

In some cases it is more convenient to work with the 1 -, 2 - or -norm. The choice largely depends on the problem in hand. See XXXX for details.

78(3.2)

PDE (3.2) and the approximated one (3.5), evaluated at the true pricing function at the point ( ) = ( )L ( ) (
+1

) (

L (

The denitions and the properties of the dierence operators yield that the truncation error = o( 2 ). We therefore say that the explicit method is rst order accurate in time and second order accurate in space. Intuitively, this truncation error would tell us how errors will be created over one step, if we start from the correct function values. Any scheme that oers order of accuracy greater than zero is called consistent. Of course, even if small errors are created over a given time step, they can still accumulate as we move from one time step to the next. It is possible that they produce feedback eects, producing errors that grow exponentially in time, destroying the approximate solutions and creating oscillatory or explosive behaviour. On the other hand we might construct a FDM that has errors that behave in a nice way, without feedback eects. The notion of stability captures these ideas. One intuitive way of looking at stability is through the Courant-FriedrichsLewy (CFL) condition6 , which is based on the notion of the domain of dependence. If we have a function ( S), then the domain of dependence of the point ( S ) is the set of points F(

S ) = {( S) :

and ( S) depends on the value (

S )}

The CFL criterion states that if a numerical scheme is stable, then the true domain of dependence must be smaller than the domain of dependence of the approximating scheme. In parabolic PDEs the domain of dependence of the process is unbounded, since information travels instantaneously across all values. The domain of dependence of the explicit FDM is bounded, since each value at time +1 will only depend on three of its neighbouring values at time . Therefore, according to CFL criterion in order for the scheme to be stable the condition = o( 2 ) must be satised.7 Therefore the explicit scheme will not be unconditionally stable, and will need very small time discretization steps to oer stability. The connection between local errors, global errors and stability is given by the Lax equivalence theorem which states that a FDM which is consistent and stable will be convergent. This means that the explicit method is not (always) convergent.
6

Stated in 1928, long before any stability issues were discussed in this context. Richardson initiated FDM schemes as back as 1922 for weather prediction, but did not discover any stability problems. This means that the time grid must become ner a lot faster than the space grid, for the information to rapidly reach remote values.

79(3.2)

. : The Implicit FDM.

100 95 90 85 1.1 1.2 1.3 1.4


+1 1 +1

+1 +1

5 10 15 20 25

One way to overcome the stability issues is to use a backward time step. Rather than taking a forward time step at time , we take a backward step from time +1 . This is equivalent to computing the space derivatives at time +1 as shown below +1 +1 +1 = + +1 + 0 +1 + 1 This equation relates three quantities at time +1 and one quantity at time , which is schematically given in gure 3.4. Since we are facing one equation with three unknowns we cannot explicitly give a solution, but we can form a system.
+

+1 +1

+ 1

+1

+1 1

Note that the number of system equations will be equal to the number of unknowns. In matrix form, the system can be written as = (I Q )
+1

+1

= (I Q )1

(3.7)

The same line of argument we used for the explicit method will give us the order of accuracy of the implicit scheme, the errors being again o( ). On the other hand, since the value at time +1 depends on the whole set of prices at time , the domain of dependence of the implicit scheme is unbounded. From the CFL criterion it follows that the implicit scheme is unconditionally stable.

80(3.2)

-N

. : The Crank-Nicolson FDM.

+1

100
+1

+1 +1

95
1 +1 1

90 5 85 1.1 1.2 1.3 1.4 15 /2 /2 20 25 10

Although the implicit scheme is unconditionally stable, it still oers convergence of order o( 2 ). The rst order convergence in time is due to the nature of the derivative approximation. We can increase this order to two by setting up a central dierence scheme in time. We will use a time step of , as 2 we did in the approximation of the second order derivative. This is equivalent in taking the space derivatives at the midpoint between and +1 . This yields the Crank-Nicolson scheme
+1

+1

+ 2

+1 +1

+ 2

+1

+ 2

+1 1

Apparently the Crank-Nicolson scheme will relate six points, illustrated in gure 3.5. Another approach is to simply add up one-half times equation (3.6) and one-half times the rst of equations (3.7). This yields again the Crank-Nicolson scheme, in matrix form 1 I Q 2
+1

1 I + Q 2

Since it uses centered dierences to approximate all derivatives, the errors in the Crank-Nicolson scheme are o( 2 2 ). Therefore, the Crank-Nicolson scheme is second order accurate both in time and space. In addition, like the implicit

81(3.2)

scheme, the Crank-Nicolson scheme has unbounded domain of dependence, and is therefore unconditionally stable. Rather than using weights of 1 to balance the explicit and implicit schemes, 2 one can use dierent values. This gives rise to the -method, which encompasses all schemes described so far. In particular, the -method in matrix form will be (I Q )
+1

= (I + (1 )Q )

It is straightforward to verify that = 0 yields the explicit scheme, = 1 yields the implicit scheme, and = 1 yields the Crank-Nicolson scheme. 2

B
The above treatment of parabolic PDEs assumed that the space extends over the real line. Essentially this implies that the matrices involved are of innite dimensions. Of course in practice we will be faced with nite grids. Sometimes, as is the case with barrier options, boundary conditions will be explicitly imposed by the nature of the derivative contract. In other cases, when the derivative has early exercise features, the boundary is not explicitly dened and is free in the sense that it is determined simultaneously with the solution of the PDE. There are two dierent kinds of xed boundary conditions: Dirichlet conditions set ( B B ) = B , that is the value of the function is known on the boundary. B Neumann conditions set ( B ) = B , that is the derivative is known on the boundary. In the second case we will need to devise an approximation scheme that exhibits o( 2 ) accuracy; if we do not achieve that, then contaminated values will diuse and eventually corrupt the function values at all grid points. This means that we must use nite dierence schemes that achieve o( 2 ), like central dierences. Say that we construct a nite space grid { }N , which essentially discretizes =0 the interval [ 0 N ]. Most of the elements of the matrix Q are not aected by the boundary conditions, and the matrix is still tridiagonal. The only parts that are determined by the xed boundary conditions are the rst and last rows. Thus Q will have the form 0 0 + 0 1 1 1 + 0 2 0 2 0 2 .. .. .. . . . Q= + 0 0 0 .. .. .. . . . + 0 0
N 1

N 1

N 1

where the values at are determined by the boundary conditions.

82(3.3)

We start with a Dirichlet condition at N +1 = ( N +1 ) = B . This point is +1 utilized in the explicit scheme when the value N at time +1 is calculated. In particular +1 + B + 1 + 0 N + N 1 N = Therefore, in matrix form, the updating equation for the explicit scheme becomes
+1

= (I + Q )

where the last row of Q is (0 0 +1 0 +1 ), and is an (N + 1) 1 N N + vector of zeros, with the last element equal to N +1 B . Similarly, a Dirichlet boundary condition at 0 will set the rst row of Q is ( 0 + 0 0), and the 0 0 rst element of is B . 0 Within the implicit scheme this boundary would appear in the N + 1 system that determines function values at time
1 N

+ 1

N 1

Therefore when the Crank-Nicolson method is implemented B will aect both pricing formulas at and +1 . Similar formulas can be easily computed for the lower boundary 0 , where the ghost point 1 is introduced. When a Neumann condition is imposed at N , we apply central dierences ( ) at the point N to approximate = B , which yields
N +1

N 1

+2

These values can be use in the approximation schemes to set up the last row of Q, namely (0 0 + +1 + +1 0 +1 ), and the last element of equal to N N N 2 + +1 B . Similarly, a Neumann boundary condition at 0 will set the rst N row of Q to ( 0 + + 0 0), and the rst element of to 2 B . 0 0 0 0

. A PDE
P

In this section we will build a Matlab example that implements the -method. We assume that the dynamics of the underlying asset are the ones that govern the BS paradigm. We will need payo function G. If we also assume Dirichlet boundary conditions, then the same function will determine the Dirichlet boundaries. If Neumann conditions are specied, then the same function should also give the derivatives on the boundaries. Since we have set up the PDE in terms of the log-price, which we denote in the solver with , the derivatives will be equal to = S S = S exp( ). A call option will be implemented by the function in listing 3.1. A put option is implemented in 3.2.8

83(3.3) L

. :

: Payo and boundaries for a call.


. :

: Payo and boundaries for a put.


The initialization part of the PDE solver just decomposes the structure and constructs the log-price and the time grids. The function returns the payo values, the boundary values and the derivatives on the boundaries. Therefore both Dirichlet and Neumann conditions can be accommodated for. The tridiagonal Q matrix will be constructed according to whether we have specied Dirichlet or Neumann boundary conditions. We use the switch that keeps the boundary type as a two-element vector. At this stage we assume that the same boundary applies to all time steps. The Matlab code for the PDE solver is given in listing 3.3. Here we use the Matlab backslash operator A\B = inv(A) B. The snippet in 3.4 illustrates how the function can be called to compute the price of a European put, and plots the pricing function. Setting will implement the PDE solver with Dirichlet boundary conditions.

E
In many cases the derivative in hand has early exercise features, either American (where the option can be exercised at any point prior to maturity), or Bermudan
8

We will implement the solver using Neumann conditions, and therefore we pass the boundary values as . Actually, for the put price the corresponding Dirichlet boundary condition is not time homogeneous, and our solver will need slight modications to accommodate time inhomogeneous boundaries.

84(3.3)

. :

10

15

20

25

30

35

: -method solver for the Black-Scholes PDE.

85(3.3) L . :

10

15

: Implementation of the -method solver.

F . : Early exercise region for an American put. The time-price space is separated into two parts. If the boundary is crossed then exercise becomes optimal.
early exercise region L ( S) > 0 ( S) = (S)

free boundary log-price

no-exercise region L ( S) = 0 ( S) > (S)

time to maturity

86(3.3)

(where the option can be exercised at a predened set of times). With small changes the PDE solver we constructed can take care of these features. Essentially, the holder of the option has to make a decision at these time points: exercise early and receive the intrinsic value, or wait and continue holding the option. In terms of PDE jargon, the problem is now a free-boundary problem. There is a boundary, which is at the point unknown for us, which separates the region of ( S) where early early exercise is optimal and the region of where it is optimal to wait. Figure 3.6 illustrates these regions. Thus, within the waiting optimal region the BS PDE is satised, while outside the boundary ( S) will be equal to the payo function (S). The boundary function is unknown, but it has a known property: it will be the rst point at which ( S) = (S). This follows from a no-arbitrage argument that gives that the pricing function has to be smooth and not exhibit discontinuities. In terms of the pricing function, it will satisfy L ( S) 0 (3.8) (3.9) (3.10)

( S) (S) L ( S) ( ( S) (S)) = 0

The BS PDE is satised within the no-exercise region, while the pricing function is satised within the exercise region. Equation (3.10) reects that. Within the exercise region L ( S) > 0, while within the no-exercise region ( S) > (S). Equations (3.8-3.9) cover these possibilities. This indicates that a strategy to compute the option price whenearly exercise is allowed will be to set = max (S ) where is the price is no exercise takes place. Therefore the option holders strategy is implemented: the holder will compare the value of the option if she did not exercise with the price if she does; the option value will be the maximum of the two. Although the above approach is straightforward in the explicit method case, it is not so in the other methods where a system has to be solved. In these cases we are looking for solutions of a system subject to a set of inequality conditions. In the most general -scheme, the system has the form (I Q ) (I Q )
+1 +1 +1

(I + (1 )Q ) (S)
+1

(I + (1 )Q )

(S) = 0

where S is the vector of the grid prices of the underlying asset, and the inequality is taken element-wise. Such systems can not be explicitly solved, but there are iterative methods, like the projected successive over-relaxation or PSOR method. Given a system

87(3.3) A [A ] [ ] = 0

a starting value updates9


( +1)

(0)

, and a relaxation parameter (0 2), the PSOR method

The PSOR procedure is implemented in the short code given in 3.5. The programme will solve A x b and x c, while one of the two equalities will strictly hold for each element. The initial value is xinit. The function returns the solution vector , and an indicator vector of the elements where the second equality holds; in our case the early exercise points. The solver has to be adjusted to accommodate for the early exercise, and the code is given in 3.6. We and call the PSOR procedure. We demand accuracy of 106 , introduce while we allow for 100 iterations to achieve that. The snippet in listing 3.7 implements the pricing of a European and an American put and examines the results. The strike price is $1.05. To make the dierences clearer the interest rate is set to 10%. The results are given in gure 3.7; the American option prices approach the payo function for small values of the spot price, while European prices cross. Early exercise will be optimal if the spot price is below $0.90, where the American prices touch the payo function.

= max

(1 )

( )

=1

( +1)

= +1

( )

B
European vanilla options (calls and puts) are exercised on maturity, and have payos that depend on the nal value of the underlying asset. Barrier options have an extra feature: the option might not be active to maturity, depending on whether or not the barrier has been triggered. Denote the barrier level with B. The jargon for barrier options species the impact of the barrier as follows Up: there is an upper barrier, or Down: there is a lower barrier In: the contract is not activated before the barrier is triggered, or Out: if the barrier is breached the contract is cancelled

Therefore we can have eight standard combinations Up Down


9

-and-

In Out

Calls Puts

A value 0 < < 1 corresponds to under-relaxation, = 1 is the Gauss-Seidel algorithm, while 1 < < 2 corresponds to over-relaxation. In our case we want to use a value that implements over-relaxation.


L . : : PSOR method.

88(3.3)

10

15

. :

: -method solver with early exercise.


lines 3-37 of

10

15

89(3.3)

F . : European versus American option prices. The American option will reach the payo function, while the price of the European contract can cross below that level.
0.25

0.2

option value

0.15 European 0.1 American

0.05

Payo function

0 0.8

0.85

0.9

0.95

1 asset price

1.05

1.1

1.15

L put.

. :

: Implementation of PSOR for an American

lines 2-12 of

10

 

90(3.3)

Barrier options are examples of path dependent contracts, since the nal payos depend on the price path before maturity. This path dependence is considered mild, since we are not interested on the actual levels, but only on the behavior relative to the barrier, i.e. if the barrier is triggered. For example, consider an up-and-out call, where the spot price of the underlying is S0 = $85, the strike price is K = $105 and the barrier is at B = $120. This contract will pay o only if the price of the underlying remains below $120 for the life of the option. If S B at any , then the payos (and the value of the contract) become zero. One can see that we should expect this contract to have some strange behaviour when the price is around the barrier level. Contrast an up-and-in call with the same specications. For the contract to pay anything, the price has to reach at least S = $120 for some (but might drop in later times). Now suppose that an investor holds both contracts, and observe that (for any sample path) the barrier can either be triggered or not. Thus, when one is active the other one is not. Holding both of them replicates the vanilla call. Therefore, PU&O + PU&I = PCall , and PD&O + PD&I = PPut In the above examples the barrier contract was monitored continuously. For such contracts closed-form solutions exist. In practice though, barrier options are motitored discretely, that is to say one examines where the underlying spot price is with respect to the barrier at a discrete set of points. For example a barrier contract might be monitored on the closing of each Friday. Monitoring can have a substantial impact on the pricing of barrier options. For that reason numerical methods are employed to price barrier options. An up-and-out option will follow the BS PDE, where a boundary will exist at S = B (in fact ( S) = 0 for all S B). This feature can be very easily implemented in the nite dierence schemes that we discussed. In particular, the barrier will be active only on the monitoring dates, and a PDE with no barriers10 will be solved. Essentially, we can compute the updated values f +1 normally, and then impose the condition ( ) = 0 if is a monitoring date and exp( ) B. A Matlab listing that implements pricing of up- and down-and-out calls and put is given in 3.8. The snippet that calls this function is given in 3.9.

Using nite dierences is very useful when one is looking for the hedge parameters, in particular the options Delta and Gamma. Given that they are quantities that are dined as derivatives with respect to the price, one can compute them rapidly over the given grid. Some care has to be taken here, as we have implemented the discretization in log-prices. The Delta and Gamma of the option will
10

Of there will be barriers imposed at extreme values that are necessary to discretize the state space.

91(3.3) L . :

: Solver with barrier features.


10

15

20

25

30

35

40


L . : barrier option.

92(3.4)

: Implementation for a discretely monitored

lines 2-12 of

10

be equal to = exp( ) S 2 2 = = S 2 2 =

exp(2 )

The derivatives with respect to the log-price can be computed using nite dierences on the grid (in fact they have been computed already when solving the PDE). Note that, since we approximate all quantities using central dierences, the rst and last grid points will be lost. The snippet in 3.10 shows how the Greeks can be computed over a grid, while gure 3.8 gives the output. In order to make clear the eect of early exercise we use a relatively high interest rate of 10%. We also implement a relatively dense (100 100) grid over ( S) to ensure that the derivatives are acurrate. Observe that the Deltas of both options approach their minimum values of 1 in a continuous way. The Gammas, on the other hand, show dierent patterns with the American Gamma jumping to zero. Even if we use a stable FDM method, like the Crank-Nicolson, computing the greeks does not always give stable results. For example gure 3.9 presents the Greeks for the same American and European put options as 3.8, but with the time steps decreased to 10. The Delta is apparently computed with errors, which are magnied when the Gamma is numerically approximated. Note that the instability is introduced by reducing the time steps; the log-price grid is still based on 100 subintervals. In other cases explosive Greeks are an outcome of the contract specications. For instance a barrier option will exhibit Deltas that behave very erratically around the barrier, since the pricing function is not dierentiable there.

93(3.4) L . :

: PDE approximations for the Greeks.

10

15

20

25

30

 

. M

PDE

In many cases the problem in hand can only be cast in a PDE form that has more than one space dimensions. This can be the case of a derivative that depends on more than one asset, or a derivative that depends on a single that exhibits stochastic volatility, or even a derivative in a BS world that is strongly pathdependent. Typically the PDE will be still a parabolic one, with a multidimensional elliptic operator. For example in the two-dimensional case the operator on the function = ( ) will be

94(3.4)

F . : Greeks for American and European puts. A European and an American put are priced using the Crank-Nicolson method on a (100 100) grid over ( S), and the Greeks are computed using nite dierences. The Greeks for the European put are given in red and for the American put in blue
0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0.8 1 1.2 spot price 1.4 1.5 1 0.5 0 0.8 1 spot price 1.2 1.4 gamma delta 2.5 2 4.5 4 3.5 3

L =

2 2 2 + + + + + 2 2

Apparently we will need to discretise both dimensions to approximate the elliptic operator. The price function at the typical grid point will be now ( )= ( ). The single-variable derivatives pose no real problem, we just need to take some care when computing the cross derivative approximation. For example, one can use the Taylors expansion of the values 1 1 and 1 1
1 1

( ) + ( ) 2 1 2 1 2 + 2+ 2+ ( )( ) + o( 2 2 2 2 + + +1 4
)

3)

The operator D2
+1 +1 1 1 1

1 +1

will approximate the cross derivative

2 (

95(3.4)

F . : Oscillations of the Greeks in FDM. A European and an American put are priced using the Crank-Nicolson method on a (10 100) grid over ( S), and the Greeks are computed using nite dierences. The Greeks for the European put are given in red and for the American put in blue
0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0.8 1 1.2 spot price 1.4 0 -5 -10 -15 0.8 1 spot price 1.2 1.4 gamma delta 10 5 30 25 20 15

This uses four points to approximate the cross derivative, but it is not the only way to do so.11 In any case we can write the discretized operator L = D + D + D2 + D2 + D2 + If we consider an (N N )-point grid over ( ), then we can construct the matrix Q which will be (N N N N ). The prices ( ) actually form a matrix F( ) for a given , but we prefer to think of them as a vector = ( ) produced by stacking the columns of this matrix. Therefore, the price ( ) will be mapped to the ( 1)N + element of ( 1 1) ( 2 1) . . . ( N 1) = ( 1 2) . . . ( N N)
11

For example Ikonen and Toivanen (2004) give an alternative.


F . diusion.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

96(3.4)

: The structure of the Q-matrix that approximates a two-dimensional

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Matrix Q will now be a block-tridiagonal matrix, with block elements that are tridiagonal themselves. Also Q is a banded matrix, meaning that all elements can be included within a band around the main diagonal. The structure is given in gure 3.10 for an approximation that uses six points to discretize and four points to discretize . The elements give the elements that reect moves to 1, used for the derivatives with respect to ; the elements reect moves to 1, used for the derivatives with respect to ; and the elements reect moves to ( 1 1), used for the cross derivative.

F
We have managed to represent the solution of the two-dimensional BS PDE as a system of ODEs, which is very similar to the approach we took when we rst discussed the one-dimensional problem. The dierence is of course that now the size of the matrix Q is larger by one order of magnitude. Nevertheless, we can represent the problem as

97(3.4)

() = Q ( ) By using the same arguments we can once again one can construct explicit, implicit and -methods. For example, the Crank-Nicolson scheme will be of the usual form 1 1 I Q +1 = I + Q 2 2 As an example consider an option with payos that depend on two correlated assets that follow geometric Brownian motions. The BS PDE in terms of the logprices and will be ( ) = ( ) ) + ( ) )
2

1 2 2 ( + 2 2

1 2 2 ( + 2

2 (

We also assume that a set of Neumann conditions is specied at each bound ary, namely the corresponding derivatives ( resp.) being equal to 1 and N ( 1 and N resp.). To build Q we essentially consider an (N N ) tridiagonal matrix, where changes in are captured, which has elements that are (N N ) matrices where changes in are captured. We can represent in the following form where all submatrices have dimensions (N N ) DB C D E C D E .. .. .. (3.11) Q= . . . C D E C D E FD

B
The boundary conditions will have an eect on these matrices. In particular, the rst and last rows of all matrices will depend on boundary conditions on . In addition, all elements of the block matrices B and F will depend on the boundary conditions imposed on . The generic sum that Q implements is given by =
( ) 1 1

(0 ) ( 0) 1

(+ ) +1 1 (0 0)

+ +

(+ 0) +1

( +) 1 +1

(0 +)

+1

(+ +) +1 +1

98(3.4)

where the coecients are given by the following quantities (with the elements that correspond to gure 3.10 also indicated) (): (): (): ( ): ( ):
( 0) (0 ) (0 0) (+ +) ( +)

= =

( ) (+ )

= /(2 ) + 2 /(2 2) = /(2 ) + 2 /(2 2) = 2 /( 2 ) 2 /( 2 ) = + /(4 ) = /(4 )

Boundary conditions will inuence the rst and last rows of each block, as this is where the boundaries of are positioned. The whole rst and last blocks will be also aected, since this is where the boundaries of are positioned. The rst and last rows of these particular blocks will correspond to the corner boundaries. Also, the boundaries will specify a matrix of constants G, just like the vector of constants we constructed in the univariate case. For Neumann conditions the elements (1 2) and (N N 1) of each block are given by (+ ) + ( ) . Of course a similar relationship will hold for all elements of the (1 2) and (N N 1) block, which will have elements given by ( +) + ( ) . Apparently the (1 2) and (N N 1) elements of these particular blocks will be dependent on both boundary conditions, and also the boundary condition across the diagonal. The values for these elements will be given by (+ +) + ( +) + (+ ) + ( ) . The elements of the matrix G will be also determined by the Neumann ( ) conditions for = 2 N 1 and = 2 N 1. Say that ( ) = , ( ) and . ( ) = G(1 ) = 2 G( 1) = 2
( 0) (0 ) ( 1) (1 )

, ,

G(N ) = +2 G(1 N ) = +2

(+ 0) (0 +)

(N

(1 N )

The corner elements of G will be determined by both boundary conditions, as well as the boundary across the diagonal. For example, the element (1 1) will be G(1 1) = 2
( 0) (1 1)

(0 )

(1 1) ( )

(1 1)

(1 1)

The other four points have similar expressions. We will vectorize the constrains by stacking the columns of G into the vector . If we include the impact of the boundary conditions (and keep in mind that they might be time varying), the system of ODEs that will give us an approximate solution to the two-dimensional PDE is now given by () =Q ( )+ ( ) (3.12)

If the boundary conditions are homogeneous, ( ) = , then the solution of the system is

99(3.4)

( ) = exp(Q ) (0) + Q1 [exp(Q ) I]

We also cast system (3.12) in the -form, and approximate it as the solution of the updating scheme (I Q )
+1

= (I + (1 ) Q )

+1

+ (1 )

Once again if the boundaries are homogeneous in time the scheme can be written as (I Q ) +1 = (I + (1 ) Q ) + In theory solving this system does not present any dierences, but in practice it might not be feasible since Q is not tridiagonal. For that reason a number of alternating direction implicit (ADI) and local one-dimensional (LOD, also known as Soviet splitting) schemes are typically used. Such schemes do not solve over all dimensions simultaneously, but instead split each time step into substeps, and assume that over each substep the system moves across a single direction. Therefore at each substep one has to solve a system that is indeed tridiagonal.

A
To understand the ADI methods it is intuitive to write down the Crank-Nicolson system in terms of operators 1 D D D2 D2 D2
+1

= 1 + D + D + D2 + D2 + D2 +
2

where we have dened = 2 and = space. Now we can put down the approximation

, to save some 3

1 D D2

= 1 + D + D2 +

1 D2

1 + D2 +

1 D D2

+1

1 + D + D2 +

It is tedious to go through the algebra, but one can show that the approximation of the operators is at least of second order in time and both directions. Therefore the results are not expected to deteriorate due to this operator splitting. In the Peaceman and H. H. Rachford (1955) scheme we implement the following three steps, solving for auxiliary values and 3 2 1 D 3 2 1 D D 3 1 D D2
+1

= 1 + D + D2 + = 1 + D2 + 3

3 3

= 1 + D + D2 +

100(3.5)

For the Dyakonov scheme (see Marchuk, 1990; McKee, Wall, and Wilson, 1996) we use a slightly dierent splitting where at the rst step we produce the complete right-hand-side 1 D D2 3

= 1 + D + D2 +

1 + D2 + 3

1 + D + D2 + 3 2 1 D D 3 1 D2
+1

= =

In both cases the operations are implemented using matrices that can be cast in tridiagonal form by permutations of their elements. In the multidimensional PDE problems, one has to take special care when dealing with the boundary conditions, as it may be confusing. Also, some decisions have to be made on the corners, which are aected by boundary conditions on more than one dimension.

. A

We now turn into the implementation of a PDE solver in two space dimensions, using the -method. Essentially this is more of a book-keeping exercise, where we need to consider the structure of matrix Q, and especially the boundary conditions. Here we will focus on boundary conditions of the Neumann type. We will assume that the payo function returns not only the function values, but also the derivative over the boundaries, together with the derivatives at the corner points across the diagonal directions. Figure 3.11 shows the positions of these boundaries. Each horizontal slice gives the ( )-grid at a dierent time point. The colored points denote the boundary and initial values that are necessary to solve the PDE numerically. In particular the bottom slice, at = 0, gives the set of initial conditions that need to be specied to start the algorithm. In the next time periods the boundaries are illustrated. The blue points show the boundaries at = 1 and = N , while the green points show the boundaries at = 1 and = N . At the black (corner) point both boundary conditions will have an impact. Essentially these point illustrate where matrix G has potentially non-zero elements. The elements of matrix Q that are aected lie just within these points. As an example we will use a correlation option, which is essentially a European call option on the minimum price of two underlying assets. The payo function of this derivative is (S1 S2 ) = max (min(S1 S2 ) K ) We will make the assumption that both assets follow geometric Brownian motions, with correlation parameter . The pricing function will satisfy the two-

101(3.5)

F . : Schematic representation of the evolution of a two-dimensional PDE. The points where initial and boundary conditions have to be specied are also illustrated. red points: initial conditions ( = 0) active; blue points: boundary conditions on active; green points: boundary conditions on active; and black points: both boundary conditions active.

2 1 0 15 10 5 0 0 5 10 20 15

: : Payo and boundaries for a two-asset option.

10

15

102(3.6) = log S1

dimensional Black-Scholes PDE, which in terms of the log-prices and 2 = log S2 can be written as (
1 2)

+ 1

2)

+ 2
1 2 2 2)

2)

1 2 2 ( + 1 2

1 2 1

2)

1 2 2 ( + 2 2

+ 1 2

2 ( 1 2 ) 1 2

2)

=0

1 2 1 2 for 1 = 2 1 and 2 = 2 2 . As this is a European style contract, the Neumann boundaries across 1 and 2 will be such that the derivative at each one of these points is the same through time. Therefore, and since the function is piecewise linear, there is no point in explicitly computing the partial derivative at each point, since we can do that numerically. Listing 3.11 gives the Matlab code that returns the payo function and vectors of derivatives. One can verify that the derivatives are computed numerically rather explicitly in all directions. Listings 3.12-3.13 give the Matlab code that implements the -method to solve the two-dimensional BS PDE. In the rst part we setup the matrices that will serve as the blocks that make up the Q-matrix. Given the small number of nonzero elements, all matrix denitions and manipulations are done using sparse matrix commands. Matrices to corresponds to the blocks B F in equation 3.11. Matrix keeps the constraints, as discussed in section 3.4, while the reshaped (stacked) form corresponds to vector . The Matlab code that actually implements the Crank-Nicolson method to price the correlation is given in listing 3.14. Two assets are considered that exhibit dierent volatilities. The discretization grid across the two dimensions is constructed using (5151) points. The call has half year to maturity, and we use 30 time steps to compute the price. Therefore we will need to solve 30 systems of 2601 equations with 2601 unknowns to arrive to the result: a substantial computational demand.

. E
Apart from the contracts and the techniques we discussed, there is a very large number of exotic options with features that can be implemented within the PDE framework. Sometimes we will need to extend the dimensionality of the problem to accommodate for these special features. For example, in many cases a rebate is oered when the barrier it triggered. This will make sure that breaching the barrier will not leave you empty handed. It is straightforward to handle such rebates in the nite dierences procedure. Other contracts attempt to cushion the barrier eect and the discontinuities it creates. For example, in Parisian options the barrier is triggered only if the barrier remains breached for a given (cumulative) time. To solve for this option we need to introduce an extra variable, namely the cumulative time that barrier has been breached, say .

103(3.6) L . :

: Solver for a two dimensional PDE (part I).

10

15

20

25

30

35

40

45

104(3.6)

: Solver for a two dimensional PDE (part II).

50

55

60

65

70

75

80

Apparently, the derivative price will now be a function (S ). Also, will evolve as an ODE d = d if > log B and d = 0, otherwise. The price will satisfy a dierent PDE within each domain S < B : (S S B : (S = ) + S S (S ) + S S (S (S ) 1 ) + 2 S 2 2 ) +
(S SS (S

) =

(S

) )

1 ) + 2 S 2 2

SS (S

105(3.6) L . :

: Implementation of the two dimensional solver.

10

To solve for this contract we would need a grid over a 3-D region, and of course a more complex set of boundary conditions needs to be specied. Another group of problems that can be attacked using PDEs arises when single asset models with more than one factors are considered. For example one might want to price derivative contracts under the Heston (1993) stochastic volatility model, where dS( ) = S( )d + ( )S( )dB ( ) ( )dB ( )

d ( ) = [ ( )] d +

dB ( )dB ( ) = d

Here a derivative will apparently depend on the current volatility as well as the price, having a pricing function ( S ). Therefore, the PDE that will be satised by such a contract will be a two-dimensional one. Finally, in modeling xed-income or credit related securities (and their derivatives) one might need to resort to multi-factor specications, for example a corporate bond being a function of an M-dimensional state vector ( ) that has dynamics express via a stochastic dierential equation d ( ) = ( ( ))d + ( ( )) dB( )

The PDE approach can also be applied in such a setting, although as the dimensionality increases implementation become infeasible (and simulation-based methods are typically preferred).

4 Transform methods

Following the success of the Black and Scholes (1973) model on pricing and hedging derivative contracts, there has been a surge of research on models that can capture the stylized facts of asset and derivative markets. Although the BS paradigm is elegant and intuitive, it still maintains a number of assumptions that are too restrictive. In particular, the assumption of identically distributed and independent Gaussian innovations clearly contradicts empirical evidence. When developing specications that relax these assumptions, academics and practitioners alike discovered that apart from the BS case, very few models oer European option prices in closed form. Being able to rapidly compute European call and put prices is paramount, since typically a theoretical model will be calibrated on a set of prices that come from options markets. The parameter values retrieved from this calibration will be used to price and devise hedging strategies for more exotic contracts. It turned out that, in many interesting cases, even though derivative prices or the risk-neutral density cannot be explicitly computed, the characteristic function of the log-returns is tractable. Based on this quantity, researchers did indeed link the characteristic function to the European call and put price, via an application of Fourier transforms (see Heston, 1993; Bates, 1998; Madan, Carr, and Chang, 1998; Carr and Madan, 1999; Due, Pan, and Singleton, 2000; Bakshi and Madan, 2000, inter alia for dierent modeling approaches).

. T
Assume that we are interested in an economy where there exists an asset with price process S( ). We also assume that there is a risk-free asset, oering a deterministic rate of return ( ), implying a set of bond prices B( ) = exp 0 ( )d . We start our analysis with the logarithmic return over a maturity T , say X(T ) = log S(T ) . We understand that as a random variable, X(T ) will be disS(0) tributed according to a probability measure P, the true or objective probability

108(4.1)

measure. Also, we assume that there exists an equivalent probability measure Q, under which the discounted price will form a martingale B(T ) EQ S(T ) = S(0) (4.1)

This is called the risk-neutral or risk adjusted probability measure. This measure need not be unique, given the current set of bond and asset prices, unless the market is complete, but all derivative contracts will have a no-arbitrage price that is equal to their discounted expected payos under this measure. That is to say, a European call option will satisfy Pcall = B(T ) EQ max(S(T ) K 0) Under the BS assumptions Q will be unique, and X(T ) will follow a Gaussian distribution under both P and Q. Under more general assumptions this need not be the case. Since we are interested in the pricing of derivatives we are going to ignore the true probability measure from now on, and focus instead on the qualities and characteristics of the risk-neutral measure. Therefore all expectations are assumed to be under Q, unless explicitly stated otherwise.

F
One of the most important tools for solving PDEs is the Fourier transform of a function ( ). In particular, we dene as the Fourier transform of ( ) a new function ( ), such that F[ ]( ) = ( ) =
R

exp(i

) ( )d

where i = 1 is the imaginary unit. It turns out that each function denes a unique transform , and this transform is invertible: if we are given we can retrieve the original function , using the inverse Fourier transform F1 []( ) = ( ) = 1 2 exp(i
R

)( )d

There can be some confusion, as dierent disciplines dene the Fourier transform slightly dierent, setting exp(i ) the other way round, or multiplying both integrals with 1 to result in symmetric expressions. Here we use the denition 2 that Matlab implements, but one has to always verify what a computer language oers. Fourier transforms have some properties that make them invaluable tools for solutions of dierential equations, the most important being that the transform is a linear operator F[
1

2 ](

) = F[ 1 ]( ) + F[ 2 ]( )

and that the Fourier transform of a derivative is given by

109(4.1) F d ( ) ( ) = (i ) F[ ]( ) d

if all derivatives up to order decay to zero for large | |. To illustrate the point, consider the BS PDE in terms of the logarithms = log S, namely ( ) = ( ) 1 2 ( + 2 2 )
2

If we apply the Fourier transform (with respect to ) on both sides, then the left-hand-side becomes F ( ) ( )=
R

exp(i

( =

)d exp(i
R

) (

)d

while the right-hand-side yields F 1 2 + 2 2 2 ( 1 ) ( ) = (i ) + (i )2 2 ( )

Therefore, by applying the Fourier transform we actually transformed a complicated second order PDE into a simple rst order ODE which has a straightforward solution ( ) = i 1 2
2

( = (

) ) = (0 ) exp i 1 2
2

with (0

) the initial condition.

C
If is a probability density function that measures a random variable, say X( ), then its Fourier transform is called the characteristic function of the random variable. It is also convenient to represent the characteristic function as an expectation, namely ( ) = E exp(i X( )) Characteristic functions are typically covered in most statistics textbooks.1 Since functions and their Fourier transforms uniquely dene each other, the
1

A good reference for characteristic functions and their properties is Kendal and Stuart (1977, ch 4).

110(4.1)

characteristic function will have enough information to uniquely dene the probability distribution of the random variable. In particular, the inverse Fourier transform will determine the probability density function. In many cases it is tractable to solve for the characteristic function of a random variable or a process, rather than the probability density itself. A large and very exible class of processes, the Lvy processes, are in fact dened through their characteristic functions. Characteristic functions have more important properties. By taking derivatives at the origin = 0, one can retrieve successive moments of the random variable, as ( ) E [X( )] = i =0 This means that qualitative properties of the distribution, such as the volatility, skewness and kurtosis can be ascertained directly from the characteristic function. In addition, it becomes straightforward to implement calibration methods that are based on the moments. The characteristic function has the property ( ) = ( ), with the complex conjugate. Thus, the real part is an even function over , while the imaginary part is odd. This is in line with the fact that the probability density is a real valued function, since to achieve that when integrating over the real line the imaginary parts must cancel out. One can use this property to write the Fourier inversion that recovers the probability density function as ( )= 1

Re [exp(i
0

)(

)] d

The cumulative density function is of course (the function () is the indicator function) F( ) = P [X( ) ] = E [(X )] =

( )d

It is also possible to recover the cumulative density function, from the characteristic function F( )= 1 1 + 2 2
0

exp(i

)( ) exp(i )( ) d i 1 1 exp(i )( = Re 2 0 i

Computing the cumulative density as above can be very cumbersome, and the approach does not lend itself naturally to the FFT techniques that we will discuss later. One main drawback is the fact that the integrand diverges at zero, rendering the numerical integration unstable in many cases. Here we present a technique which allows us to rewrite the cumulative density as a Fourier

111(4.2)

transform that is well dened. We will use exactly the same trick in the next section, in order to compute a call option price as a single and numerically tractable Fourier transform. We introduce the damping factor > 0, and dene the dampened cumulative probability as F ( ) = exp( )P [X( ) ] It is possible to derive the characteristic function of this function, say ( as follows ( )=
R

exp(i

)F (

)d =
R

exp(i =

) exp( )P [X( ) exp(i


R

]d )d d

) (

The order of integration can be reversed as follows (details on how exactly this is carried out can be found in the next section where the same approach is implemented in an option pricing framework) (

)=
R

exp((i ) ) ( =
R

)d d ( )d = 1 ( + i ) i

exp (i ) i

We can therefore compute the cumulative probability by un-damping this characteristic function, in eect computing F( ) = exp( )F ( ) = exp( )
R

exp(i

)(

)d

The choice of it important, as it will determine how accurate numerical implementations will be. A small value for will eliminate the singularity theoretically, but it might not reduce its impact around zero suciently for numerical purposes. If is too large, then the characteristic function can be pushed towards zero, which will not allow us to accurately reconstruct its shape and integrate it with precision. Typically, a value of in the region of 1 to 5 gives satisfactory results. In gure 4.1 one can see the impact of the damping parameter on the function to be integrated. As we move from 0 to = 1 the function becomes progressively better behaved, and therefore easier to numerically integrate. But if we keep damping we run the risk of pushing the whole integrand close to zero, as it is illustrated when we set = 10.

. O
It is also possible to recover European option prices from the characteristic function of the log-return. We assume that under the risk neutral measure, the logarithm of the price satises

112(4.2)

F . : Damping the Fourier transform to avoid the singularity at the origin. The integrand for the normal inverse Gaussian distribution with parameters { } = {8% 7 00 3 50 0 25} is presented for dierent values of the damping parameter . The real (imaginary) part is given in blue (green). The dashed thick line gives the integrand for = 0 01 0, which diverges at zero. The solid thick line presents the integrand for = 1, while the solid thin line assumes = 10. Two dierent horizons of one day and one month are presented, to illustrate the change in the tail behavior as the maturity is decreased.
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 6 7 8 9 10 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

(a)

= 1/365

(b)

= 30/365

log S(T ) = log S(0) + X(T )

(4.2)

We specify that this relationship holds under the risk neutral measure because it is most likely that the market we are working on is incomplete. If this is the case, then we are not able to specify the no-arbitrage prices of options solely on the information embedded in (4.2) that is specied under P. We need a change of measure technique,2 as well as a number of preferences parameters that will allow us to determine the equivalent measure Q. in order to sidestep these issues we can assume that the process in (4.2) is dened under Q. The only constraints that must be imposed is that the expectation B(T ) ES(T ) = S(0) = E exp X(T ) = 1 B(T )

If we assume that the characteristic function (T ) of log S(T ) is given to us, then the above constraint can be expressed as a constraint on the characteristic function, that is to say 1 (T i) = B(T ) There have been two methods that compute European calls and puts through the characteristic function. Following the seminal work of Bakshi and Madan
2

For example Girsanovs theorem (ksendal, 2003), or the Esscher transform (Gerber and Shiu, 1994), can be used to dene equivalent martingale measures.

113(4.2)

(2000) the call/put option price is expressed in a form that resembles the BlackScholes formula, for example Pcall = S(0)1 K B(T )2 where the quantities 1 and 2 depend on the particular characteristic function. This approach oers the same intuition as the Black-Scholes formula, where 1 is the option delta, and 2 is the risk neutral probability of exercise. Although the above expression is elegant and intuitive, it does not lend itself for numerical implementations. More recently, Carr and Madan (1999) develop the Fourier transform of the (modied) European option price directly, by expressing (with = log K the log-strike) Pcall = exp( ) F1 [(T ; )]( ) ), and is an

where ( ) is a function of the characteristic function ( auxiliary dampening parameter.

-P

The Delta-Probability decomposition of the European price has its roots in the work of Heston (1993) on stochastic volatility, although in Hestons original paper the decomposition is not proved in its generality. Bakshi and Madan (2000) provide a general approach where derivative payos are spanned using trigonometric functions. Here we will provide a heuristic proof for the special case of a European call option (see also Heston and Nandi, 2000, for details) Assuming that the probability density function under the risk neutral measure Q for the time log-price is ( ), we can write the European call option price as the expected value of its payos, as Pcall = B(T ) E max(S(T ) K 0)

= B(T ) = B(T )

log K

(exp( ) K ) (T exp( ) (

)d

log K

)d B(T )K

(T
log K

)d

The second integral is just the probability P[log S(T ) log K ], and since ) is the characteristic function of log S(T ) this will be equal to

2 =
log K

(T

)d =

1 1 + 2

Re
0

exp(i i

)(T

To compute the second integral we use the trick of multiplying and dividing the expression as follows

exp( ) (T
log K

)d =

log K exp( exp(

) (T ) (T

)d )d

exp( ) (T

)d

(4.3)

114(4.2)

S(0) Note that the quantity exp( ) (T )d = B(T ) due to the risk-neutrality restriction (4.1). Also, the fraction in the above expression is by construction between zero and one, therefore it can be interpreted as some probability. In particular, if we dene

(T

)=

exp( ) (T ) exp( ) (T )d
log K

then the fraction in (4.3) can be expressed as form of (T ) is given by (T )=


R

(T

)d . The Fourier trans-

exp(i

) (T

)d =

(T i) (T i)

We can now dene the quantity

1 =
log K

(T

)d =

1 1 + 2

Re
0

exp(i )(T i) d i (T i)

Putting everything together will yield the European call option price, which has the same structure as the Black-Scholes formula, where instead of the cumulative normal values we have 1 and 2 . To summarize Pcall = S(0)1 K B(T )2 where 1 1 exp(i )(T i) 1 = + Re d 2 0 i (T i) 1 1 exp(i )(T ) 2 = + Re d 2 0 i

The Delta-probability decomposition gives an intuitive expression for the value of the European call, but is not very ecient operationally since the integrals required are not dened at the point = 0. More recently, Carr and Madan (1999) developed the characteristic function of a modied price itself. In particular, if we introduce a parameter , we can dene the modied call price as3 Pcall ( ) = exp( )Pcall ( ) which we consider a function of the log-strike price = log K . We assume that the maturity of the option is xed at T . The Fourier transform of the modied call, say ( ) = F Pcall ( ), is given by
3

We need the parameter to modify the original call price, since the original call price is not square integrable.

115(4.2)

F . : Finite dierence approximation schemes. The forward (green), backward (blue) and central (red) dierences approximation schemes, together with the true derivative (dashed).
1

0.5

( +) ( +) or ( +) ( )

-0.5

-1 -1

-0.5

0 k

0.5

(T

) = B(T )
R

exp(i =
R

)Pcall ( )d

exp(i

) exp( )

(exp( ) exp( )) (T

)d

We will change the order of integration, and therefore the integration limits will change from ( ) ( +) ( +) to ( ) ( +) ( ), as shown in gure 4.2. Then (T ) = B(T )
R

exp(i

+ + ) (T B(T )
R

)d d + + ) (T )d d

exp(i

Now since = 0 both inner integrals will vanish at (precisely the reason we introduced this parameter), and the Fourier transform becomes (T ) = B(T )
R

1 1 (T i( + 1))d i + i ++1 B(T ) = (T i( + 1)) (i + )(i + + 1)

This is a closed form expression of the Fourier transform of the modied call price, in terms of the characteristic function of the log-price. Therefore, to retrieve


L . :

116(4.3)

: Characteristic function of the normal distribution.

the original call price we just need to apply the inverse Fourier transform on (T ) Pcall ( ) = F 1 [] ( ) = exp( ) 2 exp(i
R

) (T

)d

Option prices are of course real numbers, and that implies that the Fourier transform (T ) must have odd imaginary and even real parts. Therefore we can simplify the pricing formula to Pcall ( ) = exp( )

Re[exp(i
0

) (T

)]d

(4.4)

The choice of the parameter determines how fast the integrand approaches zero. Admissible values for are the ones for which | (T 0)| < , which in turn implies that |E[S(T )]+1 | < , or equivalently that the ( + 1)-th moment exists and is nite. For more information for the choice of see Carr and Madan (1999) and Lee (2004b).

. A

In the following subsections we will give examples of transform methods that are based on the normal and the normal inverse Gaussian (NIG) distribution (see for example Barndor-Nielsen, 1998, for details).

T
We want to set up a model under the risk neutral measure Q of the form S(T ) = S(0) exp{ T + X(T )} where X(T ) is a random variable with a given characteristic function. If we assume that the interest rate is constant, then under Q E [exp{ T + X(T )}|F (0)] = exp{ T } = 1 log E [exp{X(T )}|F (0)] T

117(4.3) L . : distribution.

: Characteristic function of the normal inverse Gaussian

10

The expectation can be cast in terms of the characteristic function of X(T ), giving the constraint 1 = log (T i) T This constraint will ensure that the under risk-neutrality the asset will grow at the same rate as the risk free asset. The characteristic function for the normal distribution, implemented in listing 4.1, is given by 1 ( ) = exp i 2 2 2 for = 1 2 . The characteristic function of the NIG distribution is given in 2 listing 4.2 ( ) = exp i + = 2 2 2 2 + 2 ( + i )2 2 ( + 1)2 .

In this case the parameter

Say that we are interested in inverting the characteristic function to produce the probability density function or to compute option prices. To do so we need the value, at the point , of an integral of the form

)=
0

exp(i

) ( )d

The integral will be approximated with a quadrature, and here we will use the trapezoidal rule.

118(4.3)

. : Numerical Fourier inversion using quadrature. The integral Re [exp(i )(T )] d is approximated where (T ) is the characteristic function of the normal distribution, with = 8%, = 25% and T = 30/365. The upper integration bound is = 50. Results for = 5% and = 15%, as well as = 10 and = 5 are given.
0
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4

10

20

30

40

50

10

20

30

40

50

(a)
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4

= 5%, = 10
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4

(b)

= 5%, = 5

10

20

30

40

50

10

20

30

40

50

(c)

= 15%, = 10

(d)

= 15%, = 5

In particular, we start with a truncating the interval [0 ), over which the characteristic function is integrated. We select a point that is large enough for the contribution of the integral after this point to be negligible. Then we discretize the interval [0 ] into N 1 subintervals with spacing , that is we produce the points = { = : = 0 N}. For a given maturity T we denote the integrand with ( ) = exp(i ) ( ), and produce the values at the grid points ( ) = ( ). Then, the trapezoidal approximation to the integral is given by

(
0

)d

N =0

( )

1 2

0(

)+

N(

Therefore, in order to carry out the numerical integration, one has to make two ad hoc choices, namely the upper integration bound and the grid spacing . Selecting can be guided by the speed of decay of the characteristic

119(4.3) L

5

. :

: Trapezoidal integration of a characteristic function.

10

15

function. A good choice of on the other hand can be a little bit trickier, since the quantity exp{i } = cos( ) + i sin( ) will be oscillatory with frequency that increases with . Figure 4.3 gives the quadrature approximations for the case of the normal distribution. The characteristic function corresponds to a density with mean = 8% p.a. and volatility = 20% p.a., and the maturity is one month, T = 30/365. We have set the upper quadrature bound to = 50, and use two dierent grid sizes, a coarse one = 10 and a ne one = 5. We investigate the integration for = 5% and for = 15%. In the rst case the integrand is not oscillatory and even the coarse approximation captures the integration fairly accurately. When = 15% the integrand oscillates and a ner grid is required. This example illustrates that one must be careful and cautious when setting up numerical integration procedures that automatically select the values for and . In order to reconstruct the probability density function we need to repeat the procedure outlined above for dierent values of . This is carried out in listing 4.3 for the normal distribution. The results are plotted in gure 4.4 in logarithmic scale for dierent values of and . One can verify that if we are interested in the central part of the distribution, then a coarse grid with is sucient while the results are not particular sensitive to the choice of the upper integration bound. One the other hand, if we want to compute the probability density at the tails, then we need to implement a very ne grid over a large support interval. There is a distinct and very important relationship between the fat tails and the decay of the characteristic function. In particular, the higher the kurtosis of the distribution, the slower the characteristic function decays towards zero as increases. This has some implications on the implementation of the nu-

120(4.3)

F . : Probability density function using Fourier inversion. Logarithmic 1 plots of the integral (T ) = 0 Re [exp(i )(T )] d , where is the normal characteristic function, together with the true normal density. The parameters set is { T } = {8% 25% 30/365}. Results for = 50 and = 100, as well as = 10 and = 1 are given.
1 1

7 -0.5

0.5

7 -0.5

0.5

(a) = 50%, = 10
1 1

(b) = 100%, = 10

7 -0.5

0.5

7 -0.5

0.5

(c) = 50%, = 1

(d) = 100%, = 1

merical integration, as we must be ready to integrate over a very long support. On the other hand, the oscillations introduced by the complex exponential do not depend on the characteristic function itself, therefore we will need a ne grid to accurately compute the density around the tails. This indicates that we will potentially have to carry out numerical quadratures with many hundreds of thousands of grid points. This is illustrated with the NIG distribution. We consider a parameter set { T } = {8% 30/365 7 00 3 50 0 25}. Taking derivatives of the characteristic function gives the volatility, skewness and excess kurtosis, which are 23 4%, 1 21 and 3 96 respectively. Therefore the distribution exhibits negative skewness and excess kurtosis of a magnitude that is observed in asset returns. We want to investigate the stability of the Fourier inversion, and gure 4.5 gives the results for dierent combinations of and . Here we can clearly see the eect of dierent choices of these parameters. As we increase the integration interval the oscillations in the probability function are reduced, but the function

121(4.4)

F . : Probability density function using Fourier inversion. Logarith1 mic plots of the integral (T ) = 0 Re [exp(i )(T )] d , where is the normal inverse Gaussian characteristic function. The parameters set is { T } = {8% 7 00 3 50 0 25 30/365}. Results for = 10 and = 100 (green), = 200 (blue) and = 400 (red). The density for = 1 and = 400 (black) is also given.
2

3 -0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

can still be inaccurate around the tails. We need to reduce the grid size to increase the overall accuracy. Observe that the right tail is slightly oscillatory even when { } = {1 400}.

. A

In the previous section we implemented a numerical integration method that approximates an integral of the form

( )=
0

exp(i

) ( )d

This integral can then be used to retrieve the probability density function at the point , or a European call option price with log-strike price . Typically, we want to compute the integral for many dierent values of the parameter , in order to reconstruct the probability density function or the implied volatility smile. Using the approach we outline above, we must perform as many numerical integrations as the the number of abscissas over .

122(4.4)

The Fast Fourier Transform (FFT) is a numerical procedure that simultaneously computes N sums of the form
N

=
=1

exp

2 ( 1)( 1) N

(4.5)

for all = 1 N. The number of operations needed for the FFT is of order o(N log N). For comparison, if we wanted to compute the above sums separately and independently it would take o(N 2 ) operations, meaning that in order to double the number of points the number of operations will increase fourfold. With the FFT the computational burden increases a bit more than two times. This substantial speedup is the reason that has made FFT popular in computational nance, since we typically we need to evaluate thousands of Fourier inversions when calibrating models to observed volatility surfaces. The input of the FFT is a vector CN , and the output is a new vector CN . Each element of will keep the sum for the corresponding value of . Our task is therefore to cast the integral approximation in a form that can be computed using FFT. The rst step is of course to discretize the interval [0 ] using N equidistant points, and say that we set = {( 1) : = 1 N}. Therefore the trapezoidal approximation to the integral is given by 1 exp(i 2
1

) ( 1 ) + exp(i + exp(i

) ( 2 ) + exp(i ) ( and )
N1 )

) ( 3 ) +
N

N1 1 2

1 exp(i 2

) (

N )

Thus, if we set = imation as the sum

1 2

1 1
N =1

= ( ), we can write the approx

exp(i

Since the FFT will also return an (N 1) vector, we should set the procedure to produce values for a set = { 1 +( 1) : = 1 N}. We typically want these values to be symmetric around zero,4 and therefore we can set 1 = N . 2 The approximating sum for these values of will therefore be given by
4

When we invert to construct a probability density we typically interested in the density at log-returns symmetric around the peak which will be close to zero. If we invert for option pricing purposes, we can normalize the current price to one. Then each value of will correspond to a log-strike price, and we typically want to retrieve option prices which are in-, at- and in-the-money. The at-the-money level will be around the current log-price which is of course zero.

123(4.4) =
N =1

exp(i
N =1

exp i( 1)( 1)

exp i( 1) 1
N =1

exp i( 1)( 1)

where = exp i 1 . The sum above will be of the FFT form (4.5) only if = 2 . This set a constraint on the relationship between the N characteristic function input grid size, and the output log-return or log-strike grid size. This completes the integral approximation; it is now straightforward to invert for the probability density function or for call prices.

FFT
To summarize, in order to invert the characteristic function we need to take the following steps (with we denote element-by-element vector multiplication): 1. Input the grid sizes and , as well as the number of integration points N. Make sure that they satisfy = 2 . N 2. Construct the vectors = {( 1) : = 1 N} and = { N + 2 ( 1) : = 1 N}. 3. Compute the vector = exp(i 1 ) (T ). N 4. For the trapezoidal rule set 1 = 21 and N = 2 . 5. Run the FFT on , that is = FFT( ). 1 6. Compute the density function values = Re[ ]. 7. Output the pair ( ): the value is the probability density for the logreturn , for = 1 N.

FFT

The inversion of the characteristic function to compute options is very similar. We just need to also compute the Fourier transform of the modied call before invoking the FFT. The steps are the following (with we denote element-byelement vector multiplication and with element-by-element division): 1. Input the grid sizes and , as well as the number of integration points N. Make sure that they satisfy = 2 . Also input the dampening N parameter for the modied call . 2. Construct the vectors = {( 1) : = 1 N} and = { N + 2 ( 1) : = 1 N}. 3. Construct the Fourier transform of the modied call = exp( T ) T i( + 1) (i + ) (i + + 1)


. :

124(4.5)

: Call pricing using the FFT.

10

15

4. 5. 6. 7. 8.

Compute the vector = exp(i 1 ) . N For the trapezoidal rule set 1 = 21 and N = 2 . Run the FFT on , that is = FFT( ). 1 Compute option values = exp( ) Re[ ]. Output the pair ( ): the value is call option that corresponds to an option with log-strike price , for = 1 N.

The inversion of the Fourier transform for the modied call is implemented in listing 4.4. One needs to specify the corresponding characteristic function, for example  in order to retrieve a set of log-strike prices and the corresponding call prices. The parameters for the characteristic function are passed through the structure , while parameters for the FFT inversion are passed through .

. T

FFT

The restriction = 2 that has to be satised when applying the FFT N hampers the exibility of the method. One naturally wants a ne grid when integrating over the characteristic function, but a small can result in very coarse output grids. For example, a 512 point integration over the interval [0 100] would oer a good approximation to invert for the normal distribution of gure 4.4. This implies = 0 1957, and in order for the FFT to be applied we have 2 to set = N = 0 0627, with 1 = 16 05%. This means that only a very

125(4.5) L

. : : Fractional FFT.

10

small number of the 512 output values are actually within the 30% which we might be interested in. One way that can result in smaller output grids is increasing the FFT size N. We have chosen the upper integration bound in a way that the characteristic function is virtually zero outside the interval. Therefore, when we increase N we just pad with zeros the input vector . For example, if we append the 512 vector with 7680 zeros we will implement a 8192-point FFT, which will return a more acceptable output grid of 0 0039. But of course applying an FFT which is 16 times longer will have a serious impact on the speed of the method. The fractional FFT method, outlined in Chourdakis (2005), addresses this issue. The fractional FFT (FRFT) with parameter will compute the more general expression
N

=
=1

exp 2( 1)( 1)

(4.6)

for all = 1 N. In order to implement a N-point FRFT one needs to invoke three times a standard 2N-point FFT,5 but the freedom of selecting and independently can actually improve the speed for a given degree of accuracy. In particular, the following steps implement an N-point FRFT, coded in listing 4.5 1. Create the (N 1) vectors and 1 = exp i( 1)2 : = 1 2 = exp i(N + 1)
2

: =1

N
1

2. Based on these auxiliary vectors create the two (2N 1) vectors


5

and

For proofs and discussion on the FRFT also see Bailey and Swarztrauber (1991, 1994).


. : : Call pricing using the FRFT.

126(4.6)

10

15

1 0

and

1 1 2

3. Apply the FFTs on these vectors


1

= FFT( 1 ) and

= FFT( 2 )

4. The N-point FRFT will be the rst N elements of the inverse FFT

= 1 IFFT(

2)

We can now easily adapt the recipes of the previous section to accommodate the fractional FFT. We can now choose the two grid sizes freely, and set the fractional parameter = 2 . Thus, we need to change the corresponding steps of the recipes to: Run the fractional FFT on FRFT( 2 ). with fractional parameter
2

, that is

Listing 4.6 implement the fractional FFT based option pricing. Chourdakis (2005) gives details on the accuracy of this method for option pricing based on a number of experiments that compares the fractional to the standard FFT. Figure 4.6 gives an example that is based on the normal distribution. One can observe the exceptional accuracy of both methods: a 8192-point FFT is contrasted to a 512-point FRFT.

127(4.6)

F . : Comparison of the FFT and the fractional FFT, based on the BlackScholes model. The gure shows the errors between option prices computed using the transform methods and their closed form values, for dierent strike prices. The blue line gives the errors of the standard FFT method, while the red line gives the errors of the fractional FFT. All values are 1015 .
1015 2

-1 0.8

0.85

0.9

0.95

1.05

1.1

1.15

1.2

1.25

. A

FFT

There is a number of ways in which adaptive integration can be employed within the fractional FFT framework. In particular, as the fractional FFT allows us to integrate over an arbitrary region, it is natural to consider splitting the support of the characteristic function into subintervals and apply the transform sequentially. Also, we might consider improving the accuracy of each integration segment. In the previous sections we worked with the trapezoidal rule, essentially approximating the function
N 1

exp(i

) ( )d

exp(i
=1

1 1 for = ( ), and = 2 1 1 1 2 . What we have done is approximating the whole integrand as a piecewise linear function. This integrand is the product of two terms: the rst, exp(i ), is a combination of trigonometric functions,

128(4.7)

and will be highly oscillatory, especially for large values of | |; the second, ( ), is also oscillatory but typically very mildly and also independent of . It therefore makes sense to approximate only the second component as a piecewise linear function, and leave the rst part intact. We therefore split the integral into N 1 sub-integrals
N 1

N1

+1

exp(i

) ( )d =
=1

exp(i

) ( )d

We then use the linear approximation within each subinterval ( ) +( )


+1 +1

for

+1

Thus, each subintegral can be computed as


+1

exp(i

) =

+ i
+1

d = exp(i exp(i
+1

+ i exp(i
2

+1

(i )2

) +

exp(i

+1

) exp(i

Luckily, the rst square brackets will cancel out sequentially, as we sum over the sub-integrals, which will give us the result after some straightforward algebra
N 1

exp(i

) ( )d i [ exp(i ) exp(i )] + 1
2 =1 N

exp(i

with = (0 1 1 2 2 3 N2 N1 N1 0). Recall that = +1 . The sum in the above expression can now be computed using the fractional FFT procedure. It might seem initially that the above expression will diverge as 0. This is not the case. In fact, by twice applying lHpitals rule as needed, one can show that the expression converges to the trapezoidal rule we obtained in the previous section. Listing 4.7 shows an implementation of this integration using the fractional FFT. This method is directly implemented in listing 4.8, which shows how an adaptive integration technique can be used to invert the characteristic function and recover the cumulative density.

129(4.7) L

. :

: Integration over an integral using the FRFT.

10

15

. S
To summarize, for large classes of models closed form solutions even for European style options are not available but their characteristic function is available in closed form. For example, models where the logarithmic price is Lvy (Madan et al., 1998; Carr, Geman, Madan, and Yor, 2002), Garch models (Heston and Nandi, 2000), ane models (Heston, 1993; Due et al., 2000; Bates, 2000, 1998), regime switching models (Chourdakis, 2002) or stochastic volatility Lvy models (Carr, Geman, Madan, and Yor, 2003; Carr and Wu, 2004) fall within this category. Fourier transform methods can be applied to recover numerically European call and put prices from the Fourier transform of the modied call. Therefore such models can be rapidly calibrated to a set of observed options contracts, as we will investigate in the next chapter on volatility. The FFT method or its fractional variant are well suited to perform this inversion. Also, one can use these methods to invert the characteristic function itself, thus recovering numerically the probability density function. This can in turn be used to set up numerical procedures for pricing American style or other exotic contracts, for example as in Andricopoulos, Widdicks, Duck, and Newton (2003).

130(4.7)

L . : density function.

: Transform a characteristic function into a cumulative

10

15

20

25

30

35

40

45

5 Maximum likelihood estimation

It is typical in many, if not all, nancial application to face models that depend on one or more parameter values, which have to be somehow determined. For example, if we are making the assumption that the stock price we are investigating follows a homogeneous geometric Brownian motion, then we would be interested in estimating the expected return and the corresponding volatility. Then we could produce forecasts, option prices, condence intervals and risk measures for an investment on this asset. At this point we must remind ourselves that not all of the above operations are carried out under the same measure. This fact will largely determine which data will be appropriate to facilitate a calibration method. Some parameters, such as the drift in the Black-Scholes framework, are not the same under the objective and the pricing measure, while some others, such as the volatility, are. In particular, if our ultimate goal is pricing, we must place ourselves under the pricing measure and use instruments that are also determined under the same measure. In this way the prices that we produce will be consistent with the prices that we use as inputs, and we will not leave any room for arbitrage. The dynamics recovered under this data set will not be the real dynamics of the underlying asset: instead, they will be consistent with the attitude of investors against risk, and thus modied accordingly. In general, drifts will be lower, volatilities will be higher, and jumps will be more frequent and more severe. When pricing assets, investors behave as if this is the, precisely because these are the scenarios that they dislike. On the other hand, if our goal is forecasting or risk management, we are interested in the real asset dynamics. We do not want the parameters to be contaminated by risk aversion, and the appropriate data in this case would be actual asset prices. Based on the real historical movements of assets we will base our forecasts for their future behaviour. Nevertheless, there are situations where we might want (or have to) use both probability measures jointly. As derivative prices are forward looking we might want to augment our information set with their prices, in order to produce more accurate forecasts. From an academic point of view, since the distance between

132(5.1)

F . : Examples of density and likelihood functions. A sample of size N is drawn from the N(1 00 2 00) distribution, presented with the green points on the horizontal axis. Three densities are also presented, together with the corresponding sample values. The blue curve gives the true N(1 00 2 00) which gives a log-likelihood L(N = 10) = 19 37 and L(N = 50) = 101 54; the red curve gives the curve for N(0 00 4 00) which far from the true density and has a low log-likelihood L(N = 10) = 23 60 and L(N = 50) = 121 55; nally the black curve is the one that maximizes the likelihood, N(0 08 1 32) with L = 17 00 for N = 10, and N(0 83 1 82) with L = 100 98 for N = 50. [code: ]
0.3 0.3

0.2

0.2

0.1

0.1

0 -8

-6

-4

-2

0 -8

-6

-4

-2

(a) N = 10

(b) N = 50

the two probability measures depend on the risk premiums, we might want to identify these premiums for dierent risk components. For instance, we might want to quantify the price of volatility risk versus the price of jump risk. Finally, in some situations we do not observe the underlying asset directly. This is the case in xed income markets, where we can attempt to identify the true dynamics using time series of bonds which are evaluated under the pricing measure. In this chapter we will focus on the case where calibration is carried out using a time series of historical values. There is a plethora of methods available, but we will focus on the most popular one, the maximum likelihood estimation (MLE) technique. We will not focus on deriving the properties of MLE, but will rather refer to Davidson and MacKinnon (1985) and Hamilton (1994). These books also give a detailed analysis of variants of MLE, as well as alternative methods of moments. For an introduction to Bayesian techniques, a good starting point is Zellner (1995).

. T
Suppose that we have in our disposal a time series of observations, say = { 1 T }, and a model which we assume has produced these observations. We

133(5.1)

will denote with large X the random variables that are produced by the model,1 and with small the realizations that make up our sample. We will collect all parameters of this model in a (K 1) vector . Our objective is twofold: we want to nd an estimator of which is based on our data set, but we also want to produce some condence intervals on , acknowledging the fact that our data set is nite and thus our produced estimators are not equal to the true parameters of the data generating process. The likelihood function is a measure of t that will allow us to fulll these objectives. To implement the likelihood function we scan through the sample, and pretend that we are standing at each point in time. Suppose that we are currently at the -th observation, with 1 T . Given a value of the parameter set, , we produce the conditional density X +1 ( | ), which we abbreviate with 1 +1| ( ). Notice that we only use the information that was available at time . Essentially we are asking the question: when we were at time in the past, how would our forecast for time + 1 look like? Then we go to the next time period + 1 and see how good our forecasting density was: if we forecasted rather well, then the value of the density +1| ( +1 ) will be high; if our forecasting density was poor, then +1| ( +1 ) would be close to zero. The value +1| ( +1 ) is the likelihood of the point +1 , seen as a function of the parameter vector. To make this more explicit we introduce the notation +1| ( ; ) for this likelihood. In order to construct the likelihood of the sample, 1 we take the product T=1 +1| ( ; ). The maximum likelihood estimator will be the one that maximizes the sample likelihood. Since this maximum will be the same under an increasing transformation, and for some other properties that we will discuss shortly, we typically work with the log-likelihood of the sample
T 1

L( ; ) =
=1

log (

+1 |

To select the maximum log-likelihood we need to set the rst order conditions, namely that L( ; ) = 0 The second order conditions will dictate that for the likelihood to be actually maximized, the K K Hessian matrix H= 2

L( ; ) is negative denite

The maximization of the log-likelihood function can be carried out analytically in some special cases, but we typically employ some algorithm to produce
1

To be more precise, X contains the random variables that are conditional on their history. That is to say, the random variable X is conditional on the realizations of all values that preceded it, namely {X 1 X 2 X1 }.

134(5.2)

numerically. The choice of the appropriate algorithm will depend on the nature of the likelihood function: if it is relatively well behaved, then a standard hill climbing algorithm will be sucient. In more complex cases, where the likelihood exhibits local maxima or is even undened for specic parameter sets, one needs to resort to other techniques such as genetic algorithms or other simulation based methods. Figure 5.1 illustrates the intuition behind the likelihood function. Samples are drawn from the blue distribution (for simplicity we assume that the sample elements are independent and identically distributed) of lengths N = 10 and N = 50. To compute the corresponding likelihood values, one has to compute the density value at the sample points as shown. The red curves give a density that is far away from the true one, and we can see that overall the function values are lower. We numerically maximize the log-likelihood and estimate the density that has produced the data, which is given in black. When the sample is small, the estimated density is not close to the true data generating process, but it will converge as the sample size increases.

. P

ML

Maximum likelihood estimators share some very appealing asymptotic properties. Asymptotic in this context means that these properties hold at the limit, when the sample size approaches innity. Therefore, one would tend to consider them more valid for large data samples. Unfortunately, how large a large sample should be is not set in stone, and depends on data generating process. For that reason it is always to verify the validity of any claims that are based on these properties via a small simulation experiment. Here we will go through some fundamental properties and will see how they can used to make inference on the quality of the estimators. We make the fundamental assumption that our model is correctly specied. This means that there is a data generating process which we guess correctly up to the parameter values. If our model is misspecied all properties go out of the window, even asymptotically. By using so-called bootstrap techniques we can take some steps towards testing our hypotheses and constructing condence intervals, while taking into account possible misspecication. We will denote the true value of the parameter set with . This is the set that has actually generated the series we observe. For us though, this is a random variable as we do not observe it directly. In fact, it is the qualities of this random variable that we intend to quantify.

T
As we said, we produce the maximum likelihood estimator by setting L( ; )/ = 0. This derivative is also called the score function, and the rst order condition corresponds to an important property of the score, namely that its expectation,

135(5.2) at the true parameter set is zero E L(

; X)

=0

Note that the random variable in the above expectation is the data sample X . For IID processes, maximum likelihood estimation can be viewed as setting the empirical expectation of this score to zero. In the same light we dene the (Fisher) information matrix as minus the expectation of the the second derivative of the log-likelihood, evaluated again at the true parameter point I ( ) = E 2 L( ; X )

As before, the Hessian matrix produces an estimate of the information matrix which is based on the sample. The information matrix will be by construction positive denite, and therefore invertible. It turns out that we can also say something on the covariance matrix of the score. In fact, it will be equal to the information matrix L( ; X ) V =E L( ; X )
2

= I ()

What is the correct way to view these expectations and variances? Say that we knew the true parameter set, and we constructed a zillion sample paths based on these parameters, each one of length T . If we compute the score vector based on each one of these samples, we would nd that the average of each element is zero and that the covariance matrix is given by the information matrix. The information matrix plays another important role, as its positive denitiveness is a necessary condition for all other asymptotic properties to carry through.

C
Lets say that we have a method of producing estimators, not necessarily maximum likelihood. If we produce dierent samples with the true parameter set, we will obviously end up with a dierent estimated value each time, and let us denote with ( ) the estimated parameter set that is generated by the sample . Of course we cannot carry this experiment out, since we do not know the true parameter set, but we can pose the question: if we produce a zillion alternative samples, do we expect the average of their estimators to be equal to the true one? An estimation method is called unbiased if this is true, namely that E (X) =

The maximum likelihood estimator is not generally unbiased, and this apparently is not a good thing. But the maximum likelihood estimator is consistent,

136(5.3)

which means that as the sample size increases the bias drops to zero. Furthermore, the variance of the estimators distribution also drops to zero, indicating that the maximum likelihood estimator will converge to the true value as the sample size increases, or more formally that plim (X ) =

as the sample size increases

It also turns out that the distribution of the MLE is Gaussian, with covariance matrix equal to the inverse of the Fisher information matrix evaluated at the true parameter value. We can therefore write (X ) N

I(

Furthermore, the variance I ( )1 of the MLE is equal to the so called CramrRao lower bound, which states that no other unbiased estimator will have smaller variance than the MLE. This also makes the maximum likelihood estimator asymptotically ecient. In practice we do not know the value of I ( ) and use an estimate instead, for example one based on the Hessian of the log-likelihood.

H
For large samples we can utilize the eciency and asymptotic normality of the maximum likelihood estimator, and produce condence intervals that are based on the normal distribution. In particular, based on the sample we can test the hypothesis H0 : = against the alternative that = . Under the null the maximum likelihood estimates will be asymptotically distributed as (X ) N

I(

We are therefore naturally led to the statistic Z( ) = ( ) I(

)1

which is distributed as a standardized normal. We would reject the null hypothesis at, say, the 95% condence level if |Z ( )| > 1 96.

. S
L ARMA
Suppose that the data generating process has autoregressive and moving average terms, and for simplicity assume that both eects are of the rst order. Then, we can write the process as

137(5.3) L . : , and mum likelihood estimation of ARMA models

: Simulation and maxi-

10

15

20

25

30

35

138(5.3)

F . : Bias and asymptotic normality of maximum likelihood estimators. Sample paths of an ARMA(1,1) model = + 1 + + 1 are simulated, and the parameters are subsequently estimated. 1,000 short samples (T = 50) and 1,000 longer samples (T = 500) were generated. The graphs give the distribution of the estimators for the short sample size in green, and for the longer sample size in blue. The true parameter values are in red. The table presents some summary statistics that correspond to the distributions. [code: ]
PSfr 35
10 9 8

30

25

7 6 5

20

15

4 3 2

10

5
1

0 -0.2

-0.15

-0.1

(a) constant ( )

-0.05

0.05

0.1

0.15

0.2

0 -0.2

(b) AR parameter ()

0.2

0.4

0.6

0.8

50 45

6 40 5 35 30 25 3 20 15 10 1 5 0 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4

(c) MA parameter () T True Mean 50 500 Volatility 50 500 Skewness 50 500 Kurtosis 50 500

(d) volatility ( ) 030 027 030 0030 0010 0076 0025 29 30

000 0002 0000 0056 0012 0001 012 81 35

075 064 074 020 0044 18 034 88 31

010 0013 0093 024 0063 057 0067 47 30

139(5.3) = +
1

+ + 1 , N(0 2 )

Since our sample is nite, we have to make an extra assumption on the values just before the starting date, namely 0 and 0 . We will set them equal to their expected values, that is to say 0 = 1 and 0 = 0. The parameter set in this case is = { }. Each observation is Gaussian conditional on its past, and in particular |
1 0

N( +

+ 1 2 )

Therefore we can compute the log-likelihood function of the sample as L( ; ) = T log(2 2 ) 2


T

=1

1 1 )2 2 2

If we have enough time in our hands we could in principle maximize this function analytically, but it is more convenient to carry out the maximization numerically. Listing 5.1 shows how this can be done in a very simple way. In fact, listing 5.1 conducts a small experiment that illustrates the bias and nonnormality of maximum likelihood estimators in small samples. We assume a known parameter set, = {0 0 75 0 10 0 30}, and simulate paths of length T = 50 and T = 500. In total we simulate 1000 paths for each length. We then estimate the parameters using maximum likelihood. Consistency will indicate that although in small samples the estimators might be biased, as the sample grows the mean should converge to the true value, while the estimator variance should decrease. Due to the asymptotic normality the distribution should become gradually closer to the Gaussian one. Figure 5.2 gives the results of this experiment, and we can observe that this is indeed the case. The bias is more pronounced for the autoregressive and the moving average parameters, with their means biased towards zero for the smaller sample. The non-Gaussian nature of the estimators is also apparent, with the kurtosis being consistently high. As the sample size increases, the estimator densities become tighter and markedly more symmetric. The volatility, although slightly biased for the smaller sample, is very accurately estimated, exhibiting very small standard deviation. This is a more general feature: drifts are very sensitive to the actual path, as they largely depend on the rst and the last observation. Volatilities on the other hand take more information over the variability of the path, and their estimators converge a lot faster. Note that the fact that asymptotically the parameters are Gaussian does not imply that they are independent. Indeed, the correlation matrices of the parameter estimates are 1 0048 0000 0038 1 0052 0015 0042 0048 1 071 0081 077 0099 , and 0052 1 0000 071 0015 077 1 0044 1 0030 0038 0081 0044 1 0042 0099 0030 1

140(5.5)

for the sample sizes T = 50 and T = 500 respectively. Observe the high negative correlation between the estimator of the autoregressive and the moving average terms. As these two parameters compete to capture the same features of the data,2 the estimates parameters tend to come in high/low pairs.

L
Lvy models can be easily estimated using the MLE approach, by inverting the characteristic function using the FFT or fractional FFT methods of chapter 4. We can invert the characteristic function directly to produce the probability density, or we can invert for the cumulative density and then use numerical dierentiation. Although the second method appears to be more cumbersome, it is often more stable. This happens in the case of Lvy models because the density typically exhibits a very sharp peak, which the direct transform might fail to capture. Irrespective of the method we choose to construct the density, the maximization of the log-likelihood should be straightforward. As Lvy models are time-homogeneous, the returns are identically distributed. If we denote with ( ; ) the probability density of the Lvy model, then the log-likelihood can be easily computed over a series of returns { 1 T } as
T

L( ; ) =
=1

log ( | )

Generally speaking, as the FFT method will produce a dense grid for the probability density function we only have to call the Fourier inversion once at each likelihood evaluation and interpolate between those points. This renders MLE quite an ecient method for the estimation of Lvy processes.3 In our example we will be using the cumulative density function, recovered with the code of listing 4.8. We use data of the S&P500 index.

. L . T
2

An AR(1) process can be written as an MA() one and vice versa. Therefore a series that is generated by an AR(1) data generating process will produce MA(1) estimators as a rst order approximation, if the estimated model is misspecied. Some popular Lvy models admit closed form expressions for the probability density function. In principle this means that one can avoid the FFT step altogether and use the closed form instead. It turns out that in the majority of cases these densities are expressed in terms of special functions, which can be more expensive to compute than a single FFT!

6 Volatility

In this chapter we will investigate the modeling of volatility, and its implications on derivative pricing. We will start with some stylized facts of the historical and implied volatility, which will benchmark any forecasting or pricing methodology. We will then give an overview of Garch-type volatility lters and discuss how the parameters can be estimated using maximum likelihood. We will see that although Garch lters do a very good job in ltering and forecasting volatility, they fall somewhat short in the derivatives pricing arena. These shortcomings stem from the fact that Garch, by construction, is set up in discrete time, while modern pricing theory is set up under continuous time assumptions. Two families of volatility models will be introduced for pricing and hedging. Stochastic volatility models extend the Black-Scholes methodology by introducing an extra diusion that models volatility. Local volatility models, on the other hand, take a dierent point of view, and make volatility a non-linear function of time and the underlying asset. Of course each approach has some benets but also some limitations, and for that reason we contrast and compare these methods. It is important to note that this chapter deals exclusively with equity volatility, and to some extend exchange rate volatility. These processes are typically represented using some variants of random walk models. Fixed income securities models, and their volatility structures, will be covered in a later chapter.

. S
This rst section will cover some stylized features of volatility. We will differentiate between historical and implied volatilities. Although the qualitative properties of these two are similar, their quantitative aspects might dier substantially, as they are specied under two dierent (but nevertheless equivalent) probability measures.

142(6.1)

F . : Dow Jones industrial average (DJIA) weekly returns and yearly historical volatility. The (annualized) volatility is computed over non-overlapping 52 week periods from the beginning of 1930 to 2005.
20% 15% 10% 5% 0 -5% -10% -15% -20% 30 10% 40% 60%

50%

30%

20%

35

40

45

50

55

60

65

70

75

80

85

90

95

00

05

0 30

35

40

45

50

55

60

65

70

75

80

85

90

95

00

05

(a) weekly DJIA returns

(b) annual volatility

H
Volatility in nancial markets varies over time. This is one of the most documented stylized facts of asset prices. For example, gure 6.1(a) gives a very long series of weekly returns1 on the Dow Jones industrial average index (DJIA, or just the Dow). Subgure 6.1(b) presents the (annualized) standard deviation of consecutive and non-overlapping 52-week intervals, a proxy of the realized DJIA volatility over yearly periods. One can readily observe this time variability of the realized volatility, and in fact we can easily associate it with distinct events, like the Great Depression (early 30s), the Second World War (late 30s/early 40s), the Oil Crisis (mid 70s), and the Russian Crisis (late 90s). If we compute the summary statistics of the DJIA returns, we will nd that the unconditional distribution exhibits fat tails (high kurtosis). In particular, the kurtosis of this sample is = 8 61. The variability of volatility can cause fat tails in the unconditional distribution, even if the conditional returns are normally distributed. To illustrate this point, consider a simple example where the volatility can take only two values, = 1 = 10% or = 2 = 40%, and both means are zero. Say that we denote with N ( ; ) the corresponding normal probability density functions. Also, suppose that 1 = 75% of the time returns are drawn from a normal2 N ( ; 0 1 ), and in the other 2 = 25% of the time they are drawn from a second normal N ( ; 0 2 ). If we consider the unconditional distribution, its probability density function will be a mixture of the two normal distributions, and in fact
1

Here by returns we actually mean log-returns, that is if S is the time-series of DJIA values, = log S 1 log S . Here the notation ( ; ) means that is distributed as a random variable that has a probability density function given by ( ; ).

143(6.1)

F . : This gure illustrates the dierent kurtosis and skewness patterns that can be generated by mixing two normal distributions. In both gures 1 = 10% and 2 = 40%. In subgure (a) the two means are equal 1 = 2 = 0, a setting that can generate fat tails but not skewness. In subgure (b) 1 = 5% and 2 = 15%, generating negative skewness in addition to the fat tails.
4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0 -2.0 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0 -2.0

-1.5

-1.0

-0.5

0.5

1.0

1.5

2.0

-1.5

-1.0

-0.5

0.5

1.0

1.5

2.0

(a) 1 = 2

(b) 1 = 2

1 N(

; 0 1 ) +

2 N(

; 0 2 )

Figure 6.2(a) illustrates exactly this point, and gives the two conditional normals and the unconditional distribution. One can easily compute the statistics for the unconditional returns, and in particular the unconditional volatility = 21 7%, and the kurtosis = 8 7 > 3. Inspecting gure 6.1(b), one can also observe that the historical realized volatility does not swing wildly, but exhibits a cyclical pattern. In particular, it appears that volatility exhibits high autocorrelation, with low (high) volatility periods more likely to be followed by more low (high) volatility periods. In the literature these patterns are often described as volatility clusters. Having said that, the volatility process appears to be stationary, in the sense that it remains between bounds, an intuitive feature.3 We can imagine that there is some long run volatility that serves as an attractor, with the spot volatility hovering around this level.

I
In chapter 2 we gave a quick introduction to the notion of the implied volatility (IV), denoted with . In particular, given an observed European call or put option price Pobs , the IV will equate it to the theoretical Black-Scholes value, solving the equation
3

The intuition stems from the fact that, unlike prices themselves, market volatility can not increase without bounds. Even if we are asked to provide some estimate for the volatility of DJIA in 1,000 years, we would probably come up with a value that reects current volatility bounds. If we are asked to estimate the level of DJIA in 1,000 years time, we would produce a very vary large number.

144(6.1)

F . : The S&P500 index (SPX, in blue) and the implied volatility index (VIX, in green) are given in subgure (a). Subgure (b) presents a scatterplot of the corresponding dierences, illustrating the return/volatility correlation.
10
160 140 120 100 log-SPX level 80 60 40 20 0 -20 90 92 95 97 00 02 05 50 45 40

5 VIX changes

35 30 25 20 VIX level

-5
15 10 5

-10 -8

-6

-4

0 2 -2 log-SPX changes

(a) levels

(b) dierences

R+ : Pobs =

BS (S

K T

We also showed that short at-the-money IV reects expected volatility (under the equivalent martingale measure that prices the corresponding option). As expected, implied volatility shows similar patterns to the historical realized volatility. In particular, time series of IV for a particular contract with xed maturity exhibit an autoregressive structure and clusters. Figure 6.3(a) gives the S&P500 index as well as the implied volatility index VIX, released by the Chicago Board Options Exchange (CBOE). The VIX is computed as a weighted average of option prices that bracket 30 days to maturity, with more weight given to options that are at-the-money.4 These options are written on the S&P500 index, and the data span a period from January 1990 (when the VIX index was rst released) up to April 2007. The VIX has been coined as investors fear gauge, and gure 6.3(a) certainly illustrates that. Just like the realized volatility (discussed in the previous subsection), the VIX increases in periods where signicant events cause the market to go into turmoil. We can clearly see the rst Gulf war (8/90-2/91), the East Asian crisis (5-8/97), the Russian crisis and the collapse of Long-Term Capital Management (5-9/98), the 9/11 attacks (11/01), and the buildup to the invasion of Iraq (3/03). In all these episodes the market level declined. Based on our catalogue of high volatility episodes that we devised above (using the VIX or the historical realized volatility), it is apparent that they were accompanied by periods of low or negative returns. Each one of these clusters is a chapter of market turmoil. This suggests that there might be some negative correlation between the market returns and their contemporaneous volatility. Periods of high volatility are accompanied by low returns, while returns are higher when volatility is low. These two market regimes reect the bad and
4

The step-by-step construction of the VIX index is given in the White Paper CBOE (2003).

145(6.1)

good times in the market.5 It is easy to investigate the validity of this claim, by a simple scatterplot of the realized volatility against market returns, as in gure 6.3(b). The negative relationship is apparent, and is veried by a simple regression that indicates a relationship of the form = 0 03% 0 89 . This indicates that a 1% drop in the market index is accompanied (on average) by a 0 89% rise of the market volatility.6 This negative correlation is often coined the leverage eect, as it can be theoretically explained by the degree of leverage that underlies the capital structure of the rm. In particular, as the rm value is the sum of debt and equity, a shock to the value of the rm will have an impact on the stock price that depends on the leverage. If the rm has been nanced by issuing stock alone, then a 1% increase in the rm value will result in a 1% in the stock price; on the other hand, if the rm is levered, the impact on the stock price can be a lot higher than 1%, depending on the leverage.7 Thus, higher leverage will produce higher stock price volatility. In addition, a negative stock price shock will increase leverage, implying that negative returns imply higher volatility, and hence the negative correlation. Early research (for example Christie, 1982) indicate that there is indeed a relationship between this correlation and the balance sheet, but more recent evidence indicates that this eect cannot really explain the magnitude of the asymmetry that is observed or implied from options markets (Figlewski and Wang, 2000). It appears that this negative relationship might be better attributed to the erratic behavior of market participants during market downturns. Whatever its reasons, this negative relationship between asset returns and their volatilities manifests itself as negative skewness in the unconditional distribution. Coming back to our toy model that we used to investigate kurtosis, assume that now returns can come from two normals with parameters (1 1 ) = (5% 10%) (the good times), or (2 2 ) = (15% 40%) (the bad times). Figure 6.2(b) presents the unconditional distribution for this setting. Straightforward calculations can reveal that the skewness of the unconditional returns is = 1 37, while kurtosis is similar, = 8 28.
5

Also found in the literature as bust and boom, bear and bull, or recession and expansion, depending on the journal or publication one is reading. Of course this is a very crude method. Bouchaud and Potters (2001) give a formal empirical investigation based on a large number of stocks and indices, and nd that this negative correlation is more pronounced for indices, but more persistent for individual stocks. We follow here the standard Modigliani and Miller (1958) capital structure approach, where stocks and bonds represent dierent ways of splitting rm ownership. Say that a company is worth $100 , with $10 in stock and the rest ($90 ) in bonds. If the value of the rm increases by 1% up to $101 , the value of the stock will increase to $11 to reect that increase (since the debt value cannot change). This will imply a 10% rise in the stock price.

146(6.1)

F . : Implied volatilities , plotted against maturity T and either strike prices K (subgure a) or against the corresponding Deltas = N log(F /K )/( T ) + T /2 (subgure b).
0.25 0.20 0.15 0.10 0.05 500 0 1.0 0.8 0.6 400 0.4 0.2 0 350 450 0 1 0.8 0.6 0.4 0.2 0 0 0.5 0.25 0.2 0.15 0.1 0.05 1

(a) against strikes

(b) against delta

The implied volatility surface Given an underlying asset, at any point in time there will exist a number of options, spanning a range of strike prices for dierent times to maturity. Each one of these options can be inverted to deliver an implied volatility (K T ). A three-dimensional scatterplot of these implied volatilities gives the implied volatility surface, which has some distinct and very interesting features. Such a scatter plot is given in gure 6.4(a), for options computed on the S&P500 index. One can readily observe an implied volatility skew for each maturity level. IV is higher for small strikes, which correspond to in-the-money calls or outof-the-money puts, and declines as we move towards higher strikes. Another observation is that this skew is more pronounced for options with shorter maturities, and attens out for long dated options. The monotonic relation between the BS price and volatility indicates that out-of-the-money puts appear to carry a higher premium than the corresponding out-of-the-money calls. More recently, the volatility skew is presented against other forms of moneyness that remove some of the maturity eect. Figure 6.4(b) gives such an example, where the same IV surface is re-parameterized with respect to the Delta of the appropriate call. In that case at-the-money contracts will be mapped to a Delta of = 1/2. If the BS model was the correct one, that is to say if log-returns were normally distributed with constant volatility, IV surfaces would be at. The shape of the IV surface can point towards these deviations from normality, and in fact it can reveal the risk-neutral distribution that is consistent with the observed implied volatilities. In particular, across moneyness the skew pattern we outline above is typical of index options, and to some extend stock options. Currency options can also exhibit a U-shaped pattern of implied volatilities, coined the volatility smile. Such a pattern was also encountered in stock and index options before the 1987 stock market crash (see Rubinstein, 1985, 1994; Jackwerth and Rubinstein, 1996, for details). A volatility smile will be consistent with a

147(6.2)

distribution that exhibits fat tails, since in that case it would be more likely to exercise out-of-the-money puts or calls. To reproduce a volatility skew, one will need a distribution that is not only leptokurtic but also skewed. The volatility surface itself is not stable across time. The dynamics of the surface are investigated in Skiadopoulos, Hodges, and Clelow (2000), and more recently in Cont and da Fonseca (2002). Assumptions on these dynamics are going to aect the Delta hedging schemes that can be employed. Derman (1999) discusses such hedging rules, namely the sticky strike, the sticky Delta or the sticky local volatility strategy. One challenge is to construct a theoretical model that can replicate the shape and the dynamics of the IV surface.

T
To model the time varying nature of the asset return volatility, typically one has to choose between a Garch and a SV approach. Each one has its benets, but also some shortfalls and peculiarities. Generally speaking, the Garch family is more suited for historical estimation and risk management purposes, while the stochastic volatility is better adapted towards derivative pricing and hedging. The following table gives a quick comparison of the two families. In the next sections we will give more details. Garch known computable no extra source set internally discrete discrete time very limited maximum likelihood hard SV unknown unknown extra source set externally continuous extra diusions available hard transforms

current volatility conditional volatility volatility randomness volatility price of risk time frame incompleteness option pricing historical calibration calibration to options

. A
In the previous section we pointed out that a mixture of normal distribution has the potential to produce distributions that exhibit skewness and excess kurtosis. Also, by investigating historical realized returns and implied volatilities, we concluded that market volatility is time varying and cyclical. Autoregressive conditional heteroscedasticity models build exactly on these points. The denitive reference is Hamilton (1994).

148(6.2)

Assume a probability space ( F P), and say that we are interested in modeling a series of returns = () for = 1 T and . The information that is gathered up to period is represented by the ltration F = ( : 0 ). The conditional distribution is normal |F 1
N(

; )

but having a dierent volatility , and possibly a dierent mean . This volatility is updated using a mechanism that ensures that at each period 1 we can ascertain the parameters of next periods returns, and , based on past returns alone. In probability jargon we say that both and are F 1 -adapted.

Engle (1982) set up a process which he coined Arch(1), standing for autoregressive conditional heteroscedasticity of order one. In particular =+ N(0 = +

)
2 1

In this model the conditional return is indeed normally distributed, |F 1 ), and the volatility is F 1 -adapted since it is a function of 1 = N( ; which is known at time 1. Also, if the volatility at time 1 is large, 1 then it will be more likely to draw a large (in absolute terms) . Therefore an Arch(1) will exhibit some autocorrelation in the volatility. In order to ensure that the volatility is positive we need to impose the restrictions 0. We can write volatility forecasts + | = E[ 2+ |F ] = E 2+ by backward substitution as
+ |

=E

2 +

= + E

2 + 1

=+
+1

+ 1|

which yields the forecasts (using also


+ |

+1|

which is known at time )

= (1 + + + =

1 1

)+ +

1 1

+1| +1

+ 1

+1

The above expression also indicates that the constraint 1 is needed to ensure that the volatility process is not explosive. In that case, the long run expectation for the volatility is = 1 . The expected integrated variance, 2 H =E + | will be given by =1 + | |F = =1 H = ( 1) + 1 1

+1

149(6.2)

The Arch(1) model can be easily extended to one of order (an Arch( ) model), by allowing the variance to depend on more lagged values of = + 1
2 1

+ 2

2 2

+ +

For this process to avoid explosive volatility we need the constraint 1. The Arch() is the natural extension where the whole history of error terms aects our volatility forecast. Actually, early research on Arch models indicated that a large number of lags are required to capture the dynamics of asset volatility, pointing towards some Arch() structure. This gave eventually rise to the Garch extension.

The Garch model (generalized Arch) of Bollerslev (1986) extends the Arch family by adding dependence on past variances. For example, the popular Garch(1,1) species =+ N(0 ) = + 1 +
2 1

The additional constraint 0 is sucient to keep the variance positive. This seemingly small addition is equivalent to an Arch() structure, which is clear if we back-substitute the conditional variances which yields for lags = 1 + 1

2 1

2 2

+ +

1 2

If 1, then we can let , giving the Arch() form of the Garch(1,1) model = + 21 + 22 + 2 23 + 1 The impact of lagged errors decays exponentially as we move further back in the past of the series. The Garch(1,1) model has been extremely popular amongst econometricians and practitioners that need to either lter of forecast volatility. The natural generalization Garch( , ) includes lags of the squared error terms and lagged variances. Once again we can derive the volatility forecasts using forward substitution, in particular
+ |

=E

2 +

= + E

+ 1

+ E

2 + 1

= + ( + )

+ 1|

which is the same form we encountered in the Arch case for +. Therefore we can compute forecasts for the variance and the integrated variance if we denote = + (the so called persistence parameter)


+ |

150(6.2)

The long run (or unconditional) variance is now given by = 1 . In order for the variance to remain well dened we need to impose the constraint + 0.

+ 1 +1 1 1 = ( 1) + +1 1 1 1 =

In order to use a Garch model we need to know the parameters of the process, namely { }. We can estimate these parameters based on a time series of historical returns = { 1 T }. If we denote with 1 ( ) = P[ d |F 1 ] the conditional density, then the likelihood of the sample is given by the product L ( ) = T=1 1 ( ). We are usually employ the logarithm of this expression, the log-likelihood
T

log L ( ) =
=1

log

1 (

The fact that conditionally the random variables |F 1 are normally distributed, allows one to compute the likelihood for a given set of parameters = { }. Often we set the long run variance equal to the sample variance 2 , and therefore set = 2 (1 ). This makes sense if our sample is fairly long, and can signicantly help the numerical optimization algorithm. In that case the parameter vector to be estimated is = { }. In order to start the recursive algorithm that computes the Garch variance we also need an initial value for 0 . We can also use 0 = 2 , or we can add 0 to the parameter vector and let it be estimated. In the Garch process we dened above, the parameter is not the expected rate of return. In particular, as the asset price is lognormally distributed, S = 1 S 1 exp( ), the expected return is E 1 S = S +1 exp 2 . Therefore, if we want to denote the constant expected return, then we need to set up the Garch equation as = 1 2 + N(0 )

The next steps, implemented in listing 6.1, show how the likelihood can be computed for a given set of parameters and a sample = { }. The popularity of the Garch model stems from the fact that this likelihood is computed rapidly and can be easily and quickly maximized. The ideas behind maximum likelihood estimation were covered in detail in chapter 5. 1. If they are not part of 2 0 = . , we set the parameters = 2 (1 ) and

151(6.2) L

. :

: Garch likelihood function.

10

15

20

2. Based on the parameters { } and the initial value volatility series, applying the Garch(1,1) recursion =
+1

0,

we lter the

= +

1 2 +

3. Now we have the variance series which allows us to compute the loglikelihood of each observation . Since |F 1 N( ) log L ( | ) = ( )2 1 log 2 2 1 log 2 2

4. Finally adding up will give the log-likelihood of the sample


T

log L ( | ) =
=1

log L ( | )

The maximization of the log-likelihood is typically numerically, using a hill climbing algorithm. Press, Flannery, Teukolsky, and Vetterling (1992) describe

152(6.2)

a number of such algorithms. We will denote with the parameter vector that maximizes the sample log-likelihood. Essentially the rst order conditions set the Jacobian equal to zero log L ( | ) =0

The Hessian matrix of second derivatives can help us produce the asymptotic standard errors 2 log L ( | ) H= = The covariance matrix of is given by the inverse of the Hessian (Hamilton, 1994, gives methods to estimate H). Estimation examples As an example, we will estimate two time series using the Garch(1,1) process for the volatility. We start with the long DJIA index sampled weekly from 1930 to 2004 (plotted in gure 6.1(a)), and then move to the shorter SPX index sampled daily from 1990 to mid-2007 (plotted in gure 6.3). Listing 6.2 shows how the log-likelihood can be optimized. The estimation is done using the Optimization Toolbox in Matlab, although any hill climbing algorithm will do in that simple case. We use constrained optimization to ensure that and are bounded between zero and one. Also we want to ensure that = + < 1. The standard errors are produced using the Hessian matrix that is estimated by the toolbox.8 We also use the restriction on the long run variance, and set the initial variance equal to the sample variance. The maximum likelihood parameters are given below (all in percentage terms), with standard errors in parentheses. =+ DJIA 0.19 (0.02) 91.32 (0.94) 7.66 (0.76) 98.98 SPX 0.05 (0.01) 93.93 (0.53) 5.42 (0.45) 99.45
2

Both times give similar estimated 2values. If we write the error term for N(0 1), then 2 = and since E2 = 1 we can write 1
8

= =

The optimization toolbox actually updates estimates of the Hessian and the output is not always reliable. Some care has to taken here, and the standard errors should be taken with a pinch of salt. Hamilton (1994) gives a number of superior methods such as the score, or outer product method, etc.

153(6.2) L . :

: Estimation of a Garch model.

10

15

20

25

(1+ ) where now E = 0 (but of course is not normal). The the Garch(1,1) variance process can be cast in an autoregressive AR(1) form =+
1

The importance of the coecient becomes now apparent, as it will determine the decay of variance shocks. In both time series 1, which indicates that volatility behaves as a near unit root process.9 In such a process shocks to the volatility are near permanent, and the process is reverting very slowly towards the long run variance.10
9 10

In fact, if we trust the standard errors we are not able to reject the hypothesis = 1. A Garch process with + = 1 is called integrated Garch (Igarch), and is equivalent to the exponentially weighted moving average (EWMA) specication, where the variance

154(6.2)

F . : Filtered volatility for the DJIA and the SPX index. In subgure (a) the Garch variance (blue) of weekly DJIA returns is plotted with the historical realized volatility (red). In (b) the Garch variance (blue) of daily SPX returns is plotted with the implied volatility VIX index (red)
60 50 45 50 40 40 35 30 30 25 20 20 15 10 10 0 30 35 40 45 50 55 60 65 70 75 80 85 90 95 00 05 5 90 92 95 97 00 02 05 07

(a) DJIA volatility

(b) SPX volatility

These parameter estimates are typical of Garch estimations, and the near integrated behavior has been the topic of substantial research through the 80s and the 90s. A number of researchers introduced Garch variants that exhibit long memory, such as the fractionally integrated Garch (Figarch) of Baillie, Bollerslev, and Mikkelsen (1993). Others acknowledge that models with structural breaks in the variance process can exhibit spuriously high persistence (Lamourex and Lastrapes, 1990), and produce models that exhibit large swings in the long run variance attractor (Hamilton and Susmel, 1994; Dueker, 1997). Figure 6.5 gives the ltered volatility for both cases. This is a by-product of the likelihood evaluation. For comparison, the historical volatility (of gure 6.1) and the implied volatility VIX index (of gure 6.3) are also presented. The ltered volatilities are computed using the maximum likelihood parameter estimates. On point worth making is that the implied volatility overestimates the true volatility, illustrated in subgure (b), where the VIX index is above the ltered volatility for most of time. This due to the fact that implied volatility can be thought as a volatility forecast under an equivalent martingale measure, rather than a true forecast. There will be dierent risk premiums embedded in the implied volatility, rendering it a biased estimator or forecast of the true volatility.

O
Apart from the simple Garch(1,1) model that we already presented, there have been scores of modications and extensions, tailor made to t the stylized facts of asset prices. We will give here a few useful alternatives.
is updated as 2 = 21 + (1 ) walk.
2 1 .

In this case the volatility behaves as a random

155(6.2)

In the standard Garch model we assumed that conditional returns are nor mally distributed, and write = , with N(0 1). The likelihood function was based on this assumption. It is straightforward to use another distribution for ; if it has a density function that is known in closed form, then it is straightforward to modify the likelihood function appropriately. Of course it might be necessary to normalize the distribution to ensure that E = 0 and E2 = 1. A popular choice is the Student-t distribution which can accommodate conditional fat tails. The density function of the Student-t distribution with degrees of freedom is
t(

( + 1)/2 ; ) = (/2)
2 ,

(+1)/2

1+

As the t distribution has variance

we can set the density of equal to 2 ;

We can augment the parameter vector lihood evaluation will now become

with , and the third step of the like-

3 Now we have the variance series which allows us to compute the loglikelihood of each observation log L ( | ) = log 2 ( + 1)/2 (/2) 1 log 2

+1 ( )2 2 log 1 + 2 2

Garch models based on normal or t distributed errors do not exhibit skewness. Nelson (1991) considers the generalized error distribution (GED) which can potentially capture skewed errors. Having said that, this approach does not model the leverage eect directly. The GJR-Garch model, introduced in Glosten, Jagannathan, and Runkle (1993), uses a dummy variable to assume dierent impact of positive and negative news on the variance process. In particular =+
1

2 1

+ I(

0)

2 1

The function I( ) is the indicator function. Therefore, if > 0 a negative return will increase the conditional variance more than a positive one ( + instead of ).11 But even with the GJR approach we will not have the situation illustrated in gure 6.3(b), where positive returns will actually have a negative impact on the volatility.
11

Other asymmetric extensions include the threshold model of Zakoian (1994) and the quadratic Garch of Sentana (1995).


. :

156(6.2)

: Egarch likelihood function.

10

15

20

25

30

The Egarch model of Nelson (1991) takes a more direct approach, as it uses raw rather than squared returns. This implies that the sign is not lost and will have an impact. In order to get around the non-negativity issue he models the logarithm of the variance log = + log
1

+ (| | + )

In the Egarch approach < 0 will be consistent with gure 6.3(b), as higher returns will lower volatility. Listing 6.3 shows an implementation of the Egarch likelihood function. As there are no constraints in the Egarch maximization, the

157(6.2)

hill climbing algorithm might attempt to compute the likelihood for absurd parameter values as it tries to nd the optimum. There are a couple of tricks in the code that ensure that a likelihood value will be returned. The implementation for the optimization resembles listing 6.2, but we shall use unconstrained optimization. The maximum likelihood parameters are given below for the two time series DJIA SPX 0.14% 0.04% (0.02%) (0.01%) -0.2968 -0.2612 (0.0204) (0.0128) 0.9785 0.9826 (0.0021) (0.0013) 0.1678 0.1249 (0.0109) (0.0075) -0.4093 -0.6145 (0.0402) (0.0587) As expected, the product < 0, supporting the negative returns/volatility relationship. The ltered variances are similar to the ones in gure 6.5. Asset pricing models typically assert that market volatility is a measure of systematic risk, and that the expected return should be adjusted accordingly. If is the risk free rate of return, then popular modications to the Garch equation are the so called Garch-in-mean models = + = + 1 2 1 2 + + N(0 N(0 ) )

The parameter in the above expressions denotes the price of risk. Note that in the rst alternative the asset exhibits constant Sharpe ratio. Garch models can also be extended to more dimensions. In that case the covariance matrix is updated at each time step. In the univariate case we needed to take some care to ensure that the variance remained positive; now, in an analogous fashion, we must make sure that the covariance matrix is positive denite. This is not a trivial task. Also, in the general case a large number of parameters have to be estimated, and we usually estimate restricted versions in order to reduce the dimensionality.12 In general, a multivariate Garch(1,1) will be of the form
12

The most widely used forms are the VEC specication of Bollerslev, Engle, and Wooldridge (1988), and the BEKK specication of Engle and Kroner (1995). A recent survey of dierent approaches and methods is Bauwens, Laurent, and Rombouts (2006).


= + = H1/2 N(0 I)

158(6.2)

The matrix H1/2 can be thought of as the one obtained from the Cholesky factorization of the covariance matrix H . The covariance matrix can be updated in a form that is analogous to the univariate Garch(1,1) H = + B H 1 + (

In this case the ( )-th element of the covariance matrix will depend on its () () lagged value and on the product 1 1 . Of course more general forms are possible, with covariances that depend on dierent lagged covariances or error products. To illustrate the multivariate Garch, we will use an example that is based on the Capital Asset Pricing Model (CAPM). In particular, asset returns will depend on the covariance with the market and the market premium, which in turn will depend on the market variance. If we denote with A , M and F the asset, market and risk free rates of return, then we can write the CAPM relationships as
A M

= =

F F

E 1 ( A M ) E 1 E 1 ( M )2
M 2

+ E 1 (
M 2

) +

Since E 1

= E 1 (
A M

) , the above system simplies to


F F

= E 1 ( = E 1 (

A M M 2

)+

A M

) +

We can estimate the above specication using a multivariate Garch approach, taking into account that the covariance and the variances can be time varying. If we dene
A

F F

H = E 1 (

then we can estimate the process (with 17 parameters)

1 2

1 1 1 2 1 3 2 1 2 2 2 3 +
(1 1)

(H ) + H 1 +

N(0 H ) 1 1 1 2 1 2 2 2 (

H =

1 1 1 2 1 2 2 2

1 1 1 2 1 2 2 2
(2 2) (1 2)

The function (H ) = (H H H ) takes the unique elements of the covariance matrix and puts them in a vector form.

159(6.2)

If the conditional CAPM with time varying risk premiums is sucient to explain the asset and market returns, then the following restrictions should be satised 1 0 1 1 1 2 1 3 001 = = 2 0 2 1 2 2 2 3 010 The restrictions can be tested with a likelihood ratio test.

G
The Garch family of models has been the workhorse of volatility modeling and has had many applications in testing, forecasting, and risk management. Applications within a pricing framework one the other hand have been very limited. The reason is that Garch models are set-up in discrete time, and for that reason the underlying market is incomplete. This means that replicating portfolios do not exist for derivative assets. Intuitively, this is due to the fact that the state-space is too dense compared to the time-space (where rebalancing takes place). Over a time step the asset price can jump to an innite number of values, and it is impossible to construct a position that will hedge against all possibilities. In contrast, when trading takes place continuously the asset price diuses from one level to the next, giving us the opportunity to create a dynamic Delta hedging strategy. This is not a feature of Garch models alone; all models that are set up in discrete time and have continuous support will share the same drawback. Even in the simple model where the asset log-price follows a random walk model in discrete time the market is incomplete. This implies that there is not a unique way to identify the risk adjusted probability measure in discrete time models. For example, there is nothing to stop us from specifying S +1 = S exp + S +1 = S exp +
Q +1 Q +1 +1

N(0 2 ) t()

under P under Q

+1

for Q chosen in a way that makes the discounted price a martingale under Q, S = EQ [exp( )S +1 ]. But not all is lost: we just need to impose some more structure that will eventually constrain our choices for Q. Here we will outline two methods to achieve that, but since derivative pricing typically takes place in a continuous time setting, we will not dwell into details. 1. We might impose assumptions on the utility structure. Assuming a certain utility form will set the family of equivalent measures. In particular, the parameters of the utility function may be recovered from the true stock and risk free expected returns. 2. We can assume that the density structure has to be maintained, that is to say if errors are normally distributed under P, then they must be normally distributed under Q.

160(6.2)

Utility based option pricing In our rst approach will will assume a utility function U( W ), which measures utility of wealth W realized at time . We will also need the relationship between wealth W and the underlying asset price S ((an early source for this approach is Brennan, 1979)). For example, if the underlying asset is a wide index, then one might assume that investors wealth is very correlated with this index. If the underlying asset is a small stock, then the correlation will be smaller. This resembles the impact of the idiosyncratic versus the systematic risk in asset pricing models. Option prices can be computed from the Euler equations, which state that the price at time of a random claim that is realized at time T > , say XT , is given by (see for example Barone-Adesi, Engle, and Mancini, 2004) X =E UW (T WT ) XT UW ( W )

Essentially, the Euler equation weights each outcome with its impact on the marginal rate of substitution, before taking expectations. The price of a European call option would be then equal to P =E UW (T WT ) (ST K )+ UW ( W )

Note that in the above expression there is no talk of equivalent measures. All expectation are taken directly under P. Nevertheless, if we think of the marginal rate of substitution as a Radon-Nikodym derivative, then we can dene the equivalent probability measure. Of course, in general it is not straightforward neither to specify the appropriate utility nor to compute the expectation in closed form, but things are substantially simplied if we consider power utility functions. In fact , we will arrive to the Esscher transform, which has been very successful in actuarial sciences. This is described in detail in Gerber and Shiu (1994). Distribution based The second method takes a more direct approach. Suppose that the log price follows the standard Garch(1,1) model log S = 1 2 = + +
1

+
2 1 1

Rather than trying to derive, we dene the risk neutral probability measure as the one under which the random variable Q =

161(6.3)

is a martingale. Then under risk neutrality the asset log price follows log S = 1 2 +
1

Q +
1

= +

Q + 1

This approach is pretty much described in Duan (1995). Derivative are computed, in the usual way, as the expectation under risk neutrality. The benet of this approach is that standardized errors remain normally distributed even after the probability measure change. Equivalently, we can say that the Black-Scholes formula holds for options with one period to maturity. One major drawback of the standard Garch model is that the expectation that prices derivatives is not generally computable in closed form. Of course simulation based techniques can be employed, but they will be time consuming. An alternative, presented in Duan, Gauthier, and Simonato (1999) can be used. In this approach the state space is discretized, and a Markov chain is used to approximate the Garch dynamics. The Garch variant introduced in Heston and Nandi (2000) circumvents the computability issue, and we present their approach in the following subsection. The Heston and Nandi model Heston and Nandi (2000) propose a similar class of Garch-type processes log S = + 1 + 2 = + 1 + 1

Here the bilinearity in the variance process is broken. That is to say, the prod uct 1 1 ) is not present and 1 , which is a standardized normal series, appears in the variance update alone. We set Q = + , and dene the probability measure Q as one that is equivalent to P, and also Q N(0 1) under this measure. Then, the asset price process under Q will satisfy log S = 1 2 = + +
1

Q + Q Q 1
2

for Q = + . Unlike the standard Garch model, the Heston and Nandi (2000) modication allows one to compute the characteristic function as a closed form recursion. Then, option prices or risk neutral can be easily computed using the methods described in chapter 4.

162(6.3)

. T
As we pointed out in the previous section, for option pricing purposes continuous time stochastic volatility models are immensely more popular than model set-up in discrete time.13 The generic stochastic volatility process is described by a system of two SDEs dS = S d + S dB d = ( )d + ( )dB The leverage eect is accommodated by allowing the asset return and the volatility innovations to be correlated E dB dB = d Derivative prices will have a pricing function that will depend on the volatility, on top of time and the underlying asset price P = ( S )

When we introduce stochastic volatilities we move away from the BlackScholes paradigm, where there was a single Brownian motion that generates uncertainty and markets are complete. Stochastic volatility models are driven by two Brownian motions, and for that reason the market that is constructed by the risk free and the underlying asset is not complete. This means that there is an innite number of derivative prices that rule out arbitrage. Having said that, if we augment the hedging instruments with one derivative contract the market becomes complete. Therefore derivatives will now have to be priced in relation to each other, as well as the underlying asset. This is one of the reasons that models with stochastic volatilities are calibrated to observed derivative prices. Another point of viewing this issue is through the notion of volatility risk, introduced by the second BM B (and in particular the part B of B that is 2 B ). The underlying orthogonal to B , since we can write B = B + 1 asset does not depend on this BM, and therefore the risk generated by this BM is not actually priced within the asset price. On the other hand, of course, the risk of B is embedded in the risk premium . Investors might be risk averse towards this risk, and although this risk aversion is not manifested in the market for the underlying asset, it will be manifested in the options market as these contracts depend on directly. Using one derivative we can identify the risk premium, and then we can price all other derivatives accordingly. We will describe two approaches that reach derivative prices, one that implements Girsanovs theorem and one that constructs a hedging portfolio in the
13

Stochastic volatility models can also be set in discrete time, like the specication described in Harvey, Ruiz, and Shephard (1994). They are used for historical estimation but are not popular for derivative pricing, just like their Garch-type counterparts.

163(6.3)

spirit of BS. Before we do that, we will go through some standard stochastic volatility models that have been proposed, discussing some of their properties and features. We will just present a selected few here, to illustrate the motivation as they try to capture they stylized features of volatility processes.

The rst stochastic volatility model was introduced in Hull and White (1987, HW). HW recognized that if investors are indierent towards volatility risk (that is is not correlated with the consumption that enters their utility function), and volatility is independent of the underlying asset price process, then one can integrate out the volatility path, and write the price of an option as a weighted average. In particular, if we are pricing a European call option, then it is sucient to condition on the average variance over the life of the option

PHW =
0

BS (

S ; ) ()d

and () is the probability density of the average variance process. For example, in the original Hull and White (1987) article the variance is assumed to follow a geometric Brownian motion, which is uncorrelated with the asset price process. dS = S d + S dB d = d + dB

where the average variance over the life of the derivative in question is dened as T 1 = d T

In this case, HW give a series approximation for the option price, which is based on the moments of the average variance. The HW model was the rst approach (together with Wiggins, 1987) towards a pricing formula for SV models, but the model they propose does not capture the desired features of realized volatilities. In particular, under the geometric Brownian motion dynamics, variance will be lognormally distributed. In the long run, the volatility paths will either explode towards innity, or they will fall to zero, depending on the parameter values. Volatility in the HW model does not exhibit mean reversion and is not stationary. As maturities increase, the variance of out volatility forecasts increases without bound.

The Stein and Stein (1991, SS) model remedies the long run behavior of the HW specication. In particular, rather than a geometric Brownian motion, SS use an Ohrnstein-Uhlenbeck (OU) process. The OU process exhibits mean reversion,

164(6.3)

and for that reason has a long run stationary distribution, which is actually normal. SS model the volatility rather than the variance dS = S d + S dB d = ( )d + dB This process was later extended in Schbel and Zhu (1999) by allowing the two BM processes to be correlated. The volatility process follows a normal distribution for each maturity, and therefore can cross zero. This implied that the true correlation (that is E dS d ) changes sign when this happens. This can be an undesirable property of the model. Schbel and Zhu (1999) compute the characteristic function of the log-price (T 1 2 + 2 ) = exp i log(S0 ) + i T i 2 0 1 2 D(T ; 1 3 )0 + B(T ; 1 2 3 )0 + C(T ; exp 2

3)

The functions D, B and C are solutions of a system of ODEs, and are given in a closed (but complicated) form in the appendix of Schbel and Zhu (1999).

By far the most popular model with stochastic volatility is Heston (1993). The variance follows the square root process of (also called a Feller process, developed in Feller, 1951), also used as the building block for the Cox, Ingersoll, and Ross (1985) model for the term structure of interest rates. The dynamics are given by dS = S d + S dB d = ( )d + dB E dB dB = d The Heston model has a number of attractive features and a convenient parameterization. In particular, the variance process is always non-negative, and is actually strictly positive if 2 > 2 . The volatility-of-volatility parameter controls the kurtosis, while the correlation parameter can be used to set the skewness of the density of asset returns. The variance process exhibits mean reversion, having as an attractor the long run variance parameter . The parameter denes the strength of mean reversion, and dictates how quickly the volatility skew attens out. This model belongs to the more general class of ane models of Due et al. (2000), and the characteristic function of the log-price is given is closed form. In particular it has an exponential-ane form14
14

We use the negative square root in , found in Gatheral (2006), unlike the original formulation in Heston (1993). Albrecher, Mayer, Schoutens, and Tistaert (2007) discuss

165(6.3) L . :

: Characteristic function of the Heston model.

10

15

( with C( D(

T ) = exp {C(

T ) + D(

T)

+ i log S0 }

T ) = i T +

1 exp( T ) ( i + )T 2 log 2 1 i + 1 exp( T ) T) = 2 1 exp( T ) i + = i = (i )2 + 2 (i + )

The characteristic function of the Heston model is given in 6.4. This can be used to compute European style vanilla calls and puts using the transform methods outlined in chapter 4. We will be using this approach later in this chapter to calibrate the Heston model to a set of observed option prices.

We will now turn to the problem of option pricing, and discuss the two main methods. We start with an implementation of Girsanovs theorem, and then we
this choice and show that the two are equivalent, but using the negative root oers higher stability for long maturities. The problem arises due to the branch cuts of the complex logarithm in C ( T ). A description of the problem and a dierent approach can be found in Kahl and Jckel (2005).

166(6.3)

will investigate the hedging structure that will give us the corresponding PDE. We set up a ltered space { F F P} and two correlated Brownian motions with respect to P, B and B . Based on these BMs we now consider a general stochastic volatility specication dS = S d + S dB d = ( )d + ( )dB EP dB dB = d In order to apply Girsanovs theorem we dene the process M via the stochastic dierential equation dM = M dB + M dB with initial value M0 = 1. The solution of this SDE is the exponential martingale (with respect to P), which has the form
T

M = exp

dB

1 2

2d +

dB

1 2

2d

The processes and are F -adapted, and therefore they can be functions of ( S ). Based on this exponential martingale we can dene a probability measure Q, which is equivalent to P. In fact, every choice of processes and will produce a dierent equivalent measure. The only constraint we need to impose on these processes is that the discounted underlying asset price must form a martingale under Q, which then becomes an equivalent martingale measure (EMM). The fundamental theorem of asset pricing postulates that if this the case, then there will be no arbitrage opportunities in the market. It turns out that this constraint is not sucient to identify both processes, something that we should anticipate since the market is incomplete and there will not be a unique EMM. The EMM will be dened via its Radon-Nikodym derivative with respect to the true measure, dQ =M dP If T is a FT -measurable random variable, then expectations under the equivalent measure will be given as EQ T = EP MT T M

It is useful to compute the expectations over an innitesimal interval d , as this will help us compute the drifts and volatilities under Q. In particular we will have

167(6.3) EQ d = EP M + dM dT M = EP 1 + dM M

d = EP d + EP ( dB + dB ) d

We can employ the above relationship to compute the drifts and the volatilities of the asset returns under Q EQ dS S = + + d , and EQ dS S
2

The drift and volatility of the variance process are EQ d = ( ) + ( ) + ( ) d , and EQ (d )2 = 2 ( )d

This veries that under equivalent probability measures the drifts are adjusted but volatilities are not. Now an EMM will be one that satises EQ (dS /S ) = d This constraint yields a relationship between and + = = S ( S )

The function S ( S ) is the market price of risk, the Sharpe ratio of the underlying asset. In order to construct a system we need a second equation, and essentially we have the freedom to choose the market price of volatility risk. Thus if we select a function EQ d = Q ( S ), which will be the variance drift under risk neutrality, we can set up a second equation + = ( ) Q ( S ( ) ) = ( S )

where ( S ) will be the price of volatility risk. The market risk premium S will be typically positive, as the underlying asset will oer expected returns that are higher than the risk free rate. This reects the fact that investors prefer higher returns, but are risk averse against declining prices. When it comes to volatility, we would expect investors to prefer lower volatility, and be risk averse against volatility increases. This indicates that it would make sense to select Q in a way that implies a negative risk premium , and one that does not increase with volatility. Essentially this means that Q . In practice we will have to nd a convenient parameterization for Q or that leads to expressions that admit solutions, and at the same time restrict the family of admissible EMMs. The parameter values cannot be determined from the dynamics of the underlying asset, but they can be recovered from observed derivative prices.

168(6.3)

If we solve the above system we can nd the processes and , and through them the appropriate EMM, as follows 1 S + 1 2 1 + S = 1 2 =

Finally, derivative prices can be written as expectations under Q, where the asset dynamics are dS = S d + S dBQ d where Q ( S = Q( S )d + ( )dB
Q

EQ dBQ dBQ = d ) = ( ) ( S )( ).

Example: The Heston model In Hestons model the variance drift and volatility are given by ( ) = ( ) ( ) = The price of risk is determined by the risk free rate and the asset price dynamics S( S ) = We are free to select the price of volatility risk. Say we set it equal to ( S )= for a parameter 0 (to conform with agents that are averse towards higher volatility). Then, the risk premium will be positive and increasing with volatility. In addition, such a risk premium will lead to risk neutral dynamics that have the same form as the dynamics under P. Girsanovs theorem will give the process under Q dS = S d + S dBQ d = Q ( )d + dB Q . Then we The risk neutral variance drift Q ( ) = ( ) ( S ) can rewrite the dynamics dS = S d + S dBQ d = Q (Q )d + dBQ

for the parameters Q = + and Q = + . Due to their risk aversion, manifested through the parameter 0, investors behave as if the long run volatility is higher than it really is, and as if volatility exhibits higher persistence.

169(6.3)

PDE

Alternatively, we can take a route that follows the BS methodology, where a derivative is shorted and subsequently hedged. This will give rise to the PDE representation of the price. In the BS world with constant volatility, it was sucient to use the underlying asset and the money market account to achieve the hedge. Here, as we have one more source of risk, these two instruments will not be sucient to eliminate volatility risk. To hedge our short derivative we will use the money market account, the underlying asset and one extra derivative. Consider a derivative X, and denote its pricing function with ( S ). The functional form of will depend on the particulars of the contract, such as maturity, payo structure, optionality, etc. Therefore, the process for this derivative will be given by X = ( S ). Following the BS argument, if we knew the functional form of , we could compute the dynamics of the derivative price using Its formula (for two dimensions) o dX = where X = + ( ) 1 + 2( 2 = S S = ( ) S 1 2 S2 2 2 S 2 2 S ( ) ) 2 + S + X + S S d + X S dB + X dB

X S X

From this expression it is apparent that if we construct a portfolio using only the underlying asset and the bank account, we will not be able to replicate the price process X , since the risk source B cannot be reproduced. The market that is based only one these instruments is incomplete, since the derivative cannot be replicated. But we can dynamically complete the market using another derivative X , with pricing function ( S ). This will work of course if X actually depends on the BM B , which is typically the case.15 In practice we would perhaps replicate X (say a barrier option), using the risk free asset, the underlying asset and a liquid derivative X (say a vanilla at the money option). Following BS, we short X and construct a portfolio of the underlying stock and the other derivative. We want to select the weights of this portfolio in a way that makes it risk free. Then it should grow at the risk free rate.
15 It is sucient that the pricing function depends on , in order for the derivative = 0, as we will see below. For example a forward contract is a derivative but it would not depend on B , since its pricing function ( S ) = S exp( (T )) does not depend on .

170(6.3)

Say that at each point in time we hold units of the underlying asset and units of the derivative. Then the change in our portfolio value will be d = dX dS dX Substituting for the dynamics of dX , dS and dX will give the portfolio dynamics d = ( )d + S S S dW S S + ( ) ( )

dZ

If we select the portfolio weights that make the parentheses equal to zero, then we have constructed a risk free portfolio. The solution is obviously16 = = S S
1

And since the portfolio will be risk now free, it will also have to grow at the risk free rate of return d = = (X S X ) We should expect that the drifts will give the PDE that we are looking for, but at the moment we have a medley of partial derivatives of both pricing functions and . Nevertheless, we can carry on setting the portfolio drifts equal, which yields X + S S X S = ( S ) S S

Since + = S the drift of the underlying asset will cancel out, S resembling the BS scenario. Furthermore, if we substitute the hedging weights and and rearrange to separate the starred from the non-starred elements X + S S

X + S S

The following line of argument is the most important part of the derivation, and the most tricky to understand at rst reading: In the above expression the RHS ratio (which depends only on ) is equal to the LHS ratio (which
16

Apparently, for the solution to exist we need = 0. This corresponds to our previous remark that a forward contract cannot serve as the hedging instrument.

171(6.3)

depends only on ). Recall that and are the pricing functions of two arbitrary derivatives, which means that the above ratio will be the same for all derivative contracts. If we selected another derivative contract X , then for its pricing function = , which implies = = , etc. This means that although can depend on ( S ), it cannot depend on the particular features of each derivative contract (since if it did, it wouldnt be equal for all of them). We therefore conclude that = = = = ( S )

That is very important, because it means that all derivatives will satisfy the same ratio, which can be rewritten as a single PDE17
X + S S

= ( S

) S ) 2 2 )} 1 2 + S2 2 2 S 2 + S ( ) + S = S S

+ {( ) ( 1 + 2 ( 2

As always, the boundary conditions of this PDE will dene which contract is actually priced. In particular, the terminal condition (T S ) = (S), with (S) the payo of the derivative. The Feynman-Kac link It is very instructive to pause at this point and verify the links that connect the two approaches. Using Girsanovs theorem we built the EMM and we concluded that a derivative, say with payo XT = (T ST T ) = (ST ), will be priced as the expectation under the EMM X0 = exp( T )EQ XT = exp( T )EQ (ST ) 0 0 where the dynamics of the underlying asset and its volatility are given by the SDEs dS = S d + S dBQ d = Q( S )d + ( )dBQ EQ dBQ dBQ = d with the drift of the variance process given by Q ( S ) = ( ) ( S )( ). The price of volatility risk is . Using the PDE approach we concluded that the pricing function ( S ) will solve the PDE
17

An identical line of argument is used in xed income securities, which we will follow in chapter XX.


+ ( S

172(6.3)

1 2 + S2 2 2 S 2 2 1 + S = + 2 ( ) 2 + S ( ) 2 S S

with ( S ) = ( ) ( S ) boundary condition (T S ) = (S). The Feynman-Kac formula links the two approaches, as it casts the solution of the PDE as an expectation under the dynamics dictated by Girsanovs theorem. In fact, it follows that Q ( S ) = ( S ), which implies that ( S )= ( S )( ) = ( ) Q ( S )

The free functional that we introduced in the PDE approach can be interpreted as the total volatility risk premium. For investors that are averse towards high volatility 0. Example: The Heston model If we implement the PDE using the Heston (1993) dynamics, the derivative pricing function will satisfy + {( ) ( S )} 1 2 + S2 2 2 S 2 1 2 2 + + S + S = 2 2 S S ) to be proportional to the

In his original paper, Heston assumes ( S variance ( S ) =

Essentially, following our previous discussion, this indicates that the equivalent function in the EMM approach will be ( S )= ( S ) = ( )

This means that the parameter of the PDE approach has exactly the same interpretation as . This choice for sets the PDE 1 2 + Q (Q ) + S 2 2 2 S 1 2 2 + 2 + S + S = 2 2 S S The boundary conditions are also need to specied. Following Heston (1993), for a European call option

173(6.3) (S ( S (0

T ) = (S K )+ )=1 )=0

(S ) = S (S 0 ) + Q Q (S 0 ) + S (S 0 ) = (S 0 ) S

E
Since in stochastic volatility models the volatility is unobserved, it is generally very hard to estimate the parameters based on historical asset returns, and lter the unobserved volatility process. People have used a number of approaches, for some of which we give references below. For more details see the surveys of Ghysels et al. (1996) and Javaheri (2005). 1. Indirect inference: Estimating a deterministic model, for example via Arch or Egarch, and then studying the dynamics of the ltered volatility (Nelson, 1990, 1991). 2. Simulation based methods: Although the conditional moments or the likelihood are not available in closed form, they can be simulated. Of course, a simulation has to be run between all time steps, which makes these procedures computationally intensive and very time consuming. Examples include Ecient Method of Moments (EMM), e.g. Gallant and Tauchen (1993) Simulated Maximum Likelihood (SMM), e.g. Sandmann and Koopman (1998) Markov Chain Monte Carlo (MCMC), e.g. (Eraker, Johannes, and Polson, 2001) (Unscented-) Particle Filter, (PF, UPF), e.g. van der Merwe, de Freitas, Doucet, and Wan (2001) 3. Kalman lter methods: The classical Kalman lter is not directly applicable, but it can be used in some cases after a transformation. Versions of the extended Kalman lter have also been employed. 4. Likelihood Approximation Methods: The likelihood can be approximated for the ane class of models, constructing an updating procedure for the characteristic function (Bates, 2005). Alternatively, the volatility process itself can be approximated using a Markov chain, as in Chourdakis (2002).

C
Even if we estimate the parameters of a stochastic volatility models using historical time series of asset returns, not all parameters would be useful for the purpose of derivative pricing. This happens because the estimated parameters

174(6.3)

would be the ones under the true probability measure, while investors will use some adjusted parameters to price derivatives. In particular, for stochastic volatility models the drift of the variance will be a modication of the true one, which is done by setting the price of volatility risk. To recover this price of risk, one should consult some existing derivative prices. For that reason, practitioners and (to some extend) academics prefer to use only derivative prices, and calibrate the model based on a set of liquid options. A standard setting is where a derivatives desk wants to sell an exotic option, and then hedge its exposure, and say that a stochastic volatility model is employed. The desk would look at the market prices of liquid European calls and puts, and would calibrate the pricing function to these prices. Such parameters are the risk neutral ones, and therefore can be used unmodied to price and hedge the exotic option. In a sense, they are a generalization of the BS implied volatilities. In a way, practitioners want to price the exotic contract in a way that is consistent with the observed vanillas. If the calibrated model was the one that actually generated the data, then these implied parameters should be stationary through time, and their variability should be due to measurement errors alone. In practice of course this is not the case, and practitioners tend to recalibrate some parameters every day (and sometimes more often). To implement this calibration we will need to minimize some measure of distance between the theoretical model prices and the prices of observed options. Say that we have a pricing function P( K ; ) = P( K ; S0 ; ), where denotes the set of unobserved parameters that we need to extract. Also denote with ( K ; S0 ; ) the implied volatility of that theoretical price, and with P ( K ) and ( K ) the observed market price and implied volatility. For example, in Hestons case = { 0 }. There are many objective functions that one can use for the minimization, the most popular having a weighted sum of squares form 2 G( ) = P( K ; ) P ( K ) The weights can be used to dierent ends. Sometimes the choice of reect the liquidity of dierent options using a measure such as the bid-ask spread. In other cases one wants to give more weight to options that are nearthe-money (using for example the Gamma), or to options with shorter maturities. In other cases one might want to implement a weighting scheme based on the options Vega, in order to mimic an objective function that is cast in the implied volatility space. Recovering the parameter set is not a trivial problem, as the objective function can (and in many cases does) exhibit multiple local minima. This is a common feature of inverse problems like this calibration exercise. Typically some regularization is implemented, in order to make the problem well posed for standard hill climbing algorithms. A popular example is Tikhonov-Phillips regularization (see Lagnado and Osher, 1997; Crpey, 2003, for an illustration), where the objective function is replaced by

175(6.3) G( ) = G( ) +
2

(
0)

for a regularization parameter . The role of the penalty function ( 0 ) is to keep the parameter vector as close as possible to a vector that is based on some prior information 0 . Depending on the particular pricing form, sometimes non-smoothness penalties are also sometimes imposed.18 From a nance point of view, the issue of multiple optima highlights the existence of model risk (Ben Hamida and Cont, 2005). Based on the nite set of option prices, dierent model parameters are indistinguishable. Using one set over another to price an exotic contract which might be sensitive to them can introduce losses. More generally, given the increasing arsenal of theoretical pricing models, dierent model classes can give identical t for vanilla contracts (for details see Schoutens, Simons, and Tistaert, 2004). Calibration example As an example we will t Hestons stochastic volatility model to a set of observed option prices. In particular, we are going to use contracts on the SP500 index written on April 24, 2007. The objective function that we will use is just the sum of squared dierences between model and observed prices. Listing 6.5 gives the code that computes the objective function. The prices are computed using the fractional FFT (see chapter 4), and the integration bounds are automatically selected to reect the decay of the integrand ( T ).19 The snippet 6.6 shows how this objective function can be implemented to calibrate Hestons model using a set of observed put prices on the SP500 index. There are eight dierent maturities in the data set, ranging from 13 to 594 days. The sum of squared dierences between the theoretical and observed prices is minimized, and for this example we did not use any weighting scheme. Figure 6.6 shows the observed option prices and the corresponding tted values. The table below gives the calibrated parameters = { 0 }
0


18

0.0219 5.5292 0.0229 1.0895 -0.6459

19

This is particularly true for calibrating local volatility models which have a large number of parameters. We will discuss this family of models in the next section. As Kahl and Jckel (2005) show, the characteristic function of the Heston model for large arguments decays as A exp( C )/ 2 times a cosine (where A = (0 T ) and C = 1 2 ( 0 + T )/). We can therefore bound the integral |( T )|d by |A| exp(C )/ . The solution of exp( ) = is Lamberts W function which is implemented in Matlab through the Symbolic Math Toolbox. If this toolbox is not available we have to devise a dierent strategy to set the upper integration bound, for example using the moments expansion for the characteristic function. If everything else fails we can just set a large value for the upper integration bound, or set up an adaptive integration scheme.


. :

176(6.3)

: Sum of squares for the Heston model.

10

15

20

25

30

35

177(6.3) L . :

: Calibration of the Heston model.

10

15

F . : Calibrated option prices for Hestons model. The red circles give the observed put prices, while the blue dots are the theoretical prices based on Hestons model that minimize the squared errors.
0.10

0.08

0.06

0.04

0.02

0.90

0.95

1.00

1.05

1.10

These parameters are typical of a calibration procedure, and give an objective function value of G( ) = 0 0090. The question is of course whether or not these parameters are unique, well dened and stable. In gure 6.7(a) we show the function G( ) for dierent combinations of ( ), keeping the rest of the parameters at their estimated values. There appears to be a valley across a

178(6.3)

F . : The ill-posed inverse problem in Hestons case. Subgure (a) gives the objective function that is minimized to calibrate the parameters. Subgure (b) presents the isoquants of this function, together with the minimum point attained using numerical optimization. Observe that all points that are roughly across the red line are indistinguishable. The regularized function is given in (c), while (d) shows its isoquants. Observe that the regularized function is better behaved than the original one. [code: ]
40
10 8 6 G( )

35 30 25 20 15 10 5
3 2 1 0

2 0 0 20 40 4

(a) the function G( )


40
10 8 6 G( )

(b) isoquants of G( )
35 30 25 20 15 10 5

4 2 0 0 20 40 4 3 2 1 0

(c) the function G( )

(d) isoquants of G( )

set of values, indicating that it is very hard to precisely identify the optimal parameter combination. It is apparent that combinations of values across the red line in 6.7.b will give values for the objective function that are very close. This means that based on this set of vanilla options the combinations ( ) = (5 0 1 0), (10 0 1 7) or (15 0 2 5) are pretty much indistinguishable. One way around this problem would be to enhance the information, by including more contracts such as forward starting options or cliquet options.20 These are contracts that depend on the dynamics of the transition densities for
20

A forward starting option is an option that has some features that are not determined until a future time. For example, one could buy (and pay today for) a put option with three years maturity, but where the strike price will be determined as the level of SP500 after one year. Essentially one buys today what is going to be an ATM put in a years time. A cliquet or rachet option is somewhat similar, resembling a basket of forward starting options. For example I could have a contract where every year the

179(6.4)

the volatility, and not only on the densities themselves as vanillas do. For example, a forward starting option would depend on the joint distribution of the volatilities at the starting and the maturity times. Alternatively, if such exotic contracts do not exist or they are not liquid enough to oer uncontaminated values, one could stick with vanilla options and use a regularization technique. This demands some prior view on some parameter values, which could be based on historical evidence or analysts forecasts. As an example, in Hestons model the parameter is the same under both the objective and the risk neutral measure. Based on an estimate 0 using historical series of returns and/or option values, one can set up the objective function G( ) = G( ) + ( 0 )2 In that way estimates will be biased towards combination where the prior value is 0 . For example, the estimation results of Bakshi, Cao, and Chen (1997) based on option prices and the joint estimation of returns and volatility in Pan (1997), indicate a value of 0 40. Therefore, if we set 0 = 0 40 and = 0 005, the objective function to be minimized is the one given in gure 6.7(c,d). The optimal values are now given in the following table
0

0.0200 3.5260 0.0232 0.7310 -0.7048

The new objective function at the optimal is G( ) = 0 0099 which implies a ) = 0 0094, which is not far from the unconditional sum of squares value G( optimization result.

. T
Stochastic volatility models take the view that there is an extra Brownian motion that is responsible for volatility changes. This extra source of randomness creates a market that is incomplete, where options are not redundant securities. Practically, this means that in order to hedge a position one needs to hedge against volatility risk as well as market risk. Local volatility models take a completely dierent view. No extra source of randomness is introduced, and the markets remain complete. In order to account for the implied volatility skew there is a nonlinear (but deterministic) volatility structure dS = S d + ( S )S dB
payo is determined and paid, and the strike price is readjusted according to the new SP500 level.


. :

180(6.4)

: Nadaraya-Watson smoother.

10

15

As vanilla options are expressed via the risk neutral expectation of the random variable ST , local volatility models attempt to construct the function ( S) that is consistent with the implied risk neutral densities for dierent maturities. The methodology of local volatility models follows the one on implied risk neutral densities, originating in the pioneering work of Breeden and Litzenberger (1978). These methods are inherently nonparametric, and rely on a large number of option contracts that span dierent strikes and maturities. In reality there is only a relatively small set of observed option prices that is traded, and for that reason some interpolation or smoothing techniques must be employed to articially reconstruct the true pricing function or the volatility surface. Of course this implies that the results will be sensitive to the particular method that is used. Also, care has to be taken to ensure that the resulting prices are arbitrage free.

I
There are many interpolation methods that one can use on the implied volatility surface. As second order derivatives of the corresponding pricing function are required, it is paramount that the surface is suciently smooth. In fact, it is common practice to sacrice the perfect t in order to ensure smoothness, which suggests that we are actually implementing an implied volatility smoother rather than an interpolator. Within this obvious tradeo we have to selecting the degree of t versus smoothness, which is more of an art than a science. One popular approach is to use a family of known functions, and reconstruct the volatility surface as a weighted sum of them. As an example we can use the radial basis function (RBF) interpolation, where we reconstruct an unknown

181(6.4) L

. : : Implied volatility surface smoothing.

10

15

20

25

30

function using the form


N

( )=

0+

+
=1

(||

||)

The points that we observe are given at the nodes , for = 1 N. The radial function ( ) will determine how the impact of the value at each node behaves. Common radial functions include the Gaussian ( ) = exp 2 /(2 2 ) and the

182(6.4)

F . : Implied volatilities smoothed with the radial basis function (RBF, left) and the Nadaraya-Watson (NW, right) methods. The corresponding local volatility surfaces and the implied probability density functions for dierent horizons are also presented.

0.4 0.3 0.2 0.1 0 1300 1400 1500 1600 0 0.5 1.5 1

0.4 0.3 0.2 0.1 0 1300 1400 1500 1600 0 0.5 1.5 1

(a) implied volatility (RBF)

(b) implied volatility (NW)

1.5

1.5

0.5

0.5

0 1300 1400 1500 1600 0 0.5 1.5 1

0 1300 1400 1500 1600 0 0.5 1.5 1

(c) local volatility (RBF)

(d) local volatility (NW)

0.012

0.012

0.008

0.008

0.004

0.004

1300 1400 1500 1600 0 0.5 1

1.5

1300 1400 1500 1600 0 0.5 1

1.5

(e) implied density (RBF)

(f) implied density (NW)

multiquadratic function ( ) = 1 + ( / )2 , among others.21 The values of the parameters 0 and are determined using the observed value function at the nodes and the required degree of smoothness. Figure 6.8(a) presents a set
21

The parameter is user dened. In Matlab the RBF interpolation is implemented in the package of Alex Chirokov that can be download at .

183(6.4)

F . : Static arbitrage tests for the smoothed implied volatility functions of gure 6.8. Vertical, buttery and calendar spreads are constructed and their prices are examined. Green dotes represent spreads that have admissible prices, while red dots indicate spreads that oer arbitrage opportunities as they are violating the corresponding bounds.
1.8 1.6 1.4 1.2 maturity 1.0 0.8 0.6 0.4 0.2 0 1300 1350 1400 1450 strike 1500 1550 1600 maturity 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 1300 1350 1400 1450 strike 1500 1550 1600

(a) vertical spreads (RBF)


1.8 1.6 1.4 1.2 maturity 1.0 0.8 0.6 0.4 0.2 0 1300 1350 1400 1450 strike 1500 1550 1600 maturity 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 1300

(b) vertical spreads (NW)

1350

1400

1450 strike

1500

1550

1600

(c) buttery spreads (RBF)


1.8 1.6 1.4 1.2 maturity 1.0 0.8 0.6 0.4 0.2 0 1300 1350 1400 1450 strike 1500 1550 1600 maturity 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 1300

(d) buttery spreads (NW)

1350

1400

1450 strike

1500

1550

1600

(e) calendar spreads (RBF)

(f) calendar spreads (NW)

of observed implied volatilities together with the smoothed surface constructed using the RBF interpolation method. The implementation is given in listing 6.8. The Nadaraya-Watson (NW) smoother is another popular choice. Here the approximating function takes the form


. : : Tests for static arbitrage.

184(6.4)

10

( )=

N =1 N =1

exp( H ) exp( H )

where is the observed value at the point , and the matrix H = diag( 1 ) is user dened. This is implemented for the two-dimensional case in listing 6.7. Figure 6.8(b) gives the implied volatility surface smoothed using the NadarayaWatson method. Of course the smoothed or interpolated volatility surface can be mapped to call and put prices using the Black-Scholes formula. There is also a number of restrictions that one needs to take into account when constructing the volatility surface. In particular, it is important to verify that the resulting prices do not permit arbitrage opportunities. As shown in Carr and Madan (2005) it is straightforward to rule out static arbitrage by checking the prices of simple vertical spreads, butteries and calendar spreads. More precisely, having constructed a grid of call prices for dierent strikes 0 = K0 K1 K2 and maturities 0 = T0 T1 T2 , with C = BS S; K T (K T ) , we need to construct the following quantities 1. Vertical spreads V S = K1 1 . There should be 0 VS 1 for all K =0 1 K 1 2. Buttery spreads BS = C 1 KK+1 K 1 C + K +1 K C +1 . There should K +1 K be BS 0 for all =0 1 3. Calendar spreads CS = C +1 C . There should be CS 0 for all =0 1 In gure 6.9 we construct these tests for the resulting volatility surfaces based on the two smoothing methods, implemented in listing 6.9. With green dots we denote the points where no arbitrage opportunities exist, while red dots represent arbitrage opportunities. Both RBF and NW methods yield prices that pass the vertical spread tests. The NW smoother produces a very small number of very short away-from-the-money prices that allow the setup of buttery spreads
C C

185(6.4)

with negative value. Both methods fail the calendar spread test for far-out-ofthe-money calls with very short maturities. Nevertheless, the bid-ask spreads in these areas are wide enough to ensure that these opportunities are not actually exploitable. Overall the results are very good, but if needed one can incorporate these tests within the tting procedures, and thus nd smoothed volatility surfaces that by construction pass all three arbitrage tests. Another important feature of the implied volatility is that it should behave in a linear fashion for extreme log-strikes (Lee, 2004a; Gatheral, 2004). This indicates that it makes sense to extrapolate the implied volatility linearly to extend outside the region of observed prices. Apart from these nonparametric methods one can set up parametric curves to t the implied volatility skew. A parametric form might be less accurate, but it can oer more a robust t where the resulting prices are by construction free of arbitrage. Gatheral (2004) proposes an implied variance function for each maturity horizon, coined the stochastic volatility inspired (SVI) parameterization, of the form ( ; ) = + ( ) + ( )2 + 2

where = log(K /F ). This form always remains positive and ensures that it grows in a linear fashion for extreme log-strikes. In particular Gatheral (2004) shows that controls for the variance level, controls the angle between the two asymptotes, controls the smoothness around the turning point, controls the orientation of the skew, and shifts the skew across the moneyness level.

I
Based on the implied volatility function (T K ) the empirical pricing function is easily determined via the Black-Scholes formula P(T K ) =
BS

S0 ; T K

(T K )

It has been recognized, since Breeden and Litzenberger (1978), that the empirical pricing function can reveal information on the risk neutral probability density that is implied by the market. In particular, if Q (S) is this risk neutral probability measure of the underlying asset with horizon , then the call price can be written as the expectation

P(T K ) = exp( T )
K

(S K )dQT (S)

If we dierentiate twice with respect to the strike price, using the Leibniz rule
( )

(
( )

)d =

( )

d( ) d

( )

d( ) + d

( ) ( )

) d

186(6.4)

L . : : Construction of implied densities and the local volatility surface.


10

15

we obtain the Breeden and Litzenberger (1978) expression for the implied probability density function dQT (S) = exp( T ) 2 P(T K ) K 2 (6.1)
K =S

It is easy to compute this derivative numerically, and therefore approximate the implied density using central dierences. In particular dQT (S) exp( T ) P(T S K ) 2P(T S) + P(T S + K ) (K )2

One can recognize that the above expression is the price of 1/(K )2 units of a very tight buttery spread around S, like the one used in the static arbitrage tests above. The relation between the buttery spread and the risk neutral probability density is well known amongst practitioners, and can be used to isolate the exposure to specic ranges of the underlying. We carry out this approximation in listing 6.10, and the resulting densities are presented in gures 6.8(e,f).

L
A natural question that follows is whether or not a process exists that is consistent with the sequence of implied risk neutral densities. After all, Kolmogorovs extension theorem 1.3 postulates that given a collection of transition densities such a process might exist. Dupire (1994) recognized that one might be able to

187(6.4)

nd a diusion which is consistent with the observed option prices, constructing the so called local volatility model, where the return volatility is a deterministic function of time and the underlying asset. In a series of papers Derman and Kani (1994), Derman, Kani, and Chriss (1996), and Derman, Kani, and Zou (1996) outline the use of the local volatility function for pricing and hedging, while Barle and Cakici (1998) present a method of constructing an implied trinomial tree that is consistent with observed option prices. The dynamics of the underlying asset (under the risk neutral measure) are given by dS = S d + ( S )S dB (6.2) The popularity of the local volatility approach stems from the fact that the the steps taken in the derivation of the Black-Scholes PDE can be replicated, since the local volatility function ( S) is deterministic. In particular, the markets remain complete as there is only one source of uncertainty that can be hedged out using the underlying asset and the risk free bank account. The pricing function for any derivative under the local volatility dynamics will therefore satisfy a PDE that resembles the Black-Scholes one 1 2 ( S) + S ( S) + 2 ( S)S 2 2 ( S) = S 2 S ( S)

Of course, having a functional form for the volatility will mean that closed form expressions are unattainable even for plain vanilla contracts. Nevertheless, it is straightforward to modify the nite dierence methods that we outlined in chapter 4 (for example the -method in listing 3.3) to account for the local volatility structure. Dupire (1993) notes that if the diusion (6.2) is consistent with the risk neutral densities (6.1), then the risk neutral densities must satisfy the forward Kolmogorov equation (see section 1.6). In particular, if we denote the transition density with with Q ( K ; T S) = Q(ST dS|S = K ), then the forward Kolmogorov equation will take the form T
Q

( K ; T S) = K K
Q

( K ; T S) +

1 2 2 (T K ) 2 K 2

( K ; T S)

Given the Breeden-Litzenberger representation of the densities (6.1), we can write 2 P(T K ) Q ( K ; T S) = exp( T ) K 2 By taking the derivative with respect to T , and substituting in the forward equation we have, after some simplications, the following


K 2 P(T K ) K 2

188(6.4)

2 P(T K ) 3 P(T K ) + + 2 2 K T K K

2 P(T K ) 1 2 2 (T K )K 2 =0 2 K 2 K 2

We can integrate the above expression twice with respect to K which will eventually yield the PDE22 P(T K ) P(T K ) 1 2 2 P(T K ) + K (T K )K 2 = T K 2 K 2
0 (T )K

1 (T )

The functionals 0 (T ) and 1 (T ) appear as integration constants with respect to K , and need to be identied using some boundary behavior. In particular, we can use that as the strike price increases, K , the call prices and all derivatives will decay to zero. This will happen because the risk neutral probability density Q ( K ; T S) decays as S . In that case the left hand side that involves the derivatives will equal to zero, which implies that 0 (T ) = 1 (T ) = 0 for all maturities T . The Dupire PDE is therefore P(T K ) 1 2 2 P(T K ) P(T K ) + K (T K )K 2 =0 T K 2 K 2 This partial dierential equation resembles the Black-Scholes PDE, and is actually its adjoint in the sense that Kolmogorovs backward and forward equations are. The Black-Scholes PDE will give the evolution of the call price as we approach maturity and as the underlying asset changes, keeping the strike and maturity constant. The Dupire PDE is satised by a call option as the maturity and the strike price change, keeping the current time and current spot price constant. We can solve the above expression for the local volatility function (T K ) =
P(T K ) + K P(T K ) T K 1 2 2 P(T K ) K K 2 2

The above links the local volatility model with prices of observed call options23 and in principle it could be used to extract the local volatility function ( S) from a set of observed contracts. Unfortunately, there is a number of practical problems with this approach, which stem from the fact that the local volatility is a function of rst and second derivatives of the pricing function P(T K ). For a start, there is only a relatively small number of calls and puts available at any point in time, which means that we will need to set up some interpolation before we carry out the necessary numerical dierentiation using nite dierences.
22 23

During the second integration we use the identity K And also put options through the put-call parity.

2 P(T K ) K 2

K P(T K ) P(T K ) . K K

189(6.4)

Therefore, our results will be dependent on the interpolation scheme that we use. In addition, the observed option prices are noisy, and interpolating through their values will cause its own problems. Numerical dierentiation is unstable at the interpolating nodes, and attempting to take the second derivative is a guarantee for disaster, with the resulting local volatility surfaces varying wildly. For these reasons practitioners prefer to use a smoothing method, like the Nadaraya-Watson or the Radial Basis Function that we outlined above. The construction of the local volatility is given in listing 6.10, together with the density extraction. Figures 6.8(c,d) give the local volatility surface for the two smoothing procedures. One can observe that the two surfaces look very similar, with the RBF producing somewhat smoother time derivatives. Of course this will depend largely on the parameters that dene the smoothing procedure, which are chosen ad hoc. The local volatility function can also play the role of the risk neutral estimator of the instantaneous future volatility at time T , if the underlying asset level at this future time is equal to K . As shown in Derman and Kani (1998) 2 (T K ) = EQ (dST )2 |ST = K = EQ lim (ST + K )2 |ST = K

If one assumes a form for the implied volatility function, either using an interpolator or a smoother, it is possible to express the local volatility ( S) in terms of the implied volatility (T K )|T = K =S . This is of course feasible since the pricing function P(T K ) =
BS (

S0 ; T K

(T K ))

which can be dierentiated analytically with respect to the strike K and the maturity T . It is actually more convenient to work with the moneyness = log(K /F ) = log(K /S) + T , and also consider the implied total variance as a function of the maturity and the moneyness (T ) = T 2 (T K ). Then, as shown in Gatheral (2006) the local variance can be easily computed as 2 (T )= 1
(T ) (T ) 1 (T ) T T

2 4

(T ) 2 (T 16 2 (T )

(T

1 2 (T ) 2 2

7 Fixed income securities

Fixed income securities1 promise to pay a stream of xed amount at predened points in time. Typically, the issuers of these securities are either governments (sovereign bonds) or corporations (corporate bonds). Bonds are debt instruments, used by governments or corporations to borrow money from investors. Zero-coupon bonds oer a single xed payment, called the face value of the bond , on the maturity date T . Coupon bearing bonds also promise to pay a stream of cash ows, the coupons, in addition to the face value. In particular, a %-coupon bond will pay an amount equal to 100 of the face value per year. 1 Typically payments are made in two semi-annual installments of 2 100 each. Instruments with short maturities, like the US Treasury bills, are typically zero coupon. Longer maturity instruments, like the US Treasury bonds, are typically coupon bearing. Corporate bonds also typically bear coupons. In most cases, when a new coupon bearing instrument is introduced, the coupons are chosen as for the instrument to sell at par. This means that its initial price is approximately equal to its face value. Then, the coupon reects the rate of interest: for example if a sovereign 6% 30 year bond will face value $100 is issued and sells at par, then the buyer will lend the government today $100, and will 1 6 receive 2 100 $100 = $3 every six months for the next 30 years, plus $100 on the bond maturity.

. Y
Of course, the bond will almost never sell at exactly its par value. The yield of the bond is the equivalent constant rate of interest that is able to replicate all cashows to maturity, when investing an amount equal to the current bond price. It is obvious that the frequency that one reinvests the proceeds will be a
1

In this chapter we will call all xed income securities bonds, although in reality the word bond is reserved for instruments with relatively long maturities. Shorter instruments in the US are called bills or notes.

192(7.1)

factor that will aect the bond yield. When the yield of an instrument is quoted, it is important to know what compounding method has been used, in order to truly compare bonds. In particular, let P denote the price of an instrument at time (measured in years). The simple yield 1 ( 1 2 ), between two dates 1 and 2 , satises P2 =1+ P1
1( 1 2 )( 2

1)

1( 1

2)

=
2

P2 1 P1

The simple yield is the return of an investment equal to P 1 that is initiated at time 1 , and is then liquidated at time 2 for a price P 2 . There is no intermediate reinvestment of any possible proceedings. Of course one could sell the instrument at the intermediate time = 1 + 2 2 for a price P , and reinvest this amount for the remaining time to 2 . Say that the yield of this strategy is denoted 2 ( 1 2 ). In that case the two simple investments will satisfy P2 = 1+ P P = 1+ P1
2( 1 2( 1 2 )( 2 2 )(

1)

Multiplying the will give the yield if we compound twice during the life of the bond, namely P2 = P1 1+
2( 1 2) 2

2( 1

2)

2 2

P2 P1

1/2

More generally, if we compound times over the life of the bond we can follow the same procedure to deduce the yield ( 1 2) P2 = P1 1+ (
1 2) 2

2) =

P2 P1

1/

If we pass to the limit we can recover the continuously compounded return ( 1 2 ) using the fact that lim (1 + / ) = exp( ) P2 = lim 1 + P1
( 1 2) 2

= exp

( 1 ( 1

2 )( 2 2)

1) 1 2 log
1

P2 P1

We will work with the continuous compounded return from now on, and we will drop the subscript , writing instead ( T ) = ( T ) = (T ). We will also denote with P( T ) = P (T ) the price at time that matures at time T .

193(7.2)

Dierent instruments are quoted using dierent market conventions. In addition to the compounding, there are also conventions in the way time intervals are computed. For example, some instruments are quoted assuming that the month has 30 days and the year 360 (the 30/360 convention). This means that in order to convert months to days we assume that each month has 30 days, and to convert days to years we assume that each year has 360 days. Other instruments might be quoted using the ACT/365, ACT/ACT or more rarely the conventions.2 If we are dealing with coupon bearing bonds, then we would need to decompose the coupon payments and take their present values individually. As an example, say that we are interested in a zero-coupon bond that has 22 months and 12 days to maturity, and we are quoted a yield of 5 5%. What is the value of this bond today, assuming that the face value is $100? If this bond is compounded semiannually and the 30/360 convention is used, then we would compute the time interval as 22m + 12d = 1y + 10m + 12d = 1y + 300d + 12d = 1y + 312d = 1 866y Compounding will take place at the points 0 = 0 366y, 1 = 0 866y, 2 = 1 366y and 3 = 1 866y. Therefore, if we denote with bond prices will satisfy 100 = (1 + 0 055 0 366) (1 + 0 055 0 5)3 = P1 The nancial toolbox of Matlab has a number of functions that convert between dierent day count conventions, and the appropriate discount factors. Here, since the markets are set up in continuous time we will use continuous compounding.

. T
At each point in time we have the opportunity to invest in instruments of dierent maturities , each oering a particular yield ( ). The mapping ( +) is called the yield curve. Essentially it represents the annualized return that is guaranteed by a zero-coupon bond with maturity . Observed yield curves are typically upward sloping, with the yields for long maturities being higher than the short ones. Such yield curves are called normal to illustrate that this pattern is the most common. Having said that, at or inverted (downward sloping) yield curves are also occasionally observed. A humped yield curve pattern is rarely encountered.

T
2

-S

-S

More information can be found at the International Swaps and Derivatives Association website ( ), and in particular in ISDA (1998).

194(7.2)

F . : Examples of yield curves using the Nelson-Siegel-Svensson parametrization. The parametric form is able to produce curves that exhibit the basic yield curve shapes.
5.0 4.8 4.6 4.4 4.2 yield 4.0 3.8 3.6 3.4 3.2 3.0 0 1 2 3 4 5 maturity 6 7 8 9 10 at normal inverted humped

Nelson and Siegel (1987) and Svensson (1994), collectively denoted with NSS, discuss various parametric forms of the yield curve, summarized in the form 0 + 1 1 /1 + 2 /1
/2

(1 /2

/2

There are ve parameters in the NSS expression, which control dierent aspects of the yield curve shape. In particular, 0 can be used to shift the yield curve up and down, therefore dening the yields for long maturities. 1 controls the amount of curvature that the curve exhibits, while 3 is responsible for a potential hump. The parameters 1 and 2 will determine at which maturities the curvature and the hump are most pronounced. Figure 7.1 shows some basic yield curves patterns that can be produced using the NSS approach. The original paper of Nelson and Siegel (1987) used only the rst two components, allowing for level and curvature eects. Svensson (1994) augmented the formula with the hump component. In reality we only observe yields for a relatively small number of maturities, and the quotes can be contaminated with noise. This can be due to non synchronous trading, illiquidity and other microstructure issues. The NSS functionals can be used to smooth the observed yield curve, and to interpolate for maturities that are not directly traded. Also, generalized versions of the NSS approach can be used to study the dynamics of the yield curve in time.

195(7.2)

L . : : Yields based on the Nelson-SiegelSvensson parametrization.


. : Historical yield curve dynamics.

6 yield

2 20 2007 2006 2005 10 2004 time 2003 2002 2001 maturity

Listing 7.1 gives a simple Matlab code that implements the NSS formula. Individual yield curves can be used to calibrate this formula and retrieve the corresponding parameters. In this chapter we are interested in the construction of mathematical models that have two desirable and very signicant features. On one hand, they should have the potential to reproduce observed yield curves. In addition, they must be able to capture the evolution of the yield curve through time, in order to oer reliable prices for derivative contracts that are based on future yields or bond prices.


L . : yield curve.

196(7.2)

: Calibration of the Nelson-Siegel formula to a

10

15

T
Figure 7.2 gives historical yield curves over the period 2001-07. A few casual observations can be made, which can oer valuable insight on the stylized facts that a xed income model should adhere to. It appears that the yield curve is indeed typically upward sloping, with a few instances where it is at or slightly inverted. There is no signicant hump-ness in this particular dataset. The short end of the yield curve appears to be a lot more volatile than the long end, which is relatively stable. Also, yields of dierent maturities do not tend to move in opposite directions. In contrast they seem to be quite strongly correlated. We can use these yields to recover the parameters of the NSS formula. In this particular instance we assume that 2 = 2 = 0, as the yields in the dataset do not exhibit a humped pattern. An example of how one can calibrate these parameters is given in listing 7.2. Figure 7.3 shows these parameter estimates through time. The parameter 0 corresponds to the maximum yield across dierent maturities. As the long term bond yields gradually dropped throughout the sample period, 0 also decreases to reect that. As illustrated by the time path of the parameter 1 , the yield curve became slightly more convex in the rst half of the period, attening quite rapidly afterwards. Parameter 1 shows that the short end of the yield curve rose steeply between mid-2002 to mid-2003.

T
Dierent points on the yield curve provides us with the risk free rates of return, for investments that commence now (that is time in our notation), and mature

197(7.2)

F . : Historical Nelson-Siegel parameters. The Nelson and Siegel (1987) formula 0 + 1 1 exp(/1) / /1 is calibrated to the yields of gure 7.2 and the parameters are presented below. The level parameter 0 , the convexity magnitude parameter 1 and the convexity steepness parameter 1 are given.
7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 0 -1 -2 -3 -4 -5 -6 -7 12 10 8 6 4 2 0

2002

2003

2004

2005

2006

at dierent times in the future. The yield curve also denes the forward rates, which are the xed rates of return that are set and reserved at time , but will be applicable over a future time period. In particular, say that we select two points on the yield curve, for bonds that mature at times T and T , with T > T > . The prices of these bonds will be P (T ) and P (T ), respectively. Now assume that we are interested in setting the forward (continuously compounded) rate of interest for an investment that will commence at time T and will mature at T , which we will denote with (T T ). Consider the following two investments over the period [ T ]: 1. Buy one risk free bond that matures at time T . This will cost P (T ) today, and will deliver one pound at time T . 2. Buy P (T )/P (T ) units of the risk free bond that matures at time T . Also enter a forward contract to invest risk-free over the period [T T ], at the rate (T T ). This strategy will also cost P (T ) today, as it is free to enter a forward contract. The rst leg will deliver P (T )/P (T ) pounds at time T ,

198(7.3)

which will be invested at the forward rate. Therefore at time T this strategy will deliver exp{ (T T ) (T T )} P (T )/P (T ). These two strategies have the same initial cost to set up, the same maturity, and are both risk free. Therefore they should deliver the same amount on the maturity date T , otherwise arbitrage opportunities would arise. For example, if the second strategy was delivering more than one pound at time T , then one would borrow P (T ) at the risk free rate to enter the second strategy with zero cost at time zero. Therefore, the arbitrage free forward rate will satisfy P (T ) = P (T ) exp { (T T ) (T T )} (T T ) = 1 P (T ) log T T P( T )

If we let the time between the two maturities shrink down to zero, by letting for example T T , we dene the (instantaneous) forward rate. This is essentially the short rate that we can reserve today, but will applied at time T (T ) = lim log P (T ) log P (T ) T T = log P (T ) = T (T ) + (T ) (T ) T

T T

Forward rates for dierent maturities dene the forward curve. There is a correspondence between the yield and forward curves, and knowing one leads to the other.

. T
Historically, the rst family of models introduced in the xed income literature were the so called short rate or one-factor models. The main underlying assumption is that there is a unique Brownian motion that is responsible for the uncertainty in the economy. More formally, we start with a ltered probability space ( F {F } 0 P), and consider a Brownian motion B with respect to this probability measure. The main ingredient of the one-factor model is the short rate process that is to say the process of the instantaneous risk free rate . Essentially, this is the rate oered by the bank account or current account, which is not xed for any period of time, but is nevertheless risk free during the innitesimal period ( + d ). The investor is not bound for any maturity and can withdraw funds from (or add funds to) this account freely, without incurring any penalties. This is in contrast to other nancial assets, like the ones introduced in the BlackScholes paradigm, where the return over this innitesimal period is random.

199(7.3)

Of course, investing in the bank account over a longer period of time is not a risk free investment, since the short rate will change. Having said that, one can show that if the short rate is process that drives the economy, then bonds with dierent maturities can be priced in a consistent way that does not permit arbitrage opportunities. This means that eventually we will show that all bonds can be priced relative to each other in a unique way. Intuitively, one can think of bonds as derivatives which are contingent on the future realizations of the short rate. To put things more concretely, the current account will satisfy the ordinary dierential equation dB = B d B = B0 exp
0

S
As we argued above, the short rate process evolves in a stochastic way, and the uncertainty is described by a Brownian motion B . We can therefore cast the short rate process as a SDE d = ( )d + ( )dB

Our objective is to establish prices for bonds with dierent maturities. The only constraint that we need to take into account is that the prices of these bonds must rule out any arbitrage opportunities. In all generality, the price (at time ) of a bond with maturity T can depend at most on the time and the short rate level , that is to say P (T ) = ( ( ); T ) This formalizes the statement we made above, that the bond is a derivative on the short rate. It appears that the setting is similar to the one in equity derivative pricing, if we consider the short rate as the underlying asset. In particular we can see the analogy equity price: dS = ( S ) + ( S )dB interest rate: d = ( ) + ( )dB In both cases we want to establish a derivative pricing relationship equity derivative: P = ( S) bond price: P (T ) = ( ;T) Although the two settings appear to be very similar, there is a very signicant dierence: Unlike equities, the short rate is not a traded asset. This means that we cannot buy or sell the short rate, and therefore we cannot construct the necessary risk free positions that produced the Black-Scholes PDE. The market, as we constructed it, is incomplete.

200(7.3)

In fact, the pricing of bonds has more common features with the pricing of options under stochastic volatility, where again we introduced a non-traded factor (the volatility of the equity returns). Then (section 6.3) we constructed a portfolio of two options, in order to solve for the price of volatility risk. Here we will use the same trick, namely to construct a portfolio of two bonds with dierent maturities, and investigate the conditions that would make it (instantaneously) risk free. This will naturally introduce the price of short rate risk that will be unknown; we will be able to determine this price of risk by calibrating the model on the observed yield curve, in the same spirit as the calibration of SV models on the implied volatility surface. These are summarized in the following table non-traded asset: used to hedge: calibrate on: equity SV volatility 2 options IV surface xed income short rate 2 bonds yield curve

T
Let us consider two bonds with maturities T1 and T2 , and say that their prices are given by the functions P( T ) = ( ;T ) = ( ), for = 1 2. Applying Its formula to the pricing function will give the dynamics of the bond prices, o namely dP (T ) = (T )d + (T )dB with (T ) = ( ) ) + ( ) ) ( ) 1 + 2( 2 ) 2 ( 2 )

(T ) = (

Note that both bonds will depend on the same Brownian motion B , as this is the only source of uncertainty that aects the bond dynamics through the short rate. Say that we sell the rst bond and buy units of the second one. The portfolio will have value = P (T1 ) P (T2 ), and will obey the SDE d = dP (T1 ) dP (T2 ) Our aim is to construct a risk free portfolio; therefore, to eliminate dependence on dB we choose the portfolio as = (T1 ) 1( = (T2 ) 2( )/ )/

Then the portfolio will evolve according to the ordinary dierential equation d = (T1 ) (T1 ) (T2 ) d (T2 )

(7.1)

201(7.3)

Since the portfolio is now risk free it must grow as the current account, at rate . If that were not the case, arbitrage opportunities would appear. This means that d = d =
1(

(T1 ) (T2 )

2(

(7.2)

Equating (7.1) and (7.2) yields the consistency relationship (T1 ) ( ; T1 ) (T1 ) = (T2 ) ( ; T2 ) (T2 )

Now we invoke the same line of argument that we used in section 6.3. In order to set up the above relationship we did not explicitly specify a particular pair of bonds, and it will therefore hold for any pair of maturities. Thus, for any set of maturities T1 T2 T3 T4 we can write (T1 ) ( ; T1 ) (T1 ) (T2 ) ( ; T2 ) (T2 ) (T3 ) ( ; T3 ) = (T3 ) =

(T4 ) ( ; T4 ) (T4 )

Therefore the ratio cannot depend on the particular bond maturities, it can at most depend on ( ), say that it is equal to ( ). This means that we can write (T ) ( ;T) = ( ) (T ) for any maturity T . Essentially we have managed to derive the PDE that the bond pricing formula has to satisfy, in order to rule out arbitrage opportunities. We can thus drop the maturity T , as it is not aecting the PDE in any way, and write ( ) + {( ) ( ) ( )} ( ) 1 + 2( 2 ) 2 ( ) = ( 2 )

This PDE is called the term structure PDE, and a boundary condition is needed in order to solve it analytically or numerically. For a zero-coupon bond that matures at time T the boundary condition for this PDE will be (T ; T ) = 1. Although the PDE is called the term structure PDE, we never used the fact the the instruments are actually bonds. The quantities T can be thought as indices for dierent interest rate sensitive instruments: bond options, caps, oors or swaptions will all satisfy the term structure PDE. In general, any contingent claim that promises to pay ( (T )) at time T will satisfy the same PDE, with boundary condition (T ) = ( )

202(7.3)

T
The price of risk functional ( ) can be freely selected, as long as it does not permit arbitrage opportunities. Intuitively, it seems to be a good idea to ensure that the function of the spot rate ( ) remains bounded for all times . This would ensure that the coecients of the PDE will not explode at any nite time, and therefore a solution will exist. Another way of viewing this kind of restriction is by considering the equivalent probability measure, under which pricing takes place. Essentially, if we denote with Q ( ) = ( ) ( ) ( ), then the PDE becomes ( ) + Q ( ) ( ) 1 + 2( 2 ) 2 ( ) = ( 2 )

This PDE will be solved subject to the boundary condition (T ) = ( ). For example, for zero-coupon bonds that mature at T we will have ( ) = 1. The Feynman-Kac theorem postulates that the solution of this PDE can be expressed as an expectation ( ) = EQ exp
T

( T )

= is given by

If BQ is a Brownian motion under Q, then the process for d = Q ( )d + ( )dBQ = ( ) ( )d + (

){dBQ (

)d }

The probability measure Q should be equivalent to the true measure P, otherwise arbitrage opportunities would be possible (this is due to the fundamental )d , which theorem of asset pricing). We can also write BQ = B + 0 ( suggests that in fact dQ ( )= dP F That is to say the process = ( ) is the Radon-Nikodym derivative of the risk adjusted measure with respect to the true one. In order for this to be a valid measure, the Novikov condition must be satised, namely that the following expectation is nite for all E exp
0

2 (

)d

<

It is apparent that if we require ( ) to be bounded for all , then the above expectation will also be bounded. This is a feature that is shared by most models for the short rate. Since we are observing bonds which are priced under Q, it is impossible to explicitly decompose from the true short rate drift . The best we can do,

203(7.4)

given this information, is calibrating the short rate model under risk neutrality, and therefore recovering Q . If our purpose it to price interest rate sensitive securities this does not pose a problem, as pricing will also take place under Q. Having said that, we might be interested in the true short rate process, perhaps for risk management which takes place under the objective probability measure. In that case we can recover the price of risk and the true drift using ltering methods, for example a version of the Kalman lter. There is a very extensive literature that investigates the determinants of the yield curve, trying to explain why it takes its various shapes and what makes it evolve, for example from a normal to an inverted one. The same factors will of course also inuence the risk premium ( ). Some of the standard term structure theories include the following 1. The pure expectations hypothesis assumes that bonds are perfect substitutes. Bond prices are determined from the expectations of future short rates. As the short rate evolves and these expectations vary, the yield curve will shift to accommodate them. Very high spot rates could therefore imply an inverted yield curve. 2. Market segmentation takes an opposite view. Short and long bonds are not substitutes, due to taxation and dierent investor objectives. For example pension funds might only be interested in the long end of the curve, while hedge funds could be willing to invest in short maturity instruments. The prices for bonds of dierent yield ranges are determined independently. 3. Somewhat between the above two extreme points lies the theory of preferred habitat. Investors forecast future rates, but also have a set investment horizon, demanding an extra premium to invest in bonds outside their preferred maturity ranges. As short term investors outnumber long term ones, prices of long maturity bonds will be relatively lower, rendering a normal term structure. This will be inverted if expectations change suciently. 4. The liquidity preferences theory goes one step further and states that investors will demand an extra premium for having their money tied up for a longer period. Long maturity bonds will therefore have to oer higher yields to reect this premium. As it is naturally expected, all factor will inuence the term structure behavior to some extent at each point in time.

. O

Following the discussion of the previous section, we are looking for specications for the short rate under Q. From now one we will be working only under the risk neutral measure, therefore unless otherwise stated we will drop the superscript Q. Our objective is to consider parametric forms ( ) and ( ) that dene the dynamics of the SDE for the short rate


d = ( )d + ( )dB

204(7.4)

In selecting and we need to keep in mind some stylized facts of interest rates, and some desirable properties of interest rate models The short rate and yields for all maturities are always positive. The process is stationary, in the sense that there is a long run distribution for the short rate. This indicates that the short rate should be allowed to increase without bound, and some sort of mean reversion should be present. As interest rates increase they become more volatile. The term structure of interest rates can be upward sloping, downward sloping or humped. The model should be capable of producing dierent yield curve shapes. The short end of the yield curve is substantially more volatile than the long end. The long end appears to evolve in a much smoother way. Yields for dierent maturities are correlated (and ones for adjacent maturities very strongly correlated), but not perfectly so. Finally, for an interest rate model to be operational, it should oer bond and bond derivatives in closed form (or at least in a form that is readily computable).

The rst generation of one-factor models assumed a time-homogeneous structure that lead to tractable expressions for bond prices. The Vasiek (1977) modelcasts the short rate as an Ornstein-Uhlenbeck process, namely d = d + dB

In the Vasicek framework the short rate is Gaussian, a feature that leads to closed form solutions for a number of instruments. For that reason the Vasicek specication is still used by some practitioners today. In particular |
0

N (exp(

) exp(

))

If we assume a constant price of risk ( dynamics of the sort rate are d = Q

) = , then under risk neutrality the d + dBQ

for Q = + . This indicates that as investors are risk averse, they behave as if the long run attractor of the short rate is higher than what it actually is. The pricing functions of interest rate sensitive securities will satisfy the PDE

+ Q

1 2 ( ) + 2 = ( 2 2

205(7.4)

For example, in the case of a bond that matures at time T the terminal condition will be (T ; T ) = 1, and we guess the solution of the PDE to be of the exponential ane ( ; T ) = exp C( ; T ) + D( ; T )

If we substitute this expression in the PDE we can write C + Q D + 2 D 2 /2 + D D 1 =0

As the PDE has to be satised for all initial spot rates , we conclude that both square brackets must be equal to zero, and that C(T ; T ) = D(T ; T ) = 0. Therefore we recover a system of ODEs for the functionals C and D, namely3 1 C ( ; T ) + Q D( ; T ) + 2 D 2 ( ; T ) = 0 2 D ( ; T ) D( ; T ) = 1 C(T ; T ) = 0 D(T ; T ) = 0 The solution of the above system will give the Vasicek bond pricing formula, namely D( ; T ) = 1 exp{(T )} [D( ; T ) (T )] [ 2Q 2 /2] 2 D 2 ( ; T ) C( ; T ) = 2 4

One important feature of the Vasicek model is the mean reversion it exhibits. In particular, the short rate of interest is attracted towards a long run value . The strength of this mean reversion is controlled by the parameter . Intuitively, the half life of the conditional expectation is 1/, which means that if the short rate is at level at time , then it is expected to cover half its distance from the long run value in 1/ years. The main shortfall of the Vasicek model is that it permits the short rate to take negative values. This happens because the short rate is normally distributed, and therefore can take values over the real line. As bond prices are exponentially ane with the short rate, and future short rates are normally distributed, it is easy to infer that future bond prices will follow the lognormal distribution. Therefore bond options will be priced with formulas similar to the Black-Scholes one for equity options. In particular, the price of a call option with strike price K that matures at time , written on a zero coupon bond that pays one pound at time T > will be equal to
3

Here we follow the approach outlined in Due and Kan (1996) for general ane structures. Such systems of ODEs that are linear-quadratic are known as Ricatti equations.

206(7.4)

C ( K ; T ) = P (T )N( + ) K P ()N( 1 P (T ) where = log K P () 2 and =

1 exp{2} [1 exp{(T )}] 2

L
The main shortcoming of the Vasicek model is that it permits negative nominal interest rates. One straightforward way around this problem is to cast the problem in terms of the logarithm of the short rate. The rst application of this idea can be found in Dothan (1978) model, which species d = d + dB , or d log = 1 2 d + dB 2

Here the short rate follows the geometric Brownian motion, just like the underlying stock in the Black-Scholes paradigm. The short rate is log-normally distributed, and therefore takes only positive values. On the other hand, there is no mean reversion present, and the long run forecast for the short rate will either be explosive (if > 2 /2) or zero (if < 2 /2). For that reason the Dothan model is not popular for modeling purposes. Another approach is casting the logarithm of the short rate to follow the Ornstein-Uhlenbeck process, giving rise to the exponential Vasicek model d log = (log log )d + dB

In section 7.5 we will discuss the numerical implementation of a popular extension of this model, due to Black and Karasinski (1991). An important feature of all lognormal models is the so-called explosive behavior of the bank account (see for example the discussion in Brigo and Mercurio, 2001; Sandmann and Sondermann, 1997). Loosely speaking, if the yield is lognormally distributed, then the expected bank account is given by an expression of the form 2 EB = E exp{exp{Z }}, with Z N(Z Z ) It turns out that this expectation is innite for all values of Z and Z . That means that, according to lognormal models, even investing for a very short horizon (where the yield is approximately normal) oers innite expected returns. Technically speaking, the right tail of the lognormal distribution does not decay fast enough, and this is the reason of the innite expectation.

CIR

The most popular member across the one factor model family is without doubt the one proposed in Cox et al. (1985, CIR). The short rate follows the square

207(7.4) root or Feller process.4 d = ( ) d + dB

The CIR model is able to capture most of the desired properties of short rate models. The process is mean reverting, with the long run attractor equal to . The speed of mean reversion is controlled by the parameter . As the short rate increases, its volatility also increases, at a degree which is dictated by . CIR show that the transition density of the process is a non-central chi-square. In particular, for 2
T

2 4

with

2 exp{(T )} 2 2 = 2 (1 exp{(T )})

Having the transition density in closed form allows us to calibrate the parameters to a set of historical data. Unfortunately, the short rate is not directly observed, but practitioners use yields of bonds with short maturities as a proxy for the dynamics. More elaborate methods involve (Kalman) ltering and are discussed later. One can readily compute the expected value and the variance of the short rate process, in particular E [ T | ] = + exp{(T )}( ) V[ T| ] = 2 (1 exp{(T )}) [ + (2 ) exp{(T )}] 2

Also, as the forecasting horizon increases, the stationary (unconditional) distribution of the short rate is Gamma 2 2 2 2

The instantaneous variance of the square root process is proportional to its level. For that reason, if the short rate reaches zero the stochastic component disappears, and the process will revert towards its positive long run mean. Therefore the CIR model rules out negative short rates, 0 for all . In particular, Feller (1951) shows that if the condition 2 > 2 is satised, then the mean reversion is strong enough for the process never to reach zero. In that case the inequality is strict, > 0 for all . CIR provide a bond pricing formula which also takes the exponentially ane form
4

Discussed in detail in Feller (1951).


F . : Simulation of CIR yield curves.
Time series of yield curves

208(7.5)

11 10 9

10
8

8
7 rate

6
6 5 4 3 2 1

yield

4 2 0 30 20 10 6 4 0 2 0 maturity 10 8

10

15 time

20

25

30

time

(a) short rate

(b) yield curves

P (T ) = exp (C( ; T ) + D( ; T ) 0 ) , with 2 2 exp{( + )(T )/2} C( ; T ) = 2 log ( + )(exp{(T )} 1) + 2 2(exp{(T )} 1) D( ; T ) = ( + )(exp{(T )} 1) + 2 = 2 + 2 2

Option prices also take a (relatively) simple form, being dependent on the cumulative densities of non-central chi-square distributions C ( K ; T ) = P (T ) 2 ( with
1 1 ; 1

2 ) K P () 2 (

2 ; 1

3 )

= 2 [

+ D(; T )], 2
2 1 0

= 2 [

2] 2 1 0

4 1 = 2 ,

exp{( )} 2 2 = , 2 = 1 + 2 + D(; T ) + 2 , 2 = 1 = 2 (exp{( )} 1) 2 1 C(; T ) = 2 + 2 2 , = log D(; T ) K

exp{( )} 1+ 2

. M
The one factor models we described above have a nite number of parameters. Although some models can give exible yield curve shapes, and conform to the stylized facts (for example CIR), they cannot match the observed yield curve

209(7.5)

exactly. The problem is of course that a large (or innite if we decide to interpolate) number of bonds have to be matched using a nite number of parameters and a given parametric form: the distance between model and observed prices can be minimized but not set to zero. This means that for (practically) all maturities the bonds will be mispriced. This might appear to be a critical drawback, as one is not required to trade at the model price. It becomes a more serious aw when one considers derivatives, where small discrepancies will be magnied, and in fact arbitrage opportunities will emerge. Models with time varying parameters were set up to create that capture any initial yield curve, and therefore are at least arbitrage-free when it comes to pricing xed income derivatives. Such models will assume that one or more parameters are deterministic functions of time, carefully chosen in a way that ensures that a term structure is perfectly replicated. Therefore a standard input for such models would be the current observed yield curve. State-of-the-art variants also be calibrated on implied volatility curves (from caplets, caps or swaption prices).

H -L

The rst one-factor models with varying parameters proposed in the literature was Ho and Lee (1986). The underlying assumption was the short rate of interest is a simple random walk with drift d = d + dB The drift is a deterministic function of time. In particular, the bond prices will satisfy
T

P (T ) = E exp

d
T

= E exp

d +

dB

Changing the order of integration we conclude that


T T

P (T ) = E exp (T )
T

d d
T T

dB d
T

= E exp (T ) = E exp (T )

d d
T

d dB
T

(T ) d

(T )dB

The last integral is actually a normally distributed random variable, following Its isometry o

210(7.5) (T )2 d =N 0 2 (T )3 3

(T )dB N 0 2

Therefore the expectation can be computed in closed form, implying a yield to maturity
T

P (T ) = exp (T )

(T ) d (T ) = +

2 (T )3 6
T

T 2 d + (T )2 T 6

The above expression in fact is the functional form for the yield curve, if the time varying drift functional was known. It turns out that it is more convenient to use the forward curve instead. In particular, applying the Leibnitz rule for dierentiation, yields (T ) = log P (T ) = T
T

d +

2 (T )2 2

If we dierentiate the forward curve with respect to maturity we can achieve an expression for (T ) T = + 2 (T ) T Knowing the drift functional can lead to prices for bonds and bond options that are easy to compute. Using the above relationship we can write
T

P (T ) = exp (T )

( )d

2 (T )3 3

As bond prices are lognormally distributed, options on these bond can be priced using a formula that is analogous to the Black-Scholes one

-W

The breakthrough of the Ho-Lee model was that it provided a structure where the observed yield curve is perfectly matched, not allowing for arbitrage opportunities between model and observed prices. Having said that, it has two signicant drawbacks, as there is no mean reversion present and the normality assumption allows negative nominal interest rates. In particular, not exhibiting mean reversion means that the distribution for the short rate widens with the time horizon, and the probability of negative rates increases. Hull and White (1990) take the Ho-Lee model one step further, and construct a model that exhibits mean reversion in the spirit of the Vasicek framework. For that reason the Hull-White model is also known as the extended Vasicek model. The short rate is given by

211(7.5) d = ( ) d + dB

Now the short rate will revert towards /, with a deterministic function of time. Although negative rates are permitted, in many cases the presence of mean reversion ensures that their probabilities are fairly small. Using exactly the same arguments as the ones in the Ho-Lee case, we can solve for the functional T in terms of the forward curve (T )

I
Models with time varying parameters give bond and bond option prices that are expressed as integrals of the forward curve. In practice, such models and their extensions are implemented through trees. In particular, the seminal papers of Hull and White (1994, 1996, hencforth HW) show how one can produce trinomial trees that will approximate a generic model of the form d = ( ) d + dB = ( )

The state variable follows a generalized Ornstein-Uhlenbeck process. The mean reversion level, the speed of mean reversion and the volatility are allowed to be deterministic functions of time. The short rate process is given as transformation of this state variable. Typical transformations are the identity () = and the exponential () = exp{}.

C
The calibration of an interest rate tree using the HW methodology is carried out in two stages, rst building an auxiliary tree and then adjusting it to match the observed yield curve. The rst stage In the rst stage a trinomial tree is built that reverts to zero, approximating the diusion d = d + dB As an example we will assume that the mean reversion parameter and the volatility are constant, but extensions are straightforward, if one wishes to render them time-varying. The tree that approximates the process is constructed recursively. Let us assume that the tree has been constructed up to time , and denote with its . We will show how to select the nodes discretized values , for = and the transition probabilities that will grow this tree to time + . The rst step is to select the grid spacing across the state space, for which HW suggest = 3

212(7.5)

. :

: Create Hull-White trees for the short rate.

10

15

20

25

30

35

213(7.5)

We will also assume that the discretization across time is done in equal time steps, and therefore the space step is also the same through time. The implementation in listing 7.3 relaxes all these assumptions and constructs a tree with time varying and , and also allows for variable time steps. We then construct the grid at time which extends across 2 + 1 elements (the choice for will be discussed shortly) = { : = 0 }

Typically, from the point = the process can move to the nodes { +1 1 }. Then, one can solve a system that matches the instantaneous drift and volatility for the probabilities { + 0 } = 2 2 2 ()2 ( )2 + () + () = +
+ + 2

=1

The solution sets


+

1 1 + [1 ] 6 2 2 = 2 2 ( )2 3 1 1 = + [1 + ] 6 2

for all = . If all those probabilities are positive, then the tree will grow and will have 2( + 1) + 1 elements in the next time period. Encountering negative probabilities indicates that the mean reversion of the tree is too strong at these nodes for this particular transition structure. The geometry of the tree will then change, and the tree will stop growing. For example, having + < 0 indicates that the mean reversion is pushing the process towards zero quite strongly, and we therefore have to change the geometry of the tree and consider transitions towards the nodes { 1 2 }. Of course, due to symmetry we will encounter negative on the other end of the grid, suggesting transitions towards the nodes { +2 +1 }. Solving for these alternative transition geometries yields
0

7 1 + [ 3] 6 2 1 = [ 2] 3 1 1 = + [ 1] 6 2 =

and


++

214(7.5)

1 1 + [ + 1] 6 2 1 + = [ + 2] 3 7 1 + [ + 3] 0 = 6 2 As we noted, in such cases the tree will not grow and the next set of nodes will also have +1 elements. The top half of listing 7.3 implements this method for a more general setting, when the time steps, the mean reversion and the volatility are all time varying. It makes sense to select the value of as the rst one for which the volatility geometry changes from the standard one to the ones that force mean reversion: = = max round The second stage In the second stage the nodes of the tree that replicates the process are shifted up or down in order to match the dynamics that price bonds exactly. This creates the trinomial tree that will be approximating . To this end, when calibrating a HW tree we make use of the so called Arrow-Debreu (AD) state prices, which we dene now. An Arrow-Debreu security is a generic contingent claim that will pay one monetary unit if a certain event is realized at a particular point in time. Otherwise the AD security pays nothing. The AD state price is the price of this security. It is easy to see that AD securities can be used as building blocks to construct more complex payos. In models with a continuous state space, the Arrow-Debreu security is a European style contract that pays o the Dirac delta function on its maturity. In the context of HW trees the state space is discretized, as at time T the short rate can take one out of KT possible values. Then, an AD security will pay one pound if the short rate is at its -th value at time T , and zero otherwise. We denote with Q ( T ) the price of this AD security at time zero, the AD state price. One can readily observe that if we purchase all Arrow-Debreu securities that mature at time T , then we are sure to receive one pound on that date. Eectively we have constructed the payo of the risk free bond. Arbitrage arguments will then indicate that the sum of all AD state prices across states will equal the price of a zero coupon bond.
KT

(1 )

+1

Q(
=1

T ) = P (T )

We can also construct an inductive relationship that links AD securities with successive maturities T and T +1. In particular, like any other security, under the

215(7.5)

risk neutral probability measure, discounted AD securities will form martingales. Suppose that the tree can take one of K dierent values at time , and denote with the actual level of the tree at that time, with {1 2 K }. Then we can write Q( T ) = EQ [QT ( T )] = EQ I(
T

= ) 1 = EQ BT BT

PQ [

= ]

Conditioning on the state at time T 1, and using the denition of the bank account process, allows us to expand the conditional expectation as EQ 1 BT =
T KT 1 =1

= EQ 1 BT 1 EQ
T 1

T 1

PQ [

T 1

= |

= ]

KT 1 =1

1 BT 1
KT 1 =1

T 1

PQ [

T 1

= |

= ]

EQ

1 BT 1
T 1

PQ [

T 1

= |

= ]

Bayes rule will provide us with PQ [


T 1

= |

= ] = PQ [

= |

T 1

= ]

PQ [ T 1 = ] PQ [ T = ]

The quantity ( ) = PQ [ T = | T 1 = ] is just the (risk neutral) transitional probability of moving from state to state at time . The AD state price is then simplied to Q( T) =
KT 1 =1

)EQ

1 BT 1
T 1

=
KT 1 =1

PQ [

T 1

= ]

)Q ( T 1)

In the above expression = ( ), and = + . Essentially, in order to t the observed yield curve one has to solve numerically for the value of at each maturity horizon T . If one also renders the volatility and/or the speed of mean reversion time varying, then the parameters have to be calibrated on a richer set of data that will identify these parameter values. Typically, deterministic volatility functions are chosen as to match implied volatilities that are derived from caps or swaptions.

216(7.5)

L . : : Compute the price path of a payo based on a HullWhite tree for the short rate.

10

15

Overall, the construction of an interest rate tree resembles the local volatility models for equity derivatives. In both frameworks we attempt to exactly replicate a market implied curve or surface. One has to keep in mind the dangers of overtting, which would introduce spurious qualities into the model. In many cases market quotes of illiquid instruments can severely distort the model behavior. Pricing and price paths After the tree has been tted to the yield curve, we can proceed to pricing various interest rate sensitive instruments, such as bond options, interest rate caps, oor, swaps or swaptions. Essentially we can nd the fair value of a given stream of contingent cashows, in a way that is consistent with the prices of risk free bonds. If there are no early exercise features, prices of contingent claims can be computed by summing up the corresponding AD security prices. In many cases we are not only interested in the fair value of the contract, but also on its price path. For example, in order to nd the fair value of a put option with three year maturity, which is written on a ten-year bond, we need to consider the price paths of the ten-year bond, in order to ascertain the option payos. Price paths can be easily computed by iterating backwards through the tree, starting from the terminal date. Listing 7.4 shows how this can be easily implemented. To allow for early exercise one just has to check if early exercise is optimal at each tree node (implemented in listing 7.5).

217(7.5)

L . : : The price path of a payo based on the HullWhite tree when American features are present.

10

15

-K

The most popular special case of this very general specication is the Black and Karasinski (1991) model, which is in spirit similar to the exponential Vasicek model with time varying parameters. Here d = ( ) d + dB = exp{ } This specication exhibits mean reversion, and through the exponential transformation ensures that the short rate remains positive. As with all lognormal models, the Black-Karasinski model implies an explosive expectation for the bank account, but since in practice the implementation is done over a nite tree, this drawback is not severe. Listing 7.6 shows how the HW tree building methodology is applied in the Black-Karasinski case. The yield curve of gure (7.5.a) is assumed, and a HW is constructed. We use = 1/16 over the rst three years, = 1/8 from the third to the tenth, and = 1/2 for remaining twenty years. A view of the resulting interest rate tree is given in gure (7.5.b), where this uneven time discetization is apparent. All models with time varying parameters can be cast in a binomial/trinomial form that approximates the short rate movements

218(7.5)

L . : : Implementation of the Black-Karasinski model using a Hull-White interest rate tree.



10

  


15

20


25


30


35

40

219(7.5)

F . : Calibration of a tree for the short rate that implements the BlackKarasinski model. The Hull-White framework is implemented.
6

5.8

5.6 yield 5.4 5.2 5

10

15 maturity

20

25

30

(a) yield curve


25

20

short rate

15

10

10

15 time

20

25

30

(b) short rate tree

For increasing maturities one adjusts the direction and probabilities of these trees to match the observed yields and possibly the volatilities implied by caps and/or swaptions

220(7.6)

As with the local volatility models in equity derivative modeling, there is always the danger of over-tting the observed volatility curves Of course, to value such a simple bond we do not need to construct the complete price path, and in fact we do not need to construct a HW tree at all. The fair price can be determined by using the yield curve alone, just by discounting all cashows. The price path is needed though if we want to value an option on this ten year bond. As an example we consider a two year put, with strike price K = $80. To price this option we need the distribution of the bond price after two years. Figure (7.6.b) gives the possible bond prices and the corresponding price paths for the two year period. Essentially, the put option gives us the right to sell the bond at the strike price if the interest rates after two years are too high. The price paths for a European and an American version are illustrated in gure (7.7); the corresponding prices are PE = $3 08 and PA = $3 46, indicating an early exercise premium of $0 38, which is actually more than 10% of the option price. The red points (in gures 7.6.b and 7.7.b) indicate the scenarios where early exercise is optimal. We can observe how the coupon payments aect the exercise boundary, as we would prefer to exercise immediately after the coupon payment is realized.

. M

Short rate models are also known as one factor models, as there is only one source of uncertainty in the economy, which is represented by the short rate. Although short rate models are very useful for some applications, they are not sucient for others. In particular, the presence of a single factor implied that yields of all maturities are perfectly correlated (albeit with dierent volatilities). This is easily illustrated in the case of the exponentially ane models, like Vasicek or CIR, where the yield is a linear function of the current short rate. Yields for two dierent maturities 1 and 2 , and their dynamics, are given by ( + ) = A( ; + ) + B( ; + ) d ( + ) = B( ; + ) d

Therefore, the correlation of the two yields is E (d ( + 1 )d ( + 1 )) E (d ( + 1 ))2 E (d ( + 2 ))2 = B( ; + 1 )B( ; + 2 ) B 2 ( ; + 1 ) B 2 ( ; + 2 ) =1

In practice, this correlation might be high, but is not perfect. For example, table 7.1 presents the historical correlation of various bonds with dierent maturities. Although the correlation is positive across the board, its magnitude varies substantially at dierent horizons. In particular, the long end of the yield curve is much stronger correlated than the short end: the ten and twenty year bonds move pretty much in unison, with correlation over 95%, while the one and three

221(7.6)

T . : Correlations of yields for dierent maturities. Bonds with longer maturities exhibit relatively higher correlation.
1m 3m 6m 1.00 0.56 0.40 1.00 0.76 1.00 1y 0.28 0.55 0.85 1.00 2y 0.21 0.42 0.68 0.87 1.00 3y 0.19 0.40 0.64 0.83 0.97 1.00 5y 0.18 0.36 0.59 0.78 0.92 0.96 1.00 7y 0.16 0.32 0.55 0.74 0.88 0.93 0.98 1.00 10y 0.15 0.30 0.52 0.71 0.85 0.89 0.96 0.98 1.00 20y 0.10 0.24 0.44 0.64 0.77 0.83 0.90 0.94 0.96 1.00 1m 3m 6m 1y 2y 3y 5y 7y 10y 20y

months exhibit about half of this dependence. Also, each maturity exhibits correlations that decay as we consider bonds with increasing maturity dierences. For example, the two year bond is stronger correlated with the three year rather than the seven year instruments. One way of increasing the number of free parameters is by considering multifactor models. For example, we can consider the interest rate to be the sum of two simple Vasicek processes, by setting = d
() (1) ()

(2) ()

, where
()


(2)

d + ( ) dB

()

The two processes and are called factors, and are in principle unobserved. In the general specication we can also assume the factors to exhibit some correlation . Postulating the short rate to be the sum of factors is not the only way to construct multi-factor models. For example, in an early article Brennan and Schwartz (1982) consider a model where the long run interest rate is stochastic, serving as the second factor. Intuitively, there is a slowly mean reverting process which is largely a proxy for the business cycle and determines the long run attractor of the short rate. More recently, Longsta and Schwartz (1992) propose a model where the second factor is the volatility of the short rate. Other multi-factor specications have been also considered in the literature. For example Brennan and Schwartz (1982) consider a model where the long run interest rate is also stochastic, serving as the second factor. Intuitively, long swings in the short rate are determined by the latter process, which exhibits weak mean reversion. This process will determine the behavior of the long end of the yield curve. The short end of the yield curve will be determined by the process that reverts faster towards the long run short rate process. In another paper, Longsta and Schwartz (1992) propose a model where the second factor is the volatility of the short rate. The mean reverting nature of the short rate

(1)

222(7.7)

implies that the eect of stochastic volatility will have a higher impact on the short end of the curve. In order to achieve a perfect match to a given yield curve Brigo and Mercurio (2001) describe a method that enhances the multi-factor specication, by adding a deterministic function of time ( ) ( )=
1(

)+

2(

)+ ( ) ( ) for a variety of

Brigo and Mercurio (2001) show how one can retrieve processes, including multi-factor Vasicek and CIR.

F
The historical relative moves of yields for various maturities also provide motivation for using multi-factor specications. In particular, although yields are strongly correlated, they dont always move in the same direction. In fact, there are periods when the yield curve remains relatively at, and episodes of steeply rising yield curves. A single-factor model is not adequate to reproduce such patterns, as it will always generate yields that move together across all maturities. Principal Component Analysis (PCA) techniques can be employed to explore the variability and correlations of various yields. PCA has as input a set of N correlated series, and decomposes them into N uncorrelated components, which are called the factors. In order to do so, the covariance structure is computed and its eigen-structure is produced. The eigenvectors with the largest eigenvalues point towards the most important factors, and can be utilized to investigate which proportion of the variability of the original series is explained by individual factors. Typically, one looks for a set of factors that will explain 90-95% of the total variability. In our setting we are Therefore, each yield = ( ) is written as the sum =
1 1

+ +

+ +

The coecients are called factor loadings, and essentially determine the sensitivity of yield to factor . They are determined by applying an eigenvector decomposition on the covariance matrix . In particular, if has a Cholesky decomposition = C C, then the yields can be written as = C where is a vector of uncorrelated random variables. If we also denote with and V the matrix of eigenvalues and eigenvectors of , then The importance of the -th factor and the cumulative importance of the rst factors are shown below (in %) The rst three factors explain more than 95% of the interest rate variability and co-movements These factors are interpreted as the level, spread and convexity factors

223(7.7)

1 2 3 4 5 6 7 8 9 10 81.7 8.8 4.6 2.2 1.1 0.6 0.3 0.3 0.2 0.2 81.7 90.6 95.2 97.3 98.4 98.9 99.3 99.6 99.8 100.0

T . : Relative magnitude of the eigenvalues for the decomposition of the correlation matrix. The rst three factors are responsible for over 95% of the yield variability.

. F
The models described so far Investing over the period ( T ) can be seen as similar to investing rst over (0 S) and then reinvesting over (S T ) Of course we dont know at time what interest rate will prevail at time S for a bond that matures at T ; therefore the second strategy is not risk free But we can lock at time an interest rate which will be applied over the interval (S T ) This will be the forward rate F ( ; S T ) At S T we have the instantaneous forward rate F ( ; T ) No arbitrage indicates that the (continuously compounded) forward rate is F( ; S T) = log P( ; T ) log P( ; S) T S

Forward rates and bond prices The instantaneous forward rates are closely linked to bonds In the limit, as we split the interval (S T ) into subintervals we reach log P( ; T ) log P( ; S) = which yields for S log P( ; T ) =
T T S

F ( ; )d

F ( ; )d
T

P( ; T ) = exp

F ( ; )d F( ; T) = log P( ; T ) T

Even though multi-factor models consider a larger set of parameters to be calibrated, they are still nite Therefore a perfect t to the initial yield curve cannot be ensured Also, we are increasingly interested in matching volatility structures implied from cap/swaption prices

224(7.7)

Forward rate models have the whole yield curve as an input (and possibly a volatility curve as well) These models were introduced in Heath, Jarrow, and Morton (1992, HJM): we exploit the link between bond prices and the forward curve In essence we model each bond maturity with a separate SDE Therefore we are facing a system of innite SDEs, with the initial forward curve as a boundary condition Of course, some relationships will ensure that no arbitrage is permitted In particular, if the forward rate dynamics are given by dF ( ; T ) = ( T )d + ( T )dW ( ) Its formula gives the bond dynamics o dP( ; T ) = P( ; T ) ( ) + ( T ) + 1 ( T) 2
2

d + ( T )dW ( )

The functions ( T ) = ( T ) =
T

(
T

)d )d

If we use the current account for discounting (as the numraire), we expect the discounted bonds to form martingales P(T ; T ) 1 P( ; T ) =E =E B( ) B(T ) B(T ) HJM show that the no-arbitrage condition is
T

( T ) = ( T )

)d

HJM

HJM models need The initial forward rateThe volatility structure The forward curve which follows from the yield curve. This is specied under risk neutrality Since the forwards are given as derivatives of the yield curve one has to be careful when constructing the yield curve: many instruments on top of bonds are also used for that (e.g. swaps, futures) The volatility structure can be specied using PCA on past yield curves. Volatility structures are the same under all measures Volatilities implied from derivatives can also be used

225(7.9)

S
Short rate models Markovian in nature, easy to model Many derivative prices in closed form Tree building, nite dierences or simulation: all easy Arbitrage not ruled out Volatility can be hard to model Forward rate models No arbitrage by nature Very exible volatility structures Easy to include many factors Short rate non Markovian generally No Feynman-Ka representations No trees or nite dierences; only simulations

. B
There is a large number of bond and interest rate derivatives with a liquid market Forwards, swaps, bond options, caplets, oorlets, caps, oors and swaptions are some examples The pricing of bond options is most important, since prices of caplets and oorlets can be expressed as an option on a zero, while swaptions can be expressed as an option on a coupon paying bond Unlike equity options, bond options have some distinctive features that arise from the nature of interest rates For example, bond prices are known both at the current time and on maturity T , a feature known as a pull to par CIR distribution of 1 year zero price after 0.5 years

Black (1976, B76) in an inuential paper considered the pricing of options on commodities Commodities have specic cycles, storage and availability costs that are not captured in the standard BS methodology With some modications the B76 formula is used in the market to quote bonds, caps, swaptions, etc The B76 formula assumes that bonds or interest rates (depending on the instrument) are log-normally distributed, with being the measure of volatility Implied volatility curves are constructed for all these instruments See Brigo and Mercurio (2001) for all relevant formulas

226(7.10)

. C
In general, if the payos are given as ( (T )), then the price at time by ( () ) =E
Q T

is given

exp

( )d

( (T )) =E
Q

B( ) ( (T )) B(T )

B( If the discounting factor B(T)) was independent of the payos we would be able to split the integral But since we are dealing with stochastic rates they are not Implicitly we are using the bank account as the numraire, but there is nothing special with this choice As shown in Geman, el Karoui, and Rochet (1995), one can choose any positive asset as the numraire For each numraire there exists an equivalent measure, under which every asset is a martingale That is to say, if N( ) is the process of the numraire, then there exists a measure N induced by this numraire, such that

X(T ) X( ) = EN N( ) N(T ) for any asset process X( ). Then, we can express the value at as X( ) = X (T N( )EN N(T )) Given a problem, a good numraire choice can simplify things enormously For example, we can use the bond that matures at time T as the numraire; then all asset prices are given in terms of this asset (rather than currency units) If T is the measure induced by this bond, we can write the payos as ( ( ) ) = P( ; T )ET ( (T )) = P( ; T )ET ( (T )) P(T ; T )

To make this approach operational we need to nd under which measure T all bonds discounted with P( ; T ) form martingales

A variant of the HJM model, which constructs lognormal rates was proposed in Brace, Gtarek, and Musiela (1995) and Miltersen, Sandmann, and Sondermann a (1997)

227(7.10)

Since it produces prices that agree with the B76 market quotes the model has been coined the market model It uses xed maturity forward curves, rather than the instantaneous forward rate, since prices become explosive in that case Typically the 3 month Libor rate is used as the underlying This is the model of choice for many practitioners

228(7.10)

F . : Price path for a ten year 5.50% coupon bearing bond. The price paths are consistent with the yield curve of gure (7.5), modeled using the Black-Karasinski process.
150

100 bond price 50 0

4 time

10

(a) ten year bond


150

100 bond price 50 0

0.5

1 time

1.5

(b) initial two year period

229(7.10)

F . : Price path for a two year put option, written on the ten year coupon bearing bond of gure (7.6). The strike price is set at $80.
70 60 50 option price 40 30 20 10 0 option price 0 1 time 1.5 70 60 50 40 30 20 10 0

PSfr

0.5

0.5

1 time

1.5

(a) European style


1

(b) American style

0.8

rst factor second factor third factor

0.6 factor loading

0.4

0.2

-0.2

-0.4

10

maturity

. : Yield curve factor loadings.


Yield curve and one year forward rates yield

230(7.10)

4.8 4.6 4.4 short/forward rate 4.2 4 3.8 3.6 3.4 3.2 3 0 1 2 3 4 5 time 6 7 8

forw

10

. : Yield and one-year forward curves.


Bond price distribution 400 350 300 250

Interest rate distribution 350

300

250

frequency

200 200 150 150 100 100 50 0 94

50

5 rate

10

96

98

100

bond price

: Pull-to-par and bond options.

231(7.10)
9x1 year ATM cap realization 10 8 rates 6 4 2 0 0 2 4 6 Cap cash ows 4000 3000 2000 1000 0 8

10

12

cash ows

6 time

10

12

F
25

: Cash ows for interest rate caplets and caps.

Cap Caplet 20

Implied Volatility (%)

15

10

10 Maturity

12

14

16

18

20

: Typical Black volatilities for caplets and caps.

8 Credit risk

A Using Matlab with Microsoft Excel

In many practical situations one needs to export Matlab functionality to a spreadsheet programme like Microsoft Excel. Fortunately, Microsoft Windows provide the functionality via the COM component objects. Using the Matlab compiler one can build a standalone COM component in the rm of a dynamically linked library (DLL) which can be invoked from Visual Basic for Applications (VBA), the programming language used throughout Excel. By using VBA one can construct an Excel add-in which can be exported to any computer running Excel. One of the main benets is that all required Matlab functions are exported, and therefore the host computer need not have Matlab installed.1 Also, Graphical User Interfaces (GUIs) can be constructed easily using VBA. The main reference of this appendix is MathWorks (2005, chapter 4.18). In this appendix we will describe the procedure, and we will produce functions that implement the Black and Scholes (1973) pricing model for calls and puts. There are four+one steps in creating the Excel-Matlab link. 0 Setup the C/C++ compiler to work with Matlab 1 Write the functions in Matlab and create the COM component. 2 Write the VBA code that communicates with the DLL and performs the operations. 3 Create the GUI in Excel. 4 Put everything in a package that can be readily installed on any computer with Excel.
1

Computers that dont run Matlab will need the Matlab Component Runtime (MCR) set of libraries which is freely available.

236(A.2)

A. S

C/C++

Before starting the procedure we have to ensure that the Matlab C compiler is properly set up. Running at the Matlab command prompt will allow us to select the compiler we want to use. Since we need to compile COM components we will need the Microsoft Visual C/C++ compiler in our system. It is truly unbelievable but Microsoft is giving away the compiler for free as part of the Visual C++ 2005 Express Edition (VC). This is recorded as VC version 8.0, and it only compatible with Matlab 7.3.2 You can download VC at


You should download and run the le , and install it to the default directory \

\ \

The second step is to get the Windows Platform SDK (Windows Server 2003 R2) from the web


You should download the installer . It is important to select a custom install and put as the target directory \

\ \ \ \

This is where Matlab looks for some necessary les. You dont need to install all components. The required ones are the following 1.

After everything is installed we are ready to setup the Matlab compiler. At the Matlab prompt just input and then select the appropriate compiler.
2

If you run an earlier versions of Matlab, like 7.0.4, you will need VC 6.0, 7.0 or 7.1. The VC 7.1 is shipped with the Visual C++ Toolbox 2003, which is not ocially supported but is out there on the net.

237(A.3) L A. : Matlab le

A. : Matlab le

A. W

We will now write the Matlab function that implement the standard BS prices and some hedging parameters. We are interested in passing whole arrays as arguments, and also want arrays to be returned. There are many ways of doing this, but to illustrate how dierent arrays are passed we will create two functions, shown in listings A.1 and A.2. Both functions should be straightforward. The set of parameters is input (perhaps some in vector form) and the prices and deltas are returned. The second step is to create the COM component using the Matlab Excel builder. This is invoked by running at the Matlab command prompt. A window for the Matlab builder will then open. Go to to start a new Builder project. We set the parameters as shown in gure A.1. Component name: Class name: Project version:

The next step is to add the les and to the project, by clicking on . We must now save the project, and click on to actually build the component. Two subfolders are created in the folder, as shown in gure A.2.


A. : Screenshots of the Matlab Excel Builder

238(A.3)

239(A.3) F

A. : The folders created by

(a) folder:

(b) folder:

A. W

VBA

Open Excel and go to . We will now need to tell Excel that the Matlab libraries shall become available. Within VBA go to and select the necessary libraries which are now available, the one we wrote and one that contains general Matlab utilities:


Now we will need to write some VBA code that initializes the add-in and also dene some global variables that will be kept between function calls in Excel. To do that we need a module. Right-click on


L A. : VBA module (

240(A.3)

10

15

20

25

30

35

40

. Change the name of this module (at its properties) to and insert the code given in listing A.3.

241(A.5) L

A. : VBA Activation Handlers ( )

10

15

We now need to turn to the GUI, which can be as in the screenshot A.3. The components need some event handlers, that will respond to the activation of the form and user input. All this code must reside within the form code. When the form is activated some initial values must be set, which is done in listing A.4. The user can click either the button (listing A.5) or the button (listing A.6).

A. T

The last part is to create the code that puts the add-in in Excel. Right-click on and add the code of listing A.7. This code will install and uninstall the add-in in the menu item of Excel. A button is added which invokes the subroutine when clicked. Now we can save the add-in in the directory, ready for packaging. Note that the Excel le has to be saved as an Excel add-in le ( ).

A. I
To check that everything is all right, we can close Excel and reopen it. Within we should be able to locate . Then a


A. : VBA User Input Handlers I ( )

242(A.5)

10

15

20

25

30

35

40

243(A.5) L

A. : VBA User Input Handlers II ( )

A. : VBA Add-in installation (


10

15

20


25

30

244(A.5)

item should be now present in the menu. Invoking this item should allow us to run the DLL and compute option prices. A screenshot of the add-in in action is given in gure A.3.

A. : Screenshot of the add-in.

Going back to Matlab and the Excel Builder tool, we can package the component. Matlab will put together the DLL and the le, and will create an executable that registers the dynamic library with Windows. We can now ship the add-in and use with a computer that does not have Matlab installed, but the host computer must have the freely available Matlab Component Runtime (MCR) libraries. The MCR must be the same version as the Matlab that created our le. We can build the MCR with our Matlab installation by using , and then ship the with our add-in. Note that we only need to do this once: after the host computer has MCR properly setup, we can add more add-ins, given that they have been created using the same Matlab version.

References

Albrecher, H., P. Mayer, W. Schoutens, and J. Tistaert (2007, January). The little Heston trap. Wilmott Magazine, 8392. Andricopoulos, A. D., M. Widdicks, P. W. Duck, and D. P. Newton (2003). Universal option valuation using quadrature methods. Journal of Financial Economics 67, 447471. Bachelier, L. (1900). Thorie de la Spculation. Gauthier-Villars. Bailey, D. H. and P. N. Swarztrauber (1991). The fractional fourier transform and applications. SIAM Review 33(3), 389404. Bailey, D. H. and P. N. Swarztrauber (1994). A fast method for the numerical evaluation of continuous fourier and laplace transforms. SIAM Journal on Scientic Computing 15 (5), 11051110. Baillie, R. T., T. Bollerslev, and H. O. Mikkelsen (1993). Fractionally integrated generalized autoregressive conditional heteroscedasticity. Journal of Econometrics. Bajeux, I. and J. C. Rochet (1996). Dynamic spanning: Are options an appropriate instrument? Mathematical Finance 6, 116. Bakshi, G., C. Cao, and Z. Chen (1997). Empirical performance of alternative option pricing models. The Journal of Finance 5, 20032049. Bakshi, G. and D. Madan (2000). Spanning and derivative-security valuation. Journal of Financial Economics 55, 205238. Barle, S. and N. Cakici (1998). How to grow a smiling tree. Journal of Financial Engineering 7 (2), 127146. Barndor-Nielsen, O. E. (1998). Processes of normal inverse Gaussian type. Finance and Stochastics 2, 4168. Barone-Adesi, G., R. Engle, and L. Mancini (2004). GARCH options in incomplete markets. Working Paper. Bates, D. S. (1998). Pricing options under jump diusion processes. Technical Report 37/88, The Wharton School, University of Pennsylvania. Bates, D. S. (2000). Post-87 crash fears in S&P500 futures options. Journal of Econometrics 94, 181238.

246(A.5)

Bates, D. S. (2005). Maximum likelihood estimation of latent ane processes. Review of Financial Studies, forthcoming. Bauwens, L., S. Laurent, and J. Rombouts (2006). Multivariate GARCH models: a survey. Journal of Applied Econometrics 21(1), 79109. Ben Hamida, S. and R. Cont (2005). Recovering volatility from option prices by evolutionary computation. Journal of Computational Finance 8(4), XXXX. Bingham, N. H. and R. Kiesel (2000). Risk-Neutral Valuation. London, UK: Springer-Verlag. Black, F. (1972). Capital market equilibrium with restricted borrowing. Journal of Business 45, 444455. Black, F. (1976). The pricing of commodity contracts. Journal of Financial Economics 3(1), 16779. Black, F. and P. Karasinski (1991). Bond and option prices when short rates are lognormal. Financial Analyst Journal (Jul-Aug), 5259. Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. Journal of Political Economy 81, 637659. Bollerslev, T., R. Engle, and D. Nelson (1994). ARCH models. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, IV. Amsterdam: North Holland. Bollerslev, T., R. F. Engle, and J. M. Wooldridge (1988, February). A capital asset pricing model with time varying covariances. Journal of Political Economy 96(1), 116131. Bollerslev, T. R. (1986). Generalized autoregressive conditional heteroscedasticity. Journal of Econometrics 31, 307327. Bouchaud, J.-P. and M. Potters (2001). More stylized facts of nancial markets: Leverage eect and downside correlation. Physica A 299, 6070. Brace, A., D. Gtarek, and M. Musiela (1995). The market model of interest rate a dynamics. Working paper, University of South Wales, Australia. Breeden, D. T. and R. Litzenberger (1978). Prices of state contigent claims implicit in option prices. Journal of Business 51, 62151. Brennan, M. J. (1979). The pricing of contingent claims in discrete time models. The Journal of Finance 34, 5368. Brennan, M. J. and E. Schwartz (1982). An equilibrium model of bond pricing and a test of market eciency. Journal of Financial and Quantitative Analysis 3, 301329. Brigo, D. and F. Mercurio (2001). Interest Rate Models: Theory and Practice. New York, NY: Springer Verlag. Carr, P. (2002). Frequently asked questions in option pricing theory. Technical report, forth. Journal of Derivatives. Carr, P., H. Geman, D. Madan, and M. Yor (2002). The ne structure of asset returns: An empirical investigation. Journal of Business 75 (2), 305332. Carr, P., H. Geman, D. Madan, and M. Yor (2003). Stochastic volatility for Lvy processes. Mathematical Finance 13(3), 345382. Carr, P. and D. Madan (1999). Option valuation using the Fast Fourier Transform. Journal of Computational Finance 3, 463520.

247(A.5)

Carr, P. and D. Madan (2005). A note on sucient conditions for no arbitrage. Finance Research Letters 2, 125130. Carr, P. and L. Wu (2004). Time-changed Levy processes and option pricing. Journal of Financial Economics 71(1), 113141. CBOE (2003). VIX CBOE volatility index. White Paper, Chicago Board Options Exchange. Chourdakis, K. (2002). Continuous time regime switching models and applications in estimating processes with stochastic volatility and jumps. Technical Report 464, Queen Mary, University of London. Chourdakis, K. (2005). Option pricing using the Fractional FFT. Journal of Computational Finance 8(2), 118. Christie, A. (1982). The stochastic behavior of common stock variances: Value, leverage and interest rate eects. Journal of Financial Economics 3, 407432. Cont, R. and J. da Fonseca (2002). Dynamics of implied volatility surfaces. Quantitative Finance 2(1), 4560. Cox, J. C., J. E. Ingersoll, and S. A. Ross (1985). A theory of the term structure of interest rates. Econometrica 53, 385407. Crpey, S. (2003). Calibration of the local volatility function in a generalized Black-Scholes model using Tikhonov regularization. SIAM Journal of Mathematical Analysis 34, 11831206. Davidson, R. and J. G. MacKinnon (1985, February). The interpretation of test statistics. Canadian Journal of Economics 18(1), 3857. Derman, E. (1999). Regimes of volatility. RISK 12(4), 5559. Derman, E. and I. Kani (1994). Riding on a smile. RISK 7 (2), 3239. Derman, E. and I. Kani (1998). Stochastic implied trees: Arbitrage pricing with stochastic term and strike structure of volatility. International Journal of Theoretical and Applied Finance 1(1), 61110. Derman, E., I. Kani, and N. Chriss (1996). Implied trinomial trees of the volatility smile. Journal of Derivatives 3(4), 722. Derman, E., I. Kani, and J. Z. Zou (1996, July). The local volatility surface: Unlocking the information in index options pricing. Financial Analysts Journal, 2536. Dothan, U. (1978). On the term structure of interest rates. Journal of Financial Economics 6(1), 5969. Duan, J.-C. (1995). The Garch option pricing model. Mathematical Finance 5 (1), 1332. Duan, J.-C., G. Gauthier, and J.-G. Simonato (1999). An analytical approximation for the Garch option pricing model. Journal of Computational Finance 2, 75 116. Dueker, M. (1997). Markov switching in GARCH processes and mean-reverting stock-market volatility. Journal of Business and Economic Statistics 15, 2634. Due, D. and R. Kan (1996). A yield-factor model of interest rates. Mathematical Finance 6(4), 379406. Due, D., J. Pan, and K. Singleton (2000). Transform analysis and asset pricing for ane jumpdiusions. Econometrica 68, 13431376.

248(A.5)

Dupire, B. (1993). Pricing and hedging with smiles. In Proceedings of the AFFI Conference, La Baule. Dupire, B. (1994). Pricing with a smile. RISK 7 (1), 1820. Engle, R. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. ination. Econometrica 50, 9871008. Engle, R. and F. K. Kroner (1995). Multivariate simultaneous generalized ARCH. Econometric Theory 11, 122150. Eraker, B., M. Johannes, and N. Polson (2001). MCMC analysis of diusion models with application to nance. Journal of Business and Economic Statistics 19(2), 17791. Fama, E. F. (1965). The behavior of stock market prices. Journal of Business 38, 34105. Feller, W. E. (1951). Two singular diusion problems. Annals of Mathematics 54, 173182. Figlewski, S. and X. Wang (2000). Is the leverage eect a leverage eect? Working Paper, SSRN 256109. Gallant, A. R. and G. Tauchen (1993). : A program for nonparametric time series analysis. version 8.3 users guide. Working Paper, University of North Carolina. Gatheral, J. (1997). Delta hedging with uncertain volatility. In I. Nelken (Ed.), Volatility in the Capital Markets: State-of-the-Art Techniques for Modeling, Managing, and Trading Volatility. Glenlake Publishing Company. Gatheral, J. (2004). A parsimonious arbitrage-free implied volatility parameterization with application to the valuation of volatility derivatives. In Global Derivatives and Risk Management. Gatheral, J. (2006). The Volatility Surface: A Practitioners Guide. New York, NY: Wiley Finance. Geman, H., N. el Karoui, and J.-C. Rochet (1995). Changes of numraire changes of probability measure and option pricing. Journal of Applied Probability 32, 443458. Gerber, H. U. and E. S. W. Shiu (1994). Option pricing by Esscher transforms. Transactions of the Society of Actuaries XLVI, 99191. Ghysels, E., A. Harvey, and E. Renault (1996). Stochastic volatility. In G. Maddala and C. Rao (Eds.), Handbook of Statistics, 14, Statistical Methods in Finance. North Holland. Glosten, L. R., R. Jagannathan, and D. Runkle (1993). On the relation between the expected value and the volatility of the nominal excess return on stocks. Journal of Finance 48(5), 17791801. Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ: Princeton University Press. Hamilton, J. D. and R. Susmel (1994). Autoregressive conditional heteroscedasticity and changes in regime. Journal of Econometrics 64, 307333. Harvey, A., E. Ruiz, and N. Shephard (1994). Multivariate stochastic variance models. Review of Economic Studies 61, 247264.

249(A.5)

Heath, D., R. Jarrow, and A. Morton (1992). Bond pricing and the term structure of interest rates: A new methodology. Econometrica 60(1), 77105. Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6, 327344. Heston, S. L. and S. Nandi (2000). A closed-form GARCH option pricing model. Review of Financial Studies Frth, Frth. Ho, T. S. Y. and S.-B. Lee (1986). Term structure movements and pricing interest rate contigent claims. Journal of Finance 41, 10111029. Hull, J. C. (2003). Options, Futures and Other Derivatives. (5th ed.). New Jersey, NJ: Prentice Hall. Hull, J. C. and A. White (1987). The pricing of options with stochastic volatilities. The Journal of Finance 42, 281300. Hull, J. C. and A. White (1990). Pricing interest rate derivative securities. Review of Financial Studies 3(4), 573592. Hull, J. C. and A. White (1994). Numerical procedures for implementing term structure models I. Journal of Derivatives 2, 716. Hull, J. C. and A. White (1996). Using Hull-White interest rate trees. Journal of Derivatives, 2636. Ikonen, S. and J. Toivanen (2004). Operator splitting methods for American option pricing. Applied Mathematics Letters 17, 809814. ISDA (1998). EMU and market conventions: Recent developments. International Swaps and Derivatives Association document BS:9951.1. Jackwerth, J. C. and M. Rubinstein (1996). Recovering probability distributions from options prices. The Journal of Finance 51, 16111631. Javaheri, A. (2005). Inside Volatility Arbitrage: The Secrets of Skewness. Hoboken, NJ: Wiley. Kahl, C. and P. Jckel (2005, September). Not-so-complex logarithms in the Heston model. Wilmott Magazine, 94103. Karpo, J. (1987). The relation between price changes and trading volume: A survey. Journal of Financial and Quantitative Analysis 22, 109126. Kendal, M. and A. Stuart (1977). The Advanced Theory of Statistics. (4th ed.), Volume I. London, U.K.: Charles Grin and Co. Lagnado, R. and S. Osher (1997). A technique for calibrating derivative security pricing models: Numerical solutions of an inverse problem. Journal of Computational Finance 1(1), 1325. Lamourex, G. and W. Lastrapes (1990). Persistence in variance, structural change, and the GARCH model. Journal of Business and Economic Statistics 23, 225 234. Lee, R. (2004a). The moment formula for implied volatility at extreme strikes. Journal of Mathematical Finance 14(3), 469480. Lee, R. (2004b). Option pricing by transform methods: extensions, unication and error control. Journal of Computational Finance 7 (3), 5186.

250(A.5)

Longsta, F. A. and E. S. Schwartz (1992). Interest rate volatility and the term structure: A two factor general equilibrium model. Journal of Finance 47 (4), 12591282. Madan, D., P. Carr, and E. Chang (1998). The variance gamma process and option pricing. European Finance Review 2, 79105. Mandelbrot, B. and H. Taylor (1967). On the distribution of stock price dierences. Operations Research 15, 10571062. Marchuk, G. I. (1990). Splitting and alternating direction methods. In N. Holland (Ed.), Handbook of Numerical Analysis, Volume 1, pp. 197462. Amsterdam, Holland. MathWorks (2005). Matlab Builder for Excel 1.2.5 (Users Guide). The MathWorks. McKee, S., D. P. Wall, and S. K. Wilson (1996). An alternating direction implicit scheme for parabolic equations mixed derivative and convective terms. Journal of Computational Physics 126(1), 6476. Merton, R. (1976). Option pricing when the underlying stock returns are discontinuous. Journal of Financial Economics 4, 125144. Merton, R. C. (1973). Theory of rational option pricing. Bell Journal of Economics and Management Sciences 4, 141183. Merton, R. C. (1992). Continuous Time Finance. (2nd ed.). Blackwell Publishing. Miltersen, K., K. Sandmann, and D. Sondermann (1997). Closed form solutions for term structure derivatives with lognormal interest rates. Journal of Finance 52(1), 40930. Modigliani, F. and M. Miller (1958). The cost of capital, corporation nance, and the theory of investment. American Economic Review 48, 261297. Neftci, S. N. (2000). Introduction to the Mathematics of Financial Derivatives. (2nd ed.). Academic Press. Nelson, C. R. and A. F. Siegel (1987). Parsimonious modeling of yield curves. Journal of Business 60(4), 473489. Nelson, D. B. (1990). Arch models as diusion approximations. Journal of Econometrics 45, 739. Nelson, D. B. (1991). Conditional heteroscedasticity in asset returns: A new approach. Econometrica 59, 347370. ksendal, B. (2003). Stochastic Dierential Equations. (6th ed.). New York, NY: Springer-Verlag. Pan, J. (1997). Stochastic volatility with reset at jumps. Permanent Working Paper. Peaceman, D. W. and J. H. H. Rachford (1955). The numerical solution of parabolic and elliptic dierential equations. Journal of the Society of Industrial and Applied Mathematics 3, 2845. Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1992). Numerical Recipes in C: The Art of Scientic Computing (2nd ed.). Cambridge University Press. Protter, P. E. (2004). Stochastic Integration and Dierential Equations. (2nd ed.). New York, NY: Springer-Verlag.

251(A.5)

Rogers, L. C. G. and D. Williams (1994a). Diusions, Markov Processes and Martingales. Volume 1: Foundations (2nd ed.). Cambridge, UK: Cambridge University Press. Rogers, L. C. G. and D. Williams (1994b). Diusions, Markov Processes and Martingales. Volume 2: It Calculus (2nd ed.). Cambridge, UK: Cambridge o University Press. Rubinstein, M. (1985). Nonparametric tests of alternative option pricing models using all reported trades and quotes on the 30 most active CBOE from August 23, 1976 through August 31, 1978. The Journal of Finance 40, 455480. Rubinstein, M. (1994). Implied binomial trees. Journal of Finance 49, 771818. Sandmann, G. and S. J. Koopman (1998). Estimation of stochastic volatility models via monte carlo maximum likelihood. Journal of Econometrics 87 (2), 271301. Sandmann, K. and D. Sondermann (1997). A note on the stability of lognormal interest rate models and the pricing of eurodollar futures. Mathematical Finance 7 (2), 119125. Schbel, R. and J. Zhu (1999). Stochastic volatility using an Ornstein-Uhlenbeck process: An extension. European Finance Review 3, 2346. Schoutens, W., E. Simons, and J. Tistaert (2004, March). A perfect calibration! Now what? Wilmott Magazine, XXXX. Sentana, E. (1995). Quadratic Garch models. Review of Economic Studies 62, 639661. Shreve, S. (2004a). Stochastic Methods in Finance v1: The Binomial Asset Pricing Model. New York, NY: Springer-Verlag. Shreve, S. (2004b). Stochastic Methods in Finance v2: Continuous Time Models. New York, NY: Springer-Verlag. Skiadopoulos, G., S. Hodges, and L. Clelow (2000). Dynamics of the S&P500 implied volatility surface. Review of Derivatives Research 3, 263282. Stein, E. M. and J. C. Stein (1991). Stock price distributions with stochastic volatility: An analytic approach. Review of Financial Studies 4, 727752. Svensson, L. (1994). Estimating and interpreting forward interest rates: Sweden 1992-4. Discussion Paper 1051, Centre for Economic Policy Research. Thomas, J. W. (1995). Numerical Partial Dierential Equations. Number 22 in Texts in Applied Mathematics. New York, NY: Springer. van der Merwe, R., N. de Freitas, A. Doucet, and E. Wan (2001). The unscented particel lter. In Advances in Neural Information Processing Systems 13. Vasiek, O. A. (1977). An equilibrium characterization of the term structure. Journal of Financial Economics 5, 177188. Wiggins, J. B. (1987). Option values under stochastic volatility: Theory and empirical estimates. Journal of Financial Economics 19, 351372. Wilmott, P., J. Dewynne, and S. Howison (1993). Option Pricing. Mathematical Models and Computation. Oxford, UK: Oxford Financial Press. Zakoian, M. (1994). Threshold heteroscedastic models. Journal of Economic Dynamics and Control 18, 931955.

252(A.5)

Zellner, A. (1995). Introduction to Bayesian Inference in Econometrics. Chichester, UK: John Wiley and Sons.

Index

ane, 190, 193 ane models, 152 arbitrage, 35, 150 static, 169 Arch model, 136 Assymetric Garch model, 143 Black-Scholes model, 149 Black-Scholes PDE, 175 bond coupon, 177 face value, 177 par, 177 yield, 177 bond option, 187 Borel algebra, 3 Brownian motion, 186 buttery spread, 169 calendar spread, 169 cap, 187 Capital Asset Pricing Model (CAPM), 145 capital structure, 133 CBOE, 132 characteristic function, 149 CIR model, 192 compounding, 178 continuously, 178 corporate bond, 177 current account, 184 Dothan model, 191

Dow Jones index, 130, 140 Ecient Method of Moments, 160 Egarch model, 143, 160 equivalent martingale measure, 142, 154 equivalent probability measure, 188 equivalent probability measures, 153 Esscher transform, 148 Euler equation, 147 event, 3 expectation hypothesis, 188 explosive bank account, 192 exponential martingale, 153 exponential Vasicek model, 191 exponentially weighted moving average model, 141 fat tails, 130 Feller process, 192 Feynman-Kac formula, 159 Figarch model, 141 xed income security, 177 oor, 187 forward curve, 184 forward rate, 182

instantaneous, 184 fundamental theorem of asset pricing, 188 Garch and stochastic volatility, 135 Garch model, 137 and Arch(), 137 and incomplete markets, 146 and persistent volatility, 137, 141 fractionally integrated, 141 Garch(1,1), 137 GED, 143 in-mean, 145 maximum likelihood estimation, 138 multidimensional, 145 non Gaussian, 142 skewness, 143 standard errors, 139 volatility forecasts, 137 generalized error distribution, 143 Girsanov theorem, 150, 152 Igarch model, 141 implied density, 167, 173 Breeden-Litzenberger method, 175 implied tree, 174


mixture of distributions, 130 model risk, 162 no-arbitrage tests, 169 Novikov condition, 188 Ohrnstein-Uhlenbeck process, 151 one-factor model, 184 Ornstein-Uhlenbeck process, 190 overtting, 201 Particle lter, 161 penalty function, 162 preferred habitat, 189 price of interest rate risk, 185 price of risk, 145 prior information, 162 Radon-Nikodym derivative, 154, 188 marginal rate of substitution, 148 random variable, 1 redundant claim, 35 regimes of volatility, 132 regularization, 162 Tikhonov-Phillips, 162 risk aversion, 150 risk premium, 150 time varying, 146 sample path, 1 sample point, see sample path sample space, see state space Sharpe ratio, 145, 154 short rate stylized facts, 189 short rate model, see one-factor model algebra, 3 generated, 4 Simulated Method of Moments, 160 smoothing, 167, 176

254(A.5)
Nadaraya-Watson, 168 radial basis function, 167 sovereign bond, 177 SP500 index, 132, 140 square root process, 151 square root process, 192 state space, 1 stochastic volatility, 149, 185 and Garch, 135 calibration, 161, 165 estimation, 160 PDE, 156 replicating portfolio, 157 Student-t, 142 swaption, 187 term structure PDE, 187 transform methods, 152 underlying asset, 35 utility function, 147 Vasicek model, 190 Vega, 162 vertical spread, 169 Vitali set, 2 VIX index, 132, 141, 142 and nancial crises, 132 and realized volatility, 132 volatility and correlation with returns, 132, 143, 149 and nancial crises, 130 attractor, 131 clusters, 131 cyclical, 131 long memory, 141 persistence, 141 time varying, 130 volatility risk, 150 yield curve, 179 historical, 181 parametric forms, 179 shapes, 179 theories of, 188 zero-coupon bond, 177

implied volatility, 131 and Delta, 134 and expected volatility, 132 and moneyness, 134 and realized volatility, 142 skew, 134 smile, 134 surface, 133 dynamics, 134 sticky Delta, 135 sticky strike, 135 SVI parameterization, 172 inverse problem, 162 It formula, 186 o Kalman lter, 161, 192 Kolmogorov backward equation, 175 Kolmogorov extension theorem, 174 Kolmogorov forward equation, 175 Leibniz rule, 173 leverage eect, 133, 143, 149 liquidity preferences theory, 189 local volatility, 167, 174 function, 174 PDE representation, 175 long memory, 141 marginal rate of substitution, 147 as Radon-Nikodym derivative, 148 market segmentation, 189 Markov chain, 148, 161 Markov Chain Monte Carlo, 160 maturity date, 35 maximum likelihood estimation, 127 standard errors, 139 mean reversion, 192 measurable space, 3 measure, 3

You might also like