Computational Finance
Computational Finance
Examensarbete i matematik, 30 hp
Handledare och examinator: Maciej Klimek
Januari 2020
Department of Mathematics
Uppsala University
i
Abstract
Machine learning can be deployed not only in order to solve non trivial problems, but also
faster than traditional implemented solutions. We illustrate several classical problems
in the field of computational finance where it is possible to fit, with Gaussian process
regressions, complex functions under and beyond the Black-Scholes model with high
accuracy. The results from the regressions, in our examples, show speed-ups of several
magnitudes compared to the classical implementations while keeping a precision well
within the acceptable limits for practical use. The concrete examples consist in financial
Greeks fitting, summarizing implied volatility surfaces, as well as estimating vanilla and
exotic options while reducing the computation time.
Acknowledgements
I would like to thank my supervisor, Professor Maciej Klimek, for his precious time and
guidance during this project. I would also like to thank my family for their relentless
support through all these years of studying.
Contents iii
Contents
1 Introduction 1
1.1 Multivariate Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 The linear approach . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Linear to non linear . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 The kernel trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 The choice of kernel . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Regression with noisy observations . . . . . . . . . . . . . . . . . . . . . 7
1.5 Choice of the hyper parameters . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Incorporating a non zero mean function . . . . . . . . . . . . . . . . . . . 9
1.7 Comparison with Kernel Ridge Regression . . . . . . . . . . . . . . . . . 10
4 Derivative Pricing 14
4.1 European Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1 Heston Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.2 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.3 GPR Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 American Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Binomial Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 GPR Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Barrier Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.2 Monte Carlo paths simulation . . . . . . . . . . . . . . . . . . . . 22
4.3.3 GPR Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Conclusion 28
References 29
iv List of Figures
List of Figures
1.1 Bi-variate Gaussian Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bi-variate Gaussian marginal and conditional distributions . . . . . . . . 3
1.3 Exponential Quadratic example . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Prior and post GP samples . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 With and without noise samples comparison . . . . . . . . . . . . . . . . 8
1.6 Hyper-parameter values examples . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Regression models comparison . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Gamma surface fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Implied volatility surface fit . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Binomial tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
List of Tables v
List of Tables
2.1 Gamma surface fit performance for different grid sizes . . . . . . . . . . . 13
4.1 Training and test set for the Heston model . . . . . . . . . . . . . . . . . 19
4.2 Fit Performance for vanilla European call . . . . . . . . . . . . . . . . . . 19
4.3 Training and test set for the binomial tree model . . . . . . . . . . . . . 21
4.4 Fit Performance for American put . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Training and test set for the Heston model . . . . . . . . . . . . . . . . . 26
4.6 Fit Performance for the Barrier option . . . . . . . . . . . . . . . . . . . 27
1
1 Introduction
The concept of Gaussian Processes leads to a supervised learning method aimed at solving
regression and probabilistic classification problems, and as such they are extensively used
for for modeling dependent data. As they are statistical models, they interpolates the
observations and induce probabilistic predictions which give empirical confidence intervals
which are very practical in order to refine and adapt the fitting in the areas of interest.
They are also extremely versatile as a diversity of kernels can be specified in order to
reach desired functions characteristics such a smoothness.
The interest towards these processes resides in their inheritance properties. Thanks to
this, they can be used on small samples and the regression process has implementation
practicalities when fitting the model as one does not need to define higher than second order
moments. Furthermore, this regression approach is non parametric, as it is theoretically
possible to draw from an infinite set of functions, which is very convenient when adjusting
kernels. Nevertheless, this method is not exempted of potential under or over fitting
problems and has to be adjusted when the output is non Gaussian (e.g. binary). They
can also suffer from a lack of accuracy when used in high dimensional spaces, typically
when the number of features exceeds several dozens, but these limitations are not reached
in our area of interest since the Heston model and binomial tree incorporate less than ten
parameters.
In this thesis, I present the theory and numerical tools in order to implement such a
regression in the field of computational finance. We discuss popular model and imple-
mentation for pricing several financial derivatives and compare their performance against
a wisely fitted Gaussian process. The theoretical content is following the notation and
logic from Rasmussen [see 12], and the diverse implementation followed the logic of De
Spiegeleer [see 14]. The implementations were written in Python and Matlab, making use
respectively of the sklearn library and the financial toolbox.
2 1.1 Multivariate Gaussian
Like the normal distribution, the multivariate normal is defined by a set of parameters:
the mean vector, which is the expected value of the distribution; and the covariance
matrix, which measures how dependent two random variables are.
The joint probability density function for a distribution of dimentionality d is defined as:
1 1 −1 0
f (x | µ, Σ) = p exp − (x − µ)Σ (x − µ) , (1.1)
| Σ | (2π)d 2
Where x is a random vector of size d, with µ as the mean vector and Σ is symmetric
and semi-definite covariance matrix of size d × d, and | Σ | its corresponding determinant.
Such a distribution of often denoted as N (µ, Σ).
This type of distribution has very important properties for the use of Gaussian processes:
Figure 1.2: Bi-variate Gaussian marginal (a) and conditional (b) distributions
In the context of Bayesian regression, we assume that we are in the case where we can
describe the observed data y = (y1 , ..., yn ) from the input x = (x1 , ..., xn ) as:
y = f (x) + , (1.2)
where f is a Gaussian process and = (1 , ..., n ) are i.i.d random variables representing
the noise in the data such as ∼ N (0, σn2 ).
f (x) = xT w, (1.3)
with w ∼ N (0, Σp ) a n × 1 vector of prior weights. Then, using Bayes’ rule we would get
for the posterior weights, wiht X the vector of inputs:
with
1 1
p(y|X, w) = exp − 2 |y − X w| = N (X T w, σn2 I)
T 2
(2πσn2 )n/2 2σn
Z (1.5)
p(y|X) = p(y|X, w)p(w)dw.
1 −1
p(w|y, x) ∼ N ( A xy, A−1 ), A = σn2 xxT + Σ−1
p (1.6)
σn2
Then, after obtaining the posterior weights, we can use them on non observed data x∗ to
get the predictive distribution of the output f∗ which is also Gaussian:
1 T −1
p(f∗ |x∗ , x, y) = N ( x A xy, xT∗ A−1 x∗ ). (1.7)
σn2 ∗
With very similar steps as before, and by noting Φ(X) the aggregation of φ(x) for all
cases in the training set, it leads to the following predictive distribution:
1
p(f∗ |x∗ , X, y) ∼ N ( φ(x∗ )T A−1 Φ(X)y, φ(x∗ )T A−1 φ(x∗ )), (1.9)
σn2
which, for later implementation performance, can be re-arranged using the Woodbury
matrix identity property [15] and be written as:
p(f∗ |x∗ , X, y) ∼ N (φT∗ Σp Φ(K + σn2 I)−1 y, φT∗ Σp φ∗ − φT∗ Σp Φ(K + σn2 I)−1 ΦT Σp φ∗ ), (1.10)
Having defined Σp as positive definite, we can rewrite the inner product by using singular
value decomposition as:
1/2
By defining ψ(x) = Σp φ(x), we can then define a covariance function, or rather kernel
function k, such as:
k(x, x0 ) = ψ(x) · ψ(x0 ) (1.12)
This procedure reduces drastically the computational power required to compute the inner
product, and enables us to work directly with inputs in the function space.
XX
v T K(x, x0 )v = k(xi , xj )vi vj ≥ 0. (1.13)
i j
− | x − x0 |2
0 2
k(x, x , l) = σ exp , (1.14)
2l2
with σ and l being hyper parameters. This can be equivalently described as a Bayesian
regression with prior weights w ∼ N (0, σp2 I) and an infinite set of basis functions of the
form:
(x − c)2
φ(x, l) = exp − (1.15)
2l2
An example of covariance matrix from the quadratic exponential kernel (also known as
the RBF kernel) covariance is shown in Figure 1.3. We see that this quadratic covariance
decreases exponentially the further away the function values are from each-other.
6 1.3 Gaussian Process
Figure 1.3: Example of exponential quadratic covariance matrix (a) and covariance
k(x, 0) (b)
where K(x, x0 ) is the covariance matrix with entries Ki,j = k(xi , x0j ). In other terms, a
Gaussian process can be understood as a multivariate Gaussian with an uncountable
infinite number of random variables.
The mean function m can be any real-valued function, and is very often set to zero by
subtracting the mean from the data. The kernel function k can be any valid Mercer’s
kernel, such as the exponential squared kernel shown before. This way, a Gaussian process
is often written as:
f (x) ∼ GP (m(x), k(x, x0 )). (1.17)
To sample functions from the Gaussian process we just need to define the mean and
covariance functions. The covariance function k models the joint variability of the Gaussian
process random variables. It returns the modelled covariance between each pair of inputs.
That is to say, with f and f∗ being respectively the training and test outputs, we have
the following joint distribution:
" # " #!
f K(X, X) K(X, X∗ )
∼N 0, (1.18)
f∗ K(X∗ , X) K(X∗ , X∗ )
1.4 Regression with noisy observations 7
Figure 1.4: Example of GP functions sample for the prior (a) and post (b) distributions
As we have seen, the specification of this covariance function, the kernel function, implies
a distribution over functions. By choosing a specific kernel function it is possible to set
prior information on this distribution. We can sample function evaluations of a function
drawn from a Gaussian process at a finite but arbitrary set of points. Thanks to the
properties of Gaussian processes, we can evaluate the posterior by conditioning the joint
Gaussian prior distribution on the observation:
• Let X = µ + AZ, which has the desired distribution thanks to the affine transfor-
mation property.
In Figure 1.4 we can see 10 samples from the prior and posterior distributions, with the
grey area and the black line being respectively the standard deviation and the mean. We
see that a Gaussian process is the weighted averages of the observed variables.
Figure 1.5: Example of noise free (a) and noise (b) regression on sinusiode
In Figure 1.5 we can see the effect of a noise assumption on the data. This is be done by
adding it to the covariance kernel when modelling the observation.
Figure 1.6: Example of different values of hyper-parameters for the quadratic exponential
kernel
to obtain optimal parameters by maximizing the marginal likelihood which is defined as:
1 1 > −1
p(y | µ, Σ) = p exp − (y − µ) Σ (y − µ) , (1.23)
(2π)d |Σ| 2
with d being the dimension of the marginal. We can then transform the equation by
observing the log-likelihood:
1 1 d
log p(y | µ, Σ) = − (y − µ)> Σ−1 (y − µ) − log |Σ| − log 2π (1.24)
2 2 2
We see that the first term corresponds to a fit related to the data, whereas the rest
corresponding to a complexity penalty. Then, the optimal parameters can be found by
minimizing the log marginal likelihood by a gradient descent.
If one were to implement a fixed mean function then the predictive mean would become:
with Ky = K +σn2 and the predictive variance remaining unchanged from the one previously
observed.
with f (x) ∼ GP (0, k(x, x0 )) is a zero mean Gaussian process, and h(x) the column vector
containing the basis functions. This way of expressing the regression shows that this
process is similar to assimilating the data to a linear model with its residuals being
captured by the Gaussian process. Then, when fitting the model, one should optimize
jointly the parameters of the basis mean and the hyperparameters of the covariance.
Additionally, if we assume a Gaussian distribution on the prior such as β ∼ N (b, B), we
can describe the whole regression as the following Gaussian process:
We see that the probability distribution of the parameters of the mean affect the covariance.
We can then obtain the predictive mean and covariance:
with:
We see that the mean is the combination of the linear output, which depends on the data
input and the prior, and the Gaussian process model prediction of the residuals. The
covariance ends up being the sum of the zero mean process and the contribution of the
new introduced parameters.
prior distribution over the target functions and uses the observed training data to define a
likelihood function whereas KRR learns a linear function, which is is chosen based on the
mean-squared error loss with ridge regularization, in the space induced by the respective
kernel which corresponds to a non-linear function in the original space. But both use the
Bayes theorem, and define a Gaussian posterior distribution over target functions, whose
mean is used for prediction.
Under the Black-Scholes model and for a non-dividend-paying asset, the price of a call
option [see 2] is:
CT = N (d1 )St − N (d2 )Ke−r(T −t) , (2.2)
with N the cumulative distribution function of the standard normal distribution and:
S0 σ2
log K
+T r+ 2
d1 = √
σ T (2.3)
√
d2 = d1 − σ T − t
12 2.1 Curve Fitting
with S0 the initial value of the underlying asset,K the strike price, r the risk free rate, σ
the implied volatility, T the time to maturity.
For hedging purposes, one can be interested to measure the corresponding Gamma of a
portfolio containing this options. Gamma is defined as the second derivative of the value
function with respect to the underlying price:
∂CT
Γ= , (2.4)
∂S
−d2
1
1 e 2
Γ= √ √ . (2.5)
S0 σ T 2π
We can see that estimating such a function is not trivial and that getting an accurate
non-linear regression over the five parameters is definitely very time consuming.
Figure 2.1: Predictions curves (a) and predictions errors (b) for GPR, Grid and polyno-
mial interpolation
Table 2.1: Gamma surface fit performance for different grid sizes
the scope of the testing grid. In Table 2.1 we can see the corresponding maximum (MAE)
and average (AAE) absolute errors for different sizes of input grids.
Figure 2.2: Gamma surface fit (a) and predictions errors (b) for GPR
14
In simpler pricing models, such as the standard Black-Scholes model, the implied volatility
surface across strike prices and time to maturity is often presumed constant and would
lead to a flat graph. But we see that in practice, this is hardly the case. Indeed, we
observe that out-of-the-money strike prices tend to have higher implied volatilities than
at-the-money strike prices, giving place to a volatility smirk, hence the need of a graphical
representation to illustrate the market tendencies. Additionally, as the time to maturity
increases, volatilities across strike prices tend to converge to a relatively constant level.
Though, we usually observe a volatility smirk across the range of tenors; options with
shorter time to maturity often have a higher volatility than options with longer maturities.
This observation is seen to be even more pronounced in periods of high market stress.
This being said, surfaces can differ drastically between options and option chains, and
therefore the representation of the volatility surface can lead to slightly wavy rather than
strictly convex graph.
With a relatively large amount of strike prices but fewer tenors, the volatility surface can
then help to price options. With interpolation, we can infer market’s implied volatilities
for a larger range of strikes and tenors than for the ones actively traded.
In Figure 3.1 is shown a volatility surface fit from freely available market data of a
call option chain of S&P 500. Since the points are the last traded value and that the
liquidity of the asset is far from other derivatives such as interest rate swaps, optimizing
the noise parameter in the input data is crucial. One can also note that the option
chain evolves in two time scales, evolving from a weekly basis, to a monthly and then
yearly time to maturity. Therefore, the non-uniform sparsity of the data makes manual
adjustment needed when optimizing parameters, in order both deal with the ’shaky’ or
wavy appearance of the input data.
4 Derivative Pricing
On the derivative pricing market, a huge variety of models are used, since specific models
can be more explanatory regarding the characteristics and behavior of the underlying asset.
Here we will delve into a frequently used model more advanced than the Black-Scholes, the
Heston model and two different pricing techniques, the binomial tree and path simulating
with Monte Carlo simulations.
4.1 European Options 15
with f the risk neutral density function corresponding to the risk-neutral measure under
the fundamental theorem of asset pricing, xt = log(St ), k = log(K).
where St and VT are price and volatility processes, Wt and Zt are Wiener processes with
correlation ρ, θ the long-run mean of volatility, κ the rate of reversion, r the risk free rate
and σ volatility of volatility.
It is similar in several aspects to the more common Black-Scholes model, which in contrast,
assumes a constant volatility. There are empirical evidences and mathematical aspects
[see 5] that such a model is a more realistic representation of market’s behavior. For
example, studies have shown that that asset’s log-return distribution has heavy tails and
high peaks, and therefore more likely leptokurtic. Such issues are addressed by different
parameters of the model, such as ρ, which affects the skewness of the distribution, and σ,
which affects the kurtosis of the distribution [see 11].
Under the Heston model, a vanilla European option on a non-dividend paying asset has a
closed form solution:
C(St , Vt , t, T ) = St P1 − Ke−r(T −t) P2 , (4.3)
∞
e−iφ log(K) fj (x, Vt , T, φ)
Z
1 1
Pj (x, Vt , T, K) = + Re dφ
2 π 0 iφ
fj = exp (C(T − t, φ) + D(T − t, φ)Vt + iφx)
1 − gedr
a
C(T − t, φ) = rφir + 2 (bj − ρσφi + d)(T − t) − 2 log
σ 1−g
dr
(4.4)
bj − ρσφi + d 1 − e
D(T − t, φ) =
σ2 1 − gedr
bj − ρσφi + d
g=
bj − ρσφi − d
q
d = (ρσφi − bj )2 − σ 2 (2uj φi − φ2 )
While a numerical method estimating this solution is extremely accurate, it is also very
time consuming. Therefore it is mostly used as a baseline when comparing to other
computation methods.
A commonly used method is the Fast Fourier Transform (FFT) [see 4], which is a discrete
approximation of a Fourier transform. With the Fourier transform F and the corresponding
4.1 European Options 17
Then we can show that the corresponding discrete approximation under FFT is:
N
i2π
X
ω(k) = e− N (j−1)(k−1)
f (xj ) (4.6)
j=1
The aim of this algorithm is to reduces the number of multiplications in the required N
summations. It does so by reducing them from an order of N 2 to that of N ln2 (N ). As
mention Carr and Madan (1999) to perform such a transformation, the function needs
to be square integrable and Ct (k) tends to S0 as k tends to −∞, thefore one needs to
introduce a parameter α, a damping factor, such as:
with α > 0. Then, it is possible to express the characteristic function of the scaled call by:
How to then obtain FCT is detailed by Heston (1999) and is under the form:
where:
This characteristic function can then be evaluated to approximate by the Fourier transform:
N
eαku X − i2π (j−1)(u−1) ibvj η
e FcT (vj ) 3 + (−1)j − δj−1 ,
Ck (ku ) = e N (4.12)
π j=1 3
where the following parameters have been chosen to reach a compromise between accuracy
and computational time :
vj = η(j − 1)
c
η=
N
c = 600
N = 4096 (4.13)
π
b=
η
2b
ku = −b + (u − 1), u = 1, 2, ..., N + 1
N
One strength of this numerical method is that it evaluates call prices for a vector of strike
prices with many terms in common in the calculations.
Table 4.1: Training and test set for the Heston model
Table 4.2: Fit Performance for vanilla European call for different number of points
The empirical results are summarized in Table 4.2. We see that the accuracy increases as
the training set becomes larger. But by doing so, this increases the computational load.
Nevertheless, there are other factors that can be used to improve the accuracy without
increasing the amount of data points. One can indeed decrease the number of parameters
by fixing a few ones in order the reduce the dimensionality, or reduce the width of the
training set since a higher density gives a better accuracy. An other interesting result
from the regression is the speedup performance of the predictions compared to the FFT
calculations. For an input of 10000 points we observe a speedup of 16 times.
At each node the security movements are translated by a chosen probability of going up or
20 4.2 American Options
down, then the option price is evaluated, and finally discounted back to obtain the price
at the first node. The advantage of tree methods is that not only they can be used to
value just about any type of option, but also that they can be easily implemented. We see
that through this method the price of a European option converges to the Black-Scholes
price. Here, the valuation of American options is done by assessing the profitability of
early exercise at every node.
One popular tree method is the binomial model, often called CRR tree [see 8]. In this
tree, at each step, the price of the underlying instrument will move up or down by a
specific value, respectively u and d. The value of these factors depends on the underlying
volatility σ and the time duration of a step t = T /n such as we have:
√
u = exp σ t
√ 1 (4.14)
d = exp σ t =
u
e(r−q)t −d
with the corresponding probabilities of ups and downs being: p = u−d
and 1 − p.
We can see here that the CRR tree is recombinant, that is to say that any path having
the same amount of ups and downs will end up in the same node. We can see such a tree
in Figure 4.1.
In the case of American option, the exercise value is evaluated at each node, which for a
put option is:
max [K − S, 0] (4.15)
Then, starting an iterative operation from the final node, the binomial value is calculated
such as:
Ct−∆t,i = er∆t (pCt,i + (1 − p)Ct,i ) , (4.16)
where Ct,i is the option value for the ith node at time t. But since early exercise is possible
4.3 Barrier Options 21
Table 4.3: Training and test set for the Heston model
Table 4.4: Fit Performance for American put for different number of points
Two general types of barrier options exist, which are knock-out option where the payoff is
null if the barrier is crossed, and knock-in option where in contrary the contract becomes
22 4.3 Barrier Options
valid once the barrier is crossed. Here we will focus on a down-and-in barrier call option,
which means the contract, the right to buy, is granted once a lower barrier than the initial
underlying price is crossed.
For this type of option, increasing the absolute difference between the initial price and
the barrier level is negatively correlated to the option price, since with other parameters
staying the same, the probability to hit the barrier decreases. As the difference decreases,
the price converges with the vanilla equivalent. The volatility is also an important factor
in the pricing, as a higher volatility leads to a higher probability to hit the barrier, and
therefore for the knock-out is positively correlated to the option price.
4.3.1 Discretization
The price of the barrier is influenced by the frequency of the monitoring. There is a closed
formula for continuous monitoring for several barrier option types. For a down-and-out
put option with strike price K, barrier Sb and maturity T , under the Black-Scholes model,
we have [see 13]:
2λ 2λ−2
−qt Sb −rt Sb p
Cdown-and-in = S0 e N (y)–Ke N (y − σ (t)) (4.18)
S0 S0
with:
r − q + σ 2 /2
λ= 2
σ2
Sb (4.19)
log S0 K √
y= √ + λσ T
σ T
In reality, however, prices are monitored discretely and there exist a discrepancy between
the previous analytical solution and the actual price. This shift can be determined by
a set of factor which are the monitoring frequency, the asset volatility and a constant
β ≈ 0.5826 [see 3].
This shift can be explained that by decreasing the amount of observe events, the probability
of hitting the barrier decreases, and we can expect the price of a down-and-out to also
decreases as the number of observation decreases.
The main concept behind Monte Carlo simulations is to estimate some parameter:
θ = E[g(X)], (4.20)
Then, by the law of large numbers, we have that for when n tends to infinity then the
estimator converges in probability to the actual value of the estimated parameter. If we
note the sample variance as:
n
2 1 X 2
s = g(Xi ) − θ̂ (4.22)
n − 1 i=1
θ̂ − θ
p ∼ N (0, 1) (4.23)
s/n
And therefore, by using the corresponding quantile Tables we can appreciate the accuracy:
s s
P θ̂ − z1−α/2 √ < θ < θ̂ + z1−α/2 √ ≈1−α (4.24)
n n
Therefore the precision of the estimator is directly linked to the standard error √sn . In
order to increase the speed of convergence of the estimator for a set of draws, variance
reduction techniques may be used, such as antithetic variates or control variates. Very
often in the case of Monte Carlo simulations, the price of the underlying asset is supposed
to behave as a geometric Brownian motion:
σ2
d log St = µ− dt + σdWt (4.26)
2
24 4.3 Barrier Options
t
σ2
Z
St = S0 exp (µ − )t + σ dW (τ ) (4.27)
2 0
We can then discretize over the time interval with small time steps, and by using the
properties of standard wiener processes we have:
σ2 √
St+dt = St exp (µ − )dt + σ dt (4.28)
2
where ∼ N (0, 1) is a standard normal random variable. This last equation can be used
for generating paths to price the barrier option.
The problem becomes harder when we allow the volatility to be stochastic. Different
discretization schemes can be used in this regard. One of them is the quadratic exponential
method, which was introduced by Anderson in 2006 [see 1] and is very accurate when
approximating a stochastic volatility, typically in the case of the Heston model. To
represent a non-central Chi squared distribution, it uses a combination of a quadratic
function of a standard Gaussian variable for the larger values, and an exponent function
for the lower segment of the distribution. The segment of large value can be approximated
by the following formula:
V (t) = a(b + Zv )2 , (4.29)
m
a=
1 + b2
p p
b2 = 2ψ −1 − 1 + 2 2ψ −1 2ψ −1 − 1
s2
ψ= (4.30)
m2
m = θ + (V0 − θ)κdt
V0 η 2 eκdt θη 2
s2 = (1 − eκdt ) + (1–eκdt )2
−κ −2κ
The segment of lower values can be approximated by using the inverse transform sampling
method such as :
V (t) = L−1 (Uv ), (4.31)
4.3 Barrier Options 25
where Uv is a uniform random variable. In our case, the inverse transform is defined by:
β −1 log( 1−p
−1 1−Uv
) p < Uv ≤ 1
L (Uv ) = (4.32)
0 0 ≤ Uv ≤ p,
where:
ψ−1
p=
ψ+1 (4.33)
1−p
β=
m
Applying Ito’s formula to the Heston model dynamics, the underlying stock price, for
s < t, can be expressed as:
Z t Z tp
1
St = Ss exp (r − V (u)du + V (u)dW ) (4.34)
s 2 s
which is equivalent to :
Z tp Rt
Vt –Vs + κθ∆t − κ s
V (u)du
V (u)dW1 = , (4.37)
s η
Table 4.5: Training and test set for the Monte Carlo simulations
κρ 1
log(St ) = log(Ss ) + r∆t − (Vs + Vt )∆t − (Vs + Vt )∆t
2η 4
ρ p p p (4.39)
+ (Vt − Vs + κθ∆t) + 1 − ρ2 Vs + Vt ∆t/2Z,
η
where Z is a standard random variable. Then, in order to generate sample path, the
process is the following:
Table 4.6: Fit Performance for the Barrier option for different number of points
28
5 Conclusion
We showed how Gaussian process regression is a logical extension of a Bayesian linear
regression by bringing more alternatives and flexibility. We presented several applications
under which this type of regression is performing well, such as curve fitting, summarizing
surfaces and pricing estimation from different models. An important factor in the procedure
is the choice of kernel function and the optimization of its hyperparameters. Since the
prediction process depends on the inversion of large covariance matrices, performing
algorithm are required to carry out these calculations. As we have seen, due to an
exponentially increasing amount of points required in higher dimensions, a clever choice
in the input data is necessary. However, since the learning process or fitting is only need
to be completed once, the prediction process is much less affected by these limitations,
and showed that it even performs faster than several pricing techniques. These methods
and techniques can be implemented to a much broader range of applications, which makes
Gaussian process regression a promising tool.
References 29
References
[1] Andersen, L. (2007). Efficient simulation of the Heston stochastic volatility model.
Journal of Computational Finance, 11.
[2] Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities.
Journal of Political Economy, 81(3):637–654.
[3] Broadie, M., Glasserman, P., and Kou, S. (1997). A continuity correction for discrete
barrier options. Mathematical Finance, 7(4):325–349.
[4] Carr, P., Stanley, M., and Madan, D. (2001). Option valuation using the Fast Fourier
Transform. Journal of Computational Finance, 2.
[5] Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical
issues. Quantitative Finance, 1(2):223–236.
[6] Cox, J. C., Ingersoll, J. E., and Ross, S. A. (1985). A theory of the term structure of
interest rates. Econometrica, 53(2):385–407.
[7] Cox, J. C. and Ross, S. (1976). The valuation of options for alternative stochastic
processes. Journal of Financial Economics, 3(1-2):145–166.
[8] Cox, J. C., Ross, S. A., and Rubinstein, M. (1979). Option pricing: A simplified
approach. Journal of Financial Economics, pages 229–264.
[9] Heston, S. L. (2015). A Closed-Form Solution for Options with Stochastic Volatility
with Applications to Bond and Currency Options. The Review of Financial Studies,
6(2):327–343.
[10] Mercer, J. (1909). Functions of positive and negative type, and their connection
with the theory of integral equations. Philosophical Transactions of the Royal Society,
London, 209:415–446.
[11] Mondal, M., Alim, M., Rahman, M., and Biswas, M. H. A. (2017). Mathematical
analysis of financial model on market price with stochastic volatility. Journal of
Mathematical Finance, 07:351–365.
[12] Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning.
Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, USA.
[13] Rubinstein, M. and Reiner, E. S. (1991). Breaking down the barriers. Risk Magazine,
4(8):28–35.
[14] Spiegeleer, J. D., Madan, D. B., Reyners, S., and Schoutens, W. (2018). Machine learn-
ing for quantitative finance: fast derivative pricing, hedging and fitting. Quantitative
Finance, 18(10):1635–1643.