0% found this document useful (0 votes)

77 views36 pages

Computational Finance

This document summarizes a project report on using Gaussian process regression in computational finance. It introduces Gaussian processes and Bayesian regression approaches. It then provides examples of using GPR to fit financial Greeks like gamma, summarize implied volatility surfaces, and estimate prices of vanilla, American and barrier options. The results show GPR can achieve speed-ups of several magnitudes compared to classical implementations, while maintaining precision for practical use.

Uploaded by

Federico Dragoni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views36 pages

Computational Finance

Uploaded by

Federico Dragoni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

U.U.D.M.

Project Report 2020:2

Gaussian Process Regression In

Computational Finance
Hugues Herfurth

Examensarbete i matematik, 30 hp
Handledare och examinator: Maciej Klimek
Januari 2020

Department of Mathematics
Uppsala University
i

Abstract
Machine learning can be deployed not only in order to solve non trivial problems, but also
faster than traditional implemented solutions. We illustrate several classical problems
in the field of computational finance where it is possible to fit, with Gaussian process
regressions, complex functions under and beyond the Black-Scholes model with high
accuracy. The results from the regressions, in our examples, show speed-ups of several
magnitudes compared to the classical implementations while keeping a precision well
within the acceptable limits for practical use. The concrete examples consist in financial
Greeks fitting, summarizing implied volatility surfaces, as well as estimating vanilla and
exotic options while reducing the computation time.

Keywords – Machine learning; Gaussian process; Derivative pricing; Volatility surface

Acknowledgements

I would like to thank my supervisor, Professor Maciej Klimek, for his precious time and
guidance during this project. I would also like to thank my family for their relentless
support through all these years of studying.
Contents iii

Contents
1 Introduction 1
1.1 Multivariate Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 The linear approach . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Linear to non linear . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 The kernel trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 The choice of kernel . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Regression with noisy observations . . . . . . . . . . . . . . . . . . . . . 7
1.5 Choice of the hyper parameters . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Incorporating a non zero mean function . . . . . . . . . . . . . . . . . . . 9
1.7 Comparison with Kernel Ridge Regression . . . . . . . . . . . . . . . . . 10

2 Financial Greek Example - Gamma 11

2.1 Curve Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Surface Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Summarizing Volatility Surface 13

4 Derivative Pricing 14
4.1 European Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1 Heston Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.2 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.3 GPR Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 American Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Binomial Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 GPR Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Barrier Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.2 Monte Carlo paths simulation . . . . . . . . . . . . . . . . . . . . 22
4.3.3 GPR Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Conclusion 28

References 29
iv List of Figures

List of Figures
1.1 Bi-variate Gaussian Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bi-variate Gaussian marginal and conditional distributions . . . . . . . . 3
1.3 Exponential Quadratic example . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Prior and post GP samples . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 With and without noise samples comparison . . . . . . . . . . . . . . . . 8
1.6 Hyper-parameter values examples . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Regression models comparison . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Gamma surface fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Implied volatility surface fit . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Binomial tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
List of Tables v

List of Tables
2.1 Gamma surface fit performance for different grid sizes . . . . . . . . . . . 13
4.1 Training and test set for the Heston model . . . . . . . . . . . . . . . . . 19
4.2 Fit Performance for vanilla European call . . . . . . . . . . . . . . . . . . 19
4.3 Training and test set for the binomial tree model . . . . . . . . . . . . . 21
4.4 Fit Performance for American put . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Training and test set for the Heston model . . . . . . . . . . . . . . . . . 26
4.6 Fit Performance for the Barrier option . . . . . . . . . . . . . . . . . . . 27
1

1 Introduction
The concept of Gaussian Processes leads to a supervised learning method aimed at solving
regression and probabilistic classification problems, and as such they are extensively used
for for modeling dependent data. As they are statistical models, they interpolates the
observations and induce probabilistic predictions which give empirical confidence intervals
which are very practical in order to refine and adapt the fitting in the areas of interest.
They are also extremely versatile as a diversity of kernels can be specified in order to
reach desired functions characteristics such a smoothness.

The interest towards these processes resides in their inheritance properties. Thanks to
this, they can be used on small samples and the regression process has implementation
practicalities when fitting the model as one does not need to define higher than second order
moments. Furthermore, this regression approach is non parametric, as it is theoretically
possible to draw from an infinite set of functions, which is very convenient when adjusting
kernels. Nevertheless, this method is not exempted of potential under or over fitting
problems and has to be adjusted when the output is non Gaussian (e.g. binary). They
can also suffer from a lack of accuracy when used in high dimensional spaces, typically
when the number of features exceeds several dozens, but these limitations are not reached
in our area of interest since the Heston model and binomial tree incorporate less than ten
parameters.

In this thesis, I present the theory and numerical tools in order to implement such a
regression in the field of computational finance. We discuss popular model and imple-
mentation for pricing several financial derivatives and compare their performance against
a wisely fitted Gaussian process. The theoretical content is following the notation and
logic from Rasmussen [see 12], and the diverse implementation followed the logic of De
Spiegeleer [see 14]. The implementations were written in Python and Matlab, making use
respectively of the sklearn library and the financial toolbox.
2 1.1 Multivariate Gaussian

Figure 1.1: Bi-variate Gaussian

1.1 Multivariate Gaussian

The multivariate Gaussian distribution, or joint Gaussian distribution, is a multi-dimension
generalization of the one-dimensional normal distribution. It represents the distribution of
a multivariate random variable that is made up of multiple random variables that can be
correlated with each other. An illustration can be seen in Figure 1.1 for a 2d distribution.

Like the normal distribution, the multivariate normal is defined by a set of parameters:
the mean vector, which is the expected value of the distribution; and the covariance
matrix, which measures how dependent two random variables are.

The joint probability density function for a distribution of dimentionality d is defined as:

1 1 −1 0
f (x | µ, Σ) = p exp − (x − µ)Σ (x − µ) , (1.1)
| Σ | (2π)d 2

Where x is a random vector of size d, with µ as the mean vector and Σ is symmetric
and semi-definite covariance matrix of size d × d, and | Σ | its corresponding determinant.
Such a distribution of often denoted as N (µ, Σ).

This type of distribution has very important properties for the use of Gaussian processes:

• The sum of jointly Gaussians is Gaussian

• An affine transformation of a Gaussian is Gaussian

• The marginal distribution of a multivariate Gaussian is Gaussian

• The conditional distributions of a Gaussian are Gaussians

A graphical illustration of these two properties is shown in Figure 1.2.

1.2 Bayesian Regression 3

Figure 1.2: Bi-variate Gaussian marginal (a) and conditional (b) distributions

1.2 Bayesian Regression

In the context of Bayesian regression, we assume that we are in the case where we can
describe the observed data y = (y1 , ..., yn ) from the input x = (x1 , ..., xn ) as:

y = f (x) + , (1.2)

where f is a Gaussian process and = (1 , ..., n ) are i.i.d random variables representing
the noise in the data such as ∼ N (0, σn2 ).

1.2.1 The linear approach

If we were to take the linear Bayesian approach, we would have f as:

f (x) = xT w, (1.3)

with w ∼ N (0, Σp ) a n × 1 vector of prior weights. Then, using Bayes’ rule we would get
for the posterior weights, wiht X the vector of inputs:

p(y|X, w)p(w) likelihood · prior

p(w|y, X) = = , (1.4)
p(y|X) marginal likelihood
4 1.2 Bayesian Regression

This leads to a Gaussian for the posterior distribution of the form:

1 −1
p(w|y, x) ∼ N ( A xy, A−1 ), A = σn2 xxT + Σ−1
p (1.6)
σn2

Then, after obtaining the posterior weights, we can use them on non observed data x∗ to
get the predictive distribution of the output f∗ which is also Gaussian:

1 T −1
p(f∗ |x∗ , x, y) = N ( x A xy, xT∗ A−1 x∗ ). (1.7)
σn2 ∗

1.2.2 Linear to non linear

In order to express non linear effects that would be impossible to transcribe with the
previous regression scheme, we can project the inputs into a feature space. This can be
done with a set of basis functions, e.g. power functions which would lead to a polynomial
regression. That is to say, with a set of basis functions represented by the column vector
φ = (φ1 , ..., φn ):
f (x) = φ(x)T w (1.8)

With very similar steps as before, and by noting Φ(X) the aggregation of φ(x) for all
cases in the training set, it leads to the following predictive distribution:

1
p(f∗ |x∗ , X, y) ∼ N ( φ(x∗ )T A−1 Φ(X)y, φ(x∗ )T A−1 φ(x∗ )), (1.9)
σn2

which, for later implementation performance, can be re-arranged using the Woodbury
matrix identity property [15] and be written as:

p(f∗ |x∗ , X, y) ∼ N (φT∗ Σp Φ(K + σn2 I)−1 y, φT∗ Σp φ∗ − φT∗ Σp Φ(K + σn2 I)−1 ΦT Σp φ∗ ), (1.10)

by expanding A = σn2 XX T + Σ−1

p and writing φ∗ = φ(x∗ ), φ(x) = φ, Φ = Φ(X) and
K = ΦΣp Φ.
1.2 Bayesian Regression 5

1.2.3 The kernel trick

We saw that the terms in the predictive distribution are projected in the function space
by matrices of the form φ(x)T Σp Φ(x0 ) where x and x0 are respectively observed and non
observed inputs.

Having defined Σp as positive definite, we can rewrite the inner product by using singular
value decomposition as:

φ(x)T Σp Φ(x0 ) = Σ1/2 1/2 0

p φ(x) · Σp φ(x ). (1.11)

1/2
By defining ψ(x) = Σp φ(x), we can then define a covariance function, or rather kernel
function k, such as:
k(x, x0 ) = ψ(x) · ψ(x0 ) (1.12)

This procedure reduces drastically the computational power required to compute the inner
product, and enables us to work directly with inputs in the function space.

1.2.4 The choice of kernel

The kernel function can be any valid Mercer’s kernel [see 10], i.e. a positive definite
function, that is to say a function k for which any subset x ∈ X leads to a positive
semi-finite covariance matrix K(x, x0 ). That is to say, for any v ∈ RD :

XX
v T K(x, x0 )v = k(xi , xj )vi vj ≥ 0. (1.13)
i j

A commonly used covariance function is the quadratic exponential kernel:

− | x − x0 |2

0 2
k(x, x , l) = σ exp , (1.14)
2l2

with σ and l being hyper parameters. This can be equivalently described as a Bayesian
regression with prior weights w ∼ N (0, σp2 I) and an infinite set of basis functions of the
form:
(x − c)2

φ(x, l) = exp − (1.15)
2l2

An example of covariance matrix from the quadratic exponential kernel (also known as
the RBF kernel) covariance is shown in Figure 1.3. We see that this quadratic covariance
decreases exponentially the further away the function values are from each-other.
6 1.3 Gaussian Process

Figure 1.3: Example of exponential quadratic covariance matrix (a) and covariance
k(x, 0) (b)

1.3 Gaussian Process

A Gaussian process is a stochastic process where every finite subset of its collection of
random variables has a multivariate normal distribution. That is to say, for an index
set X, a real-valued stochastic process {f (x), x ∈ X} is a Gaussian process if, for any
subset x = (x1 , ..., xn ) ∈ X, f (x) has a joint Gaussian distribution. It is then completely
described by its mean function m and its covariance function k:

f ∼ N (m(x), K(x, x0 )) (1.16)

where K(x, x0 ) is the covariance matrix with entries Ki,j = k(xi , x0j ). In other terms, a
Gaussian process can be understood as a multivariate Gaussian with an uncountable
infinite number of random variables.

The mean function m can be any real-valued function, and is very often set to zero by
subtracting the mean from the data. The kernel function k can be any valid Mercer’s
kernel, such as the exponential squared kernel shown before. This way, a Gaussian process
is often written as:
f (x) ∼ GP (m(x), k(x, x0 )). (1.17)

To sample functions from the Gaussian process we just need to define the mean and
covariance functions. The covariance function k models the joint variability of the Gaussian
process random variables. It returns the modelled covariance between each pair of inputs.
That is to say, with f and f∗ being respectively the training and test outputs, we have
the following joint distribution:
" # " #!
f K(X, X) K(X, X∗ )
∼N 0, (1.18)
f∗ K(X∗ , X) K(X∗ , X∗ )
1.4 Regression with noisy observations 7

Figure 1.4: Example of GP functions sample for the prior (a) and post (b) distributions

As we have seen, the specification of this covariance function, the kernel function, implies
a distribution over functions. By choosing a specific kernel function it is possible to set
prior information on this distribution. We can sample function evaluations of a function
drawn from a Gaussian process at a finite but arbitrary set of points. Thanks to the
properties of Gaussian processes, we can evaluate the posterior by conditioning the joint
Gaussian prior distribution on the observation:

(f∗ | X∗ , X, f ) ∼ N (K(X∗ , X)K(X, X)−1 f,

(1.19)
K(X∗ , X∗ ) − K(X∗ , X)K(X, X)−1 K(X, X∗ )),

we can go through the following steps to draw function samples:

• Find a matrix A such as Σ = AAT , which is possible by using the Cholesky

decomposition,

• Generate a vector (Z1 , ..., Zn ) of independent standard variables,

• Let X = µ + AZ, which has the desired distribution thanks to the affine transfor-
mation property.

In Figure 1.4 we can see 10 samples from the prior and posterior distributions, with the
grey area and the black line being respectively the standard deviation and the mean. We
see that a Gaussian process is the weighted averages of the observed variables.

1.4 Regression with noisy observations

The assumptions that we made before suggest that we receive observations without noise,
that is to say draws from a noiseless distribution. This is observable in Figure 1.4 (b)
as the variance in the posterior distribution is null on observation points. For a more
realistic modelling, where observations always come with some noise or inaccuracy, is it
better to assume that some white noise with variance σn2 comes polluting the data such
8 1.5 Choice of the hyper parameters

Figure 1.5: Example of noise free (a) and noise (b) regression on sinusiode

as we get y = f (x) + . The prior on the noisy observation then becomes:

cov(y) = K(X, X) + σn2 In . (1.20)

Then, the joint prior that we saw before becomes:

" # " #!
y K(X, X) + σn2 In K(X, X∗ )
∼N 0, (1.21)
f∗ K(X∗ , X) K(X∗ , X∗ )

and conditioning the joint Gaussian prior distribution on the observations:

(f∗ | X∗ , y, f ) ∼ N (K(X∗ , X)[K(X, X) + σn2 In ]−1 y,

(1.22)
K(X∗ , X∗ ) − K(X∗ , X)[K(X, X) + σn2 In ]−1 K(X, X∗ )),

In Figure 1.5 we can see the effect of a noise assumption on the data. This is be done by
adding it to the covariance kernel when modelling the observation.

1.5 Choice of the hyper parameters

One can to manually tune the hyper-parameters of the kernel function to reach e.g. a
desired level of smoothness in the resulting regression. The effect of the length-scale l and
variance σ 2 for the squared exponential are shown in Figure 1.6. However, it is possible
1.6 Incorporating a non zero mean function 9

Figure 1.6: Example of different values of hyper-parameters for the quadratic exponential
kernel

to obtain optimal parameters by maximizing the marginal likelihood which is defined as:

1 1 > −1
p(y | µ, Σ) = p exp − (y − µ) Σ (y − µ) , (1.23)
(2π)d |Σ| 2

with d being the dimension of the marginal. We can then transform the equation by
observing the log-likelihood:

1 1 d
log p(y | µ, Σ) = − (y − µ)> Σ−1 (y − µ) − log |Σ| − log 2π (1.24)
2 2 2

We see that the first term corresponds to a fit related to the data, whereas the rest
corresponding to a complexity penalty. Then, the optimal parameters can be found by
minimizing the log marginal likelihood by a gradient descent.

1.6 Incorporating a non zero mean function

When fitting a Gaussian process, one can choose a to use a zero mean function, but
that is not necessary. The limitations of a zero mean are not extensive, as the posterior
distribution will still take into account the mean of the input data.

If one were to implement a fixed mean function then the predictive mean would become:

f∗ = m(X∗ ) + K(X∗ , X)Ky −1(y − m(X)), (1.25)

10 1.7 Comparison with Kernel Ridge Regression

with Ky = K +σn2 and the predictive variance remaining unchanged from the one previously
observed.

However, it is hard and unusual to define a deterministic mean. It is generally more

convenient to specify a basis function whose parameters and coefficients β are determined
from the input data. Let consider the following regression:

g(x) = f (x) + h(x)> β, (1.26)

with f (x) ∼ GP (0, k(x, x0 )) is a zero mean Gaussian process, and h(x) the column vector
containing the basis functions. This way of expressing the regression shows that this
process is similar to assimilating the data to a linear model with its residuals being
captured by the Gaussian process. Then, when fitting the model, one should optimize
jointly the parameters of the basis mean and the hyperparameters of the covariance.
Additionally, if we assume a Gaussian distribution on the prior such as β ∼ N (b, B), we
can describe the whole regression as the following Gaussian process:

g(x) ∼ GP h(x)> b, k(x, x0 ) + h(x)> Bh(x0 )

(1.27)

We see that the probability distribution of the parameters of the mean affect the covariance.
We can then obtain the predictive mean and covariance:

X∗ = H∗> β + K∗> Ky −1(y + H > β)

= f∗ + R > β (1.28)
cov(g∗ ) = cov(f∗ ) + R> (B −1 + HKy−1 H > )−1R,

with:

β = (B −1 + HKy−1 H > )−1 (HKy−1 y + B −1 b)

(1.29)
R = H∗ − HKy−1 K∗

We see that the mean is the combination of the linear output, which depends on the data
input and the prior, and the Gaussian process model prediction of the residuals. The
covariance ends up being the sum of the zero mean process and the contribution of the
new introduced parameters.

1.7 Comparison with Kernel Ridge Regression

Both Gaussian Process and Kernel Ridge regression (KRR) learn a function space by using
a kernel trick. The difference is that the GPR uses the kernel to define the covariance of a
11

prior distribution over the target functions and uses the observed training data to define a
likelihood function whereas KRR learns a linear function, which is is chosen based on the
mean-squared error loss with ridge regularization, in the space induced by the respective
kernel which corresponds to a non-linear function in the original space. But both use the
Bayes theorem, and define a Gaussian posterior distribution over target functions, whose
mean is used for prediction.

An implementation difference is that while performing a regression with a Gaussian

process, the the kernel’s hyper-parameters are optimized during a gradient-ascent on the
marginal likelihood function. For a kernel ridge regression, one needs to perform a grid
search on a cross-validated loss function such as mean-squared error loss. An other major
difference resides in that a Gaussian process is a probabilistic model and therefore can
provide confidence intervals and posterior samples along with the predictions whereas a
kernel ridge regression can only provides predictions. It is interesting to note that the
mean of the predictive distribution of a GPR is equal to the prediction of a KRR when
using the same kernel and hyper-parameters.

2 Financial Greek Example - Gamma

To start with the performance in terms of graph fitting of the GPR, we will first observe
the Gamma curve and surface of a stock option priced with the Black-Scholes model.
European options are options contracts which can only be executed at their expiration
dates. Here we will focus our interest on vanilla European call options, which are financial
contracts in which buyer has the right, but not the obligation, to buy a stock, bond,
commodity or other asset or instrument at a specified price. Such an option is defined by
a strike price K and a time to maturity T . With St the underlying asset price at time t,
we have the corresponding payoff function:

Payoff = max(ST − K, 0) (2.1)

Under the Black-Scholes model and for a non-dividend-paying asset, the price of a call
option [see 2] is:
CT = N (d1 )St − N (d2 )Ke−r(T −t) , (2.2)

with N the cumulative distribution function of the standard normal distribution and:

S0 σ2

log K
+T r+ 2
d1 = √
σ T (2.3)
√
d2 = d1 − σ T − t
12 2.1 Curve Fitting

with S0 the initial value of the underlying asset,K the strike price, r the risk free rate, σ
the implied volatility, T the time to maturity.

For hedging purposes, one can be interested to measure the corresponding Gamma of a
portfolio containing this options. Gamma is defined as the second derivative of the value
function with respect to the underlying price:

∂CT
Γ= , (2.4)
∂S

which under this model satisfies the following equation:

−d2
1
1 e 2
Γ= √ √ . (2.5)
S0 σ T 2π

We can see that estimating such a function is not trivial and that getting an accurate
non-linear regression over the five parameters is definitely very time consuming.

2.1 Curve Fitting

Let consider a long position in a call option. As getting a smooth representation for a
range of stock price values would require a too large computational workload, simple
alternative can be used such as a grid or a low order polynomial interpolation. This way,
the grid can be computed during closed market time and then using some interpolation
intraday when needed. To illustrate the performance of the GPR against other classical
regressions, we can see in Figure 2.1 the comparison against a grid and a polynomial
interpolation of the analytical value of the Gamma. We can see that the GPR can achieve
a good accuracy even with a low number of data points. Using a grid with a higher density
of points obviously increases the accuracy, but also comes with additional computational
cost.

2.2 Surface Fitting

To further illustrate the fitting performance of the GPR, we can observe the fit of the
regression on the Gamma surface for an increasing amount of input data points. In
Figure 2.2 we can see the plot of the Gamma curve and the corresponding out of sample
prediction error for a 50 × 50 input grid. The prices being shown in dollar units, we see
that the error is extremely small for a limited amount of point compared to the surface.
We also can notice that the performance of GPRs is reduced at extremities of the data
input, therefore, to reach a better accuracy, one can extend the input training grid beyond
13

Figure 2.1: Predictions curves (a) and predictions errors (b) for GPR, Grid and polyno-
mial interpolation

Grid Size MAE AAE

25 × 25 3.2e-03 2.8e-06
50 × 50 2.6e-03 7.7e-07
75 × 75 2.3e-03 3.4e-07
100 × 100 2.1e-03 1.9e-07

Table 2.1: Gamma surface fit performance for different grid sizes

the scope of the testing grid. In Table 2.1 we can see the corresponding maximum (MAE)
and average (AAE) absolute errors for different sizes of input grids.

3 Summarizing Volatility Surface

The volatility surface is a three dimensional plot of the implied volatility of a financial
asset. It is often described as an extended graphical representation of the volatility smile.
The interest behind this graph resides in the fact that there exist disparities in the market
price of stock options and that market behavior differs depending on the moneyness and

Figure 2.2: Gamma surface fit (a) and predictions errors (b) for GPR
14

the tenor of the option.

In simpler pricing models, such as the standard Black-Scholes model, the implied volatility
surface across strike prices and time to maturity is often presumed constant and would
lead to a flat graph. But we see that in practice, this is hardly the case. Indeed, we
observe that out-of-the-money strike prices tend to have higher implied volatilities than
at-the-money strike prices, giving place to a volatility smirk, hence the need of a graphical
representation to illustrate the market tendencies. Additionally, as the time to maturity
increases, volatilities across strike prices tend to converge to a relatively constant level.
Though, we usually observe a volatility smirk across the range of tenors; options with
shorter time to maturity often have a higher volatility than options with longer maturities.
This observation is seen to be even more pronounced in periods of high market stress.
This being said, surfaces can differ drastically between options and option chains, and
therefore the representation of the volatility surface can lead to slightly wavy rather than
strictly convex graph.

With a relatively large amount of strike prices but fewer tenors, the volatility surface can
then help to price options. With interpolation, we can infer market’s implied volatilities
for a larger range of strikes and tenors than for the ones actively traded.

In Figure 3.1 is shown a volatility surface fit from freely available market data of a
call option chain of S&P 500. Since the points are the last traded value and that the
liquidity of the asset is far from other derivatives such as interest rate swaps, optimizing
the noise parameter in the input data is crucial. One can also note that the option
chain evolves in two time scales, evolving from a weekly basis, to a monthly and then
yearly time to maturity. Therefore, the non-uniform sparsity of the data makes manual
adjustment needed when optimizing parameters, in order both deal with the ’shaky’ or
wavy appearance of the input data.

4 Derivative Pricing
On the derivative pricing market, a huge variety of models are used, since specific models
can be more explanatory regarding the characteristics and behavior of the underlying asset.
Here we will delve into a frequently used model more advanced than the Black-Scholes, the
Heston model and two different pricing techniques, the binomial tree and path simulating
with Monte Carlo simulations.
4.1 European Options 15

Figure 3.1: Implied volatility surface fit with GPR

4.1 European Options

The price of a European call option, under the risk-neutral valuation approach [7], is
expressed as follows:
Z ∞
−rt
CT (K) = e max(ST − K, 0)f (S)dS
0
Z ∞
−rt
=e (ST − K)f (S)dS (4.1)
ZK∞
= e−rt (exT − ek )f (xT )dxT ,
K

with f the risk neutral density function corresponding to the risk-neutral measure under
the fundamental theorem of asset pricing, xt = log(St ), k = log(K).

4.1.1 Heston Model

The Heston Model [see 9], named after Steve Heston, is a popular option pricing model,
which introduces a stochastic volatility following a CIR [see 6] process:
p
dSt = rSt dt + Vt St dWt
p
dVt = κ(θ − Vt )dt + σ Vt dZt (4.2)
dWt dZt = ρdt,
16 4.1 European Options

where St and VT are price and volatility processes, Wt and Zt are Wiener processes with
correlation ρ, θ the long-run mean of volatility, κ the rate of reversion, r the risk free rate
and σ volatility of volatility.

It is similar in several aspects to the more common Black-Scholes model, which in contrast,
assumes a constant volatility. There are empirical evidences and mathematical aspects
[see 5] that such a model is a more realistic representation of market’s behavior. For
example, studies have shown that that asset’s log-return distribution has heavy tails and
high peaks, and therefore more likely leptokurtic. Such issues are addressed by different
parameters of the model, such as ρ, which affects the skewness of the distribution, and σ,
which affects the kurtosis of the distribution [see 11].

Under the Heston model, a vanilla European option on a non-dividend paying asset has a
closed form solution:
C(St , Vt , t, T ) = St P1 − Ke−r(T −t) P2 , (4.3)

where, with x = log(St ):

∞
e−iφ log(K) fj (x, Vt , T, φ)
Z
1 1
Pj (x, Vt , T, K) = + Re dφ
2 π 0 iφ
fj = exp (C(T − t, φ) + D(T − t, φ)Vt + iφx)
1 − gedr

a
C(T − t, φ) = rφir + 2 (bj − ρσφi + d)(T − t) − 2 log
σ 1−g
dr
(4.4)
bj − ρσφi + d 1 − e
D(T − t, φ) =
σ2 1 − gedr
bj − ρσφi + d
g=
bj − ρσφi − d
q
d = (ρσφi − bj )2 − σ 2 (2uj φi − φ2 )

While a numerical method estimating this solution is extremely accurate, it is also very
time consuming. Therefore it is mostly used as a baseline when comparing to other
computation methods.

4.1.2 Fast Fourier Transform

A commonly used method is the Fast Fourier Transform (FFT) [see 4], which is a discrete
approximation of a Fourier transform. With the Fourier transform F and the corresponding
4.1 European Options 17

inverse Fourier transform of an integrable function f being:

Z ∞
F{f (x)} = eiφx f (x)dx = F (φ)
−∞
Z ∞ (4.5)
−1 1 iφx
F {f (x)} = e F (φ)dφ = f (x)
2π −∞

Then we can show that the corresponding discrete approximation under FFT is:

N
i2π
X
ω(k) = e− N (j−1)(k−1)
f (xj ) (4.6)
j=1

The aim of this algorithm is to reduces the number of multiplications in the required N
summations. It does so by reducing them from an order of N 2 to that of N ln2 (N ). As
mention Carr and Madan (1999) to perform such a transformation, the function needs
to be square integrable and Ct (k) tends to S0 as k tends to −∞, thefore one needs to
introduce a parameter α, a damping factor, such as:

cT (k) = eαk CT (k), (4.7)

with α > 0. Then, it is possible to express the characteristic function of the scaled call by:

e−T FCT (φ − (α + 1)i)

FcT (φ) = . (4.8)
α2 + α − φ2 + i(2α + 1)φ

How to then obtain FCT is detailed by Heston (1999) and is under the form:

FCT (φ) = eC(T −t,φ)+D(T −t,φ)φ+iφxT . (4.9)

An explicit function with practical use for FFT is:

FCT (φ) = exp (A(φ) + B(φ) + C(φ)) , (4.10)

18 4.1 European Options

where:

2ψ(φ) − (ψ(φ) − γ(φ))(1 − eψ(φ)T

κθ
A(φ) = − 2 2 log + (ψ(φ) + γ(φ))T
σ 2ψ(φ)
2ζ(φ)(1 − eψ(φ)T V0
B(φ)
2ψ(φ) − (ψ(φ) − γ(φ)(1 − e−ψ(φ)T )
C(φ) = iφ(x0 + rT ) (4.11)
1
ζ(φ) = − (φ2 + iφ)
p2
ψ(φ) = γ(φ)2 − 2σ 2 ζ(φ)
γ(φ) = κ − ρσφi

This characteristic function can then be evaluated to approximate by the Fourier transform:

N
eαku X − i2π (j−1)(u−1) ibvj η
e FcT (vj ) 3 + (−1)j − δj−1 ,

Ck (ku ) = e N (4.12)
π j=1 3

where the following parameters have been chosen to reach a compromise between accuracy
and computational time :

vj = η(j − 1)
c
η=
N
c = 600
N = 4096 (4.13)
π
b=
η
2b
ku = −b + (u − 1), u = 1, 2, ..., N + 1
N

One strength of this numerical method is that it evaluates call prices for a vector of strike
prices with many terms in common in the calculations.

4.1.3 GPR Implementation

Due to the curse of dimensionality with high dimensional input parameter space, random
combinations are sampled uniformly over the ranges given in Table 4.1. For each combi-
nation of parameters, the price of the call option is calculated by the FFT method. The
training set is then given as input to the GPR model as X = [K, T, r, κ, ρ, θ, η, ν0 ]. The
test set is constructed by sampling uniformly new inputs over a smaller range, which is
specified in Table 4.1.
4.2 American Options 19

Variable Training Set Test Set

K 40% → 160% 50% → 150%
T 11M % → 1Y % 11M % → 1Y %
r 1.5% → 2.5% 1.5% → 2.5%
κ 1.4 → 2.6 1.5 → 2.5
ρ −0.85 → −0.55 −0.8 → −0.6
θ 0.45 → 0.75 0.5 → 0.7
η 0.01 → 0.1 0.02 → 0.1
ν0 0.01 → 0.1 0.02 → 0.1

Table 4.1: Training and test set for the Heston model

Number of points MAE AAE

1000 1.859e-2 5.413e-3
5000 1.396e-2 4.574e-3
10000 1.147e-2 4.791e-3
20000 8.372e-2 2.253e-3

Table 4.2: Fit Performance for vanilla European call for different number of points

The empirical results are summarized in Table 4.2. We see that the accuracy increases as
the training set becomes larger. But by doing so, this increases the computational load.
Nevertheless, there are other factors that can be used to improve the accuracy without
increasing the amount of data points. One can indeed decrease the number of parameters
by fixing a few ones in order the reduce the dimensionality, or reduce the width of the
training set since a higher density gives a better accuracy. An other interesting result
from the regression is the speedup performance of the predictions compared to the FFT
calculations. For an input of 10000 points we observe a speedup of 16 times.

4.2 American Options

American options, in contrast to European options, can be exercised before they expire.
Therefore their pricing becomes an optimal stopping problem, and therefore there is
generally no closed form solution for this kind of option.

4.2.1 Binomial Tree

Tree based methods are often used to price these options by mapping price movements
of the underlying security. These price movements are represented by a grid of equally
spaced time steps, with a series of nodes at each step indicating the price of the security
and of the option.

At each node the security movements are translated by a chosen probability of going up or
20 4.2 American Options

Figure 4.1: Binomial tree

down, then the option price is evaluated, and finally discounted back to obtain the price
at the first node. The advantage of tree methods is that not only they can be used to
value just about any type of option, but also that they can be easily implemented. We see
that through this method the price of a European option converges to the Black-Scholes
price. Here, the valuation of American options is done by assessing the profitability of
early exercise at every node.

One popular tree method is the binomial model, often called CRR tree [see 8]. In this
tree, at each step, the price of the underlying instrument will move up or down by a
specific value, respectively u and d. The value of these factors depends on the underlying
volatility σ and the time duration of a step t = T /n such as we have:
√
u = exp σ t
√ 1 (4.14)
d = exp σ t =
u

e(r−q)t −d
with the corresponding probabilities of ups and downs being: p = u−d
and 1 − p.

We can see here that the CRR tree is recombinant, that is to say that any path having
the same amount of ups and downs will end up in the same node. We can see such a tree
in Figure 4.1.

In the case of American option, the exercise value is evaluated at each node, which for a
put option is:
max [K − S, 0] (4.15)

Then, starting an iterative operation from the final node, the binomial value is calculated
such as:
Ct−∆t,i = er∆t (pCt,i + (1 − p)Ct,i ) , (4.16)

where Ct,i is the option value for the ith node at time t. But since early exercise is possible
4.3 Barrier Options 21

Variable Training Set Test Set

K 40% → 160% 50% → 150%
T 11M % → 1Y % 11M % → 1Y %
r 1.5% → 2.5% 1.5% → 2.5%
σ 0.005 → 0.55 0.1 → 0.5

Table 4.3: Training and test set for the Heston model

Number of points MAE AAE

1000 1.249e-3 5.838e-4
5000 1.054e-3 4.406e-4
10000 8.206-4 3.467e-4
20000 7.516-4 3.394e-4

Table 4.4: Fit Performance for American put for different number of points

in the case of American options, the value at each nodes becomes:

Ct−∆t,i = max [ExerciseV alue, BinomialV alue]

(4.17)
= max K–St,j , er∆t (pCt,i + (1 − p)Ct,i ]

4.2.2 GPR Implementation

The Gaussian process model is fitted on a training set from values obtained from a binomial
tree for an American put option with time step of one business day. The input data is fed
to the model as X = [K, T, r, σ] and is uniformly sampled over the ranges in Table 4.3.
The predictive performance is summarized in 4.4. In terms of time performance, the GPR
has the advantage of not being impacted by the number of time steps. Therefore, the
speedup becomes even greater as the maturity increases. For 10000 data points and one
year maturity, the GPR performs more than 60 times faster.

4.3 Barrier Options

Barrier option is a common type of path dependant exotic option. Therefore, unlike a
vanilla option, the payoff is depending on the underlying not only at expiry, but also on a
set of observable times. Here specifically, a barrier option payoff is determined whether or
not the price of the underlying crosses a certain level, or potentially several ones. Most of
the time these options are cheaper than vanilla ones since the contract is more probable
to become worthless.

Two general types of barrier options exist, which are knock-out option where the payoff is
null if the barrier is crossed, and knock-in option where in contrary the contract becomes
22 4.3 Barrier Options

valid once the barrier is crossed. Here we will focus on a down-and-in barrier call option,
which means the contract, the right to buy, is granted once a lower barrier than the initial
underlying price is crossed.

For this type of option, increasing the absolute difference between the initial price and
the barrier level is negatively correlated to the option price, since with other parameters
staying the same, the probability to hit the barrier decreases. As the difference decreases,
the price converges with the vanilla equivalent. The volatility is also an important factor
in the pricing, as a higher volatility leads to a higher probability to hit the barrier, and
therefore for the knock-out is positively correlated to the option price.

4.3.1 Discretization
The price of the barrier is influenced by the frequency of the monitoring. There is a closed
formula for continuous monitoring for several barrier option types. For a down-and-out
put option with strike price K, barrier Sb and maturity T , under the Black-Scholes model,
we have [see 13]:

2λ 2λ−2
−qt Sb −rt Sb p
Cdown-and-in = S0 e N (y)–Ke N (y − σ (t)) (4.18)
S0 S0

with:

r − q + σ 2 /2
λ= 2
σ2
Sb (4.19)
log S0 K √
y= √ + λσ T
σ T

In reality, however, prices are monitored discretely and there exist a discrepancy between
the previous analytical solution and the actual price. This shift can be determined by
a set of factor which are the monitoring frequency, the asset volatility and a constant
β ≈ 0.5826 [see 3].

This shift can be explained that by decreasing the amount of observe events, the probability
of hitting the barrier decreases, and we can expect the price of a down-and-out to also
decreases as the number of observation decreases.

4.3.2 Monte Carlo paths simulation

As it is often the case for financial derivatives with path dependent payoffs, a set of
simulations solutions are used in order to price them. Monte Carlo is a popular solution
among these, as it is relatively easy to implement and gives a lot of flexibility regarding
4.3 Barrier Options 23

the simulated data.

The main concept behind Monte Carlo simulations is to estimate some parameter:

θ = E[g(X)], (4.20)

where X is a random variable and g an arbitrary function. It consists in generating a

sample containing a certain amount of independent draws from a chosen density function.
Then, the estimator is given as:
n
1X
θ̂ = g(Xi ) (4.21)
n i=1

Then, by the law of large numbers, we have that for when n tends to infinity then the
estimator converges in probability to the actual value of the estimated parameter. If we
note the sample variance as:

n
2 1 X 2
s = g(Xi ) − θ̂ (4.22)
n − 1 i=1

Then the central limit theorem says that as n tends to infinity:

θ̂ − θ
p ∼ N (0, 1) (4.23)
s/n

And therefore, by using the corresponding quantile Tables we can appreciate the accuracy:

s s
P θ̂ − z1−α/2 √ < θ < θ̂ + z1−α/2 √ ≈1−α (4.24)
n n

Therefore the precision of the estimator is directly linked to the standard error √sn . In
order to increase the speed of convergence of the estimator for a set of draws, variance
reduction techniques may be used, such as antithetic variates or control variates. Very
often in the case of Monte Carlo simulations, the price of the underlying asset is supposed
to behave as a geometric Brownian motion:

dSt = µSt dt + σSt Wt (4.25)

Then, by applying Ito’s lemma:

σ2

d log St = µ− dt + σdWt (4.26)
2
24 4.3 Barrier Options

By integrating we then obtain:

t
σ2
Z
St = S0 exp (µ − )t + σ dW (τ ) (4.27)
2 0

We can then discretize over the time interval with small time steps, and by using the
properties of standard wiener processes we have:

σ2 √

St+dt = St exp (µ − )dt + σ dt (4.28)
2

where ∼ N (0, 1) is a standard normal random variable. This last equation can be used
for generating paths to price the barrier option.

The problem becomes harder when we allow the volatility to be stochastic. Different
discretization schemes can be used in this regard. One of them is the quadratic exponential
method, which was introduced by Anderson in 2006 [see 1] and is very accurate when
approximating a stochastic volatility, typically in the case of the Heston model. To
represent a non-central Chi squared distribution, it uses a combination of a quadratic
function of a standard Gaussian variable for the larger values, and an exponent function
for the lower segment of the distribution. The segment of large value can be approximated
by the following formula:
V (t) = a(b + Zv )2 , (4.29)

where Zv is a standard normal distributed random variable, a and b being parameters to

determine by moment matching, such as:

m
a=
1 + b2
p p
b2 = 2ψ −1 − 1 + 2 2ψ −1 2ψ −1 − 1
s2
ψ= (4.30)
m2
m = θ + (V0 − θ)κdt
V0 η 2 eκdt θη 2
s2 = (1 − eκdt ) + (1–eκdt )2
−κ −2κ

The segment of lower values can be approximated by using the inverse transform sampling
method such as :
V (t) = L−1 (Uv ), (4.31)
4.3 Barrier Options 25

where Uv is a uniform random variable. In our case, the inverse transform is defined by:

β −1 log( 1−p
−1 1−Uv
) p < Uv ≤ 1
L (Uv ) = (4.32)
0 0 ≤ Uv ≤ p,

where:

ψ−1
p=
ψ+1 (4.33)
1−p
β=
m

Applying Ito’s formula to the Heston model dynamics, the underlying stock price, for
s < t, can be expressed as:
Z t Z tp
1
St = Ss exp (r − V (u)du + V (u)dW ) (4.34)
s 2 s

This can be further decomposed into:

Z t Z t Z tp Z tp
1 p
2
log(St ) = log(Ss ) + rdu − V (u)du + ρ V (u)dW1 + 1 − ρ V (u)dW2
s 2 s s s
(4.35)

By integrating the variance process we obtain:

Z t Z tp
Vt = Vs + κ(θ − V (u))du + η V (u)dW1 , (4.36)
s s

which is equivalent to :

Z tp Rt
Vt –Vs + κθ∆t − κ s
V (u)du
V (u)dW1 = , (4.37)
s η

with ∆t = t − s. Additionally, by doing a mid-point discretization on the integral of the

variance process on a small increment we have:
Z t
∆t
V (u)du ≈ (Vs + Vt ) . (4.38)
s 2
26 4.3 Barrier Options

Variable Training Set Test Set

K 40% → 160% 50% → 150%
H 0.55 → 0.99 0.6 → 0.95
T 11M % → 1Y % 11M % → 1Y %
r 1.5% → 2.5% 1.5% → 2.5%
κ 1.4 → 2.6 1.5 → 2.5
ρ −0.85 → −0.55 −0.8 → −0.6
θ 0.35 → 0.75 0.4 → 0.7
η 0.01 → 0.16 0.02 → 0.16
ν0 0.01 → 0.16 0.02 → 0.16

Table 4.5: Training and test set for the Monte Carlo simulations

By inserting these results in the log price we get:

κρ 1
log(St ) = log(Ss ) + r∆t − (Vs + Vt )∆t − (Vs + Vt )∆t
2η 4
ρ p p p (4.39)
+ (Vt − Vs + κθ∆t) + 1 − ρ2 Vs + Vt ∆t/2Z,
η

where Z is a standard random variable. Then, in order to generate sample path, the
process is the following:

1. Given Vt , evaluate m, s2 and ψ,

2. Make a random uniform draw Uv ,

3. If ψ is inferior or equal to an arbitrary value (set to 1/2 by Andersen), get a large

segment value for Vt , else a small segment,

4. Compute log(St+∆t ) from the last equation

4.3.3 GPR Implementation

The Gaussian process model is fitted on a training set from values obtained from path
simulations for an down-and-out where the Heston model is used to describe the market.
The Monte Carlo values are obtained from 100 000 simulated paths and are using a time
step of one business day. The input data is fed to the model as X = [K, H, T, r, κ, ρ, θ, η, ν0 ]
and is again uniformly sampled over the ranges in Table 4.5. Since such simulation is very
time consuming, the speedup from the GPR is considerable, and over 1000 times faster.
4.3 Barrier Options 27

Number of points MAE AAE

1000 1.793e-1 7.644e-2
2000 9.473e-2 4.991e-2
3000 6.441e-2 4.357e-2
4000 5.172e-2 3.866e-2
5000 3.919e-2 3.723e-2

Table 4.6: Fit Performance for the Barrier option for different number of points
28

5 Conclusion
We showed how Gaussian process regression is a logical extension of a Bayesian linear
regression by bringing more alternatives and flexibility. We presented several applications
under which this type of regression is performing well, such as curve fitting, summarizing
surfaces and pricing estimation from different models. An important factor in the procedure
is the choice of kernel function and the optimization of its hyperparameters. Since the
prediction process depends on the inversion of large covariance matrices, performing
algorithm are required to carry out these calculations. As we have seen, due to an
exponentially increasing amount of points required in higher dimensions, a clever choice
in the input data is necessary. However, since the learning process or fitting is only need
to be completed once, the prediction process is much less affected by these limitations,
and showed that it even performs faster than several pricing techniques. These methods
and techniques can be implemented to a much broader range of applications, which makes
Gaussian process regression a promising tool.
References 29

References
[1] Andersen, L. (2007). Efficient simulation of the Heston stochastic volatility model.
Journal of Computational Finance, 11.

[2] Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities.
Journal of Political Economy, 81(3):637–654.

[3] Broadie, M., Glasserman, P., and Kou, S. (1997). A continuity correction for discrete
barrier options. Mathematical Finance, 7(4):325–349.

[4] Carr, P., Stanley, M., and Madan, D. (2001). Option valuation using the Fast Fourier
Transform. Journal of Computational Finance, 2.

[5] Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical
issues. Quantitative Finance, 1(2):223–236.

[6] Cox, J. C., Ingersoll, J. E., and Ross, S. A. (1985). A theory of the term structure of
interest rates. Econometrica, 53(2):385–407.

[7] Cox, J. C. and Ross, S. (1976). The valuation of options for alternative stochastic
processes. Journal of Financial Economics, 3(1-2):145–166.

[8] Cox, J. C., Ross, S. A., and Rubinstein, M. (1979). Option pricing: A simplified
approach. Journal of Financial Economics, pages 229–264.

[9] Heston, S. L. (2015). A Closed-Form Solution for Options with Stochastic Volatility
with Applications to Bond and Currency Options. The Review of Financial Studies,
6(2):327–343.

[10] Mercer, J. (1909). Functions of positive and negative type, and their connection
with the theory of integral equations. Philosophical Transactions of the Royal Society,
London, 209:415–446.

[11] Mondal, M., Alim, M., Rahman, M., and Biswas, M. H. A. (2017). Mathematical
analysis of financial model on market price with stochastic volatility. Journal of
Mathematical Finance, 07:351–365.

[12] Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning.
Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, USA.

[13] Rubinstein, M. and Reiner, E. S. (1991). Breaking down the barriers. Risk Magazine,
4(8):28–35.

[14] Spiegeleer, J. D., Madan, D. B., Reyners, S., and Schoutens, W. (2018). Machine learn-
ing for quantitative finance: fast derivative pricing, hedging and fitting. Quantitative
Finance, 18(10):1635–1643.

[15] Woodbury, M. A. (1950). Inverting modified matrices. Statistical Research Group,

Memo. Rep. 42, Princeton University, Princeton, N. J.

Stat 331 Course Notes
No ratings yet
Stat 331 Course Notes
79 pages
Bayesian Statistical Methods (Brian J. Reich, Sujit K. Ghosh)
No ratings yet
Bayesian Statistical Methods (Brian J. Reich, Sujit K. Ghosh)
288 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Financial Econometrics Notes: Kevin Sheppard University of Oxford
No ratings yet
Financial Econometrics Notes: Kevin Sheppard University of Oxford
612 pages
Basic Financial Econometrics PDF
No ratings yet
Basic Financial Econometrics PDF
167 pages
Delta Surface Methodology - Implied Volatility Surface by Delta
No ratings yet
Delta Surface Methodology - Implied Volatility Surface by Delta
9 pages
Velocity Analysis of Linkages
67% (3)
Velocity Analysis of Linkages
23 pages
Erp Practice Exam 2
100% (5)
Erp Practice Exam 2
60 pages
Applications and Mathematical Derivation of Options Greeks From First Principle
No ratings yet
Applications and Mathematical Derivation of Options Greeks From First Principle
27 pages
Gaussian Processes For Machine
No ratings yet
Gaussian Processes For Machine
62 pages
An Intuitive Tutorial To Gaussian Processes Regression: Jie Wang Ingenuity Labs Research Institute
No ratings yet
An Intuitive Tutorial To Gaussian Processes Regression: Jie Wang Ingenuity Labs Research Institute
19 pages
Glmext4 Preview
No ratings yet
Glmext4 Preview
27 pages
Bayesian Inference On Change Point Problems
No ratings yet
Bayesian Inference On Change Point Problems
71 pages
Gaussian Processes For Regression: A Tutorial
No ratings yet
Gaussian Processes For Regression: A Tutorial
7 pages
Dynamic Strategies With Machine Learning
No ratings yet
Dynamic Strategies With Machine Learning
94 pages
Adv Stat Inf
No ratings yet
Adv Stat Inf
194 pages
Manual GPML
No ratings yet
Manual GPML
51 pages
Stats 2
No ratings yet
Stats 2
6 pages
Preface VII Mathematical Notation Xi Contents Xiii
No ratings yet
Preface VII Mathematical Notation Xi Contents Xiii
6 pages
1
No ratings yet
1
130 pages
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
No ratings yet
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
86 pages
Financial Econometrics 2010-2011
100% (1)
Financial Econometrics 2010-2011
483 pages
Stat 378
No ratings yet
Stat 378
73 pages
Tutorial
No ratings yet
Tutorial
11 pages
Stats 1
No ratings yet
Stats 1
6 pages
Fine CMT All
No ratings yet
Fine CMT All
355 pages
Eco No Metrics
No ratings yet
Eco No Metrics
1,045 pages
Week 6
No ratings yet
Week 6
34 pages
Modern Time Series: Description, Prediction and Causality: ©2023 Neil Shephard
No ratings yet
Modern Time Series: Description, Prediction and Causality: ©2023 Neil Shephard
221 pages
Xiii Xiv Contents: 2 Probability Distributions 67
No ratings yet
Xiii Xiv Contents: 2 Probability Distributions 67
6 pages
The Use of Gaussian Processes in System Identification
No ratings yet
The Use of Gaussian Processes in System Identification
13 pages
MFML
No ratings yet
MFML
154 pages
Financial Econometrics Notes
No ratings yet
Financial Econometrics Notes
610 pages
EcmAll PDF
No ratings yet
EcmAll PDF
266 pages
Lecture Notes in Financial Econometrics (MSC Course) : Paul Söderlind 13 June 2013
No ratings yet
Lecture Notes in Financial Econometrics (MSC Course) : Paul Söderlind 13 June 2013
348 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
LectureNotes 480
No ratings yet
LectureNotes 480
192 pages
Applied Robust Statistics-David Olive
No ratings yet
Applied Robust Statistics-David Olive
588 pages
Applied Robust Statistics
No ratings yet
Applied Robust Statistics
532 pages
Applied Robust Statistics 2005 PDF
No ratings yet
Applied Robust Statistics 2005 PDF
532 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
MEP Final Thesis Qianqian Chen
No ratings yet
MEP Final Thesis Qianqian Chen
73 pages
Prause 1999
No ratings yet
Prause 1999
168 pages
Essays On Variational Bayes in Econometrics PHD Thesis: Paponpat Taveeapiradeecharoen
No ratings yet
Essays On Variational Bayes in Econometrics PHD Thesis: Paponpat Taveeapiradeecharoen
169 pages
CSC 446 Lecture Notes
No ratings yet
CSC 446 Lecture Notes
61 pages
Understanding The Impact of Heteroscedasticity On The Predictive Ability of Modern Regression Methods
No ratings yet
Understanding The Impact of Heteroscedasticity On The Predictive Ability of Modern Regression Methods
54 pages
Ebook Econometrics
No ratings yet
Ebook Econometrics
1,006 pages
Gaussian Regression
No ratings yet
Gaussian Regression
207 pages
Econometric S
No ratings yet
Econometric S
1,341 pages
Xiii Xiv Contents: 2 Probability Distributions 67
No ratings yet
Xiii Xiv Contents: 2 Probability Distributions 67
6 pages
Lecture6 2015
No ratings yet
Lecture6 2015
36 pages
Training and Inference For Deep Gaussian Processes
No ratings yet
Training and Inference For Deep Gaussian Processes
66 pages
Gaussian Process Intuitive
No ratings yet
Gaussian Process Intuitive
17 pages
Financial Applications of Gaussian Processes and Bayesian Optimization
No ratings yet
Financial Applications of Gaussian Processes and Bayesian Optimization
42 pages
Econometría
No ratings yet
Econometría
43 pages
2021 - Creel - Econometrics (Githuib Book)
No ratings yet
2021 - Creel - Econometrics (Githuib Book)
1,060 pages
Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models
No ratings yet
Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models
68 pages
Applied Quantitative Finance 5
100% (6)
Applied Quantitative Finance 5
423 pages
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
No ratings yet
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
10 pages
Support Vector Machines For Classification and Regression: Steve R. Gunn
No ratings yet
Support Vector Machines For Classification and Regression: Steve R. Gunn
66 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Derivatives
No ratings yet
Derivatives
12 pages
Section 4.6 Integration by Substitution
No ratings yet
Section 4.6 Integration by Substitution
6 pages
MATH 115: Lecture XIII Notes
No ratings yet
MATH 115: Lecture XIII Notes
3 pages
Theoretical Physics: Course Codes: Phys2325/Phys3150
No ratings yet
Theoretical Physics: Course Codes: Phys2325/Phys3150
36 pages
(Dresdner Klein Wort, Bossu) Introduction To Volatility Trading and Variance Swaps
100% (1)
(Dresdner Klein Wort, Bossu) Introduction To Volatility Trading and Variance Swaps
56 pages
MSC Mathematics
100% (1)
MSC Mathematics
32 pages
Angel One LTD.: Applica On Number: Applica On Type : New Kyc Modifica On KYC
No ratings yet
Angel One LTD.: Applica On Number: Applica On Type : New Kyc Modifica On KYC
32 pages
WSO Resume 119861
No ratings yet
WSO Resume 119861
1 page
Differentiate With Respect To X (A) (B) E: Working: Answers
No ratings yet
Differentiate With Respect To X (A) (B) E: Working: Answers
18 pages
Mathematical Finance - 2023 - Raval - The Log Moment Formula For Implied Volatility
No ratings yet
Mathematical Finance - 2023 - Raval - The Log Moment Formula For Implied Volatility
20 pages
Inverse Function: Definitions
100% (1)
Inverse Function: Definitions
11 pages
Morphology: The - Sleep - Walk - Ing - Albatross - Chant - Ed - A - Dream - y - Lullaby
No ratings yet
Morphology: The - Sleep - Walk - Ing - Albatross - Chant - Ed - A - Dream - y - Lullaby
34 pages
Ch01 Forwrd and Futures, Options
No ratings yet
Ch01 Forwrd and Futures, Options
34 pages
Commodity Derivatives Market in India: Development
No ratings yet
Commodity Derivatives Market in India: Development
10 pages
Math 1920 Review
No ratings yet
Math 1920 Review
7 pages
Business English Level1
100% (1)
Business English Level1
108 pages
Rossbeck 021611
100% (1)
Rossbeck 021611
26 pages
Analysis Cheat Sheet
No ratings yet
Analysis Cheat Sheet
4 pages
Index Models For Portfolio Selection: by Cheng Few Lee Joseph Finnerty John Lee Alice C Lee Donald Wort
No ratings yet
Index Models For Portfolio Selection: by Cheng Few Lee Joseph Finnerty John Lee Alice C Lee Donald Wort
34 pages
Commodities For Dummies
No ratings yet
Commodities For Dummies
43 pages
Nov 2003 and May 2004 CE Licensure Exam - MATHandSURV
No ratings yet
Nov 2003 and May 2004 CE Licensure Exam - MATHandSURV
3 pages
4 The Arbitrage Theorem
No ratings yet
4 The Arbitrage Theorem
26 pages
Physics430 Lecture04
No ratings yet
Physics430 Lecture04
22 pages
Guess Method
No ratings yet
Guess Method
20 pages
Unit-6.PDF Engg Math
No ratings yet
Unit-6.PDF Engg Math
56 pages
(2012) Currency Derivatives and The Disconnection Between Exchange Rate Volatility and International Trade
No ratings yet
(2012) Currency Derivatives and The Disconnection Between Exchange Rate Volatility and International Trade
28 pages

Computational Finance

Uploaded by

Computational Finance

Uploaded by

U.U.D.M.

Project Report 2020:2

Gaussian Process Regression In

Keywords – Machine learning; Gaussian process; Derivative pricing; Volatility surface

2 Financial Greek Example - Gamma 11

3 Summarizing Volatility Surface 13

Figure 1.1: Bi-variate Gaussian

1.1 Multivariate Gaussian

• The sum of jointly Gaussians is Gaussian

• An affine transformation of a Gaussian is Gaussian

• The marginal distribution of a multivariate Gaussian is Gaussian

• The conditional distributions of a Gaussian are Gaussians

A graphical illustration of these two properties is shown in Figure 1.2.

1.2 Bayesian Regression

1.2.1 The linear approach

If we were to take the linear Bayesian approach, we would have f as:

p(y|X, w)p(w) likelihood · prior

This leads to a Gaussian for the posterior distribution of the form:

1.2.2 Linear to non linear

by expanding A = σn2 XX T + Σ−1

1.2.3 The kernel trick

φ(x)T Σp Φ(x0 ) = Σ1/2 1/2 0

1.2.4 The choice of kernel

A commonly used covariance function is the quadratic exponential kernel:

1.3 Gaussian Process

f ∼ N (m(x), K(x, x0 )) (1.16)

(f∗ | X∗ , X, f ) ∼ N (K(X∗ , X)K(X, X)−1 f,

we can go through the following steps to draw function samples:

• Find a matrix A such as Σ = AAT , which is possible by using the Cholesky

• Generate a vector (Z1 , ..., Zn ) of independent standard variables,

1.4 Regression with noisy observations

as we get y = f (x) + . The prior on the noisy observation then becomes:

cov(y) = K(X, X) + σn2 In . (1.20)

Then, the joint prior that we saw before becomes:

and conditioning the joint Gaussian prior distribution on the observations:

(f∗ | X∗ , y, f ) ∼ N (K(X∗ , X)[K(X, X) + σn2 In ]−1 y,

1.5 Choice of the hyper parameters

1.6 Incorporating a non zero mean function

f∗ = m(X∗ ) + K(X∗ , X)Ky −1(y − m(X)), (1.25)

However, it is hard and unusual to define a deterministic mean. It is generally more

g(x) = f (x) + h(x)> β, (1.26)

g(x) ∼ GP h(x)> b, k(x, x0 ) + h(x)> Bh(x0 )

X∗ = H∗> β + K∗> Ky −1(y + H > β)

β = (B −1 + HKy−1 H > )−1 (HKy−1 y + B −1 b)

1.7 Comparison with Kernel Ridge Regression

An implementation difference is that while performing a regression with a Gaussian

2 Financial Greek Example - Gamma

Payoff = max(ST − K, 0) (2.1)

which under this model satisfies the following equation:

2.1 Curve Fitting

2.2 Surface Fitting

Grid Size MAE AAE

3 Summarizing Volatility Surface

the tenor of the option.

Figure 3.1: Implied volatility surface fit with GPR

4.1 European Options

4.1.1 Heston Model

where, with x = log(St ):

4.1.2 Fast Fourier Transform

inverse Fourier transform of an integrable function f being:

cT (k) = eαk CT (k), (4.7)

e−T FCT (φ − (α + 1)i)

FCT (φ) = eC(T −t,φ)+D(T −t,φ)φ+iφxT . (4.9)

An explicit function with practical use for FFT is:

FCT (φ) = exp (A(φ) + B(φ) + C(φ)) , (4.10)

2ψ(φ) − (ψ(φ) − γ(φ))(1 − eψ(φ)T

4.1.3 GPR Implementation

Variable Training Set Test Set

Number of points MAE AAE

4.2 American Options

4.2.1 Binomial Tree

Figure 4.1: Binomial tree

Variable Training Set Test Set

Number of points MAE AAE

in the case of American options, the value at each nodes becomes:

Ct−∆t,i = max [ExerciseV alue, BinomialV alue]

as we get y = f (x) + . The prior on the noisy observation then becomes: