Bayesian Estimator
Bayesian Estimator
Bayesian Estimators
Alok Gupta∗ Christoph Reisinger†
February 7, 2012
Abstract
We consider a general calibration problem for derivative pricing mod-
els, which we reformulate into a Bayesian framework to attain posterior
distributions for model parameters. It is then shown how the posterior
distribution can be used to estimate prices for exotic options. We apply
the procedure to a discrete local volatility model and work in great de-
tail through numerical examples to clarify the construction of Bayesian
estimators and their robustness to the model specification, number of cal-
ibration products, noisy data and misspecification of the prior.
1 Introduction
Since the model proposed by Black and Scholes in their seminal 1973 paper
[10], the variety and complexity of financial models has grown dramatically.
Typically, agents will want to use a model to price or hedge an instrument in
the market. But before they can do this, they must first mark the model to
market — that is, calibrate the model to observable prices. Most commonly,
vanilla instruments such as European calls or puts are used. This calibration is
necessary to avoid introducing arbitrage into the market by making the agent
vulnerable to other agents creating riskless profits from the first agent’s incorrect
prices.
In the original Black-Scholes model, there is one scalar volatility parame-
ter to be estimated. In contrast, in some of the commonly used models today,
entire functions have to be calibrated, which raises questions not just of nu-
merical complexity but of identifyability of the model from a restricted set of
observations (market data) and their robust and stable estimation.
Calibration can be classed as an inverse problem: a parametrised model
has been specified, we then observe market prices and try to find the model
parameter which gives those prices. Put abstractly, the calibration problem
often fails one or more of Hadamard’s criteria for well-posedness (see for example
[17]), which are:
∗ MathematicalInstitute, University of Oxford, [email protected]
† MathematicalInstitute, University of Oxford, [email protected]
A. Gupta acknowledges financial support from the UK Engineering and Physical Sciences
Research Council (EPSRC) and Nomura International plc under CASE and PhD Plus
Awards. The work was completed while C. Reisinger was a Visiting Fellow at the Isaac
Newton Institute for Mathematical Sciences, University of Cambridge, during the programme
Inverse Problems.
1
i. For all admissible data, a solution exists.
ii. For all admissible data, the solution is unique.
iii. The solution depends continuously on the data.
We assume the first criteria is true, i.e. there exists a parameter for which the
model reproduces market prices, for otherwise our model is poorly designed and
introduces arbitrage into the market. In reality, the calibration instruments, to
which we try to mark our model, are only observable in the market upto some
bid-ask spread, which is the interval of values between what an agent is willing
to pay for the instrument (at the lower end) and what an agent is willing to sell
the instrument for (at the upper end).
The latter two conditions are not certain to be satisfied by the calibration
problem. It is clear that picking the wrong solution (by condition ii.) or choosing
a solution that is not stable (by condition iii.) can have disastrous effects on
the pricing and hedging of an instrument.
For illustration, we consider here the local volatility model, in which the
underlying asset price S is assumed to follow
dSt /St = µ dt + σ(St , t) dZt , (1)
where the drift µ is the expected growth rate and the volatility σ a function
of both the asset price and time, and Z is a standard Brownian motion. The
function σ(·, ·) is a priori unknown and must be inferred from observed prices.
Dupire [15] derives an explicit formula, expressing this function in terms of
European call prices and their sensitivities with respect to strike and maturity.
Hindering the direct practical application of the formula, it requires prices for
a continuum of strikes and maturities, which are not quoted in reality, making
additional assumptions necessary, in practice by interpolation and discretisation.
The formula illustrates two fundamental facts: that a discrete set of observation
prices is not sufficient to pin down the functional parameter, and even if a
continuum of prices was available, the solution of the inverse problem is unstable.
To address the difficulties of calibrating the local volatility model, authors
such as Jackson et al. [26], Lagnado & Osher [28] and Coleman et al. [12]
have developed minimisation techniques and penalty functions for finding the
‘best-fit’ local volatility surface with a certain regularity. Further analysis of
these methods and their improvement has been detailed by Chiarella et al. [11].
Crepey [14] and Egger & Engl [16] show that a carefully (Tikhonov) regularised
problem is well-posed in the above sense, and rates for the convergence of the
regularised solution can be derived for vanishing data noise and regularisation
parameter. This approach has been extended to American calibration options
by Achdou and Pironneau [1, 2].
Choosing a regularising functional restricts the solution to a more well-
behaved class, but the resulting solution does not contain information on its
uncertainty.
Given that financial models typically make very specific assumptions on the
processes they are describing, we would like to add robustness as signifying
iv. insensitivity to small deviations from the assumptions
to our list of desiderata. In the context of this study on financial model cal-
ibration, the main assumptions made are: the assumed class of models which
2
are being calibrated; approximation of these models, e.g. via discretisation of a
local volatility function; any regularising assumptions, e.g. smoothing penalty
terms; the choice of calibration instruments. As any financial market model will
only be an approximation to the true data generating process, the main require-
ment of a robust method is that the predictions made from the estimated model
chosen out of the ‘wrong’ model class are a sufficiently accurate approximation.
We will assess this property by numerical tests in synthetic (i.e. the calibration
data are generated by a model) and real markets later on.
A different viewpoint is taken by Avellaneda et al. [5, 6] and Lyons [29].
Rather than imposing a detailed discription of the term-structure and leverage
of instantaneous variance, only an upper and lower bound is assumed. Upper
and lower price bounds for derivatives (the sub- and superhedging prices) are
obtained by solving a stochastic control problem, where quoted market prices
form contraints and can narrow the no-arbitrage price bands for other contracts
considerably. Using relative entropy regularisation introduced by Avellaneda et
al. [4], Samperi [32] shows that the infimum of a regularised error function is
continuous (in fact, differentiable) with respect to calibration prices. Extending
this to uncertain volatility function bounds, He et al. [25] obtain more realistic
bid and ask prices than for constant bounds.
The Bayesian approach of this paper demonstrates a shift in philosophy of
the aforementioned approaches. Acknowledging that the calibration problem is
ill-posed, we no longer focus on finding a best-fit solution, but we are interested
in finding a distribution of solutions. The essential idea behind the Bayesian
approach is to begin with some prior distribution for the unknown parameter
and update this distribution using the observable market prices to give a poste-
rior distribution for the model parameter. So instead of finding a model which,
in some measure, best replicates prices, we seek all models which sufficiently
replicate prices to within a pre-decided tolerable level of error. This is not dis-
similar from uncertain parameter models so far, and related to an approach that
more recently Hamida & Cont [24] have adopted as part of their investigation
into model risk, to obtain a spread of possible prices of exotic options which
are all consistent with those of calibration options. Where this paper differs
from [24] is by recasting the problem into a Bayesian framework, while [24] use
a prior distribution only to generate initial populations for an evolutionary opti-
misation algorithm. The Bayesian approach to calibration has been used before
by authors such as Jacquier & Jarrow [27] in Black-Scholes models, Bhar et
al. [9] for the calibration of instantaneous spot and forward interest rates, and
Monoyios [30] in the context of drift parameter uncertainty. The specification of
a prior corresponds to the regularising penalty in Tikhonov regularisation and
opens the possibility of incorporating prior information in a rigorous framework,
although this does mean that the impact of prior assumptions has to be assessed
critically.
In this paper we concentrate on providing a practical method for constructing
prior and likelihood functions and on exploring the robustness of the Bayesian
posteriors, especially with the view towards pricing exotic options using Bayesian
estimates. This is to be seen as “proof of concept” for a challenging example (of a
high-dimensional parameter), and improvements to the computational strategy
would aid the practical application in this setting. A marked advantage of the
Bayesian approach, which this work highlights, is that the posterior distribution
can be translated into price spreads for derivatives, in the spirit of the model
3
uncertainty measures in [13, 24]. We also demonstrate by case studies that the
Bayesian mean yields reliable predictions of exotic derivative prices, which are
more robust than those based on parameters obtained by Tikhonov regularisa-
tion, and much more accurate also than suggested by model uncertainty bounds
not taking into account information of the posterior distribution.
The paper is divided as follows. In Section 2 we formalise the calibration
problem and review relevant results from Bayesian theory. In Section 3, we
discuss the construction of the prior and likelihood function as applied to the
local volatility model. Section 4 provides details of its discretisation and on
the tailoring of Metropolis sampling to this application, and give calibration
examples in Section 5. Finally in Section 6 we present some case studies which
demonstrate the robustness of the proposed calibration method for pricing other
contracts. Section 7 concludes.
4
and write their relation to the ‘true’ prices (identical to the model prices with
‘true’ parameter θ∗ ) by
(i) (i) (i)
Vt = ft (θ∗ ) + et (2)
(i)
with additive noise {et : i ∈ It }. A possible interpretation of (2) is that there
is an underlying true model, unknown to the observer, under which the market
is complete and arbitrage-free (i.e. derivatives can be hedged perfectly knowing
the model); the bid-ask spread reflects the model uncertainty which causes the
buyer/seller to demand a risk premium. The existence of a true model is not
necessary for the definition of the calibration procedure in this paper as such,
as long as the observed market prices are attainable within the class of assumed
models subject to the assumed noise. It would clearly become relevant if we
were to address questions of consistency of Bayesian estimators and hedging
based on those parameters.
Then p(V |θ), the probability of observing the data V given θ, is determined
by the distribution of the noise e and is called the likelihood function. We will
discuss a complete specification of the noise in 3.2.
An application of Bayes rule gives that the posterior density of θ is given by
If the noise is modelled such that observations only have positive likelihood if
the model price lies within the bid-ask spread, we can turn this around to say
that any parameter with positive posterior density gives model prices for the
calibration options within the bid-ask spread.
The estimator
θM AP (V ) = arg max{p(θ|V )},
θ∈Θ
the maximum a posteriori (MAP) estimator, is the value which maximises the
posterior density. A family of estimators θL (V ) can be defined as
Z
′
θL (V ) = arg min L(θ, θ ) p(θ|V ) dθ ,
′θ Θ
L(θ, θ′ ) = 0 if θ′ = θ,
L(θ, θ′ ) > 0 if θ′ 6= θ.
The minimiser θL (V ) is not necessarily unique. L1 (θ, θ′ ) = kθ−θ′ k22 gives the
Bayes estimator θL1 (Y ) = E[θ|V ], which is the mean value of θ with respect to
the Bayesian posterior density p(θ|V ). The MAP estimator does not correspond
to a non-negative bounded loss function.
We discuss possible interpretations of the result of such a calibration further.
In an ideal world, the following is given: even if we do not a priori know
the true model (data generating process), we know that the model belongs to
a certain class of models, parametrised by θ, and the observations allow us to
differentiate between the true model and any other model from this class.
In the reality of financial markets, a class of candidate models can only be
assumed. Most certainly, the observed data are generated by a process outside
5
the assumed model class. It can therefore not be expected that the estimator
reproduces the values of all financial instruments outside the class of calibration
instruments exactly (or within bid and ask). A robust estimator will have the
property that if the true model is in some sense close to the assumed class of
models, predictions made from the estimated parameter, say values of exotic
derivatives or hedge parameters, will be close to the value of those derivatives
under the true model and can be hedged accurately by a trading strategy based
on the estimated parameter.
In the example of the local volatility model, θ is an infinite dimensional
(functional) parameter. This means on the one hand that any finite number of
observations will be insufficient to identify the parameter exactly. Moreover, one
will only be able to compute a finite-dimensional (discretised) approximation.
So even if the market is governed by a local volatility model, this function will
almost certainly lie outside, albeit close to, the assumed (computable) class of
local volatility models, which is necessarily part of a finite-dimensional space.
The relation of the number and type of parameters to the number and type
of calibration instruments then becomes relevant. If the number of data is
smaller than the number of parameters, the parameter will generally be under-
determined and regularisation, here via the Bayesian prior, favours particular
parameters over others. As Wasserman [33] remarks, we should desire that the
Bayesian posterior is not dominated and led astray by the priors, so care has to
be taken with its construction.
There are two routes to increasing the available information in this context:
to observe the value of different financial derivatives, e.g. vanilla options with
different strikes and maturities, and/or to observe the prices of the same finan-
cial derivatives at different times, in which case past calibrations provide prior
information for re-calibrations. We will later give examples for both. As the
number of calibration products, and/or the re-calibration frequency, increase,
a relevant property of Bayesian estimators is consistency. Ghosal [21] points
out that consistency is crucially important for parameter estimation since the
violation of consistency puts serious doubts against inferences based on the
inconsistent posterior distribution. We note that there is a vast body of liter-
ature on Bayesian consistency in a more general and more advanced setting,
see e.g. [22] and the references therein. It ensures that the estimator converges
to the ‘true’ parameter, if such a ‘true’ parameter exists, i.e. on the assump-
tion that the data are indeed generated by the assumed model. This cannot be
assumed to be the case in financial applications.
We discuss these points by numerical illustrations in Sections 5 and 6, after
introducing the construction of Bayesian posteriors and their numerical realisa-
tion in the following two sections.
6
previously introduced serves as a useful rigorous framework.
where λ̃ is a constant which quantifies how strong our prior assumptions are: a
higher value of λ̃ indicating greater confidence in our assumptions. 1/λ̃ can be
thought of as the prior variance of θ. From (3) we see that those θ which better
satisfy prior beliefs have greater density.
To illustrate a possible choice of norm k · k, we continue our example of the
local volatility model. In light of the assumptions presented earlier, we choose
volatility functions from the set
{σ ∈ H 1 (R2+ ∩ K), σ > 0 a.e., k log σk1,K < ∞ for all compactly contained K},
where
kuk2κ = (1 − κ)kuk20 + κk|∇u|k20 , (5)
7
∂ ∂
and here ∇ = ∂t , ∂S is the grad operator, k · k0 is the standard L2 norm on
a suitable domain, and κ ∈ (0, 1) is a pre-specified constant.
Using the logarithm penalises σ approaching zero. The first part of the
norm is to ensure greater prior density is attached to σ that are closer to the
ATM volatility. The second part ensures that the volatility is locally linear
in its arguments — once discretised on a grid, this will ensure that the prior
covariance of volatilities at neighbouring grid nodes approaches one if the grid
points are close.
where the wi are pre-specified weights summing to one, and are chosen depend-
ing on relative volumes likely to be traded (so, for instance, at-the-money calls
or puts are weighted more heavily).
8
Then we shall only attach positive Bayesian posterior weight to parameters
θ which on average reproduce prices to within the average basis point bid-ask
spread. In other words, we will attach positive likelihood under θ only if
G(θ) ≤ δ 2 , (8)
where δ 2 = i∈I wi δi2 is the pre-specified average basis point square-error tol-
P
erance. Additionally, the smaller the value of G(θ), the more likely the obser-
vation. Hence, for the Bayesian likelihood we will take
where λ = δ 2 λ̃. From this, estimates for θ and other predictions can be derived
as discussed in Section 2.
Observe that maximising the posterior (10) is equivalent to minimising the
expression
λkθ − θ0 k2κ + G(θ) (11)
over the set {θ : G(θ) < δ 2 }, and (11) is precisely the form of function authors
such as Lagnado & Osher [28] and Jackson et al. [26] seek to minimise to find
their optimal calibration parameter. This shows how the Bayesian approach
reformats and generalises traditional Tikhonov regularisation methods into a
unified framework, as is already noted by Fitzpatrick [18].
The posterior density, however, contains more information than the MAP
estimator and we will use this in Section 6 for the robust pricing of further
options.
4 Numerical Method
This section outlines the numerical approach leading to samples from the pos-
terior distribution.
9
4.1 Parameter Discretisation and the Value Function
We first restrict θ to a finite-dimensional space, and represent the local volatility
surface σ(S, t) by a grid of nodes whose positions are given by: Smin = s1 <
. . . < sj < . . . < sJ = Smax in the spatial direction and 0 = t1 < . . . < tl <
. . . < tL = tmax in the temporal direction. Following the ordering convention
σj+(l−1)J = σ(sj+1 , tl ), the discrete representation of σ(S, t) is defined by the
parameter vector
θ = (log σ1 , . . . , log σm , . . . , log σM ),
and a spine interpolant Θ(·, ·) of θ. We emphasise this dependence by writing
σ(·, ·; θ).
For each time tl we construct the unique natural cubic spline through the
nodes (s1 , tl ), . . . , (sJ , tl ) to give all values Θ(S, tl ). Then for (S, t) ∈ [sj , sj+1 ] ×
[tl , tl+1 ] the value of Θ(S, t) is found by linear interpolation of the two values
Θ(S, tl ) and Θ(S, tl+1 ). Then σ(S, t) = exp(Θ(S, t)). By interpolating the
logarithm of the volatility and then exponentiating we ensure that σ > 0.
With this discretisation, the norm function in (4) can be written as
kΘ − θ0 k2κ = (θ − θ0 )T C(θ − θ0 ),
where θ0 = log(σatm ) and C is the inverse covariance matrix induced by the
norm. This follows because the spline basis coefficients are linear in the nodal
values θ, and the squared Sobolev norms of the splines are quadratic in the
splines. With the norm given by (5), C is non-singular so for the sake of con-
vention write A−1 ≡ C.
This is similar to the approach taken in [26].
Finally, to calculate the likelihood value (9) for each θ, using (7), we must
(i)
price all calibration options, say European call options, ft (θ) for i ∈ It , using
the model parameter θ. For the local volatility model, we follow the method
of [24] of solving the Dupire PDE [15] with appropriate boundary conditions:
∂f ∂f K 2 σ 2 (K, T ; θ) ∂ 2 f
+ K(r − d) − =0 ∀K, T ≥ 0,
∂T ∂K 2 ∂K 2
f (S, K, 0) = (S − K)+ ∀K ≥ 0.
To solve this PDE numerically, we use a Crank-Nicolson finite difference scheme
to give all the prices for the range of K and T simultaneously. This is compu-
tationally more efficient than solving Black-Scholes PDEs (in S and t) for all
combinations of K and T separately.
10
say {θ1 , . . . , θn }. Then {θi : G(θi ) ≤ δ 2 } is a set of samples from p(θ|V ) given
by (10).
We now concentrate on generating samples from g(θ|V ). To do this we
will use the Markov Chain Monte-Carlo (MCMC) Metropolis algorithm which
proceeds as follows (see [19] for further detail):
θr−1 otherwise.
where ξ ∼ N (0, IM ) and du is the step size of our random walk. The value of
du is chosen so that the acceptance rate of jumps is close to the optimum value
of 23% found by Gelman et al. [19].
The values for n, m, b, k, du for each numerical example are given in Ap-
pendix A.
11
W , as follows:
m m
n X 1 X 2
B= (v .j − v .. )2 and W = s ,
m − 1 j=1 m j=1 j
where
n m n
1X 1 X 1 X
v .j = vij v .. = v .j s2j = (vij − v .j )2 .
n i=1 m j=1 n − 1 i=1
which tends to 1 as n → ∞. If the PSRF is high, a lot greater than 1.1 for
example, then it is likely that continuing simulation will improve inferences
based on the target distribution.
The estimate works because B usually overestimates posterior variance as-
suming the starting distribution is overdispersed. Whereas W usually under-
estimates the posterior variance because within chain samples have not had
sufficient time to range over all the target distribution. But the longer we run
the chain, the closer B gets to W and so the closer the ratio B/W gets to 1 and
the PSRF estimate for v given by (13) goes to 1 also.
See [20] or [19] for further references on PSRF values.
5 Calibration Examples
Using raw data cited by other papers, we attempt to calibrate the local volatility
model. We use a Markov Chain Monte-Carlo (MCMC) Metropolis algorithm to
sample the posterior distribution of calibrated parameters.
12
2. We take real S&P 500 implied volatility data with 10 strikes and 7 matu-
rities used in [12] to determine the prices of 70 corresponding European
call options. The spot price of the underlying at time 0 is S0 = 590, the
interest rate is r = 0.060 and dividend rate d = 0.026. The prices are
given in Appendix A.
s = 2500, 4000, 4500, 4750, 5000, 5250, 5500, 7000, 10000, (14)
t = 0.0, 0.5, 1.0,
13
Maturity Strike (units of S0 )
0.80 0.90 0.94 0.98 1.00 1.02 1.06 1.10 1.20
0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.003 1.039 1.140
0.167 1.000 1.000 1.000 1.000 1.000 1.000 1.001 1.022 1.156
0.250 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.012 1.160
0.500 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.001 1.071
0.750 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.009
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.001
Table 1: For the simulated dataset: PSRF values for the calibration call prices
(using [26]).
Figure 1: For the simulated dataset, using Metropolis Sampling, 479 surfaces
from the posterior distribution were sampled and are plotted with the same
degree of transparency. The true surface is plotted in opaque black.
14
Figure 2: For the simulated dataset: using the 479 sampled surfaces (where
λ = 1), 95% and 68% confidence intervals are found pointwise. The true surface
is plotted in opaque black.
surface is almost captured within the 68% confidence interval and completely
captured by the 95% confidence interval. This result could, for example, be
used to find lower and upper volatility function bounds for implementation of
the uncertain volatility model studied by He et al. [25]. Similarly, one could
deduce confidence intervals for the integrated variance and use conservative
hedging as proposed by Mykland [31]. Using only marginal distributions of
the volatility values at specific points in time and asset values, however the
dependence structure of the values on the surface, encoded in the posterior
distribution, is lost.
15
the Bayesian mean calculation (15) at the first calibration time, the ‘Bayesian
(i)
weights’ are y0 = 1/N for each surface θi for i = 1, . . . , N — hence in Figure 1
all plotted surfaces have the same degree of transparency. However, after the
first recalibration, the new Bayesian mean calculation for a function f will be
N
(i)
X
y1 f (θi )
i=1
(1) (N )
for some Bayesian weights y1 , . . . , y1 summing to 1. However, the weights are
no longer equal. And to reflect this in Figure 3 we have varied the transparency
of the plotted surfaces to reflect the weight. A surface with greater Bayesian
weight will be more opaque.
Figure 3 shows that after about 5 weeks, the Bayesian posterior has settled
and only a handful of surfaces have significant weight. Moreover, these surfaces
are close to the true surface (plotted in opaque black). At recalibration time
tk , say, the section of the local volatility surface corresponding to {σ(S, t) : 0 ≤
t < tk } no longer contributes to the observed calibration prices so the section
{σ(S, t) : 0 ≤ t < tk }, for small tk , is very different to the true surface for some of
the heavily weighted surfaces. This is especially noticeable in the wings for very
small S and very large S. We must remember that we have only sampled the
Bayesian posterior and hence if none of our samples is the true surface (which
it is not) then we will never settle on this true surface, but settle on the closest
few, as Figure 3 shows. Nevertheless, using the proxy updating procedure, we
still see a clear sense of convergence to the true surface.
Table 2: For S&P 500 dataset: PSRF values for the calibration call prices
(using [12]).
Figure 4 gives a plot of 600 samples from the posterior (so satisfied G(θ) ≤ δ 2 ,
this time for δ = 4.5 basis points). Again we see that, especially in the wings
and for short times, the spread of volatilities is enormous.
16
(a) week 1 (b) week 2 (c) week 3
Figure 3: For the simulated dataset: a path is simulated on the true local
volatility surface and the Bayesian posterior is updated using the newly observed
prices each week for 12 weeks. The transparency of each surface reflects the
Bayesian weight (see main text) of the surface. The true surface is plotted in
opaque black.
17
Figure 4: For S&P 500 dataset: using Metropolis Sampling, 600 surfaces from
the posterior distribution were found and are plotted (where λ = 1).
where f is the pricing function and θ1 , . . . , θN are the surfaces found by Metropo-
lis sampling. Note that, because the parameters θ1 , . . . , θN are samples, the
Bayesian weighting of each in the sum (15) is 1/N rather than p(θi |V ).
We assess this price estimate, where possible, against the true value f (θ∗ ) as
priced on the correct (assumed) surface θ∗ , where the bid-ask spread is estimated
by [f (θ∗ ) − δS0 /104 , f (θ∗ ) + δS0 /104 ].
A short remark on arbitrage is due here. If the market under the model with
parameter θ∗ is assumed complete, any price of the new contract different from
the model price under θ∗ could potentially be arbitraged (i.e. someone could
make a risk-free profit by dynamically trading the underlying and a bond) by
someone who knew the true parameter. Based on the information contained
in the (noisy) calibration prices alone (bid-ask spreads), however, prices with
positive posterior weight cannot normally be arbitraged.
18
The MAP price is taken to be f (θM AP ), where θM AP is the sample that has
greatest posterior density, i.e. p(θM AP |V ) ≥ p(θi |V ) for all i = 1, . . . , N . Note
that the MAP price does not correspond to the maximum density of f (θi ), but
that of θi . We noted earlier that the MAP parameter corresponds to the value
calculated on the surface which gives the smallest regularised calibration error
and is therefore identical to the classical Tikhonov solution.
The MAP estimator we use will not maximise the posterior density (10)
precisely, due to the finite sample size. However, we observe for the example
of Section 5.2 that our MAP estimator gives a weighted average basis point
calibration error (7) of 1.84 for 66 prices, compared with the 2.65 for 10 prices
achieved by Jackson et al. [26]. Hence, our density sampling has found a MAP
surface which gives optimisation comparable to that found by papers using
different optimisation routines.
In Figure 5 we highlight this pricing method on the example of an up-and-
out barrier call option. The barrier option is path dependent and, as such, much
more sensitive to changes in the local volatility surface. In the graph we plot
the Bayesian posterior probability density of prices, true, MAP, Bayes prices
and estimated bid-ask spread of the true price.
0.25
pdf
Bayes
MAP
true
0.2 bid
ask
posterior probability
0.15
0.1
0.05
0
80 82 84 86 88 90 92 94 96 98 100 102
price
Figure 5: For simulated dataset: prices for up-and-out barrier call option strike
5000 (S0 = 5000), barrier 5500 and maturity 3 months. Included are the true
price (found on the true surface) with an assumed bid-ask spread of 6 basis
points as per 5.1, the MAP/Tikhonov price, and the Bayes price with its asso-
ciated posterior pdf of prices.
Figure 5 shows that the barrier price obtained with the surface with the
smallest regularised calibration error for European options can lie many basis
points away from the true price. The Bayesian price on the other hand reflects
the entire distribution and the incorporated prior information (i.e. regularisa-
tion) to give a much closer price, which lies well within the bid-ask spread.
Recall that the Bayesian posterior we calculate depends on, besides the data
and model used, the form of the (subjective) prior and the level of the data
noise. In the following subsections, we conduct some robustness tests on the
19
datasets studied in the previous sections to quantify how much the solution
varies with respect to model assumptions and changes in both the form of the
prior and the observed prices.
20
0.25 0.25
pdf pdf
Bayes Bayes
MAP MAP
true true
bid bid
0.2 ask 0.2 ask
posterior probability
posterior probability
0.15 0.15
0.1 0.1
0.05 0.05
0 0
80 82 84 86 88 90 92 94 96 98 100 102 80 82 84 86 88 90 92 94 96 98 100 102
price price
(a) s = 4700, 4800, 4900, 5000, 5100, 5200, 5300 (b) s = 3500, 4000, 4500, 5000, 5500, 6000, 8000
Figure 6: Same plots as Fig. 5, for different placement of the spline knots, with
the lowest (s = 2500) and highest (s = 10000) as in (14), but with higher (a)
and lower (b) density around the spot.
105
100
95
price
90
True Price
85 27 Knots Bayes Price
27 Knots MAP Price
56 Knots Bayes Price
80 56 Knots MAP Price
108 Knots Bayes Price
108 Knots MAP Price
75
32 64 128 256 512 1024
number of calibration options
Figure 7: For the simulated data set: Prices for an up-and-out barrier call option
with strike 5000 (S0 = 5000), barrier 5500 and maturity 3 months for different
number of calibration options and different number of spline knots.
21
the prior.
0.35
pdf
Bayes
MAP
true
0.3 bid
ask
0.25
posterior probability
0.2
0.15
0.1
0.05
0
80 82 84 86 88 90 92 94 96 98 100
price
Figure 8: Prices for an up-and-out barrier call option with strike 5000
(S0 = 5000), barrier 5500 and maturity 3 months for different parameters
κ ∈ {10−2.00 , 10−1.75 , 10−1.50 , 10−1.25 , 10−1.00 , 10−0.75 , 10−0.50 , 10−0.25 , 10−0.1 },
corresponding to different priors. Included is the true price with its bid-ask
spread, MAP prices, and Bayes prices with Bayesian posterior pdfs of prices.
The thickness of the lines increases with κ, their length corresponds to the
maximum of the corresponding posterior, so they can be optically identified.
22
test quantitatively, next we plot the graphs for the same experiments but for
the case where we hold κ = 10−1 fixed and instead add Gaussian noise e with
different standard deviation ε to the observed market prices quoted in Appendix
A. We consider noise levels of
and run the calibration procedure for 100 independent noise additions for each
ε. These levels compare to assumed bid-ask data of ±δ = ±3 · 10−4 ≈ ±10−3.5
as per Section 5.1. For each value of ε we plot in Figure 9 the 100 MAP
and Bayes estimates of the price and posteriors for the same barrier options
previously used. First of all we see that as the noise is increased the closeness
of the distributions of prices deteriorates and for ε = 10−2.5 few surfaces have
been calibrated so the distributions become non-smooth and irregular. The
MAP prices are even more sensitive to the noise and can miscalculate the price
by up to 10-15%. In contrast, the Bayes prices prove to be very robust. For
the barrier option for ε = 10−3.5 only one out of a hundred Bayes estimates is
slightly outside the bid-ask spread and for ε = 10−2.5 only a handful of Bayes
estimates fall slightly below the bid price.
We also conducted tests varying the weights of calibration options in the er-
ror functional, and again the Bayes average was very insensitive to these changes
while the MAP estimator moved by an amount equalling several bid-ask spreads.
23
0.35
pdf
0.35
Bayes
pdf
MAP Bayes
true
0.3 bid 0.3 MAP
ask true
bid
0.25 0.25 ask
posterior probability
posterior probability
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0
80 82 84 86 88 90 92 94 96 98 100 0
price 80 82 84 86 88 90 92 94 96 98 100
price
0.35 0.35
pdf pdf
Bayes Bayes
MAP MAP
true true
0.3 bid 0.3 bid
ask ask
0.25 0.25
posterior probability
posterior probability
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
80 82 84 86 88 90 92 94 96 98 100 80 82 84 86 88 90 92 94 96 98 100
price price
Figure 9: Prices for up-and-out barrier call option with strike 5000 (S0 = 5000),
barrier 5500 and maturity 3 months. Each graph corresponds to a different
value of ε and shows the estimators for 100 different noise additions. Included
is the true price (found on the true surface) with its bid-ask spread, the MAP
price, and the Bayes price with its associated posterior pdf of prices.
0.35
32 Knots pdf
32 Knots Bayes Price
0.3 32 Knots MAP Price
66 Knots pdf
66 Knots Bayes Price
66 Knots MAP Price
0.25
128 Knots pdf
posterior probability
0.15
0.1
0.05
0
11.8 12 12.2 12.4 12.6 12.8 13 13.2 13.4 13.6 13.8
price
Figure 10: For S&P 500 dataset: prices for American put option with strike $590
(S0 = $590) and maturity 1 year. Included are the MAP prices and the Bayes
average prices with the associated posterior pdfs of prices, for three different
spline bases for the local volatility with 32, 66 and 128 knots, respectively.
24
13.5
13
price
12.5
Figure 11: For S&P 500 dataset: prices for American put option with strike $590
(S0 = $590) and maturity 1 year for different number of calibration options and
different number of spline knots.
for formulating the prior and likelihood functions necessary for the Bayes proce-
dure and applied it to the case of the local volatility model. Numerical examples
were presented and showed the improvement in pricing of the Bayesian proce-
dure over common maximum a posteriori methods. Moreover, we highlighted
the robustness of the pricing method to inaccuracies in the model and prior,
and mispricings in observed market data.
The Bayes average is more robust than the MAP in the examples consid-
ered, often significantly so. This comes at the expense of high computational
cost. While the MAP estimator is equivalent to a Tikhonov regularised solution
and could be found by any of a number of efficient deterministic optimisation
algorithms (see, e.g., [12]), the Bayes average requires information of the whole
high-dimensional posterior distribution.
The flip side of this is that the Bayesian posterior density p(θ|V ) can be
used as the basis for a variety of further useful analysis. A natural thing to do
would be to use the posterior to derive a measure for the model uncertainty of a
contract. For any payoff, a distribution of prices can be found (as we showed in
Section 6 for American and barrier options), and this distribution can be used
to assign a model uncertainty value to the contract in the spirit of [13] and [23].
Such measures would be important for a risk manager and for an agent trying
to decide between different products.
A second, perhaps more important, use of the Bayesian posterior would be
to use it to develop better hedging strategies. This is more fundamental than
pricing as typically a trader will be more interested in the hedging strategy (the
price of which will then correspond to the trader’s price for the contract) than
a stand-alone price. The technique described in this article gives accurate and
robust prices, but it is not immediately clear which model (parameter) should
be used for hedging. One possible way to use the posterior density for hedging
would be to deduce prediction sets for the spot volatility or integrated vari-
ance, and then hedge conservatively within these sets. This corresponds to the
25
approaches proposed in [5] and [31] respectively. In particular, Mykland finds
conservative bid and ask prices as the superreplication cost under the assump-
tion that the prediction set is realised. In an alternative approach, motivated by
the analysis of this paper, the Bayesian loss functions introduced in Section 2
could be designed to correspond to hedging losses so that the Bayes estimator
is that parameter θ which minimises the expected hedging loss.
References
[1] Y. Achdou. An inverse problem for parabolic variational inequalities in the
calibration of American options. SIAM Journal of Control, 43(5):1583–
1615, 2005.
[3] A. Apte, M. Hairer, A.M. Stuart, and J. Voss. Sampling the posterior:
An approach to non-Gaussian data assimilation. Physica D: Nonlinear
Phenomena, 230(1-2):50–64, 2007.
[5] M. Avellaneda, A. Lévy, and A. Paras. Pricing and hedging derivative secu-
rities in markets with uncertain volatilities. Applied Mathematical Finance,
2:73–88, 1995.
[8] A. Beskos and A. Stuart. MCMC methods for sampling function space.
In Sixth International Congress on Industrial and Applied Mathematics:
Zurich, Switzerland, July 16-20, 2007, page 337. European Mathematical
Society, 2009.
[9] R. Bhar, C. Chiarella, H. Hung, and W.J. Runggaldier. The volatility of the
instantaneous spot interest rate implied by arbitrage pricing: A dynamic
Bayesian approach. Automatica, 42(8):1381–1393, 2006.
[10] F. Black and M. Scholes. The pricing of options and corporate liabilities.
Journal of Political Economy, 81(3):637, 1973.
26
[12] T.F. Coleman, Y. Li, and A. Verma. Reconstructing the unknown local
volatility function. In Quantitative analysis in financial markets: collected
papers of the New York University Mathematical Finance Seminar, page
192, 2001.
[13] R. Cont. Model uncertainty and its impact on the pricing of derivative
instruments. Mathematical Finance, 16(3):519–547, 2006.
[16] H. Egger and H.W. Engl. Tikhonov regularization applied to the inverse
problem of option pricing: convergence analysis and rates. Inverse Prob-
lems, 21(3):1027–1045, 2005.
[19] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin. Bayesian Data Anal-
ysis. Chapman & Hall/CRC, 2nd edition, 2004.
[20] A. Gelman and D.B. Rubin. Inference from iterative simulation using mul-
tiple sequences. Statistical Science, pages 457–472, 1992.
[22] S. Ghosal and A. Van Der Vaart. Convergence rates of posterior distribu-
tions for noniid observations. Annals of Statistics, 35(1):192, 2007.
[23] A. Gupta, C. Reisinger, and A. Whitely. Model uncertainty and its im-
pact on derivative pricing. In K. Böcker, editor, Rethinking Risk Manage-
ment and Reporting: Uncertainty, Bayesian Analysis and Expert Judge-
ment. Risk Books, 2010.
[24] S.B. Hamida and R. Cont. Recovering volatility from option prices by
evolutionary optimization. Journal of Compuational Finance, 8(4), 2005.
[25] C. He, T.F. Coleman, and Y. Li. Calibrating volatility function bounds for
an uncertain volatility model. Journal of Computational Finance, 13(4),
2010.
[26] N. Jackson, E. Süli, and S. Howison. Computation of deterministic volatil-
ity surfaces. Journal of Computational Finance, 2(2):5–32, 1999.
27
[28] R. Lagnado and S. Osher. A technique for calibrating derivative security
pricing models: numerical solution of an inverse problem. Journal of Com-
putational Finance, 1(1):13–25, 1997.
[29] T.J. Lyons. Uncertain volatility and the risk-free synthesis of derivatives.
Applied Mathematical Finance, 2:117–133, 1995.
[30] M. Monoyios. Optimal hedging and parameter uncertainty. IMA Journal
of Management Mathematics, 18(4):331, 2007.
[31] P.A. Mykland. Financial options and statistical prediction intervals. Ann.
Statist., 31:1413–1438, 2003.
28
Maturity Strike (units of S0 )
0.80 0.90 0.94 0.96 0.98 1.00 1.02 1.04 1.06 1.10 1.20
0.083 1.003 0.507 0.312 0.219 0.136 0.070 0.030 0.010 0.003 0.000 0.000
0.167 1.010 0.518 0.332 0.246 0.168 0.104 0.058 0.029 0.012 0.001 0.000
0.250 1.012 0.531 0.352 0.270 0.196 0.132 0.083 0.048 0.025 0.004 0.000
0.500 1.029 0.577 0.414 0.337 0.265 0.200 0.146 0.102 0.068 0.024 0.000
0.750 1.052 0.623 0.469 0.396 0.327 0.264 0.208 0.160 0.119 0.059 0.004
1.000 1.079 0.671 0.525 0.457 0.390 0.329 0.274 0.224 0.180 0.110 0.021
Table 4: For the simulated dataset: European call prices (units of 103 ) (using
[26]).
Table 6: For the S&P 500 dataset: European call prices ($) (using [12]).
29