Derivatives Pricing Via Machine Learning
Derivatives Pricing Via Machine Learning
http://www.scirp.org/journal/jmf
ISSN Online: 2162-2442
ISSN Print: 2162-2434
Department of Accounting, Questrom School of Business, Boston University, Boston, MA, USA
1
1. Introduction
Theoretical and empirical finance research involves the evaluation of conditional
expectations, which, in a continuous time jump-diffusion setting, can be related
to second order partial integral differential equations of parabolic type (PIDEs)
by the Feynman-Kac theorem, and other types of equations such as backward
stochastic differential equations with jumps (BSDEJs) or quasi-linear PIDEs in
more complicated settings. In theoretical continuous-time finance, many prob-
lems, such as asset pricing with market frictions, dynamic hedging or dynamic
portfolio-consumption choice problems, can be related to Hamil-
ton-Jacobi-Bellman (HJB) equations via dynamic programming techniques. The
HJB equations, from another perspective, are equivalent to BSDEs derived from
a probabilistic approach. The nonlinear BSDEs, studied in [2], can be decom-
posed into a sequence of linear equations, which can be solved by taking condi-
tional expectations, via Picard iteration. For empirical studies, the focus of the
literature has been the evaluation of the cross sectional conditional risk-adjusted
expected returns and the explanation of them using factors. See [3] [4] and [5] as
good illustrations. It is easily seen that, regardless of the fact whether the under-
lying models are continuous-time or discrete-time, evaluating conditional ex-
pectations is inevitable in finance literature. Moreover, in order to perform XVA
computations for the measurement of counterparty credit risk, we need to eva-
luate the conditional expectations, i.e., the derivative prices, on a future simula-
tion grid, as outlined in [6]. These facts call for efficient methods to compute the
quantities aforementioned.
In this paper, we extend the basis function expansion approach proposed in [1]
with machine learning techniques. Specifically, we propose new efficient me-
thods to evaluate conditional expectations, regardless of the dynamics of the
underlying stochastic process, as long as they can be simulated. Rigorous con-
vergence proofs are given using Hilbert space theory. The methodologies can be
applied to time zero pricing as well as pricing on a future simulation grid, with
the advantage of ANN approximation most prominent in high dimensional
problems. In the sequel, we show applications of our methodologies on the
pricing of European derivatives and extension to contracts with optimal stop-
ping feature is straightforward through either [1] approach or reflect-
ed-BSDEs.
Compared to the literature on traditional stochastic analysis, our methodolo-
gies are able to handle large data sets and high-dimensional problems, therefore
suffering much less from the curse of dimensionality due to the nature of ANN
methods. Moreover, our methodologies are very efficient when evaluating solu-
tions of BSDEJs and PIDEs on a future simulation grid, where none of the tradi-
tional methodologies applies. With respect to recent machine learning literature
on numerical solutions to BSDEs and PDEs, our methodologies enjoy the theo-
retical advantage of being able to handle equations with jump-diffusion and
convergence results are provided. When applied to the solutions of BSDEJs and
PIDEs, our methodologies require much less number of parameters, as com-
pared to the current machine learning based methods to be mentioned below. At
any step in the solution process, only one ANN is needed and we do not require
nested optimization. In terms of application, not all the prices of OTC deriva-
tives can be easily translated into BSDEJs and PIDEs, for example, a range ac-
crual with both American and barrier (knock-out, for example) feature. Howev-
er, our methodologies are naturally suitable in those situations. To conclude, our
methods enjoy many theoretical and empirical advantages, which makes them
attractive and novel.
There has been a huge literature on applications of machine learning tech-
niques to financial research. Classical applications focus on the prediction of
market variables such as equity indexes or FX rates and the detection of market
anomalies, for example, [7] and [8]. Option pricing via a brute-force curving fit-
ting by ANNs dates back to [9]. More applications of machine learning in
finance, especially option pricing prediction, are surveyed in [10]. See references
therein. Pricing of American options in high dimensions can be found in [11],
which is closest to our method 1. However, there are several improvements of
our methods compared to this reference. First of all, we enable deep neural net-
work (DNN) approximation and show convergence. Second, we can incorporate
constraints in DNN approximation estimation and prove the mathematical va-
lidity of this approach. Third, we propose two more efficient methods to com-
plement the first method of ours. Our treatment of constraints in the estimation
of DNNs extends the work of [12] in that we can deal with a larger class of con-
straints by specifying a general Hilbert subspace as the constrained set. Risk
measure computation using machine learning can be found in [13]. Applications
of machine learning function approximation on financial econometrics can be
found in [14], [15], [16] and [17]. Recent applications include empirical and
theoretical asset pricing, reinforcement learning and Q-learning in solving dy-
namic programming problems such as optimal investment-consumption choice,
option pricing and optimal trading strategies construction, e.g., [18], [19], [20],
[21], [22], [23], [24], [25], [26], [27], [28] and references therein. Numerical
methods to solve PDEs and BSDEs or the related inverse problems can be found
in [29], [30], [31], [32], [33], [34], [35], [36], [37], [38] and [39]. Machine
learning based methods enjoy the advantage of being fast, able to handle large
data sets and high dimensional problems.
Our methodologies are combinations of traditional statistical learning theory
and stochastic analysis with advanced machine learning techniques, introduc-
ing powerful function approximation method via the universal approximation
theorem and artificial neural networks (ANNs), while preserving the regres-
sion-type analysis documented in [1]. The methods are very easy to use, effec-
tive, accurate as illustrated by numerical experiments and time efficient. They
are different from the convergent expansion method, e.g., [40], simulation
methods such as [41], [42], [43] and [44] or the asymptotic expansion method
proposed by [45], [46], [47], [48] [49] [50] [51] [52], in that we no longer
resort to polynomial basis function expansion or small-diffusion type analysis.
Our methods are also different from the pure machine learning based ones
documented in [29], [30], [31], [32], [33], [34] and [35], in that we utilize the
lead-lag regression formula to evaluate the conditional expectations, preserv-
ing the time dependent structure and our methods are able to handle
jump-diffusion processes easily.
The organization of this paper is as follows. Section 2 documents the metho-
dologies. Section 3 illustrates the usefulness of our methods by considering Eu-
ropean and American derivatives pricing. Section 4 considers numerical experi-
ments and Section 5 concludes. An outline of the proofs and other applications
can be found in the appendices.
2. The Methodology
Mathematical Setup
We use a Markov process modeled by a jump-diffusion as illustration. Sup-
pose that we have a stochastic differential equation with jumps
ψ ( x) ≤ C x .
P
(2)
∞
tisfying n ⊂ n+1 for any n ≥ 1 and n =1 n = . We have
lim n →∞ h − PROJ n hn = 0 for any h ∈ and lim n →∞ hn = h .
The next two theorems are well-known in the literature.
Theorem 7 (Hilbert Projection Theorem). Let ⊂ be two Hilbert
spaces and let x ∈ . Then, PROJ x exists and is unique. Moreover, it is
characterized uniquely by x − PROJ x∈ ORTH .
Theorem 8 (Repeated Projection Theorem). Let ⊂ ⊂ be three
Hilbert spaces. Then, for any x ∈ , PROJ x = PROJ ( PROJ x) .
Remark 9 The conditions of Theorems 7 and 8 on and can be re-
laxed to convexity and completeness instead of Hilbert sub-spaces.
Finally, we have the result below.
{n }n =1 and
∞
Theorem 10. Suppose is a Hilbert space, are Hilbert
∞
subspaces of satisfying n ⊂ n+1 and n =1 =
n ⊂ . x ∈ , define
The following theorem handles the constrained approximation and its con-
vergence.
Theorem 12 (On Constrained Approximation). Under Assumptions 6 and
11, for x ∈ =
, if h PROJ x ∈ Ψ , then, we have lim n →∞ PROJ Ψ n x = h .
Remark 13 (On ψ). In Theorem 12, the set Ψ represents prior knowledge
on constraints that h satisfies. It can be represented by a set of non-linear in-
equalities or equalities on functionals of h. Common constraints for option
pricing include non-negativity constraint and the positiveness constraint on the
second order derivatives. The verification of {Ψ n }n =1 satisfying Assump-
∞
{ [ξ ] | ξ
t T T ∈ L2 ( T ) , t [ξT ] ∈ L2 ( t ) ⊂ } t ⊂ L2 ( t ) ⊂ L2 ( T ) =
T .
Assumption 15 (On Structure of tJ ). {e } t
j
j∈Λ
is a set of elements of
∞
J =1 Λ J =Λ , satisfies Assumption 141.
Then, we have the following results.
Lemma 1. For any adapted stochastic process ξ such that ξT ∈ L2 ( T ) , if
t [ξT ] ∈ L2 ( t ) , we have
t [ξT ] arg min ( ξT − η t ) .
2
= (3)
ηt ∈L2 ( t )
(ψ ( X T ) − ξt ) .
2
t ψ ( X T ) arg min
= (4)
2
ξ ∈L ( ) t
t
this paper.
Theorem 17. Under Assumptions 1, 2, 6, 14 and 15, for any adapted stochas-
tic process ξ such that ξT ∈ L2 ( T ) and t [ξT ] ∈ L2 ( t ) , we have
L2 ( t ) t [ T ]
lim arg minJ (ξT − ηt ) =
2
ξ . (5)
J →∞ ηt ∈t
Further, for any measurable function ψ and stochastic process X such that
ψ ( X T ) ∈ L2 ( T ) and t ψ ( X T ) ∈ L2 ( t ) , we have the following equality
lim arg minJ (ψ ( X T ) − ξt ) =
2
ψ ( X T ) . (6)
J →∞ ξt ∈t L2 ( t ) t
The following results justify the universal approximation and ANN approxi-
mation approaches proposed in this paper.
Proposition 19 (On Universal Approximation Theory). Let σ denote the
function in the universal approximation theorem mentioned in [55], [56] and
[57]. Define {e= t } j =1 : { }
σ (α j + β j X t ) , where X satisfies Equation (1) and
mn
j mn
j =1
Assumption 2, α j and β j have at most n significant digits in total, where
n ∈ , i.e., n belongs to the set of natural numbers, j runs from 1 to mn and
mn is the number of all related {etj } , i.e.,
mn
= {σ (α + β X ) | α and β have at most n total significant digits} . Then,
t
H j mn satisfies Assumptions 6, 14 and 15. Therefore, Theorems 17 and
{ t } j =1 n∈
e
18 apply.
Proposition 20 (On Deep Neural Network Approximation). For the DNN
defined in ([58], Definition 1.1], observe that Wl ( x=
) α l + βl x . Define
etj := WL , j ρ WL −1, j ρ W1, j ρ ( X t ) (8)
where Wl , j (=
x ) α l , j + β l , j x satisfies that l = 1, 2, , L , (α l, j , βl , j ) have at
most n total significant digits and n ∈ . Then, H , where 1 means
{1,et } j =1 n∈
j mn
show convergence when the number of connections goes to infinity, which can
be achieved via enlarging the number of neurons in each layer with the total
number of layers remaining fixed.
Remark 22 (On Euler Time Discretization). [59] proposes an exact simula-
tion method for multi-dimensional stochastic differential equations. The discus-
sion of discretization error, of the regression approach proposed in this paper,
with Euler method is not hard if ψ satisfies Assumption 1, in which case the
dominated convergence theorem and L2 convergence of Euler method can be
applied to show the convergence.
The proofs of the above results can be found in Appendix A. In what follows,
we will propose three methods to compute, approximately, the function φ in
Proposition 16.
Method 1
In general, φ , defined in Proposition 16 and Theorem 17, can not be found in
closed-form. A natural thought would be to resort to function expansion repre-
sentations, i.e., to find the solution to the following problem
∞
2
{ ( )}
∞
where is an appropriate space for coefficients {a j , θ j }
∞
and e j θ j
({ } )
j =0 j =0
is a set of functions, with Span e j (θ j )
∞
2
dense in an appropriate function
j =0
space Φ 3. To further proceed, we seek a truncation of the function representa-
tion formula as follows
J
2
t ψ ( X T ) ≅ arg min ψ ( X T ) − ∑ a j e j ( t , X t | θ j ) (10)
{a j ,θ j } j =0 ∈J
J
j =0
for J sufficiently large, where J is a compact set in the Euclidean space where
{a j ,θ j } take values. The last step would be to use Monte Carlo simulation to
J
j =0
approximate the unconditional expectation appearing in Equations (9) and (10).
Therefore turning the conditional expectation computation problem, into a
least-square function regression problem, similar to [1]. An obvious choice of
{ ( )}
∞
ej θj is polynomial basis, for example, the set of Fourier-Hermite basis
j =0
functions. For expansion using Fourier-Hermite basis functions in high dimen-
sions, see [60].
In fact, Artificial Neural Networks (ANNs) prove to be an efficient and con-
vergent function approximation tool that we can utilize in the above expressions.
Write
(
( {a , θ } ))
2
J
t ψ ( X T ) ≅ arg min ψ ( X T ) − ANN J j j | t, X t (11)
J j =0
{a j ,θ j } j =0 ∈J
{a , θ }
J
where ANN J denotes an ANN with parameters j j .
j =0
{e (θ )}
∞
2
It is the linear space spanned by the set j
j
.
j =0
3
We should understand that distance can be defined in function space Φ .
Note that, via proper time discretization and fixed point iteration, solving a
BSDE with jumps can be decomposed into a series of evaluations of conditional
expectations. The machine learning based method outlined above can be applied
there. We will write down the algorithm to solve a general Coupled For-
ward-Backward Stochastic Differential Equation with Jumps (CFBSDEJs) in the
appendix. Extensions to other types of BSDEJs are possible.
Here we assume that X is a Markov process. To handle path dependency or
non-Markov processes, we can apply the backward induction method outlined
in [1]. With the machine learning approach, it is easy to see that this method
enables us to get the values of conditional expectations on a future simulation
grid.
Method 2
Another method to utilize the idea of [1] is inspired by the boosting random
tree method (BRT), see, [61], for example. Partition the domain space
{U }
K
is a set of disjoint sets in r and consider
K
r = k =1U tk 4, where t
k
k =1
{U }
K
k
The choice of t is important and we can use the machine learning
k =1
classification techniques (or any classification rule), such as kmeans function in
R programming language, in Monte Carlo simulation and related computations.
Denote
= dU sup x , y∈U x − y . It is possible to show that as long as
lim K →∞ max1≤ k ≤ K dU k = 0 , we only need finite number of functions, for example,
{e (θ )}
J t
{φk }k =1
j K
j , to approximate each and obtain convergence. In practice,
j =0
although the domain of X t is , it might be centered at a small subspace t ,
r
therefore facilitating the partition process. Note also that this method might re-
quire us to mollify the function ψ , if it is not smooth. We adopt finite order
Taylor expansion as the function expansion representation approach. The fol-
lowing theorems provide convergence analysis for this method.
Theorem 23. For an appropriate function space Φ , we have
φ ∈Φ
k =1
t t
(13)
K K 2
(ψ ( X T ) − φk ( t , X t ) ) 1 X ∈U k .
2
= arg min
∑ kK=1φk ( t , x )1x∈U k ∈Φ t t
t
K
lim
max1≤k ≤ K d k → 0
∑ φˆk ( t , X t ) 1X ∈U
t
k
t
− φ (t, X t ) 0
= (14)
Ut k =1 L2 ( t )
with J large enough, fixed, finite and φˆk is an approximation to φk , which sa-
tisfies
( )
φk (t , X t ) − φˆk ( t , X t ) 1 X ∈U k ≤ K
2
(15)
t t
for any k = 1, 2, , K , K ∈ , lim K →∞ K K = 0 and K is independent of k
when K is sufficiently large.
Method 3
Next, we propose an algorithm combining the ANN and universal approxi-
mation theorem (UAT). Suppose that L2 ( t ) is the space where we are per-
forming the approximation. Also assume that t W , N = t X , i.e., the information
filtration is equivalently generated by X. Define an ANN with connection N by
( )
ANN x, N , θ j , j , where x is the state variables that the ANN depends on, θ j is
the vector of parameters and j is its label. We define the following nested regres-
sion approximation
=ψ ( X T ) ANN ( X t , N ,θ1 ,1) + t1,T (16)
{∑ }
∞
ANN ( X t , N , θ j , j ) is the approximate sequence of t ψ ( X T ) .
J +1
where j =1 J =0
In this paper, we will test and compare the performance of all of the proposed
methods. A general discussion and rigorous proofs can be found in Appendix
A5.
Vt e : t ∫ Dt ,u fu du + Dt ,Tψ T
T
= (22)
t
u
− r dv
where Dt ,u := e ∫t v is the stochastic discount factor. If we assume a Markov
structure ft = f ( t , X t ) and ψ T = ψ ( X T ) , then Vt e := v e ( t , X t ) , i.e., Vt e is a
We will only show convergence of Methods 1 and 2.
5
Vt e : t ∫τ Dt ,u fu du + Dτ ,Tψ T
T
= (23)
where=τ inf v∈[t ,T ] { X v ∈ | X t ∉ } , where ⊂ r . In our setting, the dy-
namics of X can be arbitrary, possibly stochastic differential equations with
jumps, Markov chains, or even non-Markov processes. Previously, Monte Carlo
based method for option pricing can be found in [64] and [65], among others.
Here [t , T ] is the space of all the stopping times in [t , T ] . We refer the in-
terested readers to [62] and [66] for general derivation and explanation of Equa-
tion (24). It is also possible to derive the general BSDE that an American claim
price satisfies, for example [67]. Moreover, in [27] and [1], the authors utilize a
backward induction approach to solve optimal stopping problems. The idea can
be carried out using the methodologies documented in Section 2. American
claims with barrier features can be incorporated and priced in a similar way. It is
also known that American option prices can be related to reflected BSDEs
(RBSDEs), a rigorous discussion of existence and uniqueness of such equations
can be found in [68] and references therein.
4. Numerical Experiments
4.1. European Option Pricing
In this section, we consider a Heston model
dSt
rdt + ν t dWt , S0 =
= s0 (25)
St
( )
dν t = κ (θ −ν t ) dt + σ ν t ρ dWt + 1 − ρ 2 dBt , ν 0 = v0 (26)
Figure 1. QQ-plot for Method 1, τ = 0.05 and relative pricing error is 1.20%.
Figure 2. QQ-plot for Method 1, τ = 0.25 and relative pricing error is 1.50%.
Figure 3. QQ-plot for Method 1, τ = 0.45 and relative pricing error is 1.20%.
Figure 4. QQ-plot for Method 1, τ = 0.05 and relative pricing error is 1.66%.
Figure 5. QQ-plot for Method 1, τ = 0.20 and relative pricing error is 1.75%.
Figure 6. QQ-plot for Method 1, τ = 0.30 and relative pricing error is 3.00%.
Figure 7. QQ-plot for Method 2, τ = 0.05 and relative pricing error is 1.80%.
Figure 8. QQ-plot for Method 2, τ = 0.20 and relative pricing error is 3.50%.
Figure 9. QQ-plot for Method 2, τ = 0.30 and relative pricing error is 3.53%.
Figure 10. QQ-plot for Method 1, τ = 0.02 and relative pricing error is 0.40%.
Figure 11. QQ-plot for Method 1, τ = 0.05 and relative pricing error is 0.80%.
Figure 12. QQ-plot for Method 1, τ = 0.08 and relative pricing error is 0.60%.
in the context of American game options, equity swaps, and the related
Mckean-Vlasov type FBSDEJs (mean-field FBSDEJ, see [70]) are important top-
ics in mathematical finance. They are also related to the theoretical analysis of
high-frequency trading. Finding machine-learning based numerical methods to
solve these equations is of great interest to us. Last, but not least, machine learn-
ing methods in asset pricing and portfolio optimization, which can be found in
[71], [72], [73], [28], [74] and [75], admit an elegant way to price financial de-
rivatives under -measure. For example, we can use the method in [72] to ca-
librate the SDF process and use [75] to generate market scenarios. These me-
thodologies, combined with the methods documented in this paper and [1], have
the potential to solve for any derivative price. We leave all the development to
future research.
Acknowledgements
We thank the Editor and the referee for their comments. Moreover, we are
grateful to Professor Jérôme Detemple, Professor Marcel Rindisbacher and Pro-
fessor Weidong Tian for their useful suggestions.
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this pa-
per.
References
[1] Longstaff, F. and Schwartz, E. (2001) Valuing American Options by Simulation: A
Simple Least—Square Approach. The Review of Financial Studies, 14, 113-147.
https://doi.org/10.1093/rfs/14.1.113
[2] El Karoui, N., Peng, S. and Quenez, M.C. (1997) Backward Stochastic Differential
Equations in Finance. Mathematical Finance, 7, 1-71.
https://doi.org/10.1111/1467-9965.00022
[3] Adrian, T., Crump, R. and Vogt, E. (2018) Nonlinearity and Flight-to-Safety in the
Risk-Return Trade-Off for Stocks and Bonds. Forthcoming in Journal of Finance,
74, 1931-1973.
[4] Fama, E. and French, K. (1993) Common Risk Factors in the Returns on Stocks and
Bonds. Journal of Financial Economics, 33, 3-56.
https://doi.org/10.1016/0304-405X(93)90023-5
[5] Fama, E. and French, K. (2015) A Five-Factor Asset Pricing Model. Journal of Fi-
nancial Economics, 116, 1-22.
[6] Zhu, S. and Pykhtin, M. (2008) A Guide to Modeling Counterparty Credit Risk.
Working Paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1032522
[7] Aydogdu, M. (2018) Predicting Stock Returns Using Neural Networks. Working
Paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3141492
https://doi.org/10.2139/ssrn.3141492
[8] Voshgha, H. (2008) Early Detection of Defaulting Firms: Artificial Neural Network
Application; Australian Context. Working Paper.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2130505
[9] Hutchinson, J., Lo, A. and Poggio, T. (1994) A Nonparametric Approach to Pricing
and Hedging Derivative Securities via Learning Networks. Journal of Finance, 49,
851-889. https://doi.org/10.1111/j.1540-6261.1994.tb00081.x
[10] Hahn, J.T. (2013) Option Pricing Using Artificial Neural Networks: The Australian
Perspective. Ph.D. Thesis, Bond University, Queensland.
[11] Kohler, M., Krzyzak, M. and Todorovic, N. (2010) Pricing of High-Dimensional
American Options by Neural Networks. Mathematical Finance, 20, 383-410.
https://doi.org/10.1111/j.1467-9965.2010.00404.x
[12] Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C. and Garcia, R. (2009) Incorporating
Functional Knowledge in Neural Networks. Journal of Machine Learning Research,
10, 1239-1262.
[13] Eckstein, S., Kupper, M. and Pohl, M. (2018) Robust Risk Aggregation with Neural
Networks. Quantitative Finance, 1-40. https://arxiv.org/abs/1811.00304
[14] Giovanis, E. (2010) Applications of Neural Network Radial Basis Function in Eco-
nomics and Financial time Series. SSRN Electronic Journal.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1667442
https://doi.org/10.2139/ssrn.1667442
[15] Kopitkov, D. and Indelman, V. (2018) Deep PDF: Probabilistic Surface Optimiza-
tion and Density Estimation. Computer Science, 1-18.
https://arxiv.org/abs/1807.10728
[16] Luo, R., Zhang, W., Xu, X. and Wang, J. (2017) A Neural Stochastic Volatility Mod-
el. Computer Science, 1-11. https://arxiv.org/pdf/1712.00504.pdf
[17] Sasaki, H. and Hyvärinen, A. (2018) Neural-Kernelized Conditional Density Esti-
mation. Statistics, 1-12. https://arxiv.org/abs/1806.01754
[18] Weissensteiner, A. (2009) AQ-Learning Approach to Derive Optimal Consumption
and Investment Strategies. IEEE Transactions on Neural Networks, 20, 1234-1243.
https://doi.org/10.1109/TNN.2009.2020850
[19] Casgrain, P. and Jaimungal, S. (2016) Trading Algorithms with Learning in Latent
Alpha Models. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2871403
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2871403
[20] Heaton, J., Polson, N. and Witte, J. (2016) Deep Learning for Finance: Deep Portfo-
lios. Applied Stochastic Models in Business and Industry, 33, 3-12.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2838013
https://doi.org/10.2139/ssrn.2838013
[21] Samo, Y. and Vernuurt, A. (2016) Stochastic Portfolio Theory: A Machine Learning
Perspective. Quantitative Finance, 1-9. https://arxiv.org/pdf/1605.02654.pdf
[22] Jiang, Z., Xu, D. and Liang, J. (2017) A Deep Reinforcement Learning Framework
for the Financial Portfolio Management Problem. Computational Finance, 1-31.
https://arxiv.org/pdf/1706.10059.pdf
[23] Deng, Y., Bao, F., Kong, Y., Ren, Z. and Dai, Q. (2017) Deep Direct Reinforcement
Learning for Financial Signal Representation and Trading. IEEE Transactions on
Neural Networks and Learning Systems, 28, 653-664.
https://doi.org/10.1109/TNNLS.2016.2522401
[24] Halperin, I. (2017) QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds. Quan-
titative Finance, 1-34. https://arxiv.org/abs/1712.04609v2
https://doi.org/10.2139/ssrn.3087076
[25] Ritter, G. (2017) Machine Learning for Trading. Working Paper.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3015609
https://doi.org/10.2139/ssrn.3015609
[26] Xing, F., Cambrida, E., Malandri, L. and Vercellis, C. (2018) Discovering Bayesian
Market Views for Intelligent Asset Allocatio. https://arxiv.org/pdf/1802.09911.pdf
[27] Becker, S., Cheridito, P. and Jentzen, A. (2018) Deep Optimal Stopping. Mathemat-
ics, arXiv: 1804. 05394. https://arxiv.org/abs/1804.05394
[28] Gu, S., Kelly, B. and Xiu, D. (2018) Empirical Asset Pricing via Machine Learning.
31st Australasian Finance and Banking Conference 2018, Sydney, 13-15 December
2018. https://doi.org/10.3386/w25398
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3159577
[29] Weinan, E., Han, J. and Jentzen, A. (2017) Deep Learning-Based Numerical Me-
thods for High-Dimensional Parabolic Partial Differential Equations and Backward
Stochastic Differential Equations. Mathematics, 1-39.
https://arxiv.org/pdf/1706.04702.pdf
[30] Weinan, E., Hutzenthaler, M., Jentzen, A. and Kruse, T. (2017) On Multilevel Pi-
card Numerical Approximations for High-Dimensional Nonlinear Parabolic Partial
Differential Equations and High-Dimensional Nonlinear Backward Stochastic Dif-
ferential Equations. Mathematics, 1-25. https://arxiv.org/pdf/1708.03223.pdf
[31] Han, J., Jentzen, A. and Weinan, E. (2017) Overcoming the Curse of Dimensionali-
ty: Solving High-Dimensional Partial Differential Equations Using Deep Learning.
Mathematics, 1-14. https://arxiv.org/pdf/1707.02568.pdf
[32] Khoo, Y., Lu, J. and Ying, L. (2017) Solving Parametric PDE Problems with Artifi-
cial Neural Networks. Mathematics, 1-17. https://arxiv.org/pdf/1707.03351.pdf
[33] Beck, C., Weinan, E. and Jentzen, A. (2017) Machine Learning Approximation Al-
gorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and
Second-Order Backward Stochastic Differential Equations. Mathematics, 1-56.
https://arxiv.org/pdf/1709.05963.pdf
[34] Sirignano, J. and Spiliopoulos, K. (2017) DGM: A Deep Learning Algorithm for
Solving Partial Differential Equations. Mathematics, 1-31.
https://arxiv.org/pdf/1708.07469.pdf
[35] Long, Z., Lu, Y. and Ma, X. (2018) PDE-Net: Learning PDEs from Data. Mathemat-
ics, 1-17. https://arxiv.org/pdf/1710.09668.pdf
[36] Long, Z. and Lu, Y. (2018) PDE-Net 2.0: Learning PDEs from Data with a Numeric
Symbolic Hybrid Deep Network. Computer Science, 1-16.
https://arxiv.org/pdf/1812.04426.pdf
[37] Haehnel, P., Marecek, J. and Monteil, J. (2018) Scaling up Deep Learning for
PDE-Based Models. Computer Science, 1-39. https://arxiv.org/pdf/1810.09425.pdf
[38] Berg, J. and Nyström, K. (2018) Data-Driven Discovery of PDEs in Complex Data-
sets. Statistics, 1-22. https://arxiv.org/pdf/1808.10788.pdf
[39] Rudy, S., Alla, A., Brunton, S. and Nathan Kutz, J. (2018) Data-Driven Identifica-
tion of Parametric Partial Differential Equations. Mathematics, 1-17.
https://arxiv.org/pdf/1806.00732.pdf
[40] Detemple, J., Lorig, M., Rindisbacher, M. and Zhang, L. (2018) An Analytical Ex-
pansion Method for Forward Backwards to Chastic Differential Equations with
Jumps.
[41] Briand, P. and Labart, C. (2012) Simulation of BSDEs by Wiener Chaos Expansion.
The Annals of Applied Probability, 24, 1129-1171.
https://doi.org/10.1214/13-AAP943
[42] Geiss, C. and Labart, C. (2015) Simulation of BSDEs with Jumps by Wiener Chaos
[60] Prater, A. (2012) Discrete Sparse Fourier Hermite Approximations in High Dimen-
sions. Doctoral Thesis, Syracuse University, New York.
[61] Fonseca, Y., Medeiros, M., Vasconcelos, G. and Veiga, A. (2018) Boost: Boosting
Smooth Trees for Partial Effect Estimation in Nonlinear Regressions. Statistics,
1-30. https://arxiv.org/pdf/1808.03698.pdf
[62] Detemple, J. (2006) American-Style Derivatives: Valuation and Computation.
Chapman and Hall/CRC, New York. https://doi.org/10.1201/9781420034868
[63] Guyon, J. and Henry-Labordere, P. (2014) Nonliner Option Pricing. Chapman and
Hall, New York. https://doi.org/10.1201/b16332
[64] Detemple, J., Garcia, R. and Rindisbacher, M. (2005) Representation Formulas for
Malliavin Derivatives of Diffusion Processes. Finance and Stochastics, 9, 349-367.
https://doi.org/10.1007/s00780-004-0151-6
[65] Detemple, J. and Rindisbacher, M. (2005) Asymptotic Properties of Monte Carlo
Estimators of Derivatives. Management Science, 51, 1657-1675.
https://doi.org/10.1287/mnsc.1050.0398
[66] Detemple, J. (2014) Optimal Exercise for Derivative Securities. Annual Review of
Financial Economics, 6, 459-487.
https://doi.org/10.1146/annurev-financial-110613-034241
[67] Fujii, M., Sato, S. and Takahashi, A. (2012) An FBSDE Approach to American Op-
tion Pricing with an Interacting Particle Method. Quantitative Finance, 1-18.
https://arxiv.org/abs/1211.5867
https://doi.org/10.2139/ssrn.2180696
[68] Chassagneux, J., Elie, R. and Kharroubi, I. (2010) A Note on Existence and Unique-
ness for Solutions of Multidimensional Reflected BSDEs. Electronic Communica-
tions in Probability, 16, 120-128. https://doi.org/10.1214/ECP.v16-1614
[69] Collin-Dufresne, P. and Goldstein, R. (2003) Generalizing the Affine Framework to
HJM and Random Field Models. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.410421
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=410421
[70] Carmona, R. and Delarue, F. (2015) Forward-Backward Stochastic Differential Eq-
uations and Controlled McKean-Vlasov Dynamics. Annals of Probability, 43,
2647-2700. https://doi.org/10.1214/14-AOP946
[71] Bianchi, D., Büchner, M. and Tamoni, A. (2019) Bond Risk Premia with Machine
Learning. USC-INET Research Paper No. 19-11.
https://doi.org/10.2139/ssrn.3400941
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3232721
[72] Chen, L., Pelger, M. and Zhu, J. (2019) Deep Learning in Asset Pricing. Quantitative
Finance, 1-89. https://arxiv.org/abs/1904.00745
https://doi.org/10.2139/ssrn.3350138
[73] Feng, G., Polson, N. and Xu, J. (2019) Deep Learning in Asset Pricing. Statistics,
1-33. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3350138
[74] Yang, Q., Ye, T. and Zhang, L. (2018) A General Framework of Optimal Invest-
ment. Working Paper.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3136708
[75] Yu, P., Lee, J., Kulyatin, I., Shi, Z. and Dasgupta, S. (2019) Model-Based Deep Rein-
forcement Learning for Dynamic Portfolio Optimization. Computer Science, 1-21.
https://arxiv.org/abs/1901.08740
[76] Kingma, D. and Ba, J.L. (2014) Adam: A Method for Stochastic Optimization.
Appendix
A. Convergence of the Proposed Methodologies
Proof of Theorem 10. It is known from the projection theorem of Hilbert space
that {hn }n =1 and h actually exist and are unique. Moreover, PROJ n h = hn as
∞
= lim PROJ
nx
Ψ n PROJ (28)
n →∞
= PROJ Ψ h (30)
= h. (31)
This concludes the proof.
Proof of Lemma 1. For any λt ∈ L2 ( t ) , we have
(ξT − λt )
2
(32)
≥ (ξT − t [ξT ]) .
2
(36)
Therefore we have the claim announced.
Proof of Theorem 17. The proof of this theorem follows from Assumptions 1,
2, 6, 14, 15 and Theorem 10, by choosing
{ }
t [ξT ] | ξT ∈ L2 ( T ) , t [ξT ] ∈ L2 ( t ) ⊂
⊂ L2 ( t ) ⊂ L2 ( T ) =
t T .
Proof of Theorem 18. Essentially, Equation (7) is the result of Gauss-Markov
Theorem and the consistency property of OLS estimator.
Proof of Proposition 19. This is a direct consequence of the discussion in ([57],
Section 3) (see Equation (5)) and Theorem 10. To elaborate, consider
T = L2 ( T ) , x = ψ ( X T ) , its projections h and hn n =1 tn ⊂ L2 ( t )
∞
=
on t
and tn defined in this proposition. Suppose that
{ }
∞
where mn < mn +1 and etj is a set of orthonormal basis in t . From the
j =1
repeated projection theorem, we know that µ nj= +1 n
µ=
j λ j for any 1 ≤ j ≤ mn 6
Here we only consider the case where Λ =
6
n
mn < ∞ for any n ∈ . The case with Λ n =∞ is
analogous.
∑ j =1 λ 2j < ∞ . Therefore,
∞
and n ∈ . From the L2 property of h, we know that
h − hn L2 ( ) = ∑ j= n +1 λ 2j → 0 as n → ∞ .
∞
T
Proof of Proposition 20. This is a direct consequence of the discussion in ([58],
Theorem 2.2), localization arguments, Theorem 10 and the proof of Proposition
19.
Proof of Theorem 23. The first, second and third equality are obvious given an
appropriate choice of Φ depending on the Markov property of X and its mo-
ment conditions in Assumption 2. Actually, because of the existence and uni-
queness of φ ∈ Φ such that the RHS of the first equality achieves minimum, we
know that
K K
2
(ψ ( X T ) − φk ( t , X t ) ) 1 X ∈U k .
2
≤ min (38)
∑ kK=1φk ( t , x )1x∈U k ∈Φ t t
t
Therefore
K K
2
(ψ ( X T ) − φk ( t , X t ) ) 1 X ∈U k .
2
≥ min (40)
∑ kK=1φk ( t , x )1x∈U k ∈Φ t t
t
B. Other Applications
In this section, we document other applications of our methodologies in finance.
B.1. Joint Valuation and Calibration
Suppose that there are N derivatives contracts whose prices at time t0 can be
{V } {ϕn ( X ⋅ )}n =1 ,
N N
expressed as n
t0 . Their payoffs are where X is an
n =1
( ) ( ) (
dX tθ =µ t , X tθ | θ dt + σ t , X tθ | θ dWt + ∫E γ t , X tθ , e | θ N ( dt , de ) . ) (41)
{V }
N
n
The main idea is that t0 might contain derivatives contracts from dif-
n =1
ferent asset classes or hybrid ones. Therefore, we need to model X as a joint high
dimensional cross-asset system. One potential problem is that θ is in general a
high-dimensional vector, which will be hard to estimate using usual optimiza-
tion routines in R or MATLAB software system. However, we can apply ADAM
The existence and uniqueness of the solution to the SDEJ system (42) can be
obtained with necessary regularity conditions on the coefficients.
B.2. Option Surface Fitting
There is a strand of literature that strives to fit option panels using different
dynamics for the underlying assets, for example, [77] on stochastic volatility
models, [78] on local volatility models and [79] on local-stochastic volatility
models. Models that incorporate jumps can be found in [80], [81] and references
therein.
Consider the following stochastic differential equation
dS t
r ( t , X t ) dt + σ ( t , St , X t ) dWt , S0 =
= s0
St (43)
α ( t , X t ) dt + β ( t , X t ) dWt , X 0 =
dX t = x0 .
Here we model σ by a DNN. The advantage of doing so is that it might fully
capture the market volatility surface meantime ensuring a good dynamic fit,
while still preserving the existence and uniqueness result for the related stochas-
tic differential equation system (43).
B.3. Credit Risk Management: Evaluation on a Future Simulation Grid
We refer the problem definition to [6]. It is easy to illustrate that the problem
is equivalent to the evaluation of conditional expectations on a future simulation
grid and our methods are suitable for this type of problems. Note that, some
XVA quantities, such as KVA, require the evaluation of CVA on a future simu-
lation grid. Our methodologies, such as the ones proposed in Sections 2 and B.7,
can be applied on the evaluation of KVA, once we obtain future present values of
financial claims.
B.4. Dynamic Hedging
There are references that utilize machine learning (mainly Reinforcement
Learning, or RL) to solve dynamic hedging problems, e.g., [82], [83] and [84].
However, here in this paper we will not follow this route. Instead, we use the
BSDE formulation of the problem in [2] and try to solve the BSDE that cha-
racterizes the hedging problem. The methodology is outlined in Appendix
B.11.
B.5. Dynamic Portfolio-Consumption Choice
We use [85] as an example and try to solve the related coupled FBSDE with
jumps. The methodology is outlined in Appendix B.11. Other examples of dy-
namic portfolio optimization can be found in [53], [86], [87], [88], [89], [90],
[91], [92], [93] and [94]. Essentially, dynamic portfolio-consumption choice
problems are stochastic programming in nature and can be related to HJB equa-
tions or BSDEs. An example of using HJB representation of the problem can be
found in [95]. The equations can be solved using the methodologies outlined in
Section 0 and Appendix B.11.
B.6. Transition Density Approximation
We can generalize the theory in [96] and [97] to approximate the transition
density of a multivariate time-inhomogeneous stochastic differential equation
with jumps. According to [96] and [97], the transition density of a multivariate
time-inhomogeneous stochastic differential equation with or without jumps can
be approximated by polynomials in a weighted-Hilbert space. See ([97], Equa-
tion (2.1)), for example. The key is to evaluate the coefficients {cα }α , which is,
again, the evaluation of conditional expectations. The resulted transition density
can be used in option pricing, MLE estimation for MSDEJs and prediction, fil-
tering and smoothing problems for hidden Markov models, see [98].
B.7. Evaluating Conditional Expectations via a Measure Change
Consider the following equation
t ψ ( X τ )=
∫ Γ ( t , x;τ , y )ψ ( y ) dy
r (44)
Γ ( t , x;τ , y )
= ∫ Γ 0 ( t0 , x;τ , y ) ψ ( y ) dy (45)
Γ 0 ( t0 , x;τ , y )
r
Vt = ∫E U t ( e )ν ( de )
YT = φ ( X T )
( dt , d e ) N ( d t , d e ) − ν ( d e ) d t
where N= is a compensated Poisson random
measure. We take the following steps to solve Equation (47) numerically.
Time Discretization
Discretize time interval [ t,T ] into n-equal distance sub-intervals
ti +1 − ti
π = {[ti , ti +1 )}i = 0 with h =
n −1
, t0 = t and tn = T . Consider the following
n
Euler discretized equation.
= ( ) (
dX ti µ ti , X ti , Yti , Z ti , Vti h + σ ti , X ti , Yti , Z ti ,Vti dWti )
E
(
+ ∫ γ ti , X ti , Yti , Z ti , V , e ) N ( dt , de )
ti i
X 0 = x0
(48)
= ( )
dYti f ti , X ti , Yti , Z ti ,Vti h + Z ti dWti + ∫E U ti ( e ) N ( dti , de )
Vti = ∫E U ti ( e )ν ( de )
YT = φ ( X T )
( X , Y , Z ,U ) − ( X π , Y π , Z π ,U π ) [t ,T ] → 0
2
(49)
as n → ∞ .
Mollification
Define a sequence of functions (µ
, σ m , γ m , f m , φ m , which are bounded and m
)
have bounded derivatives of all orders and
( )
lim µ m , σ m , γ m , f m , φ m = ( µ , σ , γ , f , φ )
m →∞
(50)
in a point-wise sense. Also denote the solution to the CFBSDEJ with coefficients
(µ m
,σ m , γ m , f m ,φ m ) as (X m
)
, Y m , Z m , U m . Then, we have the following theo-
rem.
Theorem 26. Under Assumption 25
( )
t g X uπ , m , Yuπ , m , Z uπ , m , Vuπ , m → t g ( X u , Yu , Z u , Vu ) (51)
= (
dX tπi , m,1 µ m ti , X tπi , m,1 , 0, 0, 0 h + σ m ti , X tπi , m,1 , 0, 0, 0 dWti ) ( )
+ ∫E γ m
(t , X
i
π , m ,1
ti )
, 0, 0, 0, e N ( dti , de )
π , m ,1
X0 = x0
= (
dYtiπ , m,1 f m ti , X tπi , m,1 , Ytiπ , m,1 , Z tπi , m,1 , Vtiπ , m,1 h + Z tπi , m,1dWti )
+ ∫ U ti
E
π , m ,1
( e ) N ( dti , de )
(52)
Vtiπ , m,1 = ∫ U tπi , m,1 ( e )ν ( de )
E
YT π , m ,1
(
= φ X Tπ , m,1 )
For k ≥ 2 , define
(
dX tπi , m, k = µ m ti , X tπi , m, k , Ytiπ , m, k −1 , Z tπi , m, k −1 , Vtiπ , m, k −1 h )
(
+ σ m ti , X tπi , m, k , Ytiπ , m, k −1 , Z tπi , m, k −1 , Vtiπ , m, k −1 dWti )
( )
+ ∫E γ m ti , X tπi , m, k , Ytiπ , m, k −1 , Z tπi , m, k −1 , Vtiπ , m, k −1 , e N ( dti , de )
π ,m,k
X0 = x0
(
dYtiπ , m, k = f m ti , X tπi , m, k , Ytiπ , m, k , Z tπi , m, k , Vtiπ , m, k h )
+ Z ti π ,m,k
dWti + ∫E U ti π ,m,k
( e ) N ( dti , de )
(53)
Vtiπ , m, k = ∫E U tπi , m, k ( e )ν ( de )
YTπ , m, k = φ X Tπ , m, k ( )
Evaluation of Conditional Expectations
For Equation system (53), we can start from the last time interval and work
backwards. The problem is transformed into the evaluation of ti u ti +1 , X tπi+,1m, k ,
( )
where u is the intermediate solution and satisfies u (T , ⋅) = φ ( ⋅) .
B.12. Pricing Kernel Approximation
A pricing kernel ηt is an L2 ( t ) stochastic process, adapted to the infor-
mation filtration {t }0≤t ≤T , such that
Vt = t Dt ,Tηt ,T VT (54)
DT T
− r dv η
where VT is an T payoff, D=
t ,T = e ∫t v and ηt ,T = T . It is obvious
Dt ηt
that ηt = t [ηT ] , i.e., η is a -martingale. Represent
DTηT = ∑ j = 0 a j eTj (θ j )
∞
∞ j j ∞ ∞ j j
t0 ∑ a eT (θ j ) ∑ bk eT (θ j )
Vt0k = ∑ a bk .
j j
= (55)
= j 0=j 0 = j 0
j =0
numerical computations. The basis can also be represented by ANNs.
Remark 28. For a specific representation via universal approximation theo-
rem, see [55].
Remark 29. It is possible to allow shape constraints in the estimation (55) and
formulate a constrained optimization problem, see [105], for example.
We can also directly utilize the method proposed in Section 2, when used with
time discretization and Monte Carlo simulation. Denote M as the number of
{V }
M ,K
sample paths and T
m,k
as M simulated final payoffs for each of the K
m k 1
= 1,=
{V }
K
{am }m =1
M
derivatives. Define as M real numbers. Let 0
k
be K derivative
k =1
K k 1 M m,k
2
{am }m =1 arg minM ∑ V0 − ∑ φmVT .
M
= (56)
=
{φm }m=1 k 1 =Mm1
where {X }
m M
T m =1 is a set of simulated state variables at time T. When fitting g,
we can add some shape or no-arbitrage constraints, or other regularization con-
ditions, to the optimization problem and formulate a constrained ANN
(ACNN). We always assume that the matrix t VTm, k ({ }
M ,K
m k 1
= 1,= ){VT }
m,k M K
m
,
k 1
= 1,=
is a