Financial Risk Management - Part 2
Financial Risk Management - Part 2
Agenda
xn = (a · xn−1 + c) mod m
un = xn /m
where:
a is the multiplicative constant
c is the additive constant
m is the modulus (or the order of the congruence)
The initial number x0 is called the seed
{x1 , x2 , . . . , xn } is a sequence of pseudorandom integer numbers
(0 ≤ xn < m)
{u1 , u2 , . . . , un } is a sequence of uniform random variates
The maximum period is m
Thierry Roncalli Course 2023-2024 in Financial Risk Management 967 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Uniform random numbers
Random variate generation
Non-uniform random numbers
Simulation of stochastic processes
Random vectors
Monte Carlo methods
Random matrices
Example #1
If we consider that a = 3, c = 0, m = 11 and x0 = 1, we obtain the
following sequence:
{1, 3, 9, 5, 4, 1, 3, 9, 5, 4, 1, 3, 9, 5, 4, . . .}
The period length is only five, meaning that only five uniform random
variates can be generated: 0.09091, 0.27273, 0.81818, 0.45455 and
0.36364
n xn un xn un
0 1 0.000000 123 456 0.000057
1 16 807 0.000008 2 074 924 992 0.966212
2 282 475 249 0.131538 277 396 911 0.129173
3 1 622 650 073 0.755605 22 885 540 0.010657
4 984 943 658 0.458650 237 697 967 0.110687
5 1 144 108 930 0.532767 670 147 949 0.312062
6 470 211 272 0.218959 1 772 333 975 0.825307
7 101 027 544 0.047045 2 018 933 935 0.940139
8 1 457 850 878 0.678865 1 981 022 945 0.922486
9 1 458 777 923 0.679296 466 173 527 0.217079
10 2 007 237 709 0.934693 958 124 033 0.446161
We have !
k
X
xn = ai · xn−i + c mod m
i=1
xn − yn + 1 {xn ≤ yn } · m1
un =
m1 + 1
Method of inversion
Continuous random variables
G (y ) = Pr {Y ≤ y }
= Pr {F (X ) ≤ y }
= Pr X ≤ F−1 (y )
−1
= F F (y )
= y
Method of inversion
Continuous random variables
F (X ) ∼ U[0,1]
Method of inversion
Continuous random variables
Example #2
If we consider the generalized uniform distribution U[a,b] , we have
F (x) = (x − a) / (b − a) and F−1 (u) = a + (b − a) u. The simulation of
random variates xi is deduced from the uniform random variates ui by
using the following transform:
xi ← a + (b − a) ui
Method of inversion
Continuous random variables
Example #3
In the case of the exponential distribution E (λ), we have
F (x) = 1 − exp (−λx). We deduce that:
ln (1 − ui )
xi ← −
λ
Since 1 − U is also a uniform distributed random variable, we have:
ln (ui )
xi ← −
λ
Method of inversion
Continuous random variables
Example #4
In the case of the Pareto distribution P (α, x− ), we have
−α −1/α
F (x) = 1 − (x/x− ) and F−1 (u) = x− (1 − u) . We deduce that:
x−
xi ← 1/α
(1 − ui )
Method of inversion
Continuous random variables
ui − F (xim )
xim+1 = xim +
f (xim )
Method of inversion
Discrete random variables
Method of inversion
Discrete random variables
We assume that:
xi 1 2 4 6 7 9 10
pi 10% 20% 10% 5% 20% 30% 5%
F (xi ) 10% 30% 40% 45% 65% 95% 100%
Method of inversion
Discrete random variables
Method of inversion
Discrete random variables
Example #5
If we apply the method of inversion to the Bernoulli distribution B (p), we
have:
0 if 0 ≤ u ≤ 1 − p
x←
1 if 1 − p < u ≤ 1
or:
1 if u ≤ p
x←
0 if u > p
Method of inversion
Piecewise distribution functions
Method of inversion
Piecewise distribution functions
We know that S (τ ) ∼ U
It follows that:
?
? 1 S tm−1 ? ?
ti ← tm−1 + ln if S (tm ) < ui ≤ S tm−1
λm ui
Method of inversion
Piecewise distribution functions
Example #6
We model the default time τ with the piecewise exponential model and
the following parameters:
5% if t is less or equal than one year
λ= 8% if t is between one and five years
12% if t is larger than five years
Method of inversion
Piecewise distribution functions
Method of transformation
Method of transformation
Method of transformation
Method of transformation
Box-Muller algorithm
if U1 and U2 are two independent uniform random variables, then X1 and
X2 defined by: √
X1 = √−2 ln U1 · cos (2πU2 )
X2 = −2 ln U1 · sin (2πU2 )
are independent and follow the Gaussian distribution distribution N (0, 1)
Method of transformation
If Nt is a Poisson process with intensity λ, the duration T between
two consecutive events is an exponential:
Pr (T ≤ t) = 1 − e −λt
where Ei ∼ E (λ)
Because the Poisson random variable is the number of events that
occur in the unit interval of time, we also have:
( n
)
X
X = max {n : T1 + T2 + . . . + Tn ≤ 1} = max n : Ei ≤ 1
i=1
Method of transformation
We notice that:
n n n
X 1X 1 Y
Ei = − ln Ui = − ln Ui
λ λ
i=1 i=1 i=1
Method of transformation
We can then simulate the Poisson random variable with the following
algorithm:
1 set n = 0 and p = 1;
2 calculate n = n + 1 and p = p · ui where ui is a uniform random
variate;
3 if p ≥ e −λ , go back to step 2; otherwise, return X = n − 1
Rejection sampling
Theorem
F (x) and G (x) are two distribution functions such that
f (x) ≤ cg (x) for all x with c > 1
We note X ∼ G and consider an independent uniform random
variable U ∼ U[0,1]
Then, the conditional distribution function of X given that
U ≤ f (X ) / (cg (X )) is F (x)
Rejection sampling
Proof
Let us introduce the random variables B and Z :
f (X ) f (X )
B=1 U≤ and Z =X U≤
cg (X ) cg (X )
We have:
f (X )
Pr {B = 1} = Pr U ≤
cg (X )
Z +∞
f (X ) f (x)
= E = g (x) dx
cg (X ) −∞ cg (x)
Z +∞
1
= f (x) dx
c −∞
1
=
c
Thierry Roncalli Course 2023-2024 in Financial Risk Management 995 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Uniform random numbers
Random variate generation
Non-uniform random numbers
Simulation of stochastic processes
Random vectors
Monte Carlo methods
Random matrices
Rejection sampling
Proof
The distribution function of Z is defined by:
f (X )
Pr {Z ≤ x} = Pr X ≤ x U ≤
cg (X )
We deduce that:
f (X )
Pr X ≤ x, U ≤ Z x Z f (x)/(cg (x))
cg (X )
Pr {Z ≤ x} = =c g (x) du dx
f (X ) −∞ 0
Pr U ≤
cg (X )
Z x Z x
f (x)
= c g (x) dx = f (x) dx
−∞ cg (x) −∞
= F (x)
This proves that Z ∼ F
Thierry Roncalli Course 2023-2024 in Financial Risk Management 996 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Uniform random numbers
Random variate generation
Non-uniform random numbers
Simulation of stochastic processes
Random vectors
Monte Carlo methods
Random matrices
Rejection sampling
Acceptance-rejection algorithm
1 generate two independent random variates x and u from G and U[0,1] ;
2 calculate v as follows:
f (x)
v=
cg (x)
3 if u ≤ v , return x (‘accept’); otherwise, go back to step 1 (‘reject’)
Remark
The underlying idea of this algorithm is then to simulate the distribution
function F by assuming that it is easier to generate random numbers from
G, which is called the proposal distribution. However, some of these
random numbers must be ‘rejected’, because the function c · g (x)
‘dominates’ the density function f (x)
Rejection sampling
E [N] = 1/p = c
Rejection sampling
Rejection sampling
Rejection sampling
Rejection sampling
u1 u2 x v test z
0.9662 0.1291 9.3820 0.0000 reject
0.0106 0.1106 −30.0181 0.0000 reject
0.3120 0.8253 −0.6705 0.9544 accept −0.6705
0.9401 0.9224 5.2511 0.0000 reject
0.2170 0.4461 −1.2323 0.9717 accept −1.2323
0.6324 0.0676 0.4417 0.8936 accept 0.4417
0.6577 0.1344 0.5404 0.9204 accept 0.5404
0.1596 0.6670 −1.8244 0.6756 accept −1.8244
0.4183 0.3872 −0.2625 0.8513 accept −0.2625
0.9625 0.0752 8.4490 0.0000 reject
Rejection sampling
Method of mixtures
A finite mixture can be decomposed as a weighted sum of distribution
functions:
n
X
F (x) = πk · Gk (x)
k=1
Pn
where πk ≥ 0 and k=1 πk = 1
The probability density function is:
n
X
f (x) = πk · gk (x)
k=1
To simulate the probability distribution F, we introduce the random
variable B, whose probability mass function is defined by:
p (k) = Pr {B = k} = πk
It follows that:
n
X
F (x) = Pr {B = k} · Gk (x)
k=1
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1004 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Uniform random numbers
Random variate generation
Non-uniform random numbers
Simulation of stochastic processes
Random vectors
Monte Carlo methods
Random matrices
Method of mixtures
Method of mixtures
Method of mixtures
Random vectors
xi ← F−1
i|1,...,i−1 (ui | x1 , . . . , xi−1 )
Example #7
We consider the bivariate logistic distribution defined as:
−x1 −x2 −1
F (x1 , x2 ) = 1 + e +e
−1
We have F1 (x1 ) = F (x1 , +∞) = (1 + e −x1 ) . We deduce that the
conditional distribution of X2 given X1 = x1 is:
F (x1 , x2 )
F2|1 (x2 | x1 ) =
F1 (x1 )
1 + e −x1
=
1 + e −x1 + e −x2
We obtain:
F−1
1 (u) = ln u − ln (1 − u)
and:
F−1 −x1
2|1 (u | x1 ) = ln u − ln (1 − u) − ln 1 + e
x1 ← ln u1 − ln (1 − u1 )
and set i = i + 1;
4 repeat step 3 until i = n.
For some copula functions, there exists an analytical expression of the
inverse of the conditional copula. In this case, the third step is replaced by:
3 generate ui by the inversion method:
ui ← C−1
i|1,...,i−1 (vi | u1 , . . . , ui−1 )
In particular, we have:
∂1 F (x1 , x2 ) = ∂1 F1 (x1 ) · F2|1 (x2 | x1 )
= f1 (x1 ) · F2|1 (x2 | x1 )
We can generalize this result and show that the conditional copula given
some random variables Ui for i ∈ Ω is equal to the cross-derivative of the
copula function C with respect to the arguments ui for i ∈ Ω
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1018 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Uniform random numbers
Random variate generation
Non-uniform random numbers
Simulation of stochastic processes
Random vectors
Monte Carlo methods
Random matrices
Example #8
We consider the Clayton copula:
−1/θ
C (u1 , u2 ) = u1−θ + u2−θ −1
ϕ (u) = u −θ − 1
We deduce that:
−1/θ
ϕ−1 (u) = (1 + u)
ϕ0 (u) = −θu −(θ+1)
−1/(θ+1)
ϕ0−1 (u) = (−u/θ)
We obtain:
−1//θ
C−1
2|1 (v | u1 ) = 1 + u1−θ v −θ/(θ+1)
−1
Method of transformation
X =µ+A·N
Method of transformation
Method of transformation
Xi
Yi = p
Z /ν
Method of transformation
Ui = Fi (X )
U = Φ (P · N)
Method of transformation
Method of transformation
Method of transformation
Method of transformation
−1/θ
The Clayton copula is a frailty copula where ψ (x) = (1 + x) is
the Laplace transform of the gamma random variable G (1/θ, 1)
The algorithm to simulate the Clayton copula is:
x ← G (1/θ, 1)
−1/θ −1/θ !
ln u1 ln un
1 (u , . . . , u n ) ← 1 − , . . . , 1 −
x x
Method of transformation
Method of transformation
Method of transformation
Method of transformation
Method of transformation
Remark
The previous algorithms suppose that we know the analytical expression Fi
of the univariate probability distributions in order to calculate the quantile
F−1
i . This is not always the case. For instance, in the operational risk, the
loss of the bank is equal to the sum of aggregate losses:
K
X
L= Sk
k=1
where Sk is also the sum of individual losses for the k th cell of the
mapping matrix. In practice, the probability distribution of Sk is estimated
by the method of simulations
Method of transformation
we also have:
1 Xm1 ?
xi ← inf x 1 x ≤ xi,m ≥ ui
m1 m=1
Method of transformation
X1 ∼ N (0, 1)
X2 ∼ N (0, 1)
The dependence function of (X1 , X2 ) is the Clayton copula with
parameter θ = 3
Method of transformation
Method of transformation
Method of transformation
Method of transformation
Random matrices
Brownian motion
A Brownian motion (or a Wiener process) is a stochastic process
W (t), whose increments are stationary and independent:
W (t) − W (s) ∼ N (0, t − s)
We have:
W (0) = 0
W (t) = W (s) + (s, t)
where (s, t) ∼ N (0, t − s) are iid random variables
To simulate W (t) at different dates t1 , t2 , . . ., we have:
√
Wm+1 = Wm + tm+1 − tm · εm
where Wm is the numerical realization of W (tm ) and εm ∼ N (0, 1)
are iid random variables
In the case of fixed-interval times tm+1 − tm = h, we obtain the
recursion: √
Wm+1 = Wm + h · εm
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1044 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Random variate generation
Univariate continuous-time processes
Simulation of stochastic processes
Multivariate continuous-time processes
Monte Carlo methods
Ornstein-Uhlenbeck process
The stochastic differential equation of the Ornstein-Uhlenbeck
process is:
dX (t) = a (b − X (t)) dt + σ dW (t)
X (0) = x0
We also have:
Z t
−a(t−s) −a(t−s)
X (t) = X (s) e +b 1−e +σ e a(θ−t) dW (θ)
s
where:
t −2a(t−s)
1−e
Z
e a(θ−t) dW (θ) ∼ N 0,
s 2a
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1048 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Random variate generation
Univariate continuous-time processes
Simulation of stochastic processes
Multivariate continuous-time processes
Monte Carlo methods
Ornstein-Uhlenbeck process
Ornstein-Uhlenbeck process
It follows that:
√
1 2
ln Xm+1 = ln Xm + µ − σ h + σ h · εm
2
Poisson process
Let tm be the time when the mth event occurs. The numerical algorithm is
then:
1 we set t0 = 0 and N (t0 ) = 0
2 we generate a uniform random variate u and calculate the random
variate e ∼ E (λ) with the formula:
ln u
e=−
λ
3 we update the Poisson process with:
4 we go back to step 2
We have:
E [Wi (t) Wj (s)] = min (t, s) · ρi,j
where ρi,j is the correlation between the two Brownian motions Wi
and Wj
We deduce that:
W (0) = 0
W (t) = W (s) + (s, t)
where εm ∼ Np (0, ρ)
In the case of a diagonal system, we retrieve the one-dimensional
scheme:
√
Xj,m+1 = Xj,m +µj (tm , Xj,m )·(tm+1 − tm )+σj,j (tm , Xj,m )· tm+1 − tm εj,m
and: p
vm+1 = vm + a (b − vm ) h + σ vm h · ε2,m
Here, ε1,m and ε2,m are two standard Gaussian random variables with
correlation ρ
and: Z tm+1 Z s
I(k,k 0 ) = dWk (t) dWk 0 (s)
tm tm
where:
Z tm+1 Z s
I(j,j) = dWj (t) dWj (s)
tm tm
Z tm+1
= (Wj (s) − Wj (tm )) dWj (s)
tm
1 2
= (∆Wj,m ) − (tm+1 − tm )
2
Remark
The multidimensional Milstein scheme is generally not used, because the
terms L(k) σj,k 0 (tm , Xm ) I(k,k 0 ) are complicated to simulate. For the Heston
model, we obtain a very simple scheme, because we only apply the Milstein
scheme to the process v (t) and not to the vector process (ln X (t) , v (t))
where:
2 2 2
!
X X X σ1,k 0 (tm , Xm )
Am = σk 00 ,k (tm , Xm ) I(k,k 0 )
∂ xk 00
k=1 k 0 =1 k 00 =1
p 1
= σ v (t) · p · I(2,1)
2 v (t)
σ
= · I(2,1)
2
and:
p 1 2 2
vm+1 = vm + a (b − vm ) h + σ vm h · ε2,m + σ h ε2,m − 1
4
where Bm is a correction term defined by:
p Z tm+1
Bm = 1 − ρ2 (W ? (s) − W ? (tm )) dW1 (s)
tm
A basic example
Suppose we have a circle with radius r and a 2r × 2r square of the
same center. Since the area of the circle is equal to πr 2 , the
numerical calculation of π is equivalent to compute the area of the
circle with r = 1
In this case, the area of the square is 4, and we have:
A (circle)
π=4
A (square)
To determine π, we simulate nS random vectors (us , vs ) of uniform
random variables U[−1,1] and we obtain:
nc
π = lim 4
nS →∞ n
A basic example
Theoretical framework
We consider the multiple integral:
Z Z
I = · · · ϕ (x1 , . . . , xn ) dx1 · · · dxn
Ω
Theoretical framework
Let IˆnS be the random variable defined by:
nS
1 X
IˆnS = h (X1,s , . . . , Xn,s )
nS s=1
Theoretical framework
where α is the confidence level, cα = Φ−1 ((1 + α) /2) and ŜnS is the
usual estimate of the standard deviation:
v
u nS
u 1 X
ŜnS = t h2 (X1,s , . . . , Xn,s ) − Iˆns
nS − 1 s=1
Theoretical framework
The price S (t) of the underlying asset is given by the following SDE:
dS (t) = rS (t) dt + σS (t) dW (t)
where r is the interest rate and σ is the volatility of the asset
For a given simulation s, we have:
√
(s) 1
Sm+1 = Sm(s) · exp r − σ 2 (tm+1 − tm ) + σ tm+1 − tm · ε(s) m
2
(s)
where εm ∼ N (0, 1) and T = tM
The Monte Carlo estimator of the option price is then equal to:
−rT XnS +
e (s)
Cb = SM − min Sm(s)
nS s=1 m
where xs and ys are two independent random variates from the probability
distribution N (0, 1)
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1091 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Random variate generation Computing integrals
Simulation of stochastic processes Variance reduction
Monte Carlo methods Quasi-Monte Carlo simulation methods
Variance reduction
Antithetic variates
We have:
I = E [ϕ (X1 , . . . , Xn )] = E [Y ]
where Y = ϕ (X1 , . . . , Xn ) is a one-dimensional random variable
It follows that:
nS
1 X
IˆnS = ȲnS = Ys
nS s=1
We now consider the estimators ȲnS and Ȳn0S based on two different
samples and define Ȳ ? as follows:
?ȲnS + Ȳn0S
Ȳ =
2
Antithetic variates
We have: " #
? ȲnS + Ȳn0S
E Ȳ = E = E ȲnS = I
2
and:
!
?
ȲnS + Ȳn0S
var Ȳ = var
2
1 1 0
1 0
= var ȲnS + var ȲnS + cov ȲnS , ȲnS
4 4 2
1 + ρ ȲnS , Ȳn0S
= var ȲnS
2
1 + ρ hYs , Ys0 i
= var ȲnS
2
where ρ hYs , Ys0 i is the correlation between Ys and Ys0
Antithetic variates
Antithetic variates
Ys0 = ψ (Ys )
ρ ȲnS , Ȳn0S = ρ hY , Y 0 i = ρ hY , ψ (Y )i
Antithetic variates
?
Minimizing the variance var Ȳ is then equivalent to minimize the
correlation ρ hY , ψ (Y )i
We also know that the correlation reaches its lower bound if the
dependence function between Y and ψ (Y ) is equal to the lower
Fréchet copula:
C hY , ψ (Y )i = C−
However, ρ hY , ψ (Y )i is not necessarily equal to −1 except in some
special cases
Antithetic variates
C hY , ψ (Y )i = C hϕ (X ) , ψ (ϕ (X ))i = C hX , ψ (X )i
ψ (X ) = F−1 (1 − F (X ))
Antithetic variates
Example #9
We consider the following functions:
1 ϕ1 (x) = x 3 + x + 1
2 ϕ2 (x) = x 4 + x 2 + 1
3 ϕ3 (x) = x 4 + x 3 + x 2 + x + 1
Antithetic variates
For each function, we want to estimate I = E [ϕ (N (0, 1))] using the
antithetic estimator:
nS
1 X ϕ (Xs ) + ϕ (−Xs )
Ȳn?S =
nS s=1 2
where Xs ∼ N (0, 1)
2
X2m∼ N (0, 1). We have
Let E X = 1,
2m−2 2m+1
E X = (2m − 1) E X and E X = 0 for m ∈ N
We obtain the following results:
Antithetic variates
C hX , X 0 i = C− ⇒ C hY , Y 0 i = C− ⇔ ϕ0 (x) ≥ 0
Antithetic variates
0 (Y − µ)
Y = µ − σX = µ − σ = 2µ − Y
σ
If we consider the geometric Brownian motion, the fixed-interval
scheme is:
√
1 2
Xm+1 = Xm · exp µ − σ h + σ h · εm
2
whereas the antithetic path is given by:
√
0 1
Xm+1 = Xm0 · exp µ − σ 2 h − σ h · εm
2
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1104 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Random variate generation Computing integrals
Simulation of stochastic processes Variance reduction
Monte Carlo methods Quasi-Monte Carlo simulation methods
where:
ε0m = −P · ηm = −εm
0 0 0
We verify that εm = ε1,m , . . . , εn,m ∼ Nn (0, ρ)
In the Black-Scholes model, the price of the spread option with maturity T
and strike K is given by:
h i
−rT +
C=e E (S1 (T ) − S2 (T ) − K )
where the prices S1 (t) and S2 (t) of the underlying assets are given by the
following SDE:
dS1 (t) = rS1 (t) dt + σ1 S1 (t) dW1 (t)
dS2 (t) = rS2 (t) dt + σ2 S2 (t) dW2 (t)
(s)
where Sj (T ) is the s th simulation of the terminal value Sj (T )
For the AV estimator, we obtain:
+ +
(s) (s) 0(s) 0(s)
e
nS
−rT X S1 (T ) − S2 (T ) − K + S1 (T ) − S2 (T ) − K
ĈAV =
nS s=1 2
0(s) (s)
where Sj (T ) is the antithetic variate of Sj (T )
Control variates
E [Z ] = E [Y + c · (V − E [V ])]
= E [Y ] + c · E [V − E [V ]]
= E [ϕ (X1 , . . . , Xn )]
and:
Control variates
It follows that:
? cov (Y , V )
c =− = −β
var (V )
Control variates
cov (Y , V )
Z =Y − · (V − E [V ])
var (V )
and:
cov2 (Y , V ) 2
var (Z ) = var (Y ) − = 1 − ρ hY , V i · var (Y )
var (V )
Control variates
Example
We consider that X ∼ U[0,1] and ϕ (x) = e x . We would like to estimate:
Z 1
I = E [ϕ (X )] = e x dx
0
Control variates
We set Y = e X and V = X
We know that E [V ] = 1/2 and var (V ) = 1/12
It follows that:
var (Y ) = E Y − E2 [Y ]
2
Z 1 Z 1 2
= e 2x dx − e x dx
0 0
2x
1
e 1 0 2
= − e −e
2 0
2
4e − e − 3
=
2
≈ 0.2420
Control variates
We have:
cov (Y , V ) = E [VY ] − E [V ] E [Y ]
Z 1
x 1 1 0
= xe dx − e −e
0 2
1 Z 1
x x 1 1 0
= xe − e dx − e −e
0 0 2
3−e
=
2
≈ 0.1409
If we consider the VC estimator Z defined by:
cov (Y , V )
Z = Y− · (V − E [V ])
var (V )
1
= Y − (18 − 6e) · V −
2
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1115 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Random variate generation Computing integrals
Simulation of stochastic processes Variance reduction
Monte Carlo methods Quasi-Monte Carlo simulation methods
Control variates
We have β ≈ 1.6903
We obtain:
cov2 (Y , V )
var (Z ) = var (Y ) −
var (V )
4e − e 2 − 3 2
= − 3 · (3 − e)
2
≈ 0.0039
Control variates
Control variates
Ŷ is the conditional expectation of Y with respect to V :
E [Y | V ] = E [Y ] + β (V − E [V ])
U = Y − Ŷ = (Y − E [Y ]) − β (V − E [V ])
Z = E [Y ] + U = Y − β (V − E [V ])
Control variates
Control variates
where K is the strike of the option and S̄ denotes the average of S (t) on
a given number of fixing dates21 {t1 , . . . , tnF }:
nF
1 X
S̄ = S (tm )
nF m=1
21 We have tnF = T .
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1120 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Random variate generation Computing integrals
Simulation of stochastic processes Variance reduction
Monte Carlo methods Quasi-Monte Carlo simulation methods
Control variates
YnF 1/nF
S̃ = S (tm )
m=1
Control variates
E [S (T )] = S0 e rT
Control variates
where:
1 XnF
t̄ = tm
nF m=1
and:
1 Xn F
W̄ = W (tm )
nF m=1
Control variates
Control variates
The previous approach can be extended in the case of several control
variates:
nCV
X
Z =Y + ci · (Vi − E [Vi ]) = Y + c > (V − E [V ])
i=1
Control variates
Table: Linear regression between the Asian call option and the control variates
Importance sampling
I = E [ϕ (X1 , . . . , Xn ) | F]
Z Z
= · · · ϕ (x1 , . . . , xn ) f (x1 , . . . , xn ) dx1 · · · dxn
Importance sampling
It follows that:
Z Z
f (x1 , . . . , xn )
I = ··· ϕ (x1 , . . . , xn ) g (x1 , . . . , xn ) dx1 · · · dxn
g (x1 , . . . , xn )
f (X1 , . . . , Xn )
= E ϕ (X1 , . . . , Xn ) G
g (X1 , . . . , Xn )
= E [ϕ (X1 , . . . , Xn ) L (X1 , . . . , Xn ) | G]
Importance sampling
E [ϕ (X ) | F] = E [ϕ (X ) L (X ) | G]
It follows that: h i h i
E IˆMC = E IˆIS = I
where IˆMC and IˆIS are the Monte Carlo and importance sampling
estimators of I
We also deduce that:
var IˆIS = var (ϕ (X ) L (X ) | G)
Importance sampling
It follows that:
ˆ = E ϕ (X ) L (X ) | G − E2 [ϕ (X ) L (X ) | G]
2 2
var IIS
Z
= ϕ2 (x) L2 (x) g (x) dx − I 2
f 2 (x)
Z
2
= ϕ (x) 2 g (x) dx − I 2
g (x)
f 2 (x)
Z
2
= ϕ (x) dx − I 2
g (x)
Importance sampling
Importance sampling
The first-order condition is:
2
2 f (x)
−ϕ (x) · 2 =λ
g (x)
where λ is a constant
We have:
g ? (x) arg min var IˆIS
=
f 2 (x)
Z
2
= arg min ϕ (x) dx
g (x)
= c · |ϕ (x)| · f (x)
R ?
where c is the normalizing constant such that g (x) dx = 1
A good choice of the IS density g (x) is then an approximation of
|ϕ (x)| · f (x) such that g (x) can easily be simulated
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1132 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
Random variate generation Computing integrals
Simulation of stochastic processes Variance reduction
Monte Carlo methods Quasi-Monte Carlo simulation methods
Importance sampling
Remark
In order to simplify the notation and avoid confusions, we consider that
X ∼ F and Z ∼ G in the sequel. This means that IˆMC = ϕ (X ) and
IˆIS = ϕ (Z ) L (Z )
Importance sampling
Importance sampling
Importance sampling
Importance sampling
We can estimate the option price by using the Monte Carlo method
with:
+
ϕ (x) = e −rT (K − x)
In the case where K S (0), the probability of exercise
Pr {S (T ) ≤ K } is very small
Therefore, we have to increase the probability of exercise in order to
obtain a more efficient estimator
Importance sampling
where µz = θ + µx and σz = σx
Importance sampling
Importance sampling
We deduce that:
P = E [ϕ (S (T ))] = E [ϕ (S 0 (T )) · L (S 0 (T ))]
where:
1 ln x − µx
φ 2
xσx σx θ ln x − µx θ
L (x) = = exp 2
− ·
1 ln x − µz 2σx σx σx
φ
xσz σz
Importance sampling
Example #10
We assume that S0 = 100, K = 60, r = 5%, σ = 20% and T = 2. If we
consider the previous method, the IS process is simulated using the initial
0 −(r −σ 2 /2)T
value S (0) = Ke = 56.506, whereas the value of θ is equal to
−0.5708
Importance sampling
Figure: Density function of the estimators P̂MC and P̂IS (nS = 1 000)
Example #11
We consider a spread option whose payoff is equal to
+
(S1 (T ) − S2 (T ) − K ) . The price is calculated using the Black-Scholes
model, and the following parameters: S1 (0) = S2 (0) = 100,
σ1 = σ2 = 20%, ρ = 50% and r = 5%. The maturity T of the option is
set to one year, whereas the strike K is equal to 5. The true price of the
spread option is equal to 5.8198.
Exercises
References
Devroye, L. (1986)
Non-Uniform Random Variate Generation, Springer-Verlag.
Roncalli, T. (2020)
Handbook of Financial Risk Management, Chapman and Hall/CRC
Financial Mathematics Series, Chapter 13.
Agenda
Credit scoring
Xj Ratio
X1 Working capital / Total assets
X2 Retained earnings / Total assets
X3 Earnings before interest and tax / Total assets
X4 Market value of equity / Total liabilities
X5 Sales / Total assets
If we note Zi the score of the firm i, we can calculate the normalized score:
where mz and σz are the mean and standard deviation of the observed scores
A low value of Zi? (for instance Zi? < 2.5) indicates that the firm has a high probability of
default
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1195 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
The method of scoring The emergence of credit scoring
Statistical methods Variable selection
Performance evaluation criteria and score consistency Score modeling, validation and follow-up
New developments
Scores are developed by banks and financial institutions, but they can
also be developed by consultancy companies
This is the case of the FICO R scores, which are the most widely used
credit scoring systems in the world
Data preparation
Variable selection
Many candidate variables X = (X1 , . . . , Xm ) for explaining the
variable Y
The variable selection problem consists in finding the best set of
optimal variables
We assume the following statistical model:
Y = f (X ) + u
2
where u ∼ N 0, σ
We denote the prediction by Ŷ = fˆ (X ). We have:
2 2
E Y − Ŷ = E f (X ) + u − fˆ (X )
h i 2 h i2
= E fˆ (X ) − f (X ) + E fˆ (X ) − E fˆ (X ) + σ2
Variable selection
Stepwise approach:
RSS θ̂(k) − RSS θ̂(k+1)
F =
(residual)
RSS θ̂(k+1) / df (k+1)
Lasso approach:
K
X K
X
yi = βk xi,k + ui s.t. |βk | ≤ τ
k=1 k=1
Statistical methods
Clustering
K -means clustering
Hierarchical clustering
Clustering
Dimension reduction
Discriminant analysis
Discriminant analysis
The two-dimensional case
Pr {A ∩ B} = Pr {A | B} · Pr {B} = Pr {B | A} · Pr {A}
It follows that:
Pr {A}
Pr {A | B} = Pr {B | A} ·
Pr {B}
Pr {i ∈ C1 }
Pr {i ∈ C1 | X = x} = Pr {X = x | i ∈ C1 } ·
Pr {X = x}
Discriminant analysis
The two-dimensional case
Discriminant analysis
Quadratic discriminant analysis (QDA)
X | i ∈ Cj ∼ N (µj , Σj )
we have:
1 1 >
fj (x) = K /2 1/2
exp − (x − µj ) Σ−1
j (x − µj )
(2π) |Σj | 2
We deduce that:
f1 (x) 1 |Σ2 | 1 > −1 1 >
ln = ln − (x − µ1 ) Σ1 (x − µ1 )+ (x − µ2 ) Σ−1
2 (x − µ2 )
f2 (x) 2 |Σ1 | 2 2
The decision boundary is then given by:
1 |Σ2 | 1 > −1 1 > −1 π1
ln − (x − µ1 ) Σ1 (x − µ1 )+ (x − µ2 ) Σ2 (x − µ2 )+ln =0
2 |Σ1 | 2 2 π2
Discriminant analysis
Linear discriminant analysis (LDA)
Discriminant analysis
Example #1
We consider two classes and two explanatory variables X = (X1 , X2 ) where
π1 = 50%, π2 = 1 − π1 = 50%, µ1 = (1, 3), µ2 = (4, 1), Σ1 = I2 and
Σ2 = γI2 where γ = 1.5.
Discriminant analysis
Discriminant analysis
Discriminant analysis
The general case
Pr {i ∈ Cj }
Pr {i ∈ Cj | X = x} = Pr {X = x | i ∈ Cj } ·
Pr {X = x}
= c · fj (x) · πj
Discriminant analysis
The general case
Discriminant analysis
The general case
Remark
In practice, the parameters πj , µj and Σj are unknown. We replace them
by the corresponding estimates π̂j , µ̂j and Σ̂j . For the linear discriminant
analysis, Σ̂ is estimated by pooling all the classes.
Discriminant analysis
The general case
Example #2
We consider the classification problem of 33 observations with two
explanatory variables X1 and X2 , and three classes C1 , C2 and C3 :
i Cj X1 X2 i Cj X1 X2 i Cj X1 X2
1 1 1.03 2.85 12 2 3.70 5.08 23 3 3.55 0.58
2 1 0.20 3.30 13 2 2.81 1.99 24 3 3.86 1.83
3 1 1.69 3.73 14 2 3.66 2.61 25 3 5.39 0.47
4 1 0.98 3.52 15 2 5.63 4.19 26 3 3.15 −0.18
5 1 0.98 5.15 16 2 3.35 3.64 27 3 4.93 1.91
6 1 3.47 6.56 17 2 2.97 3.55 28 3 3.87 2.61
7 1 3.94 4.68 18 2 3.16 2.92 29 3 4.09 1.43
8 1 1.55 5.99 19 3 3.00 0.98 30 3 3.80 2.11
9 1 1.15 3.60 20 3 3.09 1.99 31 3 2.79 2.10
10 2 1.20 2.27 21 3 5.45 0.60 32 3 4.49 2.71
11 2 3.66 5.49 22 3 3.59 −0.46 33 3 3.51 1.82
Discriminant analysis
The general case
Class C1 C2 C3
π̂j 0.273 0.273 0.455
µ̂j 1.666 4.376 3.349 3.527 3.904 1.367
1.525 0.929 1.326 0.752 0.694 −0.031
Σ̂j
0.929 1.663 0.752 1.484 −0.031 0.960
Discriminant analysis
The general case
Discriminant analysis
The general case
Discriminant analysis
The general case
Discriminant analysis
Class separation maximization
Discriminant analysis
Class separation maximization
We notice that:
J
1X
µ̂ = nj µ̂j
n
j=1
We define the between-class variance matrix as:
J
>
X
SB = nj (µ̂j − µ̂) (µ̂j − µ̂)
j=1
We can show that the total variance matrix can be decomposed into
the sum of the within-class and between-class variance matrices:
S = SW + SB
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1226 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
The method of scoring Unsupervised learning
Statistical methods Parametric supervised methods
Performance evaluation criteria and score consistency Non-parametric supervised methods
Discriminant analysis
Class separation maximization
β > SB β
J (β) = >
β SW β
Since the objective function is invariant if we rescale the vector β –
J (β 0 ) = J (β) if β 0 = cβ, we can impose that β > SW β = 1. It follows
that:
Discriminant analysis
Class separation maximization
∂ L (β; λ)
>
= 2SB β − 2λSW β = 0
∂β
It is remarkable that we obtain a generalized eigenvalue SB β = λSW β
or equivalently:
S−1
W SB β = λβ
Discriminant analysis
Class separation maximization
1/2
With the parametrization α = SB β, the first-order condition
becomes:
1/2 1/2
SB S−1
W B α = λα
S
−1/2
because β = SB α
We have a right regular eigenvalue problem
Let λk and vk be the k th eigenvalue and eigenvector of the symmetric
1/2 1/2
matrix SB S−1 S
W B
It is obvious that the optimal solution α? is the first eigenvector v1
corresponding to the largest eigenvalue λ1
−1/2
We conclude that the estimator is β̂ = SB v1 and the discriminant
−1/2
linear relationship is Y c = v1> SB X
Moreover, we have:
β̂ > SB β̂
λ1 = J β̂ =
β̂ > SW β̂
Thierry Roncalli Course 2023-2024 in Financial Risk Management 1229 / 1695
Electronic copy available at: https://ssrn.com/abstract=4574403
The method of scoring Unsupervised learning
Statistical methods Parametric supervised methods
Performance evaluation criteria and score consistency Non-parametric supervised methods
Discriminant analysis
Class separation maximization
Example #3
We consider a problem with two classes C1 and C2 , and two explanatory
variables (X1 , X2 ). Class C1 is composed of 7 observations: (1, 2), (1, 4),
(3, 6), (3, 3), (4, 2), (5, 6), (5, 5), whereas class C2 is composed of 6
observations: (1, 0), (2, 1), (4, 1), (3, 2), (6, 4) and (6, 5).
Discriminant analysis
Class separation maximization
Discriminant analysis
Class separation maximization
where µ̄ = (µ̄1 + µ̄2 ) /2, µ̄1 = β > µ̂1 and µ̄2 = β > µ̂2
Discriminant analysis
Class separation maximization
We have:
1−yi
Pr {Yi = yi } = piyi · (1 − pi )
where pi = Pr {Yi = 1 | Xi = xi }
We deduce that:
n
X
` (θ) = yi ln pi + (1 − yi ) ln (1 − pi )
i=1
Xn
xi> β xi> β
= yi ln F + (1 − yi ) ln 1 − F
i=1
∂ ` (β)
S (β) =
∂β
n >
X f xi β >
= >
>
yi − F xi β xi
i=1
F xi β F −xi β
where:
>
2
f xi β
xi> β
Hi = − yi − F ·
xi> β −xi> β
F F
2 !
f 0 xi> β f xi> β 1 − 2F xi> β
>
>
− 2 2
F xi β F −xi β >
F xi β F −xi β>
We also have:
n
X
xi> β
S (β) = yi − F xi
i=1
and:
n
X
xi> β xi xi>
H (β) = − f ·
i=1
H (Y | X ) = EX [H (Y | X = x)]
Xn Xn pi,j
= − pi,j ln
i=1 j=1 pi
= H (X , Y ) − H (X )
I (X , Y ) = H (Y ) + H (X ) − H (X , Y )
Xn Xn pi,j
= pi,j ln
i=1 j=1 pi pj
H (X ) = H (Y ) = 1.792 H (X ) = H (Y ) = 1.792
H (X , Y ) = 3.584 H (X , Y ) = 1.792
I (X , Y ) = 0 I (X , Y ) = 1.792
H (X ) = H (Y ) = 1.683 H (X ) = 1.658
H (X , Y ) = 2.774 H (Y ) = 1.328
I (X , Y ) = 0.593 I (X , Y ) = 0.750
Application to scoring
S ≤ 0 ⇒ S? = 0
S > 0 ⇒ S? = 1
Application to scoring
Y =0 Y =1
S? = 0 n0,0 n0,1
S? = 1 n1,0 n1,1
Application to scoring
y1 y2 y3 y4 y5 y1 y2 y3 y4 y5
s1 10 9 s1 7 10
s2 7 9 s2 10 8
s3 3 7 2 s3 5 4 3
s4 2 10 4 5 s4 3 10 6 4
s5 10 2 s5 2 5 8
s6 3 4 13 s6 5 5 5
Graphical methods
Graphical methods
Performance curve
Selection curve
We have:
Pr {S ≥ s,Y = 0}
Pr {S ≥ s | Y = 0} =
Pr {Y = 0}
Pr {S ≥ s,Y = 0}
= Pr {S ≥ s} ·
Pr {S ≥ s} Pr {Y = 0}
Pr {Y = 0 | S ≥ s}
= Pr {S ≥ s} ·
Pr {Y = 0}
S (x) = xP (x)
Discriminant curve
where:
gy (s) = Pr {S ≥ s | Y = y }
It represents the proportion of good risks in the selected population
with respect to the proportion of bad risks in the selected population
The score is said to be discriminant if the curve y = D (x) is located
above the bisecting line y = x
Some properties
cov (f (Y ) , g (S) | S ≥ s) ≥ 0
Some properties
Some properties
Some properties
Some properties
Some properties
Kolmogorov-Smirnov test
F0 (s) = Pr {S ≤ s | Y = 0}
and:
F1 (s) = Pr {S ≤ s | Y = 1}
The score S is relevant if we have the stochastic dominance order
F0 F1
In this case, the score quality is measured by the Kolmogorov-Smirnov
statistic:
KS = max |F0 (s) − F1 (s)|
s
Kolmogorov-Smirnov test
Gini coefficient
The Lorenz curve
Gini coefficient
The Lorenz curve
Gini coefficient
Definition
Gini coefficient
Application to credit scoring
The selection curve measures the capacity of the score for not
selecting bad risks
We could also build the Lorenz curve that measures the capacity of
the score for selecting good risks:
x (s) = Pr {S ≥ s} = 1 − F (s)
y (s) = Pr {S ≥ s | Y = 1} = 1 − F1 (s)
Gini coefficient
Application to credit scoring
Gini coefficient
Application to credit scoring
Y =0 Y =1
S <s n0,0 n0,1
S ≥s n1,0 n1,1
n0 = n0,0 + n1,0 n1 = n0,1 + n1,1
We have
TP
True Positive Rate TPR =
TP + FN
FN
False Negative Rate FNR = = 1 − TPR
FN + TP
TN
True Negative Rate TNR =
TN + FP
FP
False Positive Rate FPR = = 1 − TNR
FP + TN
The true positive rate (TPR) is also known as the sensitivity or the
recall
It measures the proportion of real good risks that are correctly
predicted good risk
TP
PPV =
TP + FP
It measures the proportion of predicted good risks that are correctly
real good risk
The accuracy considers the classification of both negatives and
positives:
TP + TN TP + TN
ACC = =
P+N TP + FN + TN + FP
The F1 score is the harmonic mean of precision and sensitivity:
2 2 · PPV · TPR
F1 = =
1/precision + 1/sensitivity PPV + TPR
Table: Confusion matrix of three scoring systems and three cut-off values s
Exercises
References