QRM Course
QRM Course
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Risk Measurement
Scenario Analysis and Stress Testing
Value-at-Risk
Expected Shortfall (ES)
Other Considerations
2 (Section 0)
Risk Factors and Loss Distributions
Notation (to be used throughout the course):
∆ a fixed period of time such as 1 day or 1 week.
Let Vt be the value of a portfolio at time t∆.
So portfolio loss between t∆ and (t + 1)∆ is given by
Lt+1 := − (Vt+1 − Vt )
Zt := (Zt,1 , . . . , Zt,d )
so that
Vt = f (t, Zt ).
for some function f : R+ × Rd → R.
3 (Section 1)
Risk Factors and Loss Distributions
e.g. In a stock portfolio might take the stock prices or some function of the
stock prices as our risk factors.
e.g. In an options portfolio Zt might contain stock factors together with implied
volatility and interest rate factors.
Let Xt := Zt − Zt−1 denote the change in values of the risk factors between
times t and t + 1.
Then have
Given the value of Zt , the distribution of Lt+1 depends only on the distribution
of Xt+1 .
4 (Section 1)
Linear Approximations to the Loss Function
Assuming f (·, ·) is differentiable, can use a first order Taylor expansion to
approximate Lt+1 :
d
!
X
L̂t+1 (Xt+1 ) := − ft (t, Zt )∆ + fzi (t, Zt ) Xt+1,i (1)
i=1
Important to note, however, that if Xt+1 is likely to be very large then Taylor
approximations can fail.
5 (Section 1)
Conditional and Unconditional Loss Distributions
Important to distinguish between the conditional and unconditional loss
distributions.
Consider the series Xt of risk factor changes and assume that they form a
stationary time series with stationary distribution FX .
6 (Section 1)
Conditional and Unconditional Loss Distributions
If the Xt ’s are IID then the conditional and unconditional distributions coincide.
For long time horizons, e.g. ∆ = 6 months, we might be more inclined to use the
unconditional loss distribution.
However, for short horizons, e.g. 1 day or 10 days, then the conditional loss
distribution is clearly the appropriate distribution
- true in particular in times of high market volatility when the unconditional
distribution would bear little resemblance to the true conditional distribution.
7 (Section 1)
Example: A Stock Portfolio
Consider a portfolio of d stocks with St,i denoting time t price of the i th stock
and λi denoting number of units of i th stock.
St
log K+ (r + σ 2 /2)(T − t)
where d1 = √
σ T −t
√
d2 = d1 − σ T − t
and where:
Φ(·) is the standard normal distribution CDF
St = time t price of underlying security
r = continuously compounded risk-free interest rate.
In practice use an implied volatility, σ(K , T , t), that depends on strike, maturity
and current time, t.
9 (Section 1)
Example: An Options Portfolio
Consider a portfolio of European options all on the same underlying security.
Note that by put-call parity we can assume that all options are call options.
For derivatives portfolios, the linear approximation based on 1st order Greeks is
often inadequate
- 2nd order approximations involving gamma, volga and vanna might then be
used – but see earlier warning regarding use of Taylor approximations.
10 (Section 1)
Risk Factors in the Options Portfolio
Can again take log stock prices as risk factors but not clear how to handle the
implied volatilities.
2. Let each σ(K , T , t) be a separate factor. Not good for two reasons:
(a) It introduces a large number of factors.
(b) Implied volatilities are not free to move independently since no-arbitrage
assumption imposes strong restrictions on how volatility surface may move.
11 (Section 1)
Risk Factors in the Options Portfolio
3. In light of previous point, it may be a good idea to parameterize the
volatility surface with just a few parameters
- and assume that only those parameters can move from one period to the next
- parameterization should be so that no-arbitrage restrictions are easy to
enforce.
12 (Section 1)
Example: A Bond Portfolio
Consider a portfolio containing quantities of d different default-free zero-coupon
bonds.
st,Ti is the continuously compounded spot interest rate for maturity Ti so that
There are λi units of i th bond in the portfolio so total portfolio value given by
d
X
Vt = λi exp(−st,Ti (Ti − t)).
i=1
14 (Section 1)
Example: A Bond Portfolio
Assume now only parallel changes in the spot rate curve are possible
- while unrealistic, a common assumption in practice
- this is the assumption behind the use of duration and convexity.
15 (Section 1)
Approaches to Risk Measurement
1. Notional Amount Approach.
2. Factor Sensitivity Measures.
3. Scenario Approach.
4. Measures based on loss distribution, e.g. Value-at-Risk (VaR) or Conditional
Value-at-Risk (CVaR).
16 (Section 2)
An Example of Factor Sensitivity Measures: the Greeks
Scenario analysis for derivatives portfolios is often combined with the Greeks to
understand the riskiness of a portfolio
- and sometimes to perform a P&L attribution.
Consider now a single option in the portfolio with price C (S, σ, . . .).
Note approximation only holds for “small” moves in underlying risk factors
- a very important observation that is lost on many people!
17 (Section 2)
Delta-Gamma-Vega Approximations to Option Prices
A simple application of Taylor’s Theorem yields
∂C 1 ∂2C ∂C
C (S + ∆S, σ + ∆σ) ≈ C (S, σ) + ∆S + (∆S)2 + ∆σ
∂S 2 ∂S 2 ∂σ
1
= C (S, σ) + ∆S δ + (∆S)2 Γ + ∆σ vega.
2
Therefore obtain
Γ
P&L ≈ δ∆S + (∆S)2 + vega ∆σ
2
= delta P&L + gamma P&L + vega P&L .
When ∆σ = 0, obtain the well-known delta-gamma approximation
- often used, for example, in historical Value-at-Risk (VaR) calculations.
19 (Section 2)
Scenario Analysis and Stress Testing
In general we want to stress the risk factors in our portfolio.
20 (Section 2)
Value-at-Risk
Value-at-Risk (VaR) the most widely (mis-)used risk measure in the financial
industry.
Despite the many weaknesses of VaR, financial institutions are required to use it
under the Basel II capital-adequacy framework.
Will assume that horizon ∆ has been fixed so that L represents portfolio loss
over time interval ∆.
22 (Section 2)
Value-at-Risk
Definition: Let F : R → [0, 1] be an arbitrary CDF. Then for α ∈ (0, 1) the
α-quantile of F is defined by
qα (F ) := inf{x ∈ R : F (x) ≥ α}.
Definition: Let α ∈ (0, 1) be some fixed confidence level. Then the VaR of the
portfolio loss at the confidence interval, α, is given by VaRα := qα (L), the
α-quantile of the loss distribution.
23 (Section 2)
VaR for the Normal Distributions
Because the normal CDF is both continuous and strictly increasing, it is
straightforward to calculate VaRα .
24 (Section 2)
VaR for the t Distributions
The t CDF also continuous and strictly increasing so again straightforward to
calculate VaRα .
25 (Section 2)
Weaknesses of VaR
1. VaR attempts to describe the entire loss distribution with just a single
number!
- so significant information is lost
- this criticism applies to all scalar risk measures
- one way around it is to report VaRα for several values of α.
26 (Section 2)
(Non-) Sub-Additivity of VaR
e.g. Let L = L1 + L2 be the total loss associated with two portfolios, each with
respective losses, L1 and L2 .
Then
qα (FL ) > qα (FL1 ) + qα (FL2 ) is possible!
Will discuss sub-additivity property when we study coherent risk measures later in
course.
27 (Section 2)
Advantages of VaR
VaR is generally “easier” to estimate:
True of quantile estimation in general since quantiles are not very sensitive
to outliers.
- not true of other risk measures such as Expected Shortfall / CVaR
Even then, it becomes progressively more difficult to estimate VaRα as
α→1
- may be able to use Extreme Value Theory (EVT) in these circumstances.
But VaR easier to estimate only if we have correctly specified the appropriate
probability model
- often an unjustifiable assumption!
28 (Section 2)
Expected Shortfall (ES)
Definition: For a portfolio loss, L, satisfying E[|L|] < ∞ the expected shortfall
at confidence level α ∈ (0, 1) is given by
Z 1
1
ESα := qu (FL ) du. (4)
1−α α
29 (Section 2)
Expected Shortfall (ES)
A more well known representation of ESα (L) holds when FL is continuous:
E [L; L ≥ qα (L)]
ESα :=
1−α
= E [L | L ≥ VaRα ] . (5)
30 (Section 2)
Example: Expected Shortfall for a Normal Distribution
Can use (5) to compute expected shortfall of an N(µ, σ 2 ) random variable.
We find
φ (Φ−1 (α))
ESα = µ + σ (6)
1−α
where φ(·) is the PDF of the standard normal distribution.
31 (Section 2)
Example: Expected Shortfall for a t Distribution
Let L ∼ t(ν, µ, σ 2 ) so that L̃ := (L − µ)/σ has a standard t distribution with
ν > 2 dof.
where tν (·) and gν (·) are the CDF and PDF, respectively, of the standard t
distribution with ν dof.
Remark: The t distribution is a much better model of stock (and other asset)
returns than the normal model. In empirical studies, values of ν around 5 or 6 are
often found to fit best.
32 (Section 2)
The Shortfall-to-Quantile Ratio
Can compare VaRα and ESα by considering their ratio as α → 1.
Not too difficult to see that in the case of the normal distribution
ESα
→ 1 as α → 1.
VaRα
However, in the case of the t distribution with ν > 1 dof we have
ESα ν
→ > 1 as α → 1.
VaRα ν−1
33 (Section 2)
Standard Techniques for Risk Measurement
1. Historical simulation.
2. Monte-Carlo simulation.
3. Variance-covariance approach.
34 (Section 3)
Historical Simulation
Instead of using a probabilistic model to estimate distribution of Lt+1 (Xt+1 ), we
could estimate the distribution using a historical simulation.
In particular, if we know the values of Xt−i+1 for i = 1, . . . , n, then can use this
data to create a set of historical losses:
- so L̃i is the portfolio loss that would occur if the risk factor returns on date
t − i + 1 were to recur.
35 (Section 3)
Historical Simulation
Suppose the L̃i ’s are ordered by
L̃n,n ≤ · · · ≤ L̃1,n .
Then an estimator of VaRα (Lt+1 ) is L̃[n(1−α)],n where [n(1 − α)] is the largest
integer not exceeding n(1 − α).
f α = L̃[n(1−α)],n + · · · + L̃1,n .
ES
[n(1 − α)]
36 (Section 3)
Monte-Carlo Simulation
Monte-Carlo approach similar to historical simulation approach.
But now use some parametric distribution for the change in risk factors to
generate sample portfolio losses.
37 (Section 3)
The Variance-Covariance Approach
In the variance-covariance approach assume that Xt+1 has a multivariate normal
distribution so that
Xt+1 ∼ MVN (µ, Σ) .
Also assume the linear approximation
d
!
X
L̂t+1 (Xt+1 ) := − ft (t, Zt )∆ + fzi (t, Zt ) Xt+1,i
i=1
Therefore obtain
L̂t+1 (Xt+1 ) ∼ N −ct − bt > µ, bt > Σbt .
But it has several weaknesses: risk factor distributions are often fat- or
heavy-tailed but the normal distribution is light-tailed
- this is easy to overcome as there are other multivariate distributions that are
also closed under linear operations.
e.g. If Xt+1 has a multivariate t distribution so that
Xt+1 ∼ t (ν, µ, Σ)
then
L̂t+1 (Xt+1 ) ∼ t ν, −ct − bt > µ, bt > Σbt .
A more serious problem is that the linear approximation will often not work well
- particularly true for portfolios of derivative securities.
39 (Section 3)
Evaluating Risk Measurement Techniques
Important for any risk manger to constantly evaluate the reported risk measures.
e.g. If daily 95% VaR is reported then should see daily losses exceeding the
reported VaR approximately 95% of the time.
where VaRi and Li are the reported VaR and realized loss for period i.
Can use standard statistical tests to see if this is indeed the case.
41 (Section 4)
IEOR E4602: Quantitative Risk Management
Multivariate Distributions
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Joint and Marginal CDFs
Let X = (X1 , . . . Xn ) is an n-dimensional vector of random variables.
Definition (Joint CDF): For all x = (x1 , . . . , xn )> ∈ Rn , the joint cumulative
distribution function (CDF) of X satisfies
2 (Section 1)
Conditional CDFs
If X has a probability density function (PDF) then
Z x1 Z xn
FX (x1 , . . . , xn ) = ··· f (u1 , . . . , un ) du1 . . . dun .
−∞ −∞
Important properties of Σ:
1. It is symmetric so that Σ> = Σ
2. Diagonal elements satisfy Σi,i ≥ 0
3. It is positive semi-definite so that x > Σx ≥ 0 for all x ∈ Rn .
The correlation matrix, ρ(X), has (i, j)th element ρij := Corr(Xi , Xj )
- also symmetric, positive semi-definite
- has 1’s along the diagonal.
4 (Section 1)
Linear Combinations and Characteristic Functions
For any matrix A ∈ Rk×n and vector a ∈ Rk have
5 (Section 1)
The Multivariate Normal Distribution
If X multivariate normal with mean vector µ and covariance matrix Σ then write
PDF of X given by
1 1 > −1
f (x) = e − 2 (x−µ) Σ (x−µ) (4)
(2π)n/2 |Σ|1/2
6 (Section 2)
The Multivariate Normal Distribution
Let X1 = (X1 , . . . , Xk )> and X2 = (Xk+1 , . . . , Xn )> be a partition of X with
µ1 Σ11 Σ12
µ = and Σ = .
µ2 Σ21 Σ22
X2 | X1 = x1 ∼ MN(µ2.1 , Σ2.1 )
where
7 (Section 2)
Generating MN Distributed Random Vectors
Suppose we wish to generate X = (X1 , . . . , Xn ) where X ∼ MNn (0, Σ)
- it is then easy to handle the case where E[X] 6= 0.
If C an (n × m) matrix then
8 (Section 2)
The Cholesky Decomposition of a Symmetric PD Matrix
Any symmetric positive-definite matrix, M, may be written as
M = U> DU
where:
U is an upper triangular matrix
D is a diagonal matrix with positive diagonal elements.
Σ = U> DU
√ √
= (U> D)( DU)
√ √
= ( DU)> ( DU).
√
C= DU therefore satisfies C> C = Σ
- C is called the Cholesky Decomposition of Σ.
9 (Section 2)
The Cholesky Decomposition in Matlab
Easy to compute the Cholesky decomposition of a symmetric positive-definite
matrix in Matlab using the chol command
- so also easy to simulate multivariate normal random vectors in Matlab.
>> C = chol(Sigma);
>> Z = randn(3,1000000);
>> X = C’*Z;
>> cov(X’)
ans =
0.9972 0.4969 0.4988
0.4969 1.9999 0.2998
0.4988 0.2998 1.4971
10 (Section 2)
The Cholesky Decomposition in Matlab and R
Must be very careful in Matlab and R to pre-multiply Z by C> and not C.
11 (Section 2)
Normal-Mixture Models
Normal-mixture models are a class of models generated by introducing
randomness into the covariance matrix and / or the mean vector:
where
(i) Z ∼ MNk (0, Ik )
(ii) W ≥ 0 is a scalar random variable independent of Z and
(iii) A ∈ Rn×k and µ ∈ Rn are a matrix and vector of constants, respectively.
12 (Section 3)
Normal-Mixture Models
If we condition on W , then X is multivariate normally distributed
- this observation also leads to an obvious simulation algorithm for generating
samples of X.
We call µ and Σ the location vector and dispersion matrix of the distribution.
Can create a two regime model by setting w2 large relative to w1 and choosing p
large
- then W = w1 can correspond to an ordinary regime
- and W = w2 corresponds to a stress regime.
15 (Section 3)
E.G. The Multivariate t Distribution
The multivariate t distribution with ν degrees-of-freedom (dof) is obtained when
we take W to have an inverse gamma distribution.
16 (Section 3)
Characteristic Function of a Normal Variance Mixture
We have
h > i h h > ii
φX (s) = E e is X = E E e is X |W
h > 1 >
i
= E e is µ − 2 Ws Σs
> 1 >
= e is µ Ŵ s Σs
2
17 (Section 3)
Affine Transformations of Normal Variance Mixtures
Proposition: If X ∼ Mn µ, Σ, Ŵ and Y = BX + b for B ∈ Rk×n and
b ∈ Rk then Y ∼ Mk Bµ + b, BΣB> , Ŵ .
18 (Section 3)
Normal Mean-Variance Mixtures
Could also define normal mixture distributions where µ = m(W ).
19 (Section 3)
Spherical Distributions
Recall that a linear transformation U ∈ Rn×n is orthogonal if
UU> = U> U = In .
UX ∼ X (5)
20 (Section 4)
Spherical Distributions
Theorem: The following are equivalent:
1. X is spherical.
2. There exists a function ψ(·) such that for all s ∈ Rn ,
3. For all a ∈ Rn
a> X ∼ ||a|| X1
where ||a||2 = a> a = a12 + · · · + an2 .
21 (Section 4)
Example: Multivariate Normal
Let X ∼ MNn (0, In ). Then
1 >
φX (s) = e − 2 s s
.
22 (Section 4)
Example: Normal Variance Mixtures
Suppose X ∼ Mn 0, In , Ŵ
- so X has a standardized, uncorrelated normal variance mixture.
Note there are spherical distributions that are not normal variance mixture
distributions.
23 (Section 4)
Spherical Distributions
Theorem: The random vector X = (X1 , . . . , Xn ) has a spherical distribution if
and only if it has the representation
X ∼ RS
where:
1. S is uniformly distributed on the unit sphere: S n−1 := {s ∈ Rn : s> s = 1}
and
2. R ≥ 0 is a random variable independent of S.
24 (Section 4)
Elliptical Distributions
Definition: The random vector X = (X1 , . . . Xn ) has an elliptical distribution if
X ∼ µ + AY
25 (Section 4)
Characteristic Function of Elliptical Distributions
Easy to calculate characteristic function of an elliptical distribution:
h > i
φX (s) = E e is (µ+A Y)
>
h > >
i
= e is µ E e i(A s) Y
>
= e is µ ψ s> Σs
26 (Section 4)
IEOR E4602: Quantitative Risk Management
Dimension Reduction Techniques
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Factor Models
Calibration Approaches
Factor Models in Risk Management
2 (Section 0)
Principal Components Analysis
Let Y = (Y1 . . . Yn )> denote an n-dimensional random vector with
variance-covariance matrix, Σ.
(i) P1 explains the largest percentage of the total variability in the system
and
(ii) each Pi explains the largest percentage of the total variability in the
system that has not already been explained by P1 , . . . , Pi−1 .
4 (Section 1)
Principal Components Analysis
In practice common to apply PCA to normalized random variables so that
E[Yi ] = 0 and Var(Yi ) = 1
- can normalize by subtracting the means from the original random variables
and then dividing by their standard deviations.
5 (Section 1)
Spectral Decomposition
The spectral decomposition states that any symmetric matrix, A ∈ Rn×n , can be
written as
A = Γ ∆ Γ> (1)
where:
(i) ∆ is a diagonal matrix, diag(λ1 , . . . , λn ), of the eigen values of A
- without loss of generality ordered so that λ1 ≥ λ2 ≥ · · · ≥ λn .
6 (Section 1)
Spectral Decomposition
Since Σ is symmetric can take A = Σ in (1).
P = Γ> Y. (2)
Note that:
(a) E[P] = 0 since E[Y] = 0
and
(b) Cov(P) = Γ> Σ Γ = Γ> (Γ ∆ Γ> ) Γ = ∆
7 (Section 1)
Factor Loadings
The matrix Γ> is called the matrix of factor loadings.
8 (Section 1)
Explaining The Total Variance
Can measure the ability of the first few principal components to explain the total
variability in the system:
n
X n
X n
X
Var(Pi ) = λi = trace(Σ) = Var(Yi ). (4)
i=1 i=1 i=1
Pn Pn
If we take i=1 Var(Pi ) = i=1 Var(Yi ) to measure the total variability then by
(4) can interpret
Pk
λi
Pi=1
n
i=1 λi
as the percentage of total variability explained by first k principal components.
9 (Section 1)
Explaining The Total Variance
Can also show that first principal component, P1 = γ1> Y, satisfies
And that each successive principal component, Pi = γi> Y, satisfies the same
optimization problem but with the added constraint that it be orthogonal, i.e.
uncorrelated, to P1 , . . . , Pi−1 .
10 (Section 1)
Financial Applications of PCA
In financial applications, often the case that just two or three principal
components are sufficient to explain anywhere from 60% to 95% or more of the
total variability
- and often possible to interpret the first two or three components.
e.g. If Y represents (normalized) changes in the spot interest rate for n different
maturities, then:
1. 1st principal component can usually be interpreted as the (approximate)
change in overall level of the yield curve
2. 2nd component represents change in slope of the curve
3. 3rd component represents change in curvature of the curve.
Important (why?) that these observations are from a stationary time series
e.g. asset returns or yield changes
But not price levels which are generally non-stationary.
Xtj − µj
Ytj = for t = 1, . . . , m and j = 1, . . . , n.
σj
12 (Section 1)
Empirical PCA
Let Σ be the sample variance-covariance matrix so that
m
1 X
Σ = Yt Yt> .
m t=1
From (3), see that original data obtained from principal components as
Xt = diag(σ1 , . . . , σn ) Yt + µ
= diag(σ1 , . . . , σn ) Γ Pt + µ (5)
13 (Section 1)
Applications of PCA: Building Factor Models
If first k principal components explain sufficiently large amount of total variability
then may partition the n × n matrix Γ according to Γ = [Γ1 Γ2 ] where Γ1 is
n × k and Γ2 is n × (n − k).
where
(2)
t+1 := diag(σ1 , . . . , σn ) Γ2 Pt+1 (7)
represents an error term.
Can interpret (6) as a k-factor model for the changes in risk factors, X
- but then take t+1 as an uncorrelated noise vector and ignore (7).
14 (Section 1)
Applications of PCA: Scenario Generation
Easy to generate scenarios using PCA.
Suppose today is date t and we want to generate scenarios over the period
[t, t + 1].
Can then use (6) to apply stresses to first few principal components, either singly
or jointly, to generate loss scenarios.
Moreover, know that Var(Pi ) = λi so can easily control severity of the stresses.
15 (Section 1)
Applications of PCA: Estimating VaR and CVaR
Can use the k-factor model and Monte-Carlo to simulate portfolio returns.
16 (Section 1)
E.G. An Analysis of (Risk-Free) Yield Curves
We use daily yields on U.S. Treasury bonds at 11 maturities: T = 1, 3, and
6 months and 1, 2, 3, 5, 7, 10, 20, and 30 years.
Time period is January 2, 1990, to October 31, 2008.
We use PCA to study how the curves change from day to day.
To analyze daily changes in yields, all 11 time series were differenced.
Daily yields were missing from some values of T for various reasons
- e.g. the 20-year constant maturity series was discontinued at the end of 1986
and reinstated on October 1, 1993.
All days with missing values of the differenced data were omitted.
- this left 819 days of data starting on July 31, 2001, when the one-month
series started and ending on October 31, 2008, with the exclusion of the
period February 19, 2002 to February 2, 2006 when the 30-year Treasury was
discontinued.
The covariance matrix rather than the correlation matrix was used
- which is fine here because the variables are comparable and in the same units.
17 (Section 1)
(a) (b)
0.04
● ●
●● ●● ● ● ● ● ●
5
●
● ●
●
●
Variances
● ●
4
● ●
●
●●
Yield
●
0.02
3
●
2
●
● 07/31/01
●
1 ●
07/02/07
● 10/31/08
0.00
●
0
0 5 10 15 20 25 30
(c) (d)
● ●
0.5
0.5
● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ● ●
PC
0.0
PC
0.0
● ●
● ●
● ●
● PC 1 ● PC 1
● ●
−0.5
−0.5
●
● PC 2 ● ● PC 2
● PC 3 ● PC 3
T T
Figure 18.1 from Ruppert and Matteson: (a) Treasury yields on three dates. (b) Scree plot for the changes
in Treasury yields. Note that the first three principal components have most of the variation, and the first five
have virtually all of it. (c) The first three eigenvectors for changes in the Treasury yields. (d) The first three
eigenvectors for changes in the Treasury yields in the range 0 ≤ T ≤ 3.
(a) (b)
●
●
4.5
4.5
● ●
● ●
● ● ●
● ● ●
● ●
4.0
4.0
● ● ●
● ● ●
Yield ●● ● ●
Yield
● ● ●
● ● ● ●
● ● ● ●
●
3.5
3.5
● ● ●● ●
● ● ● ●
● mean mean
● mean + PC1 ● mean + PC2
3.0
3.0
●
mean − PC1 mean − PC2
●
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
T T
(c) (d)
●
4.5
● ● PC 4
● ●
0.4
●
● ● ●●
PC 5
● ● ● ●
4.0
●● ● ● ●
●
● ●●
Yield
● ● ●
PC
0.0
●
● ●
● ● ●
●
3.5
● ● ●
● ● ●
mean ●
●
● ●
mean + PC3 ●
3.0
−0.6
● ●
● mean − PC3
0 1 2 3 4 5 6 7 0 5 10 15 20 25 30
T T
Figure 18.2 from Ruppert and Matteson: (a) The mean yield curve plus and minus the first eigenvector. (b)
The mean yield curve plus and minus the second eigenvector. (c) The mean yield curve plus and minus the
third eigenvector. (d) The fourth and fifth eigenvectors for changes in the Treasury yields.
E.G. An Analysis of (Risk-Free) Yield Curves
Would actually be interested in the behavior of the yield changes over time.
But time series analysis based on the changes in the 11 yields would be
problematic.
- better approach would be to use first three principal components.
Their time series and auto- and cross-correlation plots are shown in Figs.
18.3 and 18.4, respectively.
Notice that lag-0 cross-correlations are zero; this is not a coincidence! Why?
Cross-correlations at nonzero lags are not zero, but in this example they are
small
- practical implication is that parallel shifts, changes in slopes, and changes in
convexity are nearly uncorrelated and could be analyzed separately.
The time series plots show substantial volatility clustering which could be
modeled using GARCH models.
20 (Section 1)
1.5
PC 1 PC 2 PC 3
0.6
● ● ●
0.6
● ● ●
●
●
●
1.0
● ●
0.4
●
●●
●●● ●
● ● ●●● ●●
● ●● ● ●● ● ● ●● ● ●
0.5
● ●●●
0.2
● ● ● ● ● ●●
● ●●● ●● ● ●● ●● ● ● ●
●●● ●
● ●
●●●●● ● ● ●
●● ● ● ●● ●●
0.2
● ● ● ● ●●● ● ●● ● ● ●
● ● ●● ● ●●● ● ●● ● ● ●
●●●
● ● ●
● ● ●
● ● ● ●
● ● ●● ●
●● ●●●
●
●●
●
● ●●●
● ●
●●● ●
●
●● ● ●● ● ●● ●●●●● ●
●●●● ●● ●●● ● ●● ● ● ● ● ● ● ●●
●●●●
● ●●
● ● ● ●
●●
●
●●●
●●● ●●●
● ●●●●● ●● ● ● ●●●● ●●●●● ●
●●●
● ●
● ●● ●
●●
●● ●
● ●
●
● ●● ●●●
●●●
●●●●
●●● ●
●●●●● ●
●●●
●●●●
●●●●●
●●●
● ●
●●
● ●● ●
●● ●
●●
●●●●● ●
● ●
●
●● ● ●● ● ● ● ● ● ● ●● ●
● ●
●●
●●●
●● ●●
● ●●
●●●●●●
●●●●
●
●●
●●
●●
●●● ●
●●●
●●● ●●●●●●●●●
● ●●●●●
●●●●
●●
●●
●
●●
● ●
●
●
●●
●
●●●●
●
●
●●●●
●
●●
● ●
●
●
●●
●
●
●●
●●●
● ●●●
●●
●●
● ●
●●●●
●
● ●●●
●
●
●●
●
●● ●●
● ●●
●●● ●
●●●●●
●●●●
●● ●
●
●●●● ● ● ● ●
● ●
●
●
●●●
● ●●● ●●
●●●
●●●●
● ●●
● ●●● ●●
● ●● ●●●● ●●
●● ● ●
●●● ● ●●● ●●● ●●●
● ●
● ● ●
●●
● ● ●● ●●●
● ●● ●●●● ● ●● ●●● ●
●● ● ● ● ● ●● ● ●
−0.5 0.0
● ●●● ●
● ● ● ● ●● ● ● ●●● ●● ●●● ● ●
● ● ● ● ● ● ●●● ● ●● ● ●●
●
●●●
●●
●●
● ●
●●
●
●
●●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●●● ●●●●●
● ●
●●
●●
● ●●
●●
●
●●●●●
●
●
● ●
●
●●●● ● ●●● ● ●
●
●●
●●
●●
●●●
●●
●●
●●
●
●
●
●●
●●
●●
●
●●●●
●
●●
● ●
●
●
●●●
●●
● ●
●●
●
● ●●●●
●
●●
●
●●
●●●●
●●●●● ●●
●
● ●
● ●
●●
●
●● ●● ● ●
●●●● ●● ●●
●● ●●●●● ●●●●
●
●
●
●
●
●●●●
●
●●
●
●
●●●
● ●●
●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●●●
● ●●●●● ● ●
●●
●●
●●
● ●●
●
●●
●●●
●●
● ●●
●●
●
●
●
●
●
●
●
●
●●
● ●●
●● ●
● ●
●
●●●
●●
●
●
●
● ●
●
●●
●
●●●
●●
●
●●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
● ●●
●●●
●●
●
●●
●●● ●● ●●
●●●●●
● ●●●
●
●
●●
●
●● ●
● ●●
●
●
●
●●
●
●
●●●
● ●● ●
●
● ●●●●
● ●●●● ●
●●●●●
●●●●● ●
●
●
●●
●
●●●●●●
●●
●●●●
●●
●
●
●
●
●
● ●
●●●●●●●
●● ●●●●
●● ● ● ● ● ●● ● ● ●
● ●●● ●
● ●● ● ● ●● ●● ● ●● ● ● ●●
● ●●
● ●
0.0
● ●● ●●
● ●●● ●●●●●● ● ●
●●●● ●●
●● ●
● ●
●● ●●●● ●● ●●●● ●●● ●● ● ● ●
● ● ● ●●
●
●● ● ●
●●●● ●
● ●●●●●●●
●●
●
● ●
●●● ● ●
●● ●
●●●
● ●
●● ●● ●
●●● ●● ●
● ●● ●●●●●●●●● ●●●●
●●●
●
●● ●
●●●
● ● ● ●● ● ●● ●●
●● ● ●●● ●● ●
● ● ●●●●●●
● ●● ● ●
●●
●
●●●
●
●●
●●●●●
●●●
●
●●
●
●●
●●
●
●●
●●●
●●
●
●
●●●●
● ●
●●
●
●●
●●
●●
●
●●●
●●
●
●●
●
●●
●
●●
●●●
●●●● ●●● ●●
●●●
●●●
●●●
●●●
● ●
●●
●●●
●● ●● ●● ● ● ●
●●●
● ●
● ● ●● ●● ●● ●● ●●● ●● ● ●● ● ●● ●● ●●● ●● ●
● ●●
●●
●●
●
●●
● ●●●●
●●●
●●
● ●
●●
●●
● ● ●
●
●●
●●●●
●● ●●● ●●●●●●
−0.2
●● ●●●● ●●
● ●●●●
● ●
● ●●●● ● ● ●
●● ●
●
●●
●●
●●
●
●●●●●●
●
●●
●
●●
●
●●●
●
●●●
●●
●●●
●●
●●
●
●●
●
●●
●
●●
●
●●
●● ●
●
●●
●
● ●
●●
●
●●●
● ●●
●
●●●●●●
●● ●●
●●●●
●●
●
●●
●●●
●●
●
●●
● ●
●●●● ● ● ●● ● ● ●● ● ●●● ● ●
● ● ● ● ●●
● ● ●●●● ●
●● ●
●
●● ●●
●●
●
●●
●
●●
●
●●●●●
●● ●●
●●● ●
●●●
●
●●●●
●●
● ●● ●●●
●●
● ●
●●
● ● ● ●
●●●●
●●●●
● ● ●●● ●● ●● ● ●● ● ● ● ●●● ● ●●● ●●●● ● ● ●●●● ● ●●●●● ● ●●●● ●
●●●● ●●● ●●
● ● ●
● ●
●
● ● ●● ●● ● ● ●●
● ●●●●●
● ●●●●●
●
●●●●●
● ●
● ● ●●● ● ●●
● ● ●● ● ●● ● ● ●● ● ● ●
●● ●● ●● ● ●●
●●● ●● ●●
● ● ● ●● ● ● ●
−0.4 −0.2
● ● ● ● ●
● ●
● ●● ● ●●
●
● ● ● ●
● ●●
● ●
−0.6
●
● ●
●
●
−1.5
● ● ●
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
Figure 18.3 from Ruppert and Matteson: Time series plots of the first three principal
components of the Treasury yields. There are 819 days of data, but they are not consecutive
because of missing data; see text.
PC1 PC1 & PC2 PC1 & PC3
1.0
1.0
1.0
0.6
0.6
0.6
−0.2 0.2
−0.2 0.2
−0.2 0.2
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
lag lag lag
1.0
1.0
0.6
0.6
0.6
−0.2 0.2
−0.2 0.2
−0.2 0.2
−20 −10 0 0 5 10 15 20 0 5 10 15 20
lag lag lag
1.0
1.0
0.6
0.6
0.6
−0.2 0.2
−0.2 0.2
−0.2 0.2
−20 −10 0 −20 −10 0 0 5 10 15 20
lag lag lag
Figure 18.4 from Ruppert and Matteson: Sample auto- and cross-correlations of the first three principal
components of the Treasury yields.
Applications of PCA: Portfolio Immunization
Also possible to hedge or immunize a portfolio against moves in the principal
components.
Let Vt = time t value of portfolio and assume our hedge will consist of positions,
φti , in the securities with time t prices, Sti , for i = 1, . . . , k.
23 (Section 1)
Applications of PCA: Portfolio Immunization
n k
!
∗
X ∂Vt X ∂Sti
∆Vt+1 ≈ + φti ∆Z(t+1)j
j=1
∂Ztj i=1
∂Ztj
n k
!
X ∂Vt X ∂Sti
= + φti X(t+1)j
j=1
∂Ztj i=1
∂Ztj
n k
! k
!
X ∂Vt X ∂Sti X
≈ + φti µj + σj Γjl Pl
j=1
∂Ztj i=1
∂Ztj
l=1
n k
!
X ∂Vt X ∂Sti
= + φti µj
j=1
∂Ztj i=1
∂Ztj
k n k
! !
X X ∂Vt X ∂Sti
+ + φti σj Γjl Pl (8)
j=1
∂Ztj i=1
∂Ztj
l=1
24 (Section 1)
Applications of PCA: Portfolio Immunization
Can now use (8) to hedge the risk associated with the first k principal
components.
In particular, we solve for the φtl ’s so that the coefficients of the Pl ’s in (8) are
zero
- a system of k linear equations in k unknowns so it is easily solved.
If we include an additional hedging asset then could also ensure that total value
of hedged portfolio is equal to value of original un-hedged portfolio.
25 (Section 1)
Factor Models
Definition: We say the random vector X = (X1 . . . Xn )> follows a linear
k-factor model if it satisfies
X = a + BF + (9)
where
(i) F = (F1 . . . Fk )> is a random vector of common factors with k < n and
with a positive-definite covariance matrix;
(ii) = (1 , . . . , n ) is a random vector of idiosyncratic error terms which are
uncorrelated and have mean zero;
(iii) B is an n × k constant matrix of factor loadings, and a is an n × 1 vector of
constants;
(iv) Cov(Fi , j ) = 0 for all i, j.
26 (Section 2)
Factor Models
If X ∼ MN(·, ·) and follows (9) then possible to find a version of the model
where F ∼ MN(·, ·) and ∼ MN(·, ·).
In this case the error terms, i , are independent.
27 (Section 2)
Exercise
Show that if (9) holds then there is also a representation
X = µ + B∗ F∗ + (10)
where
E[X] = µ and
Cov(F∗ ) = Ik
28 (Section 2)
Example: Factor Models Based on Principal Components
Factor model of (6) may be interpreted as a k-factor model with
F = P(1) and
B = diag(σ1 , . . . , σn ) Γ1 .
29 (Section 2)
Calibration Approaches
Three different types of factor models:
1. Observable Factor Models
Factors, Ft , have been identified in advance and are observable.
They typically have a fundamental economic interpretation e.g. a 1-factor
model where market index plays role of the single factor.
These models are usually calibrated and tested for goodness-of-fit using
multivariate or time-series regression techniques.
e.g. A model with factors constructed from change in rate of inflation,
equity index return, growth in GDP, interest rate spreads etc.
2. Cross-Sectional Factor Models
Factors are unobserved and therefore need to be estimated.
The factor loadings, Bt , are observed, however.
e.g. A model with dividend yield, oil and tech factors. We assume the factor
returns are unobserved but the loadings are known. Why?
e.g. BARRA’s factor models are generally cross-sectional factor models.
3. Statistical Factor Models
Both factors and loadings need to be estimated.
Two standard methods for doing this: factor analysis and PCA.
30 (Section 2)
Factor Models in Risk Management
Straightforward to use a factor model to manage risk.
For a given portfolio composition and fixed matrix, B, of factor loadings, the
sensitivity of the total portfolio value to each factor, Fi for i = 1, . . . , k, is easily
computed.
Can then adjust portfolio composition to achieve desired overall factor sensitivity.
Process easier to understand and justify when the factors are easy to interpret.
When this is not the case then the model is purely statistical.
Tends to occur when statistical methods such as factor analysis or PCA are
employed
But still possible even then for identified factors to have an economic
interpretation.
31 (Section 2)
E.G. Scenario Analysis for an Options Portfolio
Mr. Smith has a portfolio consisting of various stock positions as well as a
number of equity options.
He would like to perform a basic scenario analysis with just two factors:
1. An equity factor, Feq , representing the equity market.
2. A volatility factor, Fvol , representing some general implied volatility factor.
Can perform the scenario analysis by stressing combinations of the factors and
computing the P&L resulting from each scenario.
Of course using just two factors for such a portfolio will result in a scenario
analysis that is quite coarse but in many circumstances this may be sufficient.
32 (Section 2)
IEOR E4602: Quantitative Risk Management
Introduction to Copulas
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
References: Chapter 7 of 2nd ed. of QRM by McNeil, Frey and Embrechts, Chapter 8 of SDAFA by
Ruppert and Matteson, and the book chapter “Coping With Copulas” by Thorsten Schmidt.
Outline
Introduction
Main Results
Sklar’s Theorem
Copula Invariance Under Monotonic Transformations
The Fréchet-Hoeffding Bounds
Other Examples of Copulas
Measures of Dependence
Spearman’s Rho
Kendall’s Tau
Tail Dependence
Simulating the Gaussian and t Copulas
Estimating Copulas
A Financial Application: Pricing CDOs
A Single-Period CDO
Multi-period CDO’s
Synthetic CDO’s
Calibrating the Gaussian Copula Model
2 (Section 0)
Why Study Copulas?
Copulas separate the marginal distributions from the dependency structure
in a given multivariate distribution.
Copulas help expose the various fallacies associated with correlation.
Copulas play an important role in pricing securities that depend on many
underlying securities
- e.g. equity basket options, collateralized debt obligations (CDO’s),
n th -to-default options.
3 (Section 1)
The Definition of a Copula
Definition: A d-dimensional copula, C : [0, 1]d : → [0, 1] is a cumulative
distribution function (CDF) with uniform marginals.
Recall the definition of the quantile function or generalized inverse: for a CDF,
F , the generalized inverse, F ← , is defined as
P (F ← (U ) ≤ x) = FX (x).
FX (X ) ∼ U [0, 1].
5 (Section 1)
Sklar’s Theorem (1959)
Let X = (X1 , . . . , Xd ) be a multivariate random vector with CDF FX and with
continuous marginals.
Then (why?) the joint distribution of FX1 (X1 ), . . . , FXd (Xd ) is a copula, CX .
6 (Section 2)
Sklar’s Theorem (1959)
Consider a d-dimensional CDF, F , with marginals F1 , …, Fd . Then there exists a
copula, C , such that
7 (Section 2)
An Example
Let Y and Z be two IID random variables each with CDF, F (·).
Let X1 := min(Y , Z ) and X2 := max(Y , Z ).
We then have
2
P(X1 ≤ x1 , X2 ≤ x2 ) = 2 F (min{x1 , x2 }) F (x2 ) − F (min{x1 , x2 })
8 (Section 2)
An Example
First note the two marginals satisfy
9 (Section 2)
When the Marginals Are Continuous
Suppose the marginal distributions, F1 , . . . , Fn , are continuous. Then can show
10 (Section 2)
Invariance of the Copula Under Monotonic Transformations
Proposition: Suppose the random variables X1 , . . . , Xd have continuous
marginals and copula, CX . Let Ti : R → R, for i = 1, . . . , d be strictly
increasing functions.
Then the dependence structure of the random variables
Y1 := T1 (X1 ), . . . , Yd := Td (Xd )
is also given by the copula CX .
−1
Sketch of proof when Tj ’s are continuous and FXj
’s exist:
First note that
FY (y1 , . . . , yd ) = P(T1 (X1 ) ≤ y1 , . . . , Td (Xd ) ≤ yd )
= P(X1 ≤ T1−1 (y1 ), . . . , Xd ≤ Td−1 (yd ))
= FX (T1−1 (y1 ), . . . , Td−1 (yd )) (5)
so that (why?) FYj (yj ) = FXj (Tj−1 (yj )).
11 (Section 2)
Invariance of the Copula Under Monotonic Transformations
Now to the proof:
= FY FY−1 −1
CY (u1 , . . . , ud ) 1
(u1 ), . . . , FY d
(ud ) by (4)
= FX T1−1 FY −1
(u1 ) , . . . , Td−1 FY
−1
1 d
(ud ) by (5)
−1
(u1 ), . . . , FX−1
= FX FX 1 d
(ud ) by (6)
= CX (u1 , . . . , ud )
and so CX = CY .
12 (Section 2)
The Fréchet-Hoeffding Bounds
Theorem: Consider a copula C (u) = C (u1 , . . . , ud ). Then
( d
)
X
max 1 − d + ui , 0 ≤ C (u) ≤ min{u1 , . . . , ud }.
i=1
13 (Section 2)
Tightness of the Fréchet-Hoeffding Bounds
The upper Fréchet-Hoeffding bound is tight for all d.
Fréchet and Hoeffding showed independently that copulas always lie between
these bounds
- corresponding to cases of extreme of dependency, i.e. comonotonicity and
countermonotonicity.
14 (Section 2)
Comonotonicity and The Perfect Dependence Copula
The comonotonic copula is given by
M (u) := min{u1 , . . . , ud }
15 (Section 2)
Countermonotonic Random Variables
The countermonotonic copula is the 2-dimensional copula that is the
Fréchet-Hoeffding lower bound.
It satisfies
W (u1 , u2 ) = max{u1 + u2 − 1, 0} (7)
and corresponds to the case of perfect negative dependence.
Can check that (7) is the joint distribution of (U , 1 − U ) where U ∼ U (0, 1).
Question: Why is the Fréchet-Hoeffding lower bound not a copula for d > 2?
16 (Section 2)
The Independence Copula
The independence copula satisfies
d
Y
Π(u) = ui .
i=1
And random variables are independent if and only if their copula is the
independence copula
- follows immediately from Sklar’s Theorem.
17 (Section 2)
The Gaussian Copula
Recall when marginals are continuous we have from (4)
If Y ∼ MNd (µ, Σ) with Corr(Y) = P, then Y has (why?) the same copula as X
- hence a Gaussian copula is fully specified by a correlation matrix, P.
18 (Section 3)
The t Copula
Recall X = (X1 , . . . , Xd ) has a multivariate t distribution with ν dof if
Z
X = p
ξ/ν
19 (Section 3)
The Bivariate Gumbel Copula
The bivariate Gumbel copula is defined as
1
CθGu (u1 , u2 ) := exp − (− ln u1 )θ + (− ln u2 )θ θ
20 (Section 3)
θ = 1.1 θ = 1.5 θ=2
●● ● ● ● ●
● ●● ●● ● ● ●●●●●
● ● ●● ● ●●
●
●
● ● ●●● ●●●●● ● ●● ●●●
● ●● ●● ● ● ● ● ● ●●
● ●●● ●●
● ● ● ● ●●●●● ● ●●●
● ● ●● ● ●●● ● ● ●● ● ● ●● ●●
0.8
0.8
0.8
●● ● ● ● ●
● ● ●●
● ● ●● ●●● ●● ● ● ● ● ● ●● ●
● ● ● ● ● ● ● ●●●● ●●● ●
● ● ●●
●
●
● ● ● ●
●●● ● ● ● ● ● ● ●●
● ●● ●●
● ● ● ●●● ● ●● ● ●
● ● ●●●●
● ●
● ● ●●● ● ● ●● ●●
● ●● ● ●
●
● ● ●●● ● ● ● ●
●●●● ● ●●●● ● ●●● ●● ● ●● ● ● ● ●● ●● ● ●
●●● ● ● ● ●●● ● ● ● ● ● ● ● ●●●
●● ●● ● ● ● ●● ● ● ●●● ●
● ● ● ● ● ●●● ●
u2
u2
u2
● ● ● ●● ●●
●
●●
● ● ●● ● ● ● ● ● ● ●● ●
●●● ● ●●● ●●●● ●●●●● ●
●●
●
● ● ● ● ● ● ● ● ● ● ●● ●
0.4
0.4
0.4
●●
● ●● ● ●● ● ● ● ● ●● ●
● ●
● ● ●● ● ●● ● ● ● ●●● ● ● ●●
● ●● ● ●
●● ●
●● ● ● ● ● ● ● ● ● ● ●● ● ●
● ● ●● ● ● ● ●●●● ●● ●● ● ●
●
●●● ● ● ●● ● ●●
● ● ● ● ● ● ●● ● ● ● ●
● ● ●●●● ● ●● ●●●● ● ●● ●
● ● ● ● ●● ● ● ● ●● ●
●● ●●●
● ●● ●● ● ● ● ●
● ●● ● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ●●●● ● ● ●
●● ●●●●●● ●●
● ●
●
● ●
● ●●
● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ●●● ●
● ●
● ● ●
0.0
● ● ●
0.0
●●● ●
0.0
● ●● ●
● ● ●● ●
● ● ● ●● ●● ●
●
●
●● ● ● ●
● ●●
●
θ=4 θ=8 θ = 50
●●● ●
● ●
●
●●
●●●
●●●
●●●●
●
●
●●●●
●
●●
●
●●●● ●●●●
●
●
● ● ●
●●
●●● ● ●●
● ●
●
●
●
0.8
●●
●●
● ●●●
0.8
●●●●
0.8
● ●
● ●
● ●● ● ● ●● ●
●
●● ● ●
●● ●● ●● ● ●
●
● ●●●●●
●●●● ●
●
●●
●●●●
●● ●
●
●
●
●
● ●●●●
●● ●●
●
●●● ●●
● ●●●●
●
●
●● ●
●●
●
●
●●● ● ● ●
●●
●
●● ●●● ● ●
●●●
●●●● ●●
●
●
●
● ● ● ● ●●●
●●●
●
●●●● ●●
●
●● ●● ●●●●●
●● ●
● ●● ●●
● ●
● ●
●
●●
u2
u2
u2
● ●
● ●
● ●●●●●●●●●
●
● ●
●
●●
●
● ●● ● ● ●● ● ● ●●
●●● ●● ● ●
0.4
●●
0.4
0.4
●●● ●
● ● ●●●● ●
●● ●●●●
● ●● ● ●
● ●●●●● ●
● ● ●
●●●●●
●
● ●● ●
●
●
●
●
●
●●
● ●●●● ● ●●● ●●●
● ●●
●
●
●
●●● ●●
● ●● ●● ● ●● ●●●●●
●●● ●●●
●
●● ● ●
●● ● ● ●●●●
●
●●●●●●● ●●●●
● ●
●
●●
● ●● ●
●●
●●●
●
●
● ●●● ●● ●
●●● ●●● ●
● ●●
●●●
● ●●
●●● ●●● ● ●●●●
● ●●●
●● ● ●●●●
● ● ● ●
●
●●
●
● ● ●
0.0
●
0.0
0.0
●●● ● ●
●
●● ●● ●
●
●
● ●
●
●
●
Figure 8.4 from Ruppert and Matteson: Bivariate random samples of size 200 from various
Gumbel copulas.
The Bivariate Clayton Copula
The bivariate Clayton copula is defined as
−1/θ
CθCl (u1 , u2 ) := u1−θ + u2−θ − 1
23 (Section 3)
θ = −0.98 θ = −0.7 θ = −0.3
●
●●●
●● ●● ● ●●
● ● ●● ● ●●●● ● ●●● ● ●
●● ●● ● ● ● ● ●●● ●
●
●
●●● ●
●
●● ● ●
● ● ● ●● ●
● ● ● ● ●
0.8
● ●●
0.8
0.8
●
●
●●● ● ● ●● ● ● ● ● ●●●● ●● ●
●
●● ● ●●●●● ● ● ●●● ● ● ●● ● ●● ● ●●●●
●● ●● ●● ●
●
●●
●
●●
●
● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●
●
●
●●
●●
●
● ●● ●● ● ● ●● ●
● ●●
● ● ● ● ●
●
● ●● ●
●●
●● ● ●●●● ● ● ● ●● ●
● ● ● ● ● ● ● ● ●
●
●●●
●
●●● ● ●●● ● ●
●
● ● ● ● ● ●● ● ● ●
● ● ● ● ●
● ●● ●
u2
u2
u2
●●
●
● ●●
● ● ●● ● ● ●
●●● ● ●●
●
● ● ● ●● ● ●
0.4
●
0.4
0.4
●●
● ● ●
●●● ●
● ● ●● ●●
●
●● ●●● ● ●● ●●
●● ●●●
●●●●● ●● ● ● ●● ● ●● ●● ● ●
●●●
●
●● ● ● ● ●● ● ●
● ●
● ● ● ●
●
●
●●
●● ●● ●●
● ●
● ● ● ● ●●● ●
●
●●
●●
●
●
● ●
● ●
●●● ●
●●
● ●
● ●
● ● ● ● ●● ● ● ● ● ● ●●
●
● ●●●● ● ●● ●● ● ●●●
●●
●
●●
●●
●● ●
●●●● ● ● ●● ● ●●● ● ● ● ● ●●●
●
●● ●●●●●● ● ●●●●●● ● ●
●
●
●●● ●●●
● ●●● ● ●●
●● ●● ● ●●● ●
● ●
0.0
● ● ●● ●● ●
0.0
0.0
●
●
●●
● ● ●● ●
●
●●●
●
● ●● ●
0.8
● ● ● ●
0.8
0.8
●
● ● ●● ●
● ●● ●● ● ● ●● ● ● ●●●●
● ● ●● ●● ●
●● ●
● ●● ● ● ●
●●● ●● ● ● ● ●●● ● ● ● ● ●
● ●●
● ●
●●
● ●● ● ●● ● ●
● ● ● ●
● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●●
● ●●● ● ●●● ●● ● ● ● ●● ●● ● ● ● ●
●● ● ● ●● ● ●● ●●● ● ●
●●● ● ● ●● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ●●
●
● ● ●● ●●●● ●● ● ● ●● ● ● ● ● ●
● ●●
●
●● ● ● ●
●
u2
u2
u2
● ● ● ●● ●● ●
●●● ●● ●●● ● ● ● ●
● ●●
● ● ● ●● ●
● ● ● ●● ● ●
●●
● ● ● ●● ● ●● ●
● ●
●●● ●● ● ● ●●
0.4
0.4
0.4
● ● ●
● ● ●● ● ● ● ●●
● ● ●● ● ●
● ● ●● ● ● ● ●●● ● ●● ●
●●● ● ● ● ● ● ●●● ● ●●● ●
●● ●● ● ● ●
●●
● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ●●●● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ●● ● ● ●
●● ● ● ● ● ● ●● ● ● ● ● ●●●● ●
●
● ●●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●
● ●●●● ● ● ●●● ● ●
● ● ● ●
● ● ● ● ●● ● ●● ● ●● ●
● ● ● ●
●● ● ● ● ● ●●
● ● ● ● ● ●● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ●●● ●●●●●● ● ● ●●
●●
● ● ●
0.0
0.0
0.0
● ● ●● ●●●● ● ● ● ● ● ●● ● ● ●●● ●● ●
θ=5 θ = 15 θ = 100
● ●● ●● ● ●●
●● ●●
●
●●
● ●
●●
●●
●
● ●●● ● ● ●●●● ●●●● ●●
●●●●
●● ● ●
●
●●
●
●
●
●
●
●
●●
●
●●
●● ● ● ● ●●
● ● ●● ●● ●
●
●
●
●●
●●
0.8
0.8
●
0.8
● ●● ● ●●●
●
●● ●●● ●●●● ●●●●●●●● ●● ●
●●
●●●
●
●●
● ●● ● ● ● ● ●●●●● ●
●
●● ● ●●● ●
●
●
●●
●
●
●●●
●
●
● ●● ●● ● ● ● ● ● ● ●●●
●●●●
●● ●
●
●●●
●
●
●
●
● ●● ●● ●● ● ●● ●
●●●● ●
●
●
● ●
● ●●● ● ●●●●● ●
●
●
● ●
● ● ●
●●
●
● ●
● ● ● ● ● ●
●●●●
● ● ●● ●● ●●●●
●
● ●●
●
●
●●
●
●
u2
u2
u2
● ● ●●●●●
●● ●●
● ● ●●●● ● ● ● ●
●●
● ● ●
●●
0.4
0.4
0.4
● ● ● ● ●
●●● ● ● ● ● ●●●
●
●
●●
●●
●
●●●●
●
●●●
●
●●●
●
●
●●●
●
●
●
● ●●●●●● ●
●● ● ● ●●
●
●● ●
●
●
●
●●
●
● ●●● ●● ● ● ● ●● ●● ●
●●
●● ● ●
●●●●● ● ●●
●●● ●●
● ●●
●●
●●●
●
● ●
● ●
●●
●
● ●
●● ●
●
●●
● ●●
●
●
●
●●
●
●●● ● ●
● ●●●● ●
●
●
●●●
●●● ● ●
●
●●
● ● ●●
●
●
●●● ●
0.0
0.0
● ●●
0.0
●●
●
●
●● ●
●
●●
● ●
●
● ●
●
● ●
●
●
Figure 8.3 from Ruppert and Matteson: Bivariate random samples of size 200 from various
Clayton copulas.
Measures of Dependence
There are three principal measures of dependence:
1. The usual Pearson, i.e. linear, correlation coefficient
- invariant under positive linear transformations, but not under general strictly
increasing transformations
- there are many fallacies associated with the Pearson correlation
- not defined unless second moments exist.
2. Rank correlations
- only depend on the unique copula of the joint distribution
- therefore (why?) invariant to strictly increasing transformations
- also very useful for calibrating copulas to data.
25 (Section 4)
Fallacies of The Correlation Coefficient
Each of the following statements is false!
1. The marginal distributions and correlation matrix are enough to determine
the joint distribution
- how would we find a counterexample?
3. The VaR of the sum of two risks is largest when the two risks have maximal
correlation.
Definition: We say two random variables, X1 and X2 , are of the same type if
there exist constants a > 0 and b ∈ R such that
X1 ∼ aX2 + b.
26 (Section 4)
On Fallacy #2
Theorem: Let (X1 , X2 ) be a random vector with finite-variance marginal CDF’s
F1 and F2 , respectively, and an unspecified joint CDF.
Assuming Var(X1 ) > 0 and Var(X2 ) > 0, then the following statements hold:
3. ρmin = −1 if and only if X1 and −X2 are of the same type. ρmax = 1 if and
only if X1 and X2 are of the same type.
The proof is not very difficult; see Section 7.2 of MFE for details.
27 (Section 4)
Spearman’s Rho
Definition: For random variables X1 and X2 , Spearman’s rho is defined as
The Spearman’s rho matrix is simply the matrix of pairwise Spearman’s rho
correlations, ρ(Fi (Xi ), Fj (Xj )) – a positive-definite matrix. Why?
where (X̃1 , X̃2 ) is independent of (X1 , X2 ) but has same joint distribution as
(X1 , X2 ).
29 (Section 4)
Kendall’s Tau
If X1 and X2 have continuous marginals then can show
Z 1 Z 1
ρτ (X1 , X2 ) = 4 C (u1 , u2 ) dC (u1 , u2 ) − 1.
0 0
Can also show that for a bivariate Gaussian copula, or more generally, if
X ∼ E2 (µ, P, ψ) and P(X = µ) = 0
2
ρτ (X1 , X2 ) = arcsin ρ (10)
π
where ρ = P12 = P21 is the Pearson correlation coefficient.
Note (10) very useful for estimating ρ with fat-tailed elliptical distributions
- generally provides much more robust estimates of ρ than usual Pearson
estimator
- see figure on next slide where each estimate was constructed from a sample
of n = 60 (simulated) data-points.
30 (Section 4)
1
−0.5
−1
0 500 1000 1500 2000
1
Kendall’s Tau
0.5
−0.5
−1
0 500 1000 1500 2000
Estimating Pearson’s correlation using the usual Pearson estimator versus using Kendall’s τ .
Underlying distribution was bivariate t with ν = 3 degrees-of-freedom and true Pearson
correlation ρ = 0.5.
Properties of Spearman’s Rho and Kendall’s Tau
Spearman’s rho and Kendall’s tau are examples of rank correlations in that, when
the marginals are continuous, they depend only on the bivariate copula and not
on the marginals
- they are invariant (why?) in this case under strictly increasing
transformations.
32 (Section 4)
Tail Dependence
Definition: Let X1 and X2 denote two random variables with CDF’s F1 and F2 ,
respectively. Then the coefficient of upper tail dependence, λu , is given by
If λu > 0, then we say that X1 and X2 have upper tail dependence while if
λu = 0 we say they are asymptotically independent in the upper tail.
Lower tail dependence and asymptotically independent in the lower tail are
similarly defined using λl .
33 (Section 4)
1.0
ν=1
ν=4
ν = 25
0.8
ν = 250
0.6
λl = λu
0.4
0.2
0.0
35 (Section 5)
Simulating the t Copula
1. For an arbitrary covariance matrix, Σ, let P be its corresponding correlation
matrix.
2. Generate X ∼ MNd (0, P).
3. Generate ξ ∼ χ2ν independent of X.
p p
4. Return U = tν (X1 / ξ/ν), . . . , tν (Xd / ξ/ν) where tν is the CDF of a
univariate t distribution with ν degrees-of-freedom.
t
The distribution of U is the t copula Cν,P (u)
- this is also (why?) the copula of X.
36 (Section 5)
Estimating / Calibrating Copulas
There are several related methods that can be used for estimating copulas:
1. Maximum likelihood estimation (MLE)
2. Pseudo-MLE of which there are two types:
- parametric pseudo-MLE
- semiparametric pseudo-MLE
3. Moment-matching methods are also sometimes used
- they can also be used for finding starting points for (pseudo) MLE.
37 (Section 6)
Maximum Likelihood Estimation
Let Y = (Y1 . . . Yd )> be a random vector and suppose we have parametric
models FY1 (· | θ 1 ), . . . , FYd (· | θ d ) for the marginal CDFs.
38 (Section 6)
Maximum Likelihood Estimation
The ML estimators θ̂ 1 , . . . , θ̂ d , θ̂ C are obtained by maximizing (12).
39 (Section 6)
Pseudo-Maximum Likelihood Estimation
The pseudo-MLE approach has two steps:
1. First estimate the marginal CDFs to obtain F̂Yj for j = 1, . . . , d. Can do
this using either:
The empirical CDF of y1,j , . . . , yn,j so that
Pn
i=1
1{yi,j ≤y}
F̂Yj (y) =
n+1
A parametric model with θ̂ j obtained using usual MLE approach.
2. Then estimate the copula parameters θ C by maximizing
n
X h i
log cY F̂Y1 (yi,1 ), . . . , F̂Yd (yi,d ) | θ C (13)
i=1
2
ρτ (Yi , Yj ) = arcsin (Ωi,j ) (14)
π
and
6
ρS (Yi , Yj ) = arcsin (Ωi,j /2) ≈ Ωi,j (15)
π
If instead Y has a meta-t distribution with continuous univariate marginal
t
distributions and copula Cν,Ω then (14) still holds but (15) does not. 2
Question: There are several ways to use this Proposition to fit meta Gaussian
and t copulas. What are some of them?
41 (Section 6)
Collateralized Debt Obligations (CDO’s)
Want to find the expected losses in a simple 1-period CDO with the following
characteristics:
The maturity is 1 year.
There are N = 125 bonds in the reference portfolio.
Each bond pays a coupon of one unit after 1 year if it has not defaulted.
The recovery rate on each defaulted bond is zero.
There are 3 tranches of interest:
42 (Section 7)
Collateralized Debt Obligations (CDO’s)
Xi is the normalized asset value of the i th credit and we assume
√ p
Xi = ρM + 1 − ρ Zi (16)
where M , Z1 , . . . , ZN are IID normal random variables
- note correlation between each pair of asset values is identical.
Since probability, q, of default is identical across all bonds must therefore have
x̄1 = · · · x̄N = Φ−1 (q). (17)
It now follows from (16) and (17) that
P(i defaults | M ) = P(Xi ≤ x̄i | M )
√
= P( ρM + 1 − ρ Zi ≤ Φ−1 (q) | M )
p
√
Φ−1 (q) − ρM
= P Zi ≤ √ M .
1−ρ
43 (Section 7)
Collateralized Debt Obligations (CDO’s)
Therefore conditional on M , the total number of defaults is Bin(N , qM ) where
−1 √
Φ (q) − ρM
qM := Φ √ .
1−ρ
That is,
N k
p(k | M ) = q (1 − qM )N −k .
k M
Can now compute expected (risk-neutral) loss on each of the three tranches:
44 (Section 7)
Collateralized Debt Obligations (CDO’s)
2
X
EQ
0 [Equity tranche loss] = 3 × P(3 or more defaults) + k P(k defaults)
k=1
X2
EQ
0 [Mezz tranche loss] = 3 × P(6 or more defaults) + k P(k + 3 defaults)
k=1
119
X
EQ
0 [Senior tranche loss] = k P(k + 6 defaults).
k=1
45 (Section 7)
Collateralized Debt Obligations (CDO’s)
Regardless of the individual default probability, q, and correlation, ρ, we have:
EQ Q Q
0 [% Equity tranche loss] ≥ E0 [% Mezz tranche loss] ≥ E0 [% Senior tranche loss] .
Expected senior tranche loss (with upper attachment point of 100%) always
increasing in ρ.
46 (Section 7)
Expected Tranche Losses As a Function of ρ
Collateralized Debt Obligations (CDO’s)
Question: How does the total expected loss in the portfolio vary with ρ?
The dependence structure we used in (??) to link the default events of the
various bonds is the famous Gaussian-copula model.
48 (Section 7)
Multi-period CDO’s
Will now assume:
There are N credits in the reference portfolio.
Each credit has a notional amount of Ai .
If the i th credit defaults, then the portfolio incurs a loss of Ai × (1 − Ri )
- Ri is the recovery rate
- we assume Ri is fixed and known.
The default time of i th credit is Exp(λi ) with CDF Fi
- λi easily estimated from either CDS spreads or prices of corporate bonds
- so can compute Fi (t), the risk-neutral probability that i th credit defaults
before time t.
49 (Section 7)
Multi-period CDO’s
Again Xi is the normalized asset value of the i th credit and we assume
q
Xi = ai M + 1 − ai2 Zi (18)
Let F (t1 , . . . , tn ) denote joint distribution of the default times of the N credits.
Then assume
where ΦP (·) is the multivariate normal CDF with mean 0 and correlation P
- so distribution of default times has a Gaussian copula!
50 (Section 7)
Computing the Portfolio Loss Distribution
In order to price credit derivatives, we need to compute the portfolio loss
distribution.
51 (Section 7)
Computing the Portfolio Loss Distribution
Straightforward to calculate pN (l, t|M ) using a simple iterative procedure.
If we assume that the notional, Ai , and the recovery rate, Ri , are constant across
all credits, then the loss on any given credit will be either 0 or A(1 − R).
This then implies that knowing probability distribution of the number of defaults
is equivalent to knowing probability distribution of the total loss in the reference
portfolio.
52 (Section 7)
The Mechanics and Pricing of a CDO Tranche
A tranche is defined by the lower and upper attachment points, L and U ,
respectively.
The tranche loss function, TLL,U (l), for a fixed time, t, is a function of the
number of defaults, l, and is given by
For a given number of defaults it tells us the loss suffered by the tranche.
For example:
Suppose L = 3% and U = 7%
Suppose also that total portfolio loss is lA(1 − R) = 5%
Then tranche loss is 2% of total portfolio notional
- or 50% of tranche notional.
53 (Section 7)
The Mechanics and Pricing of a CDO Tranche
When an investor sells protection on the tranche she is guaranteeing to reimburse
any realized losses on the tranche to the protection buyer.
The fair value of the CDO tranche is that value of the premium for which the
expected value of the premium leg equals the expected value of the default leg.
54 (Section 7)
The Mechanics and Pricing of a CDO Tranche
Clearly then the fair value of the CDO tranche depends on the expected value of
the tranche loss function.
h i N
X
E TLtL,U = TLtL,U (l) p(l, t)
l=0
We now compute the fair value of the premium and default legs ...
55 (Section 7)
Fair Value of the Premium Leg
Premium leg represents the premium payments that are paid periodically by the
protection buyer to the protection seller.
They are paid at the end of each time interval and they are based upon the
remaining notional in the tranche.
Note that (21) is consistent with the statement that the premium paid at any
time t is based only on the remaining notional in the tranche.
56 (Section 7)
Fair Value of the Default Leg
Default leg represents the cash flows paid to the protection buyer upon losses
occurring in the tranche.
57 (Section 7)
Fair Value of the Tranche
The fair premium, s ∗ say, is the value of s that equates the value of the default
leg with the value of the premium leg:
DL0L,U
s ∗ := P h i .
n L,U
t=1 d t ∆ t (U − L) − E0 TLt
As is the case with swaps and forwards, the fair value of the tranche to the
protection buyer and seller at initiation is therefore zero.
Easy to incorporate any possible upfront payments that the protection buyer
must pay at time t = 0 in addition to the regular premium payments.
Can also incorporate recovery values and notional values that vary with each
credit in the portfolio.
58 (Section 7)
Cash CDO’s
First CDOs to be traded were all cash CDOs
- the reference portfolio actually existed and consisted of corporate bonds that
the CDO issuer usually kept on its balance sheet.
To reduce these capital requirements, banks converted the portfolio into a series
of tranches and sold most of these tranches to investors.
Banks usually kept the equity tranche for themselves. This meant:
They kept most of the economic risk and rewards of the portfolio
But they also succeeded in dramatically reducing the amount of capital they
needed to set aside
Hence first CDO deals were motivated by regulatory arbitrage considerations.
59 (Section 7)
Synthetic CDO’s
Soon become clear there was an appetite in the market-place for these products.
e.g. Hedge funds were keen to buy the riskier tranches while insurance companies
and others sought the AAA-rated senior and super-senior tranches.
This appetite and explosion in the CDS market gave rise to synthetic tranches
where:
the underlying reference portfolio is no longer a physical portfolio of
corporate bonds or loans
it is instead a fictitious portfolio consisting of a number of credits with an
associated notional amount for each credit.
60 (Section 7)
Synthetic CDO’s
Mechanics of a synthetic tranche are precisely as described earlier.
But they have at least two features that distinguish them from cash CDOs:
(i) With a synthetic CDO it is no longer necessary to tranche the entire
portfolio and sell the entire “deal"
e.g. A bank could sell protection on a 3%-7% tranche and never have to
worry about selling the other pieces of the reference portfolio. This is not
the case with cash CDOs.
(ii) Because the issuer no longer owns the underlying bond portfolio, it is no
longer hedged against adverse price movements
- it therefore needs to dynamically hedge its synthetic tranche position and
would have typically done so using the CDS markets.
61 (Section 7)
Calibrating the Gaussian Copula Model
In practice, very common to calibrate synthetic tranches as follows:
1. Assume all pairwise correlations, Corr(Xi , Xj ), are identical
- equivalent to taking a1 = · · · = aN = a in (18) so that
Corr(Xi , Xj ) = a 2 := ρ for all i, j.
2. In the case of the liquid CDO tranches whose prices are observable in the
market-place, we then choose ρ so that the fair tranche spread in the model
is equal to the quoted spread in the market place.
62 (Section 7)
Calibrating the Gaussian Copula Model
If the model is correct, then every tranche should have the same ρimp .
Just as equity derivatives markets have an implied volatility surface, the CDO
market has implied base correlation curves.
63 (Section 7)
Calibrating the Gaussian Copula Model
The implied base correlation curve is generally an increasing function of the
upper attachment point.
64 (Section 7)
IEOR E4602: Quantitative Risk Management
Risk Measures
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
2 (Section 1)
Axioms of Coherent Risk Measures
Translation Invariance For all L ∈ M and every constant a ∈ R, we have
%(L + a) = %(L) + a.
3 (Section 1)
Axioms of Coherent Risk Measures
Positive Homogeneity For all L ∈ M and every λ > 0 we have
%(λL) = λ%(L).
- also controversial: has been criticized for not penalizing concentration of risk
- e.g. if λ > 0 very large, then perhaps we should require %(λL) > λ%(L)
- but this would be inconsistent with subadditivity:
%(L1 ) ≤ %(L2 )
5 (Section 1)
Convex Risk Measures
Criticisms of subadditivity and positive homogeneity axioms led to the study
of convex risk measures.
A convex risk measure satisfies the same axioms as a coherent risk measure
except that subadditivity and positive homogeneity axioms are replaced by
the convexity axiom:
It is possible to find risk measures within the convex class that satisfy
%(λL) > λ%(L) for λ > 1.
6 (Section 1)
Value-at-Risk
Recall ...
Definition: Let α ∈ (0, 1) be some fixed confidence level. Then the VaR of the
portfolio loss, L, at the confidence level, α, is given by
7 (Section 1)
Example 1
Consider two IID assets, X and Y where
X = + η where ∼ N(0, 1)
0, with prob .991
and η =
−10, with prob .009.
VaR.99 (X + Y ) = 9.8
> VaR.99 (X ) + VaR.99 (Y )
= 3.1 + 3.1
= 6.2
8 (Section 1)
Example 2: Defaultable Bonds
Consider a portfolio of n = 100 defaultable corporate bonds
Probability of default over next year identical for all bonds and equal to 2%.
Default events of different bonds are independent.
Current price of each bond is 100.
If bond does not default then will pay 105 one year from now
- otherwise there is no repayment.
Li := 105Yi − 5
where Yi = 1 if the bond defaults over the next year and Yi = 0 otherwise.
By assumption also see that P(Li = −5) = .98 and P(Li = 100) = .02.
9 (Section 1)
Example 2: Defaultable Bonds
Consider now the following two portfolios:
A: A fully concentrated portfolio consisting of 100 units of bond 1.
B: A completely diversified portfolio consisting of 1 unit of each of the 100
bonds.
We want to compute the 95% VaR for each portfolio.
Obtain VaR.95 (LA ) = −500, representing a gain(!) and VaR.95 (LB ) = 25.
10 (Section 1)
Example 2: Defaultable Bonds
Now let % be any coherent risk measure depending only on the distribution of L.
11 (Section 1)
Subadditivity of VaR for Elliptical Risk Factors
Theorem
Suppose that X ∼ En (µ, Σ, ψ) and let M be the set of linearized portfolio losses
of the form
Xn
M := {L : L = λ0 + λi Xi , λi ∈ R}.
i=1
12 (Section 1)
Proof of Subadditivity of VaR for Elliptical Risk Factors
Without (why?) loss of generality assume that λ0 = 0.
L = λT X = λT AY + λT µ
∼ ||λT A|| Y1 + λT µ (2)
VaR can also fail to be sub-additive when the individual loss distributions have
heavy tails.
14 (Section 1)
Expected Shortfall
Recall ...
Definition: For a portfolio loss, L, satisfying E[|L|] < ∞ the expected shortfall
(ES) at confidence level α ∈ (0, 1) is given by
Z 1
1
ESα := qu (FL ) du.
1−α α
When the CDF, FL , is continuous then a more well known representation given by
ESα = E [L | L ≥ VaRα ] .
15 (Section 1)
Expected Shortfall
Theorem: Expected shortfall is a coherent risk measure.
There are many other examples of risk measures that are coherent
- e.g. risk measures based on generalized scenarios
- e.g. spectral risk measures
- of which expected shortfall is an example.
16 (Section 1)
Risk Aggregation
Let L = (L1 , . . . , Ln ) denote a vector of random variables
- perhaps representing losses on different trading desks, portfolios or operating
units within a firm.
Sometimes need to aggregate these losses into a random variable, ψ(L), say.
17 (Section 2)
Risk Aggregation
Want to understand the risk of the aggregate loss function, %(ψ(L))
- but first need the distribution of ψ(L).
In this case can try to compute lower and upper bounds on %(ψ(L)):
%min := inf{%(ψ(L)) : Li ∼ Fi , i = 1, . . . , n}
%max := sup{%(ψ(L)) : Li ∼ Fi , i = 1, . . . , n}
The capital allocation problem seeks a decomposition, AC1 , . . . , ACn , such that
n
X
%(L) = ACi (4)
i=1
19 (Section 3)
Capital Allocation
Pn
More formally, let L(λ) := i=1 λi Li be the loss associated with the portfolio
consisting of λi units of the loss, Li , for i = 1, . . . , n.
Let %(·) be a risk measure on a space M that contains L(λ) for all λ ∈ Λ, an
open set containing 1.
r% (λ) = %(L(λ)).
20 (Section 3)
Capital Allocation Principles
Definition: Let r% be a risk measure function on some set Λ ⊂ Rn \ 0 such that
1 ∈ Λ.
Then a mapping, f r% : Λ → Rn , is called a per-unit capital allocation principle
associated with r% if, for all λ ∈ Λ, we have
n
r
X
λi fi % (λ) = r% (λ). (5)
i=1
r
We then interpret fi % as the amount of capital allocated to one unit of Li
when the overall portfolio loss is L(λ).
r
The amount of capital allocated to a position of λi Li is therefore λi fi % and
so by (5), the total risk capital is fully allocated.
21 (Section 3)
The Euler Allocation Principle
Definition: If r% is a positive-homogeneous risk-measure function which is
differentiable on the set Λ, then the per-unit Euler capital allocation principle
associated with r% is the mapping
r ∂r%
f r% : Λ → Rn : fi % (λ) = (λ).
∂λi
22 (Section 3)
Value-at-Risk and Value-at-Risk Contributions
α
Let rVaR (λ) = VaRα (L(λ)) be our risk measure function.
Will now use (6) and Monte-Carlo to estimate the VaR contributions from each
security in a portfolio.
- Monte-Carlo is a general approach that can be used for complex portfolios
where (6) cannot be calculated analytically.
23 (Section 3)
An Application: Estimating Value-at-Risk Contributions
Pn
Recall total portfolio loss is L = i=1 Li .
∂ VaRα (λ)
=
∂λi λ=1
∂ VaRα
= wi (8)
∂wi
(m)
Could then set (why?) ACi = Li where m denotes the VaRα scenario, i.e.
L(m) is the dN (1 − α)eth largest of the N simulated portfolio losses.
Question:
Pn Will this estimator satisfy the additivity property, i.e. will
i AC i = VaRα ?
Question: What is the problem with this approach? Will this problem disappear
if we let N → ∞?
26 (Section 3)
A Third Approach: Kernel Smoothing Monte-Carlo
An alternative approach that resolves the problem with the second approach is to
take a weighted average of the losses in the i th security around the VaRα
scenario.
x
In particular, say K (x; h) := K h is a kernel function if it is:
1. Symmetric about zero
2. Takes a maximum at x = 0
3. And is non-negative for all x.
27 (Section 3)
A Third Approach: Kernel Smoothing Monte-Carlo
The kernel estimate of ACi is then given by
PN
K L (j)
− ˆ α ; h L(j)
VaR
ker j=1 i
AC
d
i := P (10)
N ˆ α; h
(j) − VaR
j=1 K L
One minor problem with (10) is that the additivity property doesn’t hold. Can
easily correct this by instead setting
PN (j) ˆ α ; h L(j)
− VaR
ker j=1 K L i
AC
d
i := VaR
dα
PN . (11)
ˆ α ; h L(j)
(j) − VaR
j=1 K L
1
Contribution
0.5
−0.5
−1
1 2 3 4 5 6 7 8 9 10
Security
30 (Section 3)
IEOR E4602: Quantitative Risk Management
Model Risk
Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Outline
Model Transparency
Local Volatility Models
Stochastic Volatility Models
Jump-Diffusion Models and Levy Processes
2 (Section 0)
A Simple Example: Pricing Barrier Options
Barrier options are options whose payoff depends in some way on whether or not
a particular barrier has been crossed before the option expires.
e.g. A knockout put option with strike K , barrier B and maturity T has payoff
Knock-Out Put Payoff = max 0, (K − ST ) 1{St ≥B for all t∈[0,T]} .
e.g. A digital down-and-in call option with strike K , barrier B and maturity T
has payoff
Digital Down-and-In Call = max 0, 1{mint∈[0,T] St ≤B} × 1{ST ≥K} .
3 (Section 1)
Barrier Options
Knock-in options can be priced from knock-out options and vice versa since a
knock-in plus a knock-out – each with the same strike – is equal to the vanilla
option or digital with the same strike.
Will not bother to derive or present these solutions here, however, since they are
of little use in practice
- this is because the Black-Scholes model is a terrible(!) model for pricing
barrier options.
But can still use GBM to develop intuition and as an example of model risk.
4 (Section 1)
Value of a Knockout Put Option Using Black-Scholes GBM
0.9
0.8
0.7
0.6
Option Price
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15 20 25 30 35 40 45 50
Black−Scholes Implied Volatility (%)
4.5
Knockout Put
Vanilla Put
4
3.5
Option Price
2.5
1.5
0.5
0
0 5 10 15 20 25
Black−Scholes Implied Volatility (%)
6 (Section 1)
Barrier Options
So knock-out put option always cheaper than corresponding vanilla put option.
While the vanilla option is unambiguously increasing in σ the same is not true for
the knock-out option. Why?
Question: What do you think would happen to the value of the knock-out put
option as σ → ∞?
P0 := C0 (1)
Pti+1 = Pti + (Pti − δti Sti ) r∆t + δti Sti+1 − Sti + qSti ∆t (2)
where:
∆t := ti+1 − ti
r = risk-free interest rate
q is the dividend yield
δti is the Black-Scholes delta at time ti
– a function of Sti and some assumed implied volatility, σimp .
9 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
Stock prices are simulated assuming St ∼ GBM(µ, σ) so that
2
√
/2)∆t+σ ∆tZ
St+∆t = St e (µ−σ
In the case of a short position in a call option with strike K and maturity T , the
final trading P&L is then defined as
In the Black-Scholes world we have σ = σimp and the P&L = 0 along every price
path in the limit as ∆t → 0.
In practice, however, we cannot know σ and so the market (and hence the option
hedger) has no way to ensure a value of σimp such that σ = σimp .
10 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
This has interesting implications for the trading P&L: it means we cannot exactly
replicate the option even if all of the assumptions of Black-Scholes are correct!
In figures on next two slides we display histograms of the P&L in (3) that results
from simulating 100k sample paths of the underlying price process with
S0 = K = $100.
11 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
6000
5000
4000
# of Paths
3000
2000
1000
0
−8 −6 −4 −2 0 2 4 6 8
Histogram of delta-hedging P&L with true vol. = 30% and implied vol. = 20%.
8000
7000
6000
5000
# of Paths
4000
3000
2000
1000
0
−8 −6 −4 −2 0 2 4 6 8
Histogram of delta-hedging P&L with true vol. = 30% and implied vol. = 40%.
This example is intended to highlight the importance of not just having a good
model but also having the correct model parameters.
Can be shown that the payoff from continuously delta-hedging an option satisfies
Z T 2 2
St ∂ Vt 2
− σt2 dt
P&L = 2
σimp
0 2 ∂S
where Vt is the time t value of the option and σt is the realized instantaneous
volatility at time t.
S2 2
We recognize the term 2t ∂∂SV2t as the dollar gamma
- always positive for a vanilla call or put option.
14 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
Returning to s.f. trading strategy of (1) and (2), note that we can choose any
model we like for the security price dynamics
- e.g. other diffusions or jump-diffusion models.
Goal then is to understand how robust the hedging strategy (based on the given
model) is to alternative price dynamics that might prevail in practice.
Given the appropriate data, one can also back-test the performance of a model
on realized historical price data to assess its hedging performance.
15 (Section 1)
E.G: Calibration and Extrapolation in Short-Rate Models
r3,3
r2,2 r3,2
qu
r1,1 qd r2,1 r3,1
r0,0 r1,0 r2,0 r3,0
Lattice above shows a generic binomial model for the short-rate, rt which is a
1-period risk-free rate.
Securities then priced in the lattice using risk-neutral pricing in the usual
backwards evaluation manner.
16 (Section 1)
E.G: Calibration and Extrapolation in Short-Rate Models
e.g. Can compute time t price, ZtT , of a zero-coupon bond (ZCB) maturing at
time T by setting ZTT ≡ 1 and then calculating
Bt 1
Zt,j = EQ
t Zt+1 = [qu × Zt+1,j+1 + qd × Zt+1,j ]
Bt+1 1 + rt,j
for t = T − 1, . . . , 0 and j = 0, . . . , t.
More generally, risk-neutral pricing for a “coupon” paying security takes the form
Q Zt+1 + Ct+1
Zt,j = Et
1 + rt,j
1
= [qu (Zt+1,j+1 + Ct+1,j+1 ) + qd (Zt+1,j + Ct+1,j )] (4)
1 + rt,j
where Ct,j is the coupon paid at time t and state j, and Zt,j is the “ex-coupon”
value of the security at time t and state j.
17 (Section 1)
E.G: Calibration and Extrapolation in Short-Rate Models
Can iterate (4) to obtain
" t+s
#
Zt X Cj Zt+s
= EQ
t + (5)
Bt B
j=t+1 j
Bt+s
Other securities including caps, floors, swaps and swaptions can all be priced
using (4) and appropriate boundary conditions.
Moreover, when we price securities like in this manner the model is guaranteed to
be arbitrage-free. Why?
18 (Section 1)
The Black-Derman-Toy (BDT) Model
The Black-Derman-Toy (BDT) model assumes the interest rate at node Ni,j is
given by
ri,j = ai e bi j
where log(ai ) and bi are drift and volatility parameters for log(r), respectively.
Once the parameters have been calibrated we can now consider using the model
to price less liquid or more exotic securities.
19 (Section 1)
Using BDT to Price a 2 − 8 Payer Swaption
Consider pricing a 2 − 8 payer swaption with fixed rate = 11.65%.
This is:
20 (Section 1)
Using BDT to Price a 2 − 8 Payer Swaption
We use a 10-period BDT lattice so that 1 period corresponds to 1 year.
Lattice was calibrated to the term structure of interest rates in the market
- there are therefore 20 free parameters, ai and bi for i = 0, . . . , 9, to choose
- and want to match 10 spot interest rates, st for t = 1, . . . , 10.
We can (and do) resolve this issue by simply setting bi = b = .005 for all i
- which leaves 10 unknown parameters.
It would be naive(!) in the extreme to assume that this is a fair price for the
swaption
- after all, what was so special about the choice of b = .005?
In that case, after recalibrating to the same term structure of interest rates, we
find a swaption price of $1, 962
- a price increase of approximately 50%!
Understood there was some (very small) credit risk associated with these banks
- so LIBOR would therefore be higher than corresponding rates on
government treasuries.
Since banks that are polled always had strong credit ratings (prior to the crisis)
spread between LIBOR and treasury rates was generally quite small.
Moreover, the pre-defined list of banks regularly updated so that banks whose
credit ratings have deteriorated are replaced with banks with superior credit
ratings.
This had the practical impact of ensuring (or so market participants believed up
until the crisis(!)) that forward LIBOR rates would only have a very modest
degree of credit risk associated with them.
24 (Section 1)
E.G: Regime Shifts: LIBOR Pre- and Post-Crisis
LIBOR extremely important since it’s a benchmark interest rate and many of the
most liquid fixed-income derivative securities are based upon it.
Before 2008 financial crisis, however, these LIBOR rates were viewed as being
essentially (default) risk-free
- led to many simplifications when it came to the pricing of the
aforementioned securities.
25 (Section 1)
Derivatives Pricing Pre-Crisis
e.g.1. Consider a floating rate note (FRN) with face value 100.
Note expires at time T = TM and pays 100 (in addition to the interest) at that
time.
A well known and important result is that the fair value of the FRN at any reset
point just after the interest has been paid, is 100
- follows by a simple induction argument.
26 (Section 1)
Derivatives Pricing Pre-Crisis
e.g.2. The forward LIBOR rate at time t based on simple interest for lending in
the interval [T1 , T2 ] is given by
1 P(t, T1 )
L(t, T1 , T2 ) = −1 (6)
T2 − T1 P(t, T2 )
LIBOR rates are quoted as simply-compounded interest rates, and are quoted on
an annual basis.
27 (Section 1)
Derivatives Pricing Pre-Crisis
With a fixed value of δ in mind can define the δ-year forward rate at time t with
maturity T as
1 P(t, T )
L(t, T , T + δ) = −1 . (7)
δ P(t, T + δ)
The δ-year spot LIBOR rate at time t then given by L(t, t + δ) := L(t, t, t + δ).
Also note that L(t, T , T + δ) is the FRA rate at time t for the swap(let)
maturing at time T + δ.
That is, L(t, T , T + δ) is unique value of K for which the swaplet that pays
±(L(T , T + δ) − K ) at time T + δ is worth zero at time t < T .
28 (Section 1)
Derivatives Pricing Pre-Crisis
e.g.3. We can compound two consecutive 3-month forward LIBOR rates to
obtain corresponding 6-month forward LIBOR rate.
In particular, we have
F13m F 3m F 6m
1+ 1+ 2 =1+ (8)
4 4 2
where:
F13m := L(0, 3m, 6m)
F13m := L(0, 6m, 9m)
F 6m := L(0, 3m, 9m).
29 (Section 1)
During the Crisis
All three pricing results broke down during the 2008 financial crisis.
Because these results were also required for pricing of swaps and swaptions, caps
and floors, etc., the entire approach to the pricing of fixed income derivative
securities needed broke down.
Cause of this breakdown was the loss of trust in the banking system and the loss
of trust between banks.
This meant that LIBOR rates were no longer viewed as being risk-free.
Easiest way to demonstrate this loss of trust is via the spread between LIBOR
and OIS rates.
30 (Section 1)
Overnight Indexed Swaps (OIS)
An overnight indexed swap (OIS) is an interest-rate swap where the periodic
floating payment is based on a return calculated from the daily compounding of
an overnight rate (or index).
e.g. Fed Funds rate in the U.S., EONIA in the Eurozone and SONIA in the
U.K.
The fixed rate in the swap is the fixed rate the makes the swap worth zero at
inception.
Note there is essentially no credit / default risk premium included in OIS rates
- due to fact that floating payments in the swap are based on overnight
lending rates.
31 (Section 1)
Overnight Indexed Swaps (OIS)
We see that the LIBOR-OIS spreads were essentially zero leading up to the
financial crisis
- because market viewed LIBOR rates as being essentially risk-free with no
associated credit risk.
This changed drastically during the crisis when entire banking system nearly
collapsed and market participants realized there were substantial credit risks in
the interbank lending market.
Since the crisis the spreads have not returned to zero and must now be
accounted for in all fixed-income pricing models.
This regime switch constituted an extreme form of model risk where the entire
market made an assumption that resulted in all pricing models being hopelessly
inadequate!
33 (Section 1)
Derivatives Pricing Post-Crisis
Since the financial crisis we no longer take LIBOR rates to be risk-free.
OIS forward rates (but not LIBOR rates!) are calculated as in (6) or (7) so that
1 Pd (t, T1 )
Fd (t, T1 , T2 ) = −1
T2 − T1 Pd (t, T2 )
where Pd denotes the discount factor computed from the OIS curve and Fd
denotes forward rates implied by these OIS discount factors.
34 (Section 1)
Derivatives Pricing Post-Crisis
Forward LIBOR rates are now defined as risk-neutral expectations (under the
forward measure) of the spot LIBOR rate, L(T , T + δ).
Relationships such as (8) no longer hold and we now live in a multi-curve world
- with a different LIBOR curve for different tenors
- e.g. we have a 3-month LIBOR curve, a 6-month LIBOR curve etc.
Question: Which if any of the 3-month and 6-month LIBOR curves will be
lower? Why?
35 (Section 1)
Model Transparency
Will now consider some well known models in equity derivatives space
- these models (or slight variations of them) also be used in foreign exchange
and commodity derivatives markets.
Can be useful when trading in exotic derivative securities and not just the
“vanilla" securities for which prices are readily available.
If these models can be calibrated to observable vanilla security prices, they can
then be used to construct a range of plausible prices for more exotic securities
- so they can help counter extrapolation risk
- and provide alternative estimates of the Greeks or hedge ratios.
Will not focus on stochastic calculus or various numerical pricing techniques here.
Instead simply want to emphasize that a broad class of tractable models exist and
that they should be employed when necessary.
36 (Section 2)
Model Transparency
Very important for the users of these models to fully understand their various
strengths and weaknesses
- and the implications of these strengths and weaknesses when they are used
to price and risk manage a given security.
Will see later how these models and others can be used together to infer prices of
exotic securities as well as their Greeks or hedge ratios.
In particular will emphasize how they can be used to avoid the pitfalls associated
with price extrapolation.
37 (Section 2)
Recalling the Implied Volatility Surface
Recall the GBM model:
Therefore have a single free parameter, σ, which we can fit to option prices or,
equivalently, the volatility surface.
Not all surprising then that this exercise fails: the volatility surface is never flat so
that a constant σ fails to re-produce market prices.
This became particularly apparent after the stock market crash of October 1987
when market participants began to correctly identify that lower strike options
should be priced with a higher volatility, i.e. there should be a volatility skew.
38 (Section 2)
The Implied Volatility Surface
39 (Section 2)
Local Volatility Models
The local volatility framework assumes risk-neutral dynamics satisfy
dSt = (r − q)St dt + σl (t, St )St dWt (9)
– so σl (t, St ) is now a function of time and stock price.
Key result is the Dupire formula that links the local volatilities, σl (t, St ), to the
implied volatility surface:
∂C ∂C
+ (r − q)K + qC
σl2 (T , K ) = ∂T
K 2 ∂2C
∂K
(10)
2 ∂K 2
Moreover the Greeks that are calculated from a local volatility model are
generally not consistent with what is observed empirically.
They are known to be particularly unsuitable for pricing derivatives that depend
on the forward skew such as forward-start options and cliquets.
41 (Section 2)
Local Volatility
110 300
100 250
Implied Vol (%)
Implied and local volatility surfaces: local volatility surface is constructed from
implied volatility surface using Dupire’s formula.
42 (Section 2)
Stochastic Volatility Models
Most well-known stochastic volatility model is due to Heston (1989).
It’s a two-factor model and assumes separate dynamics for both the stock price
and instantaneous volatility so that
√ (s)
dSt = (r − q)St dt + σt St dWt (11)
√ (vol)
dσt = κ (θ − σt ) dt + γ σt dWt (12)
(s) (vol)
where Wt and Wt are standard Q-Brownian motions with constant
correlation coefficient, ρ.
The particular EMM that we choose to work with would be determined by some
calibration algorithm
- the typical method of choosing an EMM in incomplete market models.
∂C 1 2 ∂ 2 C ∂2C 1 2 ∂2C ∂C ∂C
+ σS + ρσγS + γ σ 2 + (r − q)S + κ (θ − σ) = rC .
∂t 2 ∂S 2 ∂S∂σ 2 ∂σ ∂S ∂σ
(13)
Price then obtained by solving (13) subject to the relevant boundary conditions.
Once parameters of the model have been identified via some calibration
algorithm, pricing can be done by either solving (13) numerically or alternatively,
using Monte-Carlo.
Heston generally captures long-dated skew quite well but struggles with
short-dated skew, particularly when it is steep.
44 (Section 2)
Stochastic Volatility Models
40
30
25
20
15
2.5 50
2
1.5 100
1
0.5
0 150
Strike
Time−to−Maturity
45 (Section 2)
A Detour: Forward Skew
Forward skew is the implied volatility skew that prevails at some future date and
that is consistent with some model that has been calibrated to today’s market
data.
e.g. Suppose we simulate some model that has been calibrated to today’s
volatility surface forward to some date T > 0.
On any simulated path can then compute the implied volatility surface as of that
date T .
46 (Section 2)
Forward Skew and Local Volatility
Well known that forward skew in local volatility models tends to be very flat.
A significant feature of local volatility models that is not at all obvious until one
explores and works with the model in some detail.
47 (Section 2)
Example: The Locally Capped, Globally Floored Cliquet
The locally capped, globally floored cliquet is structured like a bond:
Investor pays the bond price upfront at t = 0.
In return he is guaranteed to receive the principal at maturity as well as an
annual coupon.
The coupon is based on monthly returns of underlying security over previous
year. It is calculated as
( 12 )
X
Payoff = max min (max(rt , −.01), .01) , MinCoupon (14)
t=1
Would therefore expect coupon value to be very sensitive to the forward skew in
the model.
In particular, would expect coupon value to increase as the skew becomes more
negative.
So important to evaluate exotic securities with which we are not familiar under
different modeling assumptions.
50 (Section 2)
Jump-Diffusion Models
Merton’s jump diffusion model assumes that time t stock price satisfies
Nt
2 Y
/2)t+σWt
St = S0 e (µ−σ Yi (15)
i=1
Each Yi represents the magnitude of the i th jump and stock price behaves like a
regular GBM between jumps.
If the dynamics in (15) are under an EMM, Q, then µ, λ and the mean jump size
are constrained in such a way that Q-expected rate of return must equal r − q.
Question: Can you see how using (15) to price European options might lead to
an infinitely weighted sum of Black-Scholes options prices?
Other tractable jump-diffusion models are due to Duffie, Pan and Singleton
(1998), Kou (2002), Bates (1996) etc.
51 (Section 2)
Merton’s Jump-Diffusion Model
140
100
80
60
40
20
150
1
0.8
100 0.6
0.4
0.2
50 0
Strike Time−to−Maturity
52 (Section 2)
Levy Processes
Definition: A Levy process is any continuous-time process with stationary and
independent increments.
The two most common examples of Levy processes are Brownian motion and
the Poisson process.
A Levy process with jumps can be of infinite activity so that it jumps
infinitely often in any finite time interval, or of finite activity so that it
makes only finitely many jumps in any finite time interval.
The Merton jump-diffusion model is an example of an exponential Levy
process of finite activity.
The most important result for Levy processes is the Levy-Khintchine formula
which describes the characteristic function of a Levy process.
53 (Section 2)
Time-Changed Exponential Levy Processes
Levy processes cannot capture “volatility clustering", the tendency of high
volatility periods to be followed by periods of high volatility, and low volatility
periods to be followed by periods of low volatility.
Levy models with stochastic time, however, can capture volatility clustering.
Note that if the subordinator, Yt , is a jump process, then St can jump even if the
process Xt cannot jump itself.
Note that in (17) the dynamics of St are Q-dynamics corresponding to the cash
account as numeraire. Why is this the case?
- typical of how incomplete markets are often modeled: we directly specify
Q-dynamics so that martingale pricing holds by construction
- other free parameters are then chosen by a calibration algorithm.
55 (Section 2)
Normal-Inverse-Gaussian Process with CIR Clock
3600
3400
3200
3000
2800
Price
2600
2400
2200
2000
1800
1600
0 0.5 1 1.5 2 2.5 3
Time
56 (Section 2)
Variance-Gamma Process with OU-Gamma Clock
3600
3400
3200
3000
2800
Price
2600
2400
2200
2000
1800
1600
0 0.5 1 1.5 2 2.5 3
Time
57 (Section 2)
Time-Changed Exponential Levy Processes
Many Levy processes are quite straightforward to simulate.
However, the characteristic functions of the log-stock price are often available in
closed form, even when the Levy process has a stochastic clock.
Indeed, in early 2000’s there was debate about precisely this issue.
Context for the debate was general approach in the market to use simple
one-factor models to price Bermudan swaptions.
Longstaff, Santa-Clara and Schwartz (2001) argued that since the term-structure
of interest rates was driven by several factors, using a one-factor model to price
Bermudan swaptions was fundamentally flawed.
This argument was sound ... but they neglected to account for the calibration
process.
59 (Section 3)
Calibration is an Integral part of the Pricing Process
Their analysis relied on the fact that when pricing Bermudan swaptions it was
common to calibrate the one-factor models to the prices of European
swaptions.
On the basis that Bermudan swaptions are actually quite “close” to European
swaptions, they argued the extrapolation risk was small.
This debate clearly tackled the issue of model transparency and highlighted
that model dynamics may not be important at all if the exotic security being
priced is within or close to the span of the securities used to calibrate the model.
Essentially two wrongs, i.e. a bad model and bad parameters, can together make
a right, i.e. an accurate price!
60 (Section 3)
Using Many Models to Manage Model Risk
The models we have described are quite representative of the models that are
used for pricing many equity and FX derivative securities in practice.
Therefore very important for users of these models to fully understand their
strengths and weaknesses and the implications of these strengths and
weaknesses when used to price and risk manage a given security.
Will see how these models and others can be used together to infer the prices of
exotic securities as well as their Greeks or hedge ratios.
In particular we will emphasize how they can be used to avoid the pitfalls
associated with price extrapolation.
61 (Section 3)
Another Example: Back to Barrier Options
Well known that the price of a barrier option is not solely determined by the
marginal distributions of the underlying stock price.
62 (Section 3)
Down-and-Out Barrier Call Prices Under Different Models
550
500
400
Hest
350 Hest-J
BNS
300 VG-OUG
VG-CIR
250
NIG-CIR
200 NIG-OUG
150
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Barrier (% of Spot)
Down-and-out (DOB) barrier call option prices for different models, all of which
have been calibrated to the same implied volatility surface.
See “A Perfect Calibration! Now What?” by Schoutens, Simons and Tistaert
(2003) for further details.
63 (Section 3)
Up-and-Out Barrier Call Prices Under Different Models
200
180
Hest
160 Hest-J
60
40
20
0
1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
Barrier (% of Spot)
Up-and-out (UOB) barrier call option prices for different models, all of which
have been calibrated to the same implied volatility surface.
See “A Perfect Calibration! Now What?” by Schoutens, Simons and Tistaert
(2003) for further details.
64 (Section 3)
Barrier Options and Extrapolation Risk
Clear that the different models result in very different barrier prices.
Perhaps best solution is to price the barrier using several different models that:
(a) have been calibrated to the market prices of liquid securities and
(b) have reasonable dynamics that make sense from a modeling viewpoint.
The minimum and maximum of these prices could then be taken as the bid-offer
prices if they are not too far apart.
If they are far apart, then they simply provide guidance on where the fair price
might be.
Using many plausible models is perhaps the best way to avoid extrapolation risk.
65 (Section 3)
Model Calibration Risk
Have to be very careful when calibrating models to market prices!
In general (and this is certainly the case with equity derivatives markets) the
most liquid instruments are vanilla call and put options.
Assuming then that we have a volatility surface available to us, we can at the
very least calibrate our model to this surface.
where ModelPricei and MarketPricei are model and market prices, respectively,
of the i th option used in the calibration.
The ωi ’s are fixed weights that we choose to reflect either the importance or
accuracy of the i th observation and γ is the vector of model parameters.
2. Even if there is only one local minimum, there may be “valleys" containing
the local minimum in which the objective function is more or less flat.
Then possible that the optimization routine will terminate at different points
in the valley even when given the same input, i.e. market option prices
- this is clearly unsatisfactory!
Consider following scenario:
It tells you nothing(!) about the joint distributions of the stock price at different
times.
All of the parameter combinations in the “valley" might result in the very similar
marginal distributions, but they will often result in very different joint
distributions.
This was demonstrated when we saw how different models, that had been
calibrated to the same volatility surface, gave very different prices for
down-and-out call prices.
68 (Section 3)