0% found this document useful (0 votes)
9 views261 pages

QRM Course

The document outlines the course IEOR E4602 on Quantitative Risk Management, focusing on risk factors, loss distributions, and various risk measurement techniques such as Value-at-Risk (VaR) and Expected Shortfall (ES). It discusses the importance of understanding conditional and unconditional loss distributions, linear approximations, and the Greeks in risk analysis. Additionally, it highlights the weaknesses of VaR and the necessity of scenario analysis and stress testing in managing financial risks.

Uploaded by

pattanayakdilip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views261 pages

QRM Course

The document outlines the course IEOR E4602 on Quantitative Risk Management, focusing on risk factors, loss distributions, and various risk measurement techniques such as Value-at-Risk (VaR) and Expected Shortfall (ES). It discusses the importance of understanding conditional and unconditional loss distributions, linear approximations, and the Greeks in risk analysis. Additionally, it highlights the weaknesses of VaR and the necessity of scenario analysis and stress testing in managing financial risks.

Uploaded by

pattanayakdilip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 261

IEOR E4602: Quantitative Risk Management

Basic Concepts and Techniques of Risk Management

Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]

References: Chapter 2 of 2nd ed. of MFE by McNeil, Frey and Embrechts.


Outline

Risk Factors and Loss Distributions


Linear Approximations to the Loss Function
Conditional and Unconditional Loss Distributions

Risk Measurement
Scenario Analysis and Stress Testing
Value-at-Risk
Expected Shortfall (ES)

Standard Techniques for Risk Measurement


Evaluating Risk Measurement Techniques

Other Considerations

2 (Section 0)
Risk Factors and Loss Distributions
Notation (to be used throughout the course):
∆ a fixed period of time such as 1 day or 1 week.
Let Vt be the value of a portfolio at time t∆.
So portfolio loss between t∆ and (t + 1)∆ is given by

Lt+1 := − (Vt+1 − Vt )

- note that a loss is a positive quantity


- it (of course) depends on change in values of the securities.

More generally, may wish to define a set of d risk factors

Zt := (Zt,1 , . . . , Zt,d )

so that
Vt = f (t, Zt ).
for some function f : R+ × Rd → R.
3 (Section 1)
Risk Factors and Loss Distributions
e.g. In a stock portfolio might take the stock prices or some function of the
stock prices as our risk factors.

e.g. In an options portfolio Zt might contain stock factors together with implied
volatility and interest rate factors.

Let Xt := Zt − Zt−1 denote the change in values of the risk factors between
times t and t + 1.

Then have

Lt+1 (Xt+1 ) = − (f (t + 1, Zt + Xt+1 ) − f (t, Zt ))

Given the value of Zt , the distribution of Lt+1 depends only on the distribution
of Xt+1 .

Estimating the (conditional) distribution of Xt+1 is then a very important goal in


much of risk management.

4 (Section 1)
Linear Approximations to the Loss Function
Assuming f (·, ·) is differentiable, can use a first order Taylor expansion to
approximate Lt+1 :
d
!
X
L̂t+1 (Xt+1 ) := − ft (t, Zt )∆ + fzi (t, Zt ) Xt+1,i (1)
i=1

where f -subscripts denote partial derivatives.

First order approximation commonly used when Xt+1 is likely to be small


- often the case when ∆ is small, e.g. ∆ = 1/365 ≡ 1 day, and market not
too volatile.
Second and higher order approximations also based on Taylor’s Theorem can also
be used.

Important to note, however, that if Xt+1 is likely to be very large then Taylor
approximations can fail.

5 (Section 1)
Conditional and Unconditional Loss Distributions
Important to distinguish between the conditional and unconditional loss
distributions.

Consider the series Xt of risk factor changes and assume that they form a
stationary time series with stationary distribution FX .

Let Ft denote all information available in the system at time t including in


particular {Xs : s ≤ t}.

Definition: The unconditional loss distribution is the distribution of Lt+1 given


the time t composition of the portfolio and assuming the CDF of Xt+1 is given
by FX .

Definition: The conditional loss distribution is the distribution of Lt+1 given


the time t composition of the portfolio and conditional on the information in Ft .

6 (Section 1)
Conditional and Unconditional Loss Distributions
If the Xt ’s are IID then the conditional and unconditional distributions coincide.

For long time horizons, e.g. ∆ = 6 months, we might be more inclined to use the
unconditional loss distribution.

However, for short horizons, e.g. 1 day or 10 days, then the conditional loss
distribution is clearly the appropriate distribution
- true in particular in times of high market volatility when the unconditional
distribution would bear little resemblance to the true conditional distribution.

7 (Section 1)
Example: A Stock Portfolio
Consider a portfolio of d stocks with St,i denoting time t price of the i th stock
and λi denoting number of units of i th stock.

Take log stock prices as risk factors so


Xt+1,i = ln St+1,i − ln St,i
and
d
X
λi St,i e Xt+1,i − 1 .

Lt+1 = −
i=1
Linear approximation satisfies
d
X d
X
L̂t+1 = − λi St,i Xt+1,i = −Vt ωt,i Xt+1,i
i=1 i=1

where ωt,i := λi St,i /Vt is the i th portfolio weight.


h i
If E[Xt+1,i ] = µ and Covar(Xt+1,i ) = Σ, then Et L̂t+1 = −Vt ω 0 µ and
 
Vart L̂t+1 = Vt2 ω 0 Σω.
8 (Section 1)
Example: An Options Portfolio
Recall the Black-Scholes formula for time t price of a European call option with
strike K and maturity T on a non-dividend paying stock satisfies

C (St , t, σ) = St Φ(d1 ) − e −r(T−t) K Φ(d2 )

St

log K+ (r + σ 2 /2)(T − t)
where d1 = √
σ T −t

d2 = d1 − σ T − t

and where:
Φ(·) is the standard normal distribution CDF
St = time t price of underlying security
r = continuously compounded risk-free interest rate.

In practice use an implied volatility, σ(K , T , t), that depends on strike, maturity
and current time, t.

9 (Section 1)
Example: An Options Portfolio
Consider a portfolio of European options all on the same underlying security.

If the portfolio contains d different options with a position of λi in the i th option,


then
Lt+1 = −λ0 (St+1 − St ) −
d
X
λi (C (St+1 , t + 1, σ(Ki , Ti , t + 1) − C (St , t, σ(Ki , Ti , t)))
i=1

where λ0 is the position in the underlying security.

Note that by put-call parity we can assume that all options are call options.

Can also use linear approximation technique to approximate Lt+1


- would result in a delta-vega-theta approximation.

For derivatives portfolios, the linear approximation based on 1st order Greeks is
often inadequate
- 2nd order approximations involving gamma, volga and vanna might then be
used – but see earlier warning regarding use of Taylor approximations.
10 (Section 1)
Risk Factors in the Options Portfolio
Can again take log stock prices as risk factors but not clear how to handle the
implied volatilities.

There are several possibilities:


1. Assume the σ(K , T , t)’s simply do not change
- not very satisfactory but commonly assumed when historical simulation is
used to approximate the loss distribution and historical data on the changes
in implied volatilities are not available.

2. Let each σ(K , T , t) be a separate factor. Not good for two reasons:
(a) It introduces a large number of factors.
(b) Implied volatilities are not free to move independently since no-arbitrage
assumption imposes strong restrictions on how volatility surface may move.

Therefore important to choose factors in such a way that no-arbitrage restrictions


are easily imposed when we estimate the loss distribution.

11 (Section 1)
Risk Factors in the Options Portfolio
3. In light of previous point, it may be a good idea to parameterize the
volatility surface with just a few parameters
- and assume that only those parameters can move from one period to the next
- parameterization should be so that no-arbitrage restrictions are easy to
enforce.

4. Use dimension reduction techniques such as principal components analysis


(PCA) to identify just two or three factors that explain most of the
movements in the volatility surface.

12 (Section 1)
Example: A Bond Portfolio
Consider a portfolio containing quantities of d different default-free zero-coupon
bonds.

Thei th bond has price Pt,i , maturity Ti and face value 1.

st,Ti is the continuously compounded spot interest rate for maturity Ti so that

Pt,i = exp(−st,Ti (Ti − t)).

There are λi units of i th bond in the portfolio so total portfolio value given by
d
X
Vt = λi exp(−st,Ti (Ti − t)).
i=1

14 (Section 1)
Example: A Bond Portfolio
Assume now only parallel changes in the spot rate curve are possible
- while unrealistic, a common assumption in practice
- this is the assumption behind the use of duration and convexity.

Then if spot curve moves by δ the portfolio loss satisfies


d
X  
Lt+1 = − λi e −(st+∆,Ti +δ)(Ti −t−∆) − e −st,Ti (Ti −t)
i=1
d
X
' − λi (st,Ti (Ti − t) − (st+∆,Ti + δ)(Ti − t − ∆)) .
i=1

Therefore have a single risk factor, δ.

15 (Section 1)
Approaches to Risk Measurement
1. Notional Amount Approach.
2. Factor Sensitivity Measures.
3. Scenario Approach.
4. Measures based on loss distribution, e.g. Value-at-Risk (VaR) or Conditional
Value-at-Risk (CVaR).

16 (Section 2)
An Example of Factor Sensitivity Measures: the Greeks
Scenario analysis for derivatives portfolios is often combined with the Greeks to
understand the riskiness of a portfolio
- and sometimes to perform a P&L attribution.

Suppose then we have a portfolio of options and futures


- all written on the same underlying security.

Portfolio value is the sum of values of individual security positions


- and the same is true for the portfolio Greeks, e.g. the portfolio delta,
portfolio gamma and portfolio vega. Why?

Consider now a single option in the portfolio with price C (S, σ, . . .).

Will use a delta-gamma-vega approximation to estimate risk of the position


- but approximation also applies (why?) to the entire portfolio.

Note approximation only holds for “small” moves in underlying risk factors
- a very important observation that is lost on many people!
17 (Section 2)
Delta-Gamma-Vega Approximations to Option Prices
A simple application of Taylor’s Theorem yields
∂C 1 ∂2C ∂C
C (S + ∆S, σ + ∆σ) ≈ C (S, σ) + ∆S + (∆S)2 + ∆σ
∂S 2 ∂S 2 ∂σ
1
= C (S, σ) + ∆S δ + (∆S)2 Γ + ∆σ vega.
2
Therefore obtain
Γ
P&L ≈ δ∆S + (∆S)2 + vega ∆σ
2
= delta P&L + gamma P&L + vega P&L .
When ∆σ = 0, obtain the well-known delta-gamma approximation
- often used, for example, in historical Value-at-Risk (VaR) calculations.

Can also write


2
ΓS 2
  
∆S ∆S
P&L ≈ δS + + vega ∆σ
S 2 S
= ESP × Return + $ Gamma × Return2 + vega ∆σ (2)
where ESP denotes the equivalent stock position or “dollar” delta.
18 (Section 2)
Scenario Analysis and Stress Testing

– Stress testing an options portfolio written on the Eurostoxx 50.

19 (Section 2)
Scenario Analysis and Stress Testing
In general we want to stress the risk factors in our portfolio.

Therefore very important to understand the dynamics of the risk factors.


e.g. The implied volatility surface almost never experiences parallel shifts. Why?
- In fact changes in volatility surface tend to follow a square root of time
rule.

When stressing a portfolio, it is also important to understand what risk factors


the portfolio is exposed to.
e.g. A portfolio may be neutral with respect to the two most “important" risk
factors but have very significant exposure to a third risk factor
- Important then to conduct stresses of that third risk factor
- Especially if the trader or portfolio manager knows what stresses are applied!

20 (Section 2)
Value-at-Risk
Value-at-Risk (VaR) the most widely (mis-)used risk measure in the financial
industry.

Despite the many weaknesses of VaR, financial institutions are required to use it
under the Basel II capital-adequacy framework.

And many institutions routinely report VaR numbers to shareholders, investors or


regulatory authorities.

VaR is calculated from the loss distribution


- could be conditional or unconditional
- could be a true loss distribution or some approximation to it.

Will assume that horizon ∆ has been fixed so that L represents portfolio loss
over time interval ∆.

Will use FL (·) to denote the CDF of L.

22 (Section 2)
Value-at-Risk
Definition: Let F : R → [0, 1] be an arbitrary CDF. Then for α ∈ (0, 1) the
α-quantile of F is defined by
qα (F ) := inf{x ∈ R : F (x) ≥ α}.

If F is continuous and strictly increasing, then qα (F ) = F −1 (α).


For a random variable L with CDF FL (·), will often write qα (L) instead of
qα (FL ).
Since any CDF is by definition right-continuous, immediately obtain the
following result:
Lemma: A point x0 ∈ R is the α-quantile of FL if and only if
(i) FL (x0 ) ≥ α and
(ii) FL (x) < α for all x < x0 .

Definition: Let α ∈ (0, 1) be some fixed confidence level. Then the VaR of the
portfolio loss at the confidence interval, α, is given by VaRα := qα (L), the
α-quantile of the loss distribution.
23 (Section 2)
VaR for the Normal Distributions
Because the normal CDF is both continuous and strictly increasing, it is
straightforward to calculate VaRα .

So suppose L ∼ N(µ, σ 2 ). Then

VaRα = µ + σΦ−1 (α) (3)

where Φ(·) is the standard normal CDF.

This follows from previous lemma if we can show FL (VaRα ) = α


- but this follows immediately from (3).

24 (Section 2)
VaR for the t Distributions
The t CDF also continuous and strictly increasing so again straightforward to
calculate VaRα .

So let L ∼ t(ν, µ, σ 2 ), i.e. (L − µ)/σ has a standard t distribution with ν > 2


degrees-of-freedom (dof). Then

VaRα = µ + σtν−1 (α)

where tν is the CDF for the t distribution with ν dof.

Note that now E[L] = µ and Var(L) = νσ 2 /(ν − 2).

25 (Section 2)
Weaknesses of VaR
1. VaR attempts to describe the entire loss distribution with just a single
number!
- so significant information is lost
- this criticism applies to all scalar risk measures
- one way around it is to report VaRα for several values of α.

2. Significant model risk attached to VaR


- e.g. if loss distribution is heavy-tailed but a light-tailed, e.g. normal,
distribution is assumed, then VaRα will be severely underestimated as α → 1.

3. A fundamental problem with VaR is that it can be very difficult to estimate


the loss distribution
- true of all risk measures based on the loss distribution.

4. VaR is not a sub-additive risk measure so that it doesn’t lend itself to


aggregation.

26 (Section 2)
(Non-) Sub-Additivity of VaR
e.g. Let L = L1 + L2 be the total loss associated with two portfolios, each with
respective losses, L1 and L2 .

Then
qα (FL ) > qα (FL1 ) + qα (FL2 ) is possible!

An undesirable property as we would expect some diversification benefits


when we combine two portfolios together.
Such a benefit would be reflected by the combined portfolio having a
smaller risk measure than the sum of the two individual risk measures.

Will discuss sub-additivity property when we study coherent risk measures later in
course.

27 (Section 2)
Advantages of VaR
VaR is generally “easier” to estimate:
True of quantile estimation in general since quantiles are not very sensitive
to outliers.
- not true of other risk measures such as Expected Shortfall / CVaR
Even then, it becomes progressively more difficult to estimate VaRα as
α→1
- may be able to use Extreme Value Theory (EVT) in these circumstances.

But VaR easier to estimate only if we have correctly specified the appropriate
probability model
- often an unjustifiable assumption!

Value of ∆ that is used in practice generally depends on the application:


For credit, operational and insurance risk ∆ often on the order of 1 year.
For financial risks typical values of ∆ are on the order of days.

28 (Section 2)
Expected Shortfall (ES)
Definition: For a portfolio loss, L, satisfying E[|L|] < ∞ the expected shortfall
at confidence level α ∈ (0, 1) is given by
Z 1
1
ESα := qu (FL ) du. (4)
1−α α

Relationship between ESα and VaRα is therefore given by


Z 1
1
ESα := VaRu (L) du
1−α α

- so clear that ESα (L) ≥ VaRα (L).

29 (Section 2)
Expected Shortfall (ES)
A more well known representation of ESα (L) holds when FL is continuous:

Lemma: If FL is a continuous CDF then

E [L; L ≥ qα (L)]
ESα :=
1−α

= E [L | L ≥ VaRα ] . (5)

Proof: See Lemma 2.13 in McNeil, Frey and Embrechts (MFE). 2

Expected Shortfall also known as Conditional Value-at-Risk (CVaR)


- when there are atoms in the distribution CVaR is defined slightly differently
- but we will continue to take (4) as our definition.

30 (Section 2)
Example: Expected Shortfall for a Normal Distribution
Can use (5) to compute expected shortfall of an N(µ, σ 2 ) random variable.

We find
φ (Φ−1 (α))
ESα = µ + σ (6)
1−α
where φ(·) is the PDF of the standard normal distribution.

31 (Section 2)
Example: Expected Shortfall for a t Distribution
Let L ∼ t(ν, µ, σ 2 ) so that L̃ := (L − µ)/σ has a standard t distribution with
ν > 2 dof.

Then easy to see that ESα (L) = µ + σESα (L̃).

Straightforward using direct integration to check that

gν (tν−1 (α)) ν + (tν−1 (α))2


 
ESα (L̃) = (7)
1−α ν−1

where tν (·) and gν (·) are the CDF and PDF, respectively, of the standard t
distribution with ν dof.

Remark: The t distribution is a much better model of stock (and other asset)
returns than the normal model. In empirical studies, values of ν around 5 or 6 are
often found to fit best.

32 (Section 2)
The Shortfall-to-Quantile Ratio
Can compare VaRα and ESα by considering their ratio as α → 1.

Not too difficult to see that in the case of the normal distribution

ESα
→ 1 as α → 1.
VaRα
However, in the case of the t distribution with ν > 1 dof we have

ESα ν
→ > 1 as α → 1.
VaRα ν−1

33 (Section 2)
Standard Techniques for Risk Measurement
1. Historical simulation.
2. Monte-Carlo simulation.
3. Variance-covariance approach.

34 (Section 3)
Historical Simulation
Instead of using a probabilistic model to estimate distribution of Lt+1 (Xt+1 ), we
could estimate the distribution using a historical simulation.

In particular, if we know the values of Xt−i+1 for i = 1, . . . , n, then can use this
data to create a set of historical losses:

{L̃i := Lt+1 (Xt−i+1 ) : i = 1, . . . , n}

- so L̃i is the portfolio loss that would occur if the risk factor returns on date
t − i + 1 were to recur.

To calculate value of a given risk measure we simply assume the distribution of


Lt+1 (Xt+1 ) is discrete and takes on each of the values L̃i w.p. 1/n for
i = 1, . . . , n, i.e., we use the empirical distribution of the Xt ’s.

e.g. Suppose we wish to estimate VaRα . Then can do so by computing the


α-quantile of the L̃i ’s.

35 (Section 3)
Historical Simulation
Suppose the L̃i ’s are ordered by

L̃n,n ≤ · · · ≤ L̃1,n .

Then an estimator of VaRα (Lt+1 ) is L̃[n(1−α)],n where [n(1 − α)] is the largest
integer not exceeding n(1 − α).

Can estimate ESα using

f α = L̃[n(1−α)],n + · · · + L̃1,n .
ES
[n(1 − α)]

Historical simulation approach generally difficult to apply for derivative portfolios.


Why?
But if applicable, then easy to apply.

Historical simulation estimates the unconditional loss distribution


- so not good for financial applications!

36 (Section 3)
Monte-Carlo Simulation
Monte-Carlo approach similar to historical simulation approach.

But now use some parametric distribution for the change in risk factors to
generate sample portfolio losses.

The (conditional or unconditional) distribution of the risk factors is estimated


and m portfolio loss samples are generated.

Free to make m as large as possible


Subject to constraints on computational time.
Variance reduction methods often employed to obtain improved estimates of
required risk measures.

While Monte-Carlo is an excellent tool, it is only as good as the model used to


generate the data: if the estimated distribution of Xt+1 is poor, then
Monte-Carlo of little value.

37 (Section 3)
The Variance-Covariance Approach
In the variance-covariance approach assume that Xt+1 has a multivariate normal
distribution so that
Xt+1 ∼ MVN (µ, Σ) .
Also assume the linear approximation
d
!
X
L̂t+1 (Xt+1 ) := − ft (t, Zt )∆ + fzi (t, Zt ) Xt+1,i
i=1

is sufficiently accurate. Then can write

L̂t+1 (Xt+1 ) = −(ct + bt > Xt+1 )

for a constant scalar, ct , and constant vector, bt .

Therefore obtain
 
L̂t+1 (Xt+1 ) ∼ N −ct − bt > µ, bt > Σbt .

and can calculate any risk measures of interest.


38 (Section 3)
The Variance-Covariance Approach
This technique can be either conditional or unconditional
- depends on how µ and Σ are estimated.
The approach provides straightforward analytically tractable method of
determining the loss distribution.

But it has several weaknesses: risk factor distributions are often fat- or
heavy-tailed but the normal distribution is light-tailed
- this is easy to overcome as there are other multivariate distributions that are
also closed under linear operations.
e.g. If Xt+1 has a multivariate t distribution so that

Xt+1 ∼ t (ν, µ, Σ)

then  
L̂t+1 (Xt+1 ) ∼ t ν, −ct − bt > µ, bt > Σbt .

A more serious problem is that the linear approximation will often not work well
- particularly true for portfolios of derivative securities.
39 (Section 3)
Evaluating Risk Measurement Techniques
Important for any risk manger to constantly evaluate the reported risk measures.

e.g. If daily 95% VaR is reported then should see daily losses exceeding the
reported VaR approximately 95% of the time.

So suppose reported VaR numbers are correct and define



1, Li ≥ VaRi
Yi :=
0, otherwise

where VaRi and Li are the reported VaR and realized loss for period i.

If Yi ’s are IID, then


n
X
Yi ∼ Binomial(n, .05)
i=1

Can use standard statistical tests to see if this is indeed the case.

Similar tests can be constructed for ES and other risk measures.


40 (Section 3)
Other Considerations
Risk-Neutral and Data-Generating (Empirical) Probability Measures.
Data Risk.
Multi-Period Risk Measures and Scaling.
Model Risk.
Data Aggregation.
Liquidity Risk.
P&L Attribution.

41 (Section 4)
IEOR E4602: Quantitative Risk Management
Multivariate Distributions

Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Joint and Marginal CDFs
Let X = (X1 , . . . Xn ) is an n-dimensional vector of random variables.

Definition (Joint CDF): For all x = (x1 , . . . , xn )> ∈ Rn , the joint cumulative
distribution function (CDF) of X satisfies

FX (x) = FX (x1 , . . . , xn ) = P(X1 ≤ x1 , . . . , Xn ≤ xn ).

Definition (Marginal CDF): For a fixed i, the marginal CDF of Xi satisfies

FXi (xi ) = FX (∞, . . . , ∞, xi , ∞, . . . ∞).

Straightforward to generalize to joint marginal distributions. e.g

Fij (xi , xj ) = FX (∞, . . . , ∞, xi , ∞, . . . , ∞, xj , ∞, . . . ∞).

2 (Section 1)
Conditional CDFs
If X has a probability density function (PDF) then
Z x1 Z xn
FX (x1 , . . . , xn ) = ··· f (u1 , . . . , un ) du1 . . . dun .
−∞ −∞

A collection of random variables is independent if the joint CDF (or PDF if it


exists) can be factored into the product of the marginal CDFs (or PDFs).

If X1 = (X1 , . . . , Xk )> and X2 = (Xk+1 , . . . , Xn )> is a partition of X then the


conditional CDF satisfies

FX2 |X1 (x2 |x1 ) = P(X2 ≤ x2 |X1 = x1 ).

If X has a PDF, f (·), then it satisfies


Z xk+1 Z xn
f (x1 , . . . , xk , uk+1 , . . . , un )
FX2 |X1 (x2 |x1 ) = ··· duk+1 . . . dun
−∞ −∞ fX1 (x1 )

where fX1 (·) is the joint marginal PDF of X1 .


3 (Section 1)
Mean Vector and Covariance Matrix
Assuming it exists, mean vector of X given by
>
E[X] := (E[X1 ] . . . E[Xn ]) .

Again assuming it exists, the covariance matrix of X satisfies

Cov(X) := Σ := E (X − E[X]) (X − E[X])>


 

so that the (i, j)th element of Σ is simply the covariance of Xi and Xj .

Important properties of Σ:
1. It is symmetric so that Σ> = Σ
2. Diagonal elements satisfy Σi,i ≥ 0
3. It is positive semi-definite so that x > Σx ≥ 0 for all x ∈ Rn .
The correlation matrix, ρ(X), has (i, j)th element ρij := Corr(Xi , Xj )
- also symmetric, positive semi-definite
- has 1’s along the diagonal.

4 (Section 1)
Linear Combinations and Characteristic Functions
For any matrix A ∈ Rk×n and vector a ∈ Rk have

E [AX + a] = AE [X] + a (1)


Cov(AX + a) = A Cov(X) A> . (2)

The characteristic function of X given by


h > i
φX (s) := E e is X for s ∈ Rn (3)

If it exists, the moment-generating function (MGF) is given by (3) with s


replaced by −is.

5 (Section 1)
The Multivariate Normal Distribution
If X multivariate normal with mean vector µ and covariance matrix Σ then write

X ∼ MNn (µ, Σ).

Standard multivariate normal: µ = 0 and Σ = In , the n × n identity matrix.

PDF of X given by
1 1 > −1
f (x) = e − 2 (x−µ) Σ (x−µ) (4)
(2π)n/2 |Σ|1/2

where | · | denotes the determinant.

Characteristic function satisfies


h > i > 1 >

φX (s) = E e is X = e is µ 2 s Σs .

6 (Section 2)
The Multivariate Normal Distribution
Let X1 = (X1 , . . . , Xk )> and X2 = (Xk+1 , . . . , Xn )> be a partition of X with
   
µ1 Σ11 Σ12
µ = and Σ = .
µ2 Σ21 Σ22

Then marginal distribution of a multivariate normal random vector is itself


(multivariate) normal. In particular, Xi ∼ MN(µi , Σii ), for i = 1, 2.

Assuming Σ is positive definite, the conditional distribution of a multivariate


normal distribution is also a (multivariate) normal distribution. In particular,

X2 | X1 = x1 ∼ MN(µ2.1 , Σ2.1 )

where

µ2.1 = µ2 + Σ21 Σ−1


11 (x1 − µ1 )

Σ2.1 = Σ22 − Σ21 Σ−1


11 Σ12 .

7 (Section 2)
Generating MN Distributed Random Vectors
Suppose we wish to generate X = (X1 , . . . , Xn ) where X ∼ MNn (0, Σ)
- it is then easy to handle the case where E[X] 6= 0.

Let Z = (Z1 , . . . , Zn )> where Zi ∼ N(0, 1) and IID for i = 1, . . . , n.

If C an (n × m) matrix then

C> Z ∼ MN(0, C> C).

Problem therefore reduces to finding C such that C> C = Σ.

Usually find such a matrix, C, via the Cholesky decomposition of Σ.

8 (Section 2)
The Cholesky Decomposition of a Symmetric PD Matrix
Any symmetric positive-definite matrix, M, may be written as

M = U> DU

where:
U is an upper triangular matrix
D is a diagonal matrix with positive diagonal elements.

Since Σ is symmetric positive-definite, can therefore write

Σ = U> DU
√ √
= (U> D)( DU)
√ √
= ( DU)> ( DU).

C= DU therefore satisfies C> C = Σ
- C is called the Cholesky Decomposition of Σ.

9 (Section 2)
The Cholesky Decomposition in Matlab
Easy to compute the Cholesky decomposition of a symmetric positive-definite
matrix in Matlab using the chol command
- so also easy to simulate multivariate normal random vectors in Matlab.

Sample Matlab Code


>> Sigma = [1.0 0.5 0.5;
0.5 2.0 0.3;
0.5 0.3 1.5];

>> C = chol(Sigma);
>> Z = randn(3,1000000);
>> X = C’*Z;
>> cov(X’)

ans =
0.9972 0.4969 0.4988
0.4969 1.9999 0.2998
0.4988 0.2998 1.4971
10 (Section 2)
The Cholesky Decomposition in Matlab and R
Must be very careful in Matlab and R to pre-multiply Z by C> and not C.

Some languages take C> to be the Cholesky Decomposition rather C


- must therefore always know what convention your programming language /
package is using.

Must also be careful that Σ is indeed a genuine variance-covariance matrix.

11 (Section 2)
Normal-Mixture Models
Normal-mixture models are a class of models generated by introducing
randomness into the covariance matrix and / or the mean vector:

Definition: The random vector X has a normal variance mixture if



X ∼ µ + W AZ

where
(i) Z ∼ MNk (0, Ik )
(ii) W ≥ 0 is a scalar random variable independent of Z and
(iii) A ∈ Rn×k and µ ∈ Rn are a matrix and vector of constants, respectively.

12 (Section 3)
Normal-Mixture Models
If we condition on W , then X is multivariate normally distributed
- this observation also leads to an obvious simulation algorithm for generating
samples of X.

Typically interested in case when rank(A) = n ≤ k and Σ is a full-rank positive


definite matrix
- then obtain a non-singular normal variance mixture.

Assuming W is integrable, immediately see that


E[X] = µ and Cov(X) = E[W ] Σ
where Σ = AA> .

We call µ and Σ the location vector and dispersion matrix of the distribution.

Also clear that correlation matrices of X and AZ coincide


- implies that if A = In then components of X are uncorrelated though they
are not in general independent.
13 (Section 3)
Normal-Mixture Models
Lemma: Let X = (X1 , X2 ) have a normal mixture distribution with A = I2 ,
µ = 0 and E[W ] < ∞ so that Cov(X1 , X2 ) = 0.

Then X1 and X2 are independent if and only if W is constant with probability 1.

Proof: (i) If W constant then immediately follows from independence of Z1 and


Z2 that X1 and X2 are also independent.

(ii) Suppose now X1 and X2 are independent. Note that

E[|X1 | |X2 |] = E[W |Z1 | |Z2 |] = E[W ] E[|Z1 | |Z2 |]


 √ 2
≥ E[ W ] E[|Z1 | |Z2 |]
= E[|X1 |] E[|X2 |]

with equality only if W is a constant.

But independence of X1 and X2 implies we must have equality and so W is


indeed constant almost surely. 2
14 (Section 3)
E.G. The Multivariate Two-Point Normal Mixture Model
Perhaps the simplest example of normal-variance mixture is obtained when W is
a discrete random variable.

If W is binary and takes on two values, w1 and w2 with probabilities p and 1 − p,


respectively, then obtain the two-point normal mixture model.

Can create a two regime model by setting w2 large relative to w1 and choosing p
large
- then W = w1 can correspond to an ordinary regime
- and W = w2 corresponds to a stress regime.

15 (Section 3)
E.G. The Multivariate t Distribution
The multivariate t distribution with ν degrees-of-freedom (dof) is obtained when
we take W to have an inverse gamma distribution.

Equivalently, the multivariate t distribution with ν dof is obtained if ν/W ∼ χ2ν


- the more familiar description of the t distribution.
We write X ∼ tn (ν, µ, Σ).

Note that Cov(X) = ν/(ν − 2)Σ


- only defined when ν > 2.

Can easily simulate chi-squared random variables so easy to simulate multivariate


t random vectors.

The multivariate t distribution plays an important role in risk management as it


often provides a very good fit to asset return distributions.

16 (Section 3)
Characteristic Function of a Normal Variance Mixture
We have
h > i h h > ii
φX (s) = E e is X = E E e is X |W
h > 1 >
i
= E e is µ − 2 Ws Σs
 
> 1 >
= e is µ Ŵ s Σs
2

where Ŵ (·) is the Laplace transform of W .


 
Sometimes use the notation X ∼ Mn µ, Σ, Ŵ .

17 (Section 3)
Affine Transformations of Normal Variance Mixtures
 
Proposition: If X ∼ Mn µ, Σ, Ŵ and Y = BX + b for B ∈ Rk×n and
 
b ∈ Rk then Y ∼ Mk Bµ + b, BΣB> , Ŵ .

So affine transformations of normal variance mixtures remain normal variance


mixtures
- useful when loss function is approximated with linear function of risk factors.

Proof is straightforward using characteristic function argument.

18 (Section 3)
Normal Mean-Variance Mixtures
Could also define normal mixture distributions where µ = m(W ).

Would still obtain that X is multivariate normal conditional on W .

Important class of normal mean-variance mixtures are the generalized hyperbolic


distributions. They:
are closed under addition
are easy to simulate
can be fitted using standard statistical techniques.

We will not study normal mean-variance mixtures in this course.

19 (Section 3)
Spherical Distributions
Recall that a linear transformation U ∈ Rn×n is orthogonal if
UU> = U> U = In .

Definition: A random vector X = (X1 , . . . , Xn ) has a spherical distribution if

UX ∼ X (5)

for every orthogonal linear transformation, U ∈ Rn×n .

Note that (5) implies the distribution of X is invariant under rotations.

A better understanding of spherical distributions may be obtained from the


following theorem ….

20 (Section 4)
Spherical Distributions
Theorem: The following are equivalent:
1. X is spherical.
2. There exists a function ψ(·) such that for all s ∈ Rn ,

φX (s) = ψ(s> s) = ψ(s12 + · · · + sn2 ). (6)

3. For all a ∈ Rn
a> X ∼ ||a|| X1
where ||a||2 = a> a = a12 + · · · + an2 .

(6) shows that characteristic function of a spherical distribution is completely


determined by a function, ψ(·), of a scalar variable.

ψ(·) is known as the generator of the distribution


- common to write X ∼ Sn (ψ).

21 (Section 4)
Example: Multivariate Normal
Let X ∼ MNn (0, In ). Then
1 >
φX (s) = e − 2 s s
.

So X is spherical with generator ψ(s) = exp(− 12 s).

22 (Section 4)
Example: Normal Variance Mixtures
 
Suppose X ∼ Mn 0, In , Ŵ
- so X has a standardized, uncorrelated normal variance mixture.

Then part (2) of previous theorem implies that X is spherical with


ψ(s) = Ŵ (s/2).

Note there are spherical distributions that are not normal variance mixture
distributions.

Now for another important and insightful result . . .

23 (Section 4)
Spherical Distributions
Theorem: The random vector X = (X1 , . . . , Xn ) has a spherical distribution if
and only if it has the representation

X ∼ RS

where:
1. S is uniformly distributed on the unit sphere: S n−1 := {s ∈ Rn : s> s = 1}
and
2. R ≥ 0 is a random variable independent of S.

24 (Section 4)
Elliptical Distributions
Definition: The random vector X = (X1 , . . . Xn ) has an elliptical distribution if

X ∼ µ + AY

where Y ∼ Sk (ψ) and A ∈ Rn×k and µ ∈ Rn are a matrix and vector of


constants, respectively.

Elliptical distributions therefore obtained via multivariate affine transformations


of spherical distributions.

25 (Section 4)
Characteristic Function of Elliptical Distributions
Easy to calculate characteristic function of an elliptical distribution:
h > i
φX (s) = E e is (µ+A Y)
>
h > >
i
= e is µ E e i(A s) Y
>
= e is µ ψ s> Σs


where as before Σ = AA> .

Common to write X ∼ En (µ, Σ, ψ)


µ known as the location vector
Σ known as the dispersion matrix.

But Σ and ψ only uniquely determined up to a positive constant.

26 (Section 4)
IEOR E4602: Quantitative Risk Management
Dimension Reduction Techniques

Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]

Reference: Chapter 18 of 2nd ed. of SDAFA by Ruppert and Matteson.


Outline

Principal Components Analysis


Empirical PCA
Applications of PCA

Factor Models
Calibration Approaches
Factor Models in Risk Management

2 (Section 0)
Principal Components Analysis
Let Y = (Y1 . . . Yn )> denote an n-dimensional random vector with
variance-covariance matrix, Σ.

Y represents (normalized) changes of risk factors over some appropriately chosen


time horizon.

These risk factors might be:


Security price returns
Returns on futures contracts of varying maturities or
Changes in spot interest rates, again of varying maturities.

Goal of PCA is to construct linear combinations of the Yi ’s


n
X
Pi := wij Yj for i = 1, . . . , n
j=1

in such a way that ….


3 (Section 1)
Principal Components Analysis
(1) The Pi ’s are orthogonal so that E[Pi Pj ] = 0 for i 6= j
and
(2) The Pi ’s are ordered in such a way that:

(i) P1 explains the largest percentage of the total variability in the system
and
(ii) each Pi explains the largest percentage of the total variability in the
system that has not already been explained by P1 , . . . , Pi−1 .

4 (Section 1)
Principal Components Analysis
In practice common to apply PCA to normalized random variables so that
E[Yi ] = 0 and Var(Yi ) = 1
- can normalize by subtracting the means from the original random variables
and then dividing by their standard deviations.

We normalize to ensure no one component of Y can influence the analysis by


virtue of that component’s measurement units.

Will therefore assume the Yi ’s have already been normalized


- but common in financial applications to also work with non-normalized
variables if clear that components of Y all on similar scale.
Key tool of PCA is the spectral decomposition or (more generally) the singular
value decomposition (SVD) of linear algebra.

5 (Section 1)
Spectral Decomposition
The spectral decomposition states that any symmetric matrix, A ∈ Rn×n , can be
written as
A = Γ ∆ Γ> (1)
where:
(i) ∆ is a diagonal matrix, diag(λ1 , . . . , λn ), of the eigen values of A
- without loss of generality ordered so that λ1 ≥ λ2 ≥ · · · ≥ λn .

(ii) Γ an orthogonal matrix with i th column of Γ containing i th standardized


eigen-vector, γ i , of A.
“Standardized” means γ >
i γi = 1
Orthogonality of Γ implies Γ Γ> = Γ> Γ = In .

6 (Section 1)
Spectral Decomposition
Since Σ is symmetric can take A = Σ in (1).

Positive semi-definiteness of Σ implies λi ≥ 0 for all i = 1, . . . , n.

The principal components of Y then given by P = (P1 , . . . , Pn ) satisfying

P = Γ> Y. (2)

Note that:
(a) E[P] = 0 since E[Y] = 0
and
(b) Cov(P) = Γ> Σ Γ = Γ> (Γ ∆ Γ> ) Γ = ∆

So components of P are uncorrelated and Var(Pi ) = λi are decreasing in i as


desired.

7 (Section 1)
Factor Loadings
The matrix Γ> is called the matrix of factor loadings.

Can invert (2) to obtain


Y = ΓP (3)

- so easy to go back and forth between Y and P.

8 (Section 1)
Explaining The Total Variance
Can measure the ability of the first few principal components to explain the total
variability in the system:
n
X n
X n
X
Var(Pi ) = λi = trace(Σ) = Var(Yi ). (4)
i=1 i=1 i=1

Pn Pn
If we take i=1 Var(Pi ) = i=1 Var(Yi ) to measure the total variability then by
(4) can interpret
Pk
λi
Pi=1
n
i=1 λi
as the percentage of total variability explained by first k principal components.

9 (Section 1)
Explaining The Total Variance
Can also show that first principal component, P1 = γ1> Y, satisfies

Var(γ1> Y) = max Var(a> Y) : a> a = 1 .




And that each successive principal component, Pi = γi> Y, satisfies the same
optimization problem but with the added constraint that it be orthogonal, i.e.
uncorrelated, to P1 , . . . , Pi−1 .

10 (Section 1)
Financial Applications of PCA
In financial applications, often the case that just two or three principal
components are sufficient to explain anywhere from 60% to 95% or more of the
total variability
- and often possible to interpret the first two or three components.

e.g. If Y represents (normalized) changes in the spot interest rate for n different
maturities, then:
1. 1st principal component can usually be interpreted as the (approximate)
change in overall level of the yield curve
2. 2nd component represents change in slope of the curve
3. 3rd component represents change in curvature of the curve.

In equity applications, first component often represents a systematic market


factor whereas the second (and possibly other) components may be identified
with industry specific factors.

But generally less interpretability with equity portfolios and 2 or 3 principal


components often not enough to explain most of overall variance.
11 (Section 1)
Empirical PCA
In practice do not know true variance-covariance matrix but it may be estimated
using historical data.

Suppose then we have multivariate observations, X1 , . . . Xm


Xt = (Xt1 . . . Xtn )> represents the date t sample observation.

Important (why?) that these observations are from a stationary time series
e.g. asset returns or yield changes
But not price levels which are generally non-stationary.

If µj and σj are sample mean and standard deviation, respectively, of


{Xtj : t = 1, . . . , m}, then can normalize by setting

Xtj − µj
Ytj = for t = 1, . . . , m and j = 1, . . . , n.
σj

12 (Section 1)
Empirical PCA
Let Σ be the sample variance-covariance matrix so that
m
1 X
Σ = Yt Yt> .
m t=1

Principal components, Pt , then computed using this covariance matrix.

From (3), see that original data obtained from principal components as

Xt = diag(σ1 , . . . , σn ) Yt + µ
= diag(σ1 , . . . , σn ) Γ Pt + µ (5)

where Pt := (Pt1 . . . Ptn )> = Γ> Yt is the t th sample principal component


vector.

13 (Section 1)
Applications of PCA: Building Factor Models
If first k principal components explain sufficiently large amount of total variability
then may partition the n × n matrix Γ according to Γ = [Γ1 Γ2 ] where Γ1 is
n × k and Γ2 is n × (n − k).

(1) (2) (1) (2)


Similarly can write Pt = [Pt Pt ]> where Pt is k × 1 and Pt is
(n − k) × 1.

May then use (5) to write


(1)
Xt+1 = µ + diag(σ1 , . . . , σn ) Γ1 Pt+1 + t+1 (6)

where
(2)
t+1 := diag(σ1 , . . . , σn ) Γ2 Pt+1 (7)
represents an error term.

Can interpret (6) as a k-factor model for the changes in risk factors, X
- but then take t+1 as an uncorrelated noise vector and ignore (7).

14 (Section 1)
Applications of PCA: Scenario Generation
Easy to generate scenarios using PCA.

Suppose today is date t and we want to generate scenarios over the period
[t, t + 1].

Can then use (6) to apply stresses to first few principal components, either singly
or jointly, to generate loss scenarios.

Moreover, know that Var(Pi ) = λi so can easily control severity of the stresses.

15 (Section 1)
Applications of PCA: Estimating VaR and CVaR
Can use the k-factor model and Monte-Carlo to simulate portfolio returns.

Could be done by estimating joint distribution of first k principal components

e.g. Could assume (why?)


(1)
Pt+1 ∼ MNk (0, diag(λ1 , . . . , λk )).

but other heavy-tailed distributions may be more appropriate.

If we want to estimate the conditional loss distribution (as we usually do) of


(1)
Pt+1 then time series methods such as GARCH models should be used.

16 (Section 1)
E.G. An Analysis of (Risk-Free) Yield Curves
We use daily yields on U.S. Treasury bonds at 11 maturities: T = 1, 3, and
6 months and 1, 2, 3, 5, 7, 10, 20, and 30 years.
Time period is January 2, 1990, to October 31, 2008.
We use PCA to study how the curves change from day to day.
To analyze daily changes in yields, all 11 time series were differenced.
Daily yields were missing from some values of T for various reasons
- e.g. the 20-year constant maturity series was discontinued at the end of 1986
and reinstated on October 1, 1993.

All days with missing values of the differenced data were omitted.
- this left 819 days of data starting on July 31, 2001, when the one-month
series started and ending on October 31, 2008, with the exclusion of the
period February 19, 2002 to February 2, 2006 when the 30-year Treasury was
discontinued.

The covariance matrix rather than the correlation matrix was used
- which is fine here because the variables are comparable and in the same units.

17 (Section 1)
(a) (b)

0.04
● ●
●● ●● ● ● ● ● ●

5

● ●

Variances
● ●

4
● ●

●●

Yield

0.02
3

2

● 07/31/01

1 ●
07/02/07
● 10/31/08

0.00

0

0 5 10 15 20 25 30

(c) (d)

● ●
0.5

0.5
● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ● ●
PC

0.0

PC

0.0
● ●
● ●
● ●
● PC 1 ● PC 1
● ●
−0.5

−0.5

● PC 2 ● ● PC 2
● PC 3 ● PC 3

0 5 10 15 20 25 30 0.0 0.5 1.0 1.5 2.0 2.5 3.0

T T

Figure 18.1 from Ruppert and Matteson: (a) Treasury yields on three dates. (b) Scree plot for the changes
in Treasury yields. Note that the first three principal components have most of the variation, and the first five
have virtually all of it. (c) The first three eigenvectors for changes in the Treasury yields. (d) The first three
eigenvectors for changes in the Treasury yields in the range 0 ≤ T ≤ 3.
(a) (b)

4.5

4.5
● ●
● ●
● ● ●
● ● ●
● ●

4.0

4.0
● ● ●
● ● ●
Yield ●● ● ●

Yield
● ● ●
● ● ● ●
● ● ● ●

3.5

3.5
● ● ●● ●
● ● ● ●
● mean mean
● mean + PC1 ● mean + PC2
3.0

3.0

mean − PC1 mean − PC2

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

T T

(c) (d)


4.5

● ● PC 4
● ●

0.4

● ● ●●
PC 5
● ● ● ●
4.0

●● ● ● ●

● ●●
Yield

● ● ●

PC

0.0

● ●
● ● ●

3.5

● ● ●
● ● ●
mean ●

● ●
mean + PC3 ●
3.0

−0.6
● ●
● mean − PC3

0 1 2 3 4 5 6 7 0 5 10 15 20 25 30

T T

Figure 18.2 from Ruppert and Matteson: (a) The mean yield curve plus and minus the first eigenvector. (b)
The mean yield curve plus and minus the second eigenvector. (c) The mean yield curve plus and minus the
third eigenvector. (d) The fourth and fifth eigenvectors for changes in the Treasury yields.
E.G. An Analysis of (Risk-Free) Yield Curves
Would actually be interested in the behavior of the yield changes over time.
But time series analysis based on the changes in the 11 yields would be
problematic.
- better approach would be to use first three principal components.
Their time series and auto- and cross-correlation plots are shown in Figs.
18.3 and 18.4, respectively.
Notice that lag-0 cross-correlations are zero; this is not a coincidence! Why?
Cross-correlations at nonzero lags are not zero, but in this example they are
small
- practical implication is that parallel shifts, changes in slopes, and changes in
convexity are nearly uncorrelated and could be analyzed separately.

The time series plots show substantial volatility clustering which could be
modeled using GARCH models.

20 (Section 1)
1.5
PC 1 PC 2 PC 3

0.6
● ● ●

0.6
● ● ●



1.0

● ●

0.4

●●
●●● ●
● ● ●●● ●●
● ●● ● ●● ● ● ●● ● ●
0.5

● ●●●

0.2
● ● ● ● ● ●●
● ●●● ●● ● ●● ●● ● ● ●
●●● ●
● ●
●●●●● ● ● ●
●● ● ● ●● ●●

0.2
● ● ● ● ●●● ● ●● ● ● ●
● ● ●● ● ●●● ● ●● ● ● ●
●●●
● ● ●
● ● ●
● ● ● ●
● ● ●● ●
●● ●●●

●●

● ●●●
● ●
●●● ●

●● ● ●● ● ●● ●●●●● ●
●●●● ●● ●●● ● ●● ● ● ● ● ● ● ●●
●●●●
● ●●
● ● ● ●
●●

●●●
●●● ●●●
● ●●●●● ●● ● ● ●●●● ●●●●● ●
●●●
● ●
● ●● ●
●●
●● ●
● ●

● ●● ●●●
●●●
●●●●
●●● ●
●●●●● ●
●●●
●●●●
●●●●●
●●●
● ●
●●
● ●● ●
●● ●
●●
●●●●● ●
● ●

●● ● ●● ● ● ● ● ● ● ●● ●
● ●
●●
●●●
●● ●●
● ●●
●●●●●●
●●●●

●●
●●
●●
●●● ●
●●●
●●● ●●●●●●●●●
● ●●●●●
●●●●
●●
●●

●●
● ●


●●

●●●●


●●●●

●●
● ●


●●


●●
●●●
● ●●●
●●
●●
● ●
●●●●

● ●●●


●●

●● ●●
● ●●
●●● ●
●●●●●
●●●●
●● ●

●●●● ● ● ● ●
● ●


●●●
● ●●● ●●
●●●
●●●●
● ●●
● ●●● ●●
● ●● ●●●● ●●
●● ● ●
●●● ● ●●● ●●● ●●●
● ●
● ● ●
●●
● ● ●● ●●●
● ●● ●●●● ● ●● ●●● ●
●● ● ● ● ● ●● ● ●
−0.5 0.0

● ●●● ●
● ● ● ● ●● ● ● ●●● ●● ●●● ● ●
● ● ● ● ● ● ●●● ● ●● ● ●●

●●●
●●
●●
● ●
●●


●●●

●●
●●


●●


●●





●●





●●

●●
●●


●●

●●


●●


●●
●●



●●

●●
●●


●●








●●


●●●
●●●● ●●●●●
● ●
●●
●●
● ●●
●●

●●●●●


● ●

●●●● ● ●●● ● ●

●●
●●
●●
●●●
●●
●●
●●



●●
●●
●●

●●●●

●●
● ●


●●●
●●
● ●
●●

● ●●●●

●●

●●
●●●●
●●●●● ●●

● ●
● ●
●●

●● ●● ● ●
●●●● ●● ●●
●● ●●●●● ●●●●





●●●●

●●


●●●
● ●●
●●●







●●●












●●
●● ●




●●









●●

●●
●●

●●


●●●
● ●●●●● ● ●
●●
●●
●●
● ●●

●●
●●●
●●
● ●●
●●








●●
● ●●
●● ●
● ●

●●●
●●



● ●

●●

●●●
●●

●●
●●

●●
●●●

●●






●●

●●


●●●
● ●●
●●●
●●

●●
●●● ●● ●●
●●●●●
● ●●●


●●

●● ●
● ●●



●●


●●●
● ●● ●

● ●●●●
● ●●●● ●
●●●●●
●●●●● ●


●●

●●●●●●
●●
●●●●
●●





● ●
●●●●●●●
●● ●●●●
●● ● ● ● ● ●● ● ● ●
● ●●● ●
● ●● ● ● ●● ●● ● ●● ● ● ●●
● ●●
● ●

0.0
● ●● ●●
● ●●● ●●●●●● ● ●
●●●● ●●
●● ●
● ●
●● ●●●● ●● ●●●● ●●● ●● ● ● ●
● ● ● ●●

●● ● ●
●●●● ●
● ●●●●●●●
●●

● ●
●●● ● ●
●● ●
●●●
● ●
●● ●● ●
●●● ●● ●
● ●● ●●●●●●●●● ●●●●
●●●

●● ●
●●●
● ● ● ●● ● ●● ●●
●● ● ●●● ●● ●
● ● ●●●●●●
● ●● ● ●
●●

●●●

●●
●●●●●
●●●

●●

●●
●●

●●
●●●
●●


●●●●
● ●
●●

●●
●●
●●

●●●
●●

●●

●●

●●
●●●
●●●● ●●● ●●
●●●
●●●
●●●
●●●
● ●
●●
●●●
●● ●● ●● ● ● ●
●●●
● ●
● ● ●● ●● ●● ●● ●●● ●● ● ●● ● ●● ●● ●●● ●● ●
● ●●
●●
●●

●●
● ●●●●
●●●
●●
● ●
●●
●●
● ● ●

●●
●●●●
●● ●●● ●●●●●●

−0.2
●● ●●●● ●●
● ●●●●
● ●
● ●●●● ● ● ●
●● ●

●●
●●
●●

●●●●●●

●●

●●

●●●

●●●
●●
●●●
●●
●●

●●

●●

●●

●●
●● ●

●●

● ●
●●

●●●
● ●●

●●●●●●
●● ●●
●●●●
●●

●●
●●●
●●

●●
● ●
●●●● ● ● ●● ● ● ●● ● ●●● ● ●
● ● ● ● ●●
● ● ●●●● ●
●● ●

●● ●●
●●

●●

●●

●●●●●
●● ●●
●●● ●
●●●

●●●●
●●
● ●● ●●●
●●
● ●
●●
● ● ● ●
●●●●
●●●●
● ● ●●● ●● ●● ● ●● ● ● ● ●●● ● ●●● ●●●● ● ● ●●●● ● ●●●●● ● ●●●● ●
●●●● ●●● ●●
● ● ●
● ●

● ● ●● ●● ● ● ●●
● ●●●●●
● ●●●●●

●●●●●
● ●
● ● ●●● ● ●●
● ● ●● ● ●● ● ● ●● ● ● ●
●● ●● ●● ● ●●
●●● ●● ●●
● ● ● ●● ● ● ●

−0.4 −0.2
● ● ● ● ●
● ●
● ●● ● ●●

● ● ● ●
● ●●
● ●

−0.6

● ●


−1.5

● ● ●

0 200 400 600 800 0 200 400 600 800 0 200 400 600 800

day day day

Figure 18.3 from Ruppert and Matteson: Time series plots of the first three principal
components of the Treasury yields. There are 819 days of data, but they are not consecutive
because of missing data; see text.
PC1 PC1 & PC2 PC1 & PC3

1.0

1.0

1.0
0.6

0.6

0.6
−0.2 0.2

−0.2 0.2

−0.2 0.2
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
lag lag lag

PC2 & PC1 PC2 PC2 & PC3


1.0

1.0

1.0
0.6

0.6

0.6
−0.2 0.2

−0.2 0.2

−0.2 0.2
−20 −10 0 0 5 10 15 20 0 5 10 15 20
lag lag lag

PC3 & PC1 PC3 & PC2 PC3


1.0

1.0

1.0
0.6

0.6

0.6
−0.2 0.2

−0.2 0.2

−0.2 0.2
−20 −10 0 −20 −10 0 0 5 10 15 20
lag lag lag

Figure 18.4 from Ruppert and Matteson: Sample auto- and cross-correlations of the first three principal
components of the Treasury yields.
Applications of PCA: Portfolio Immunization
Also possible to hedge or immunize a portfolio against moves in the principal
components.

e.g. Suppose we wish to hedge value of a portfolio against movements in the


first k principal components.

Let Vt = time t value of portfolio and assume our hedge will consist of positions,
φti , in the securities with time t prices, Sti , for i = 1, . . . , k.

Let Z(t+1)j be date t + 1 level of the j th risk factor


- so ∆Z(t+1)j = X(t+1)j using our earlier notation.

If change in value of hedged portfolio between dates t and t + 1 is denoted by



∆Vt+1 , then we have ….

23 (Section 1)
Applications of PCA: Portfolio Immunization

n k
!

X ∂Vt X ∂Sti
∆Vt+1 ≈ + φti ∆Z(t+1)j
j=1
∂Ztj i=1
∂Ztj

n k
!
X ∂Vt X ∂Sti
= + φti X(t+1)j
j=1
∂Ztj i=1
∂Ztj

n k
! k
!
X ∂Vt X ∂Sti X
≈ + φti µj + σj Γjl Pl
j=1
∂Ztj i=1
∂Ztj
l=1

n k
!
X ∂Vt X ∂Sti
= + φti µj
j=1
∂Ztj i=1
∂Ztj

k n k
! !
X X ∂Vt X ∂Sti
+ + φti σj Γjl Pl (8)
j=1
∂Ztj i=1
∂Ztj
l=1

24 (Section 1)
Applications of PCA: Portfolio Immunization
Can now use (8) to hedge the risk associated with the first k principal
components.

In particular, we solve for the φtl ’s so that the coefficients of the Pl ’s in (8) are
zero
- a system of k linear equations in k unknowns so it is easily solved.

If we include an additional hedging asset then could also ensure that total value
of hedged portfolio is equal to value of original un-hedged portfolio.

25 (Section 1)
Factor Models
Definition: We say the random vector X = (X1 . . . Xn )> follows a linear
k-factor model if it satisfies

X = a + BF +  (9)

where
(i) F = (F1 . . . Fk )> is a random vector of common factors with k < n and
with a positive-definite covariance matrix;
(ii)  = (1 , . . . , n ) is a random vector of idiosyncratic error terms which are
uncorrelated and have mean zero;
(iii) B is an n × k constant matrix of factor loadings, and a is an n × 1 vector of
constants;
(iv) Cov(Fi , j ) = 0 for all i, j.

26 (Section 2)
Factor Models
If X ∼ MN(·, ·) and follows (9) then possible to find a version of the model
where F ∼ MN(·, ·) and  ∼ MN(·, ·).
In this case the error terms, i , are independent.

If Ω is the covariance matrix of F then covariance matrix, Σ, of X satisfies


(why?)
Σ = B Ω B> + Υ
where Υ is a diagonal matrix of the variances of .

27 (Section 2)
Exercise
Show that if (9) holds then there is also a representation

X = µ + B∗ F∗ +  (10)

where

E[X] = µ and
Cov(F∗ ) = Ik

so that Σ = B∗ (B∗ )> + Υ.

28 (Section 2)
Example: Factor Models Based on Principal Components
Factor model of (6) may be interpreted as a k-factor model with

F = P(1) and
B = diag(σ1 , . . . , σn ) Γ1 .

As constructed, covariance of  in (6) is not diagonal


- so it does not satisfy part (ii) of definition above.
Nonetheless, quite common to construct factor models in this manner and to
then make the assumption that  is a vector of uncorrelated error terms.

29 (Section 2)
Calibration Approaches
Three different types of factor models:
1. Observable Factor Models
Factors, Ft , have been identified in advance and are observable.
They typically have a fundamental economic interpretation e.g. a 1-factor
model where market index plays role of the single factor.
These models are usually calibrated and tested for goodness-of-fit using
multivariate or time-series regression techniques.
e.g. A model with factors constructed from change in rate of inflation,
equity index return, growth in GDP, interest rate spreads etc.
2. Cross-Sectional Factor Models
Factors are unobserved and therefore need to be estimated.
The factor loadings, Bt , are observed, however.
e.g. A model with dividend yield, oil and tech factors. We assume the factor
returns are unobserved but the loadings are known. Why?
e.g. BARRA’s factor models are generally cross-sectional factor models.
3. Statistical Factor Models
Both factors and loadings need to be estimated.
Two standard methods for doing this: factor analysis and PCA.
30 (Section 2)
Factor Models in Risk Management
Straightforward to use a factor model to manage risk.

For a given portfolio composition and fixed matrix, B, of factor loadings, the
sensitivity of the total portfolio value to each factor, Fi for i = 1, . . . , k, is easily
computed.

Can then adjust portfolio composition to achieve desired overall factor sensitivity.

Process easier to understand and justify when the factors are easy to interpret.

When this is not the case then the model is purely statistical.
Tends to occur when statistical methods such as factor analysis or PCA are
employed
But still possible even then for identified factors to have an economic
interpretation.

31 (Section 2)
E.G. Scenario Analysis for an Options Portfolio
Mr. Smith has a portfolio consisting of various stock positions as well as a
number of equity options.

He would like to perform a basic scenario analysis with just two factors:
1. An equity factor, Feq , representing the equity market.
2. A volatility factor, Fvol , representing some general implied volatility factor.
Can perform the scenario analysis by stressing combinations of the factors and
computing the P&L resulting from each scenario.

Of course using just two factors for such a portfolio will result in a scenario
analysis that is quite coarse but in many circumstances this may be sufficient.

Question: How do we compute the value of each security in each scenario?

Question: This is easy if we adopt a factor model framework


- see lecture notes for further details.

32 (Section 2)
IEOR E4602: Quantitative Risk Management
Introduction to Copulas

Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]

References: Chapter 7 of 2nd ed. of QRM by McNeil, Frey and Embrechts, Chapter 8 of SDAFA by
Ruppert and Matteson, and the book chapter “Coping With Copulas” by Thorsten Schmidt.
Outline
Introduction
Main Results
Sklar’s Theorem
Copula Invariance Under Monotonic Transformations
The Fréchet-Hoeffding Bounds
Other Examples of Copulas
Measures of Dependence
Spearman’s Rho
Kendall’s Tau
Tail Dependence
Simulating the Gaussian and t Copulas
Estimating Copulas
A Financial Application: Pricing CDOs
A Single-Period CDO
Multi-period CDO’s
Synthetic CDO’s
Calibrating the Gaussian Copula Model
2 (Section 0)
Why Study Copulas?
Copulas separate the marginal distributions from the dependency structure
in a given multivariate distribution.
Copulas help expose the various fallacies associated with correlation.
Copulas play an important role in pricing securities that depend on many
underlying securities
- e.g. equity basket options, collateralized debt obligations (CDO’s),
n th -to-default options.

They provide a source of examples regarding model risk!


- e.g. the (in)famous Gaussian copula model that was used for pricing CDO’s.

But there are problems with copulas as well!


1. Not always applied properly.
2. Generally static in nature.
3. Not easy to estimate in general.

Nevertheless, an understanding of copulas is important in QRM.

3 (Section 1)
The Definition of a Copula
Definition: A d-dimensional copula, C : [0, 1]d : → [0, 1] is a cumulative
distribution function (CDF) with uniform marginals.

We write C (u) = C (u1 , . . . , ud ) for a generic copula and immediately have


(why?) the following properties:

1. C (u1 , . . . , ud ) is non-decreasing in each component, ui .


2. The i th marginal distribution is obtained by setting uj = 1 for j 6= i and
since it it is uniformly distributed
C (1, . . . , 1, ui , 1, . . . , 1) = ui .
3. For ai ≤ bi , P(U1 ∈ [a1 , b1 ], . . . , Ud ∈ [ad , bd ]) must be non-negative. This
implies the rectangle inequality
2
X 2
X
··· (−1)i1 +···+id C (u1,i1 , . . . , ud,id ) ≥ 0
i1 =1 id =1

where uj,1 = aj and uj,2 = bj .


4 (Section 1)
Properties of a Copula
The reverse is also true: any function that satisfies properties 1 to 3 is a copula.
Easy then to confirm that C (1, u1 , . . . , ud−1 ) is a (d − 1)-dimensional copula
- more generally, all k-dimensional marginals with 2 ≤ k ≤ d are copulas.

Recall the definition of the quantile function or generalized inverse: for a CDF,
F , the generalized inverse, F ← , is defined as

F ← (x) := inf{v : F (v) ≥ x}.

We now recall the following well-known result:

Proposition: If U ∼ U [0, 1] and FX is a CDF, then

P (F ← (U ) ≤ x) = FX (x).

In the opposite direction, if X has a continuous CDF, FX , then

FX (X ) ∼ U [0, 1].

5 (Section 1)
Sklar’s Theorem (1959)
Let X = (X1 , . . . , Xd ) be a multivariate random vector with CDF FX and with
continuous marginals.

Then (why?) the joint distribution of FX1 (X1 ), . . . , FXd (Xd ) is a copula, CX .

Can we find an expression for CX ? Yes! We have

CX (u1 , . . . , ud ) = P (FX1 (X1 ) ≤ u1 , . . . , FXd (Xd ) ≤ ud )


−1 −1

= P X1 ≤ FX 1
(u1 ), . . . , Xd ≤ FX d
(ud )
−1
(u1 ), . . . , FX−1

= FX FX 1 d
(ud ) . (1)

Now let uj := FXj (xj ) so that (1) yields

FX (x1 , . . . , xd ) = CX (FX1 (x1 ), . . . , FXd (xd ))

– this is one side of Sklar’s Theorem!

6 (Section 2)
Sklar’s Theorem (1959)
Consider a d-dimensional CDF, F , with marginals F1 , …, Fd . Then there exists a
copula, C , such that

F (x1 , . . . , xd ) = C (F1 (x1 ), . . . , Fd (xd )) (2)

for all xi ∈ [−∞, ∞] and i = 1, . . . , d.

If Fi is continuous for all i = 1, . . . , d, then C is unique; otherwise C is uniquely


determined only on Ran(F1 ) × · · · × Ran(Fd ) where Ran(Fi ) denotes the range
of the CDF, Fi .

In the opposite direction, consider a copula, C , and univariate CDF’s, F1 , . . . , Fd .


Then F as defined in (2) is a multivariate CDF with marginals F1 , . . . , Fd .

7 (Section 2)
An Example
Let Y and Z be two IID random variables each with CDF, F (·).
Let X1 := min(Y , Z ) and X2 := max(Y , Z ).
We then have
2
P(X1 ≤ x1 , X2 ≤ x2 ) = 2 F (min{x1 , x2 }) F (x2 ) − F (min{x1 , x2 })

- can show this by considering separately the two cases


(i) x2 ≤ x1
and
(ii) x2 > x1 .

Would like to compute the copula, C (u1 , u2 ), of (X1 , X2 )!

8 (Section 2)
An Example
First note the two marginals satisfy

F1 (x) = 2F (x) − F (x)2


F2 (x) = F (x)2 .

Sklar’s Theorem states that C (·, ·) satisfies

C (F1 (x1 ), F2 (x2 )) = F (x1 , x2 ).

So just need to connect the pieces(!) to obtain


√ √ √ √ √
C (u1 , u2 ) = 2 min{1 − 1 − u1 , u2 } u2 − min{1 − 1 − u1 , u2 }2 .

9 (Section 2)
When the Marginals Are Continuous
Suppose the marginal distributions, F1 , . . . , Fn , are continuous. Then can show

Fi (Fi← (y)) = y. (3)

Now evaluate (2) at xi = Fi← (ui ) and use (3) to obtain

C (u) = F (F1← (u1 ), . . . , Fd← (ud )) (4)

- a very useful characterization!

10 (Section 2)
Invariance of the Copula Under Monotonic Transformations
Proposition: Suppose the random variables X1 , . . . , Xd have continuous
marginals and copula, CX . Let Ti : R → R, for i = 1, . . . , d be strictly
increasing functions.
Then the dependence structure of the random variables
Y1 := T1 (X1 ), . . . , Yd := Td (Xd )
is also given by the copula CX .
−1
Sketch of proof when Tj ’s are continuous and FXj
’s exist:
First note that
FY (y1 , . . . , yd ) = P(T1 (X1 ) ≤ y1 , . . . , Td (Xd ) ≤ yd )
= P(X1 ≤ T1−1 (y1 ), . . . , Xd ≤ Td−1 (yd ))
= FX (T1−1 (y1 ), . . . , Td−1 (yd )) (5)
so that (why?) FYj (yj ) = FXj (Tj−1 (yj )).

This in turn implies


FY−1
j
−1
(yj ) = Tj (FXj
(yj )). (6)

11 (Section 2)
Invariance of the Copula Under Monotonic Transformations
Now to the proof:

= FY FY−1 −1

CY (u1 , . . . , ud ) 1
(u1 ), . . . , FY d
(ud ) by (4)
= FX T1−1 FY −1
(u1 ) , . . . , Td−1 FY
−1
 
1 d
(ud ) by (5)
−1
(u1 ), . . . , FX−1

= FX FX 1 d
(ud ) by (6)
= CX (u1 , . . . , ud )

and so CX = CY .

12 (Section 2)
The Fréchet-Hoeffding Bounds
Theorem: Consider a copula C (u) = C (u1 , . . . , ud ). Then
( d
)
X
max 1 − d + ui , 0 ≤ C (u) ≤ min{u1 , . . . , ud }.
i=1

Sketch of Proof: The first inequality follows from the observation


 
\
C (u) = P  {Ui ≤ ui }
1≤i≤d
 
[
= 1 − P {Ui > ui }
1≤i≤d
d
X d
X
≥ 1 − P(Ui > ui ) = 1 − d + ui .
i=1 i=1

≤ ui } ⊆ {Ui ≤ ui } for all i. 2


T
The second inequality follows since 1≤i≤d {Ui

13 (Section 2)
Tightness of the Fréchet-Hoeffding Bounds
The upper Fréchet-Hoeffding bound is tight for all d.

The lower Fréchet-Hoeffding bound is tight only when d = 2.

Fréchet and Hoeffding showed independently that copulas always lie between
these bounds
- corresponding to cases of extreme of dependency, i.e. comonotonicity and
countermonotonicity.

14 (Section 2)
Comonotonicity and The Perfect Dependence Copula
The comonotonic copula is given by

M (u) := min{u1 , . . . , ud }

- the Fréchet-Hoeffding upper bound.


It corresponds to the case of extreme positive dependence.

Proposition: Let X1 , . . . , Xd be random variables with continuous marginals and


suppose Xi = Ti (X1 ) for i = 2, . . . , d where T2 , . . . , Td are strictly increasing
transformations. Then X1 , . . . , Xd have the comonotonic copula.

Proof: Apply the invariance under monotonic transformations proposition and


observe that the copula of (X1 , X1 , . . . , X1 ) is the comonotonic copula.

15 (Section 2)
Countermonotonic Random Variables
The countermonotonic copula is the 2-dimensional copula that is the
Fréchet-Hoeffding lower bound.

It satisfies
W (u1 , u2 ) = max{u1 + u2 − 1, 0} (7)
and corresponds to the case of perfect negative dependence.

Can check that (7) is the joint distribution of (U , 1 − U ) where U ∼ U (0, 1).

Question: Why is the Fréchet-Hoeffding lower bound not a copula for d > 2?

16 (Section 2)
The Independence Copula
The independence copula satisfies
d
Y
Π(u) = ui .
i=1

And random variables are independent if and only if their copula is the
independence copula
- follows immediately from Sklar’s Theorem.

17 (Section 2)
The Gaussian Copula
Recall when marginals are continuous we have from (4)

C (u) = F (F1← (u1 ), . . . , F1← (ud )) .

Now let X ∼ MNd (0, P), where P is the correlation matrix of X.

Then the corresponding Gaussian copula is given by

CPGauss (u) := ΦP Φ−1 (u1 ), . . . , Φ−1 (ud )



(8)
- where Φ(·) is the standard univariate normal CDF
- and ΦP (·) denotes the joint CDF of X.

If Y ∼ MNd (µ, Σ) with Corr(Y) = P, then Y has (why?) the same copula as X
- hence a Gaussian copula is fully specified by a correlation matrix, P.

For d = 2, obtain countermonotonic, independence and comonotonic copulas


when ρ = −1, 0, and 1, respectively.

18 (Section 3)
The t Copula
Recall X = (X1 , . . . , Xd ) has a multivariate t distribution with ν dof if

Z
X = p
ξ/ν

- where Z ∼ MNd (0, Σ)


- and ξ ∼ χ2ν independently of Z.

The d-dimensional t-copula is defined as


t
(u) := tν,P tν−1 (u1 ), . . . , tν−1 (ud )

Cν,P (9)

- where again P is a correlation matrix


- tν,P is the joint CDF of X ∼ td (ν, 0, P)
- and tν is the standard univariate CDF of a t-distribution with ν dof.

19 (Section 3)
The Bivariate Gumbel Copula
The bivariate Gumbel copula is defined as
 1 
CθGu (u1 , u2 ) := exp − (− ln u1 )θ + (− ln u2 )θ θ

where θ ∈ [1, ∞).

When θ = 1 obtain the independence copula.

As θ → ∞ the Gumbel copula → the comonotonicity copula


- an example of a copula with tail dependence in just one corner.

e.g. Consider bivariate Normal and meta-Gumbel distributions on next slide:


- 5, 000 points simulated from each distribution
- marginal distributions in each case are standard normal
- correlation is ≈ .7 in both cases
- but meta-Gumbel is much more likely to see large joint moves.

20 (Section 3)
θ = 1.1 θ = 1.5 θ=2

●● ● ● ● ●
● ●● ●● ● ● ●●●●●
● ● ●● ● ●●


● ● ●●● ●●●●● ● ●● ●●●
● ●● ●● ● ● ● ● ● ●●
● ●●● ●●
● ● ● ● ●●●●● ● ●●●
● ● ●● ● ●●● ● ● ●● ● ● ●● ●●

0.8
0.8
0.8
●● ● ● ● ●
● ● ●●
● ● ●● ●●● ●● ● ● ● ● ● ●● ●
● ● ● ● ● ● ● ●●●● ●●● ●
● ● ●●


● ● ● ●
●●● ● ● ● ● ● ● ●●
● ●● ●●
● ● ● ●●● ● ●● ● ●
● ● ●●●●
● ●
● ● ●●● ● ● ●● ●●
● ●● ● ●

● ● ●●● ● ● ● ●
●●●● ● ●●●● ● ●●● ●● ● ●● ● ● ● ●● ●● ● ●
●●● ● ● ● ●●● ● ● ● ● ● ● ● ●●●
●● ●● ● ● ● ●● ● ● ●●● ●
● ● ● ● ● ●●● ●
u2

u2

u2
● ● ● ●● ●●

●●
● ● ●● ● ● ● ● ● ● ●● ●
●●● ● ●●● ●●●● ●●●●● ●
●●

● ● ● ● ● ● ● ● ● ● ●● ●

0.4
0.4

0.4
●●
● ●● ● ●● ● ● ● ● ●● ●
● ●
● ● ●● ● ●● ● ● ● ●●● ● ● ●●
● ●● ● ●
●● ●
●● ● ● ● ● ● ● ● ● ● ●● ● ●
● ● ●● ● ● ● ●●●● ●● ●● ● ●

●●● ● ● ●● ● ●●
● ● ● ● ● ● ●● ● ● ● ●
● ● ●●●● ● ●● ●●●● ● ●● ●
● ● ● ● ●● ● ● ● ●● ●
●● ●●●
● ●● ●● ● ● ● ●
● ●● ● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ●●●● ● ● ●
●● ●●●●●● ●●
● ●

● ●
● ●●
● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ●●● ●
● ●
● ● ●
0.0

● ● ●

0.0
●●● ●

0.0
● ●● ●
● ● ●● ●
● ● ● ●● ●● ●


●● ● ● ●
● ●●

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8


u1 u1 u1

θ=4 θ=8 θ = 50

●●● ●
● ●

●●
●●●
●●●
●●●●


●●●●

●●

●●●● ●●●●


● ● ●
●●
●●● ● ●●
● ●


0.8
●●
●●
● ●●●
0.8

●●●●
0.8

● ●
● ●
● ●● ● ● ●● ●

●● ● ●
●● ●● ●● ● ●

● ●●●●●
●●●● ●

●●
●●●●
●● ●




● ●●●●
●● ●●

●●● ●●
● ●●●●


●● ●
●●


●●● ● ● ●
●●

●● ●●● ● ●
●●●
●●●● ●●



● ● ● ● ●●●
●●●

●●●● ●●

●● ●● ●●●●●
●● ●
● ●● ●●
● ●
● ●

●●
u2

u2

u2
● ●
● ●
● ●●●●●●●●●

● ●

●●

● ●● ● ● ●● ● ● ●●
●●● ●● ● ●

0.4
●●
0.4

0.4

●●● ●
● ● ●●●● ●
●● ●●●●
● ●● ● ●
● ●●●●● ●
● ● ●
●●●●●

● ●● ●





●●
● ●●●● ● ●●● ●●●
● ●●



●●● ●●
● ●● ●● ● ●● ●●●●●
●●● ●●●

●● ● ●
●● ● ● ●●●●

●●●●●●● ●●●●
● ●

●●
● ●● ●
●●
●●●


● ●●● ●● ●
●●● ●●● ●
● ●●
●●●
● ●●
●●● ●●● ● ●●●●
● ●●●
●● ● ●●●●
● ● ● ●

●●

● ● ●

0.0

0.0

0.0

●●● ● ●

●● ●● ●


● ●


0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8


u1 u1 u1

Figure 8.4 from Ruppert and Matteson: Bivariate random samples of size 200 from various
Gumbel copulas.
The Bivariate Clayton Copula
The bivariate Clayton copula is defined as
−1/θ
CθCl (u1 , u2 ) := u1−θ + u2−θ − 1

where θ ∈ [−1, ∞)\{0}.


As θ → 0 obtain the independence copula.
As θ → ∞ the Clayton copula → the comonotonic copula.
For θ = −1 obtain the Fréchet-Hoeffding lower bound
- so Clayton moves from countermonotonic to independence to comonotonic
copulas.

Clayton and Gumbel copulas belong to the Archimedean family of copulas


- they can be be generalized to d dimensions
- but their d-dimensional versions are exchangeable – so C (u1 , . . . , ud )
unchanged if we permute u1 , . . . , ud
- implies all pairs have same dependence structure – has implications for
modeling with Archimedean copulas!

23 (Section 3)
θ = −0.98 θ = −0.7 θ = −0.3


●●●
●● ●● ● ●●
● ● ●● ● ●●●● ● ●●● ● ●
●● ●● ● ● ● ● ●●● ●


●●● ●

●● ● ●
● ● ● ●● ●
● ● ● ● ●

0.8
● ●●

0.8

0.8


●●● ● ● ●● ● ● ● ● ●●●● ●● ●

●● ● ●●●●● ● ● ●●● ● ● ●● ● ●● ● ●●●●
●● ●● ●● ●

●●

●●

● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●


●●
●●

● ●● ●● ● ● ●● ●
● ●●
● ● ● ● ●

● ●● ●
●●
●● ● ●●●● ● ● ● ●● ●
● ● ● ● ● ● ● ● ●

●●●

●●● ● ●●● ● ●

● ● ● ● ● ●● ● ● ●
● ● ● ● ●
● ●● ●

u2

u2

u2
●●

● ●●
● ● ●● ● ● ●
●●● ● ●●

● ● ● ●● ● ●

0.4

0.4

0.4
●●
● ● ●
●●● ●
● ● ●● ●●

●● ●●● ● ●● ●●
●● ●●●
●●●●● ●● ● ● ●● ● ●● ●● ● ●
●●●

●● ● ● ● ●● ● ●
● ●
● ● ● ●


●●
●● ●● ●●
● ●
● ● ● ● ●●● ●

●●
●●


● ●
● ●
●●● ●
●●
● ●
● ●
● ● ● ● ●● ● ● ● ● ● ●●

● ●●●● ● ●● ●● ● ●●●
●●

●●
●●
●● ●
●●●● ● ● ●● ● ●●● ● ● ● ● ●●●

●● ●●●●●● ● ●●●●●● ● ●


●●● ●●●
● ●●● ● ●●
●● ●● ● ●●● ●
● ●

0.0
● ● ●● ●● ●

0.0

0.0


●●
● ● ●● ●

●●●

● ●● ●

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8


u1 u1 u1

θ = −0.1 θ = 0.1 θ=1

●●● ●●● ● ● ●●●●● ● ●● ● ●●●● ●


●●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ●● ● ● ● ● ●
● ● ● ● ●● ●
● ● ● ●● ● ● ●● ●●● ●● ●●
●● ● ●● ●●● ● ● ● ● ●● ●● ●● ● ●

0.8
● ● ● ●

0.8
0.8


● ● ●● ●
● ●● ●● ● ● ●● ● ● ●●●●
● ● ●● ●● ●
●● ●
● ●● ● ● ●
●●● ●● ● ● ● ●●● ● ● ● ● ●
● ●●
● ●
●●
● ●● ● ●● ● ●
● ● ● ●
● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●●
● ●●● ● ●●● ●● ● ● ● ●● ●● ● ● ● ●
●● ● ● ●● ● ●● ●●● ● ●
●●● ● ● ●● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ●●

● ● ●● ●●●● ●● ● ● ●● ● ● ● ● ●
● ●●

●● ● ● ●

u2

u2

u2
● ● ● ●● ●● ●
●●● ●● ●●● ● ● ● ●
● ●●
● ● ● ●● ●
● ● ● ●● ● ●
●●
● ● ● ●● ● ●● ●
● ●
●●● ●● ● ● ●●

0.4
0.4

0.4
● ● ●
● ● ●● ● ● ● ●●
● ● ●● ● ●
● ● ●● ● ● ● ●●● ● ●● ●
●●● ● ● ● ● ● ●●● ● ●●● ●
●● ●● ● ● ●
●●
● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ●●●● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ●● ● ● ●
●● ● ● ● ● ● ●● ● ● ● ● ●●●● ●

● ●●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●
● ●●●● ● ● ●●● ● ●
● ● ● ●
● ● ● ● ●● ● ●● ● ●● ●
● ● ● ●
●● ● ● ● ● ●●
● ● ● ● ● ●● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ●●● ●●●●●● ● ● ●●
●●
● ● ●
0.0

0.0
0.0
● ● ●● ●●●● ● ● ● ● ● ●● ● ● ●●● ●● ●

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8


u1 u1 u1

θ=5 θ = 15 θ = 100

● ●● ●● ● ●●
●● ●●

●●
● ●
●●
●●

● ●●● ● ● ●●●● ●●●● ●●
●●●●
●● ● ●

●●






●●

●●
●● ● ● ● ●●
● ● ●● ●● ●



●●
●●
0.8

0.8

0.8

● ●● ● ●●●

●● ●●● ●●●● ●●●●●●●● ●● ●
●●
●●●

●●
● ●● ● ● ● ● ●●●●● ●

●● ● ●●● ●


●●


●●●


● ●● ●● ● ● ● ● ● ● ●●●
●●●●
●● ●

●●●




● ●● ●● ●● ● ●● ●
●●●● ●


● ●
● ●●● ● ●●●●● ●


● ●
● ● ●
●●

● ●
● ● ● ● ● ●
●●●●
● ● ●● ●● ●●●●

● ●●


●●


u2

u2

u2
● ● ●●●●●
●● ●●
● ● ●●●● ● ● ● ●
●●
● ● ●
●●
0.4

0.4

0.4
● ● ● ● ●
●●● ● ● ● ● ●●●


●●
●●

●●●●

●●●

●●●


●●●



● ●●●●●● ●
●● ● ● ●●

●● ●



●●

● ●●● ●● ● ● ● ●● ●● ●
●●
●● ● ●
●●●●● ● ●●
●●● ●●
● ●●
●●
●●●

● ●
● ●
●●

● ●
●● ●

●●
● ●●



●●

●●● ● ●
● ●●●● ●


●●●
●●● ● ●

●●
● ● ●●


●●● ●
0.0

0.0

● ●●

0.0
●●


●● ●

●●
● ●

● ●

● ●

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8


u1 u1 u1

Figure 8.3 from Ruppert and Matteson: Bivariate random samples of size 200 from various
Clayton copulas.
Measures of Dependence
There are three principal measures of dependence:
1. The usual Pearson, i.e. linear, correlation coefficient
- invariant under positive linear transformations, but not under general strictly
increasing transformations
- there are many fallacies associated with the Pearson correlation
- not defined unless second moments exist.

2. Rank correlations
- only depend on the unique copula of the joint distribution
- therefore (why?) invariant to strictly increasing transformations
- also very useful for calibrating copulas to data.

3. Coefficients of tail dependence


- a measure of dependence in the extremes of the distributions.

25 (Section 4)
Fallacies of The Correlation Coefficient
Each of the following statements is false!
1. The marginal distributions and correlation matrix are enough to determine
the joint distribution
- how would we find a counterexample?

2. For given univariate distributions, F1 and F2 , and any correlation value


ρ ∈ [−1, 1], it is always possible to construct a joint distribution F with
margins F1 and F2 and correlation ρ.

3. The VaR of the sum of two risks is largest when the two risks have maximal
correlation.

Definition: We say two random variables, X1 and X2 , are of the same type if
there exist constants a > 0 and b ∈ R such that

X1 ∼ aX2 + b.

26 (Section 4)
On Fallacy #2
Theorem: Let (X1 , X2 ) be a random vector with finite-variance marginal CDF’s
F1 and F2 , respectively, and an unspecified joint CDF.
Assuming Var(X1 ) > 0 and Var(X2 ) > 0, then the following statements hold:

1. The attainable correlations form a closed interval [ρmin , ρmax ] with


ρmin < 0 < ρmax .

2. The minimum correlation ρ = ρmin is attained if and only if X1 and X2 are


countermonotonic. The maximum correlation ρ = ρmax is attained if and
only if X1 and X2 are comonotonic.

3. ρmin = −1 if and only if X1 and −X2 are of the same type. ρmax = 1 if and
only if X1 and X2 are of the same type.

The proof is not very difficult; see Section 7.2 of MFE for details.

27 (Section 4)
Spearman’s Rho
Definition: For random variables X1 and X2 , Spearman’s rho is defined as

ρs (X1 , X2 ) := ρ(F1 (X1 ), F2 (X2 )).

So Spearman’s rho is simply the linear correlation of the probability-transformed


random variables.

The Spearman’s rho matrix is simply the matrix of pairwise Spearman’s rho
correlations, ρ(Fi (Xi ), Fj (Xj )) – a positive-definite matrix. Why?

If X1 and X2 have continuous marginals then can show


Z 1 Z 1
ρs (X1 , X2 ) = 12 (C (u1 , u2 ) − u1 u2 ) du1 du2 .
0 0

Can show that for a bivariate Gaussian copula


6 ρ
ρs (X1 , X2 ) = arcsin ' ρ
π 2
where ρ is the Pearson, i.e., linear, correlation coefficient.
28 (Section 4)
Kendall’s Tau
Definition: For random variables X1 and X2 , Kendall’s tau is defined as
 
ρτ (X1 , X2 ) := E sign (X1 − X̃1 ) (X2 − X̃2 )

where (X̃1 , X̃2 ) is independent of (X1 , X2 ) but has same joint distribution as
(X1 , X2 ).

Note that Kendall’s tau can be written as


 
ρτ (X1 , X2 ) = P (X1 − X̃1 ) (X2 − X̃2 ) > 0 − P (X1 − X̃1 ) (X2 − X̃2 ) < 0

- so if both probabilities are equal then ρτ (X1 , X2 ) = 0.

29 (Section 4)
Kendall’s Tau
If X1 and X2 have continuous marginals then can show
Z 1 Z 1
ρτ (X1 , X2 ) = 4 C (u1 , u2 ) dC (u1 , u2 ) − 1.
0 0

Can also show that for a bivariate Gaussian copula, or more generally, if
X ∼ E2 (µ, P, ψ) and P(X = µ) = 0
2
ρτ (X1 , X2 ) = arcsin ρ (10)
π
where ρ = P12 = P21 is the Pearson correlation coefficient.

Note (10) very useful for estimating ρ with fat-tailed elliptical distributions
- generally provides much more robust estimates of ρ than usual Pearson
estimator
- see figure on next slide where each estimate was constructed from a sample
of n = 60 (simulated) data-points.
30 (Section 4)
1

Pearson Correlation 0.5

−0.5

−1
0 500 1000 1500 2000

1
Kendall’s Tau

0.5

−0.5

−1
0 500 1000 1500 2000

Estimating Pearson’s correlation using the usual Pearson estimator versus using Kendall’s τ .
Underlying distribution was bivariate t with ν = 3 degrees-of-freedom and true Pearson
correlation ρ = 0.5.
Properties of Spearman’s Rho and Kendall’s Tau
Spearman’s rho and Kendall’s tau are examples of rank correlations in that, when
the marginals are continuous, they depend only on the bivariate copula and not
on the marginals
- they are invariant (why?) in this case under strictly increasing
transformations.

They both take values in [−1, 1]:


- they equal 0 for independent random variables; but possible for dependent
variables to also have a rank correlation of 0.
- they take the value 1 when X1 and X2 are comonotonic
- they take the value −1 when X1 and X2 are countermonotonic.
They are very useful for calibrating copulas via method-of-moments type
algorithms.

Fallacy #2 is no longer an issue when we work with rank correlations.

32 (Section 4)
Tail Dependence
Definition: Let X1 and X2 denote two random variables with CDF’s F1 and F2 ,
respectively. Then the coefficient of upper tail dependence, λu , is given by

λu := lim P (X2 > F2← (q) | X1 > F1← (q))


q%1

provided that the limit exists.

Similarly, the coefficient of lower tail dependence, λl , is given by

λl := lim P (X2 ≤ F2← (q) | X1 ≤ F1← (q))


q&0

provided again that the limit exists.

If λu > 0, then we say that X1 and X2 have upper tail dependence while if
λu = 0 we say they are asymptotically independent in the upper tail.
Lower tail dependence and asymptotically independent in the lower tail are
similarly defined using λl .

33 (Section 4)
1.0
ν=1
ν=4
ν = 25
0.8

ν = 250
0.6
λl = λu

0.4
0.2
0.0

−1.0 −0.5 0.0 0.5 1.0


ρ
Figure 8.6 from Ruppert and Matteson: Coefficients of tail dependence for bivariate t-copulas
as functions of ρ for ν = 1, 4, 25 and 250.
Simulating the Gaussian Copula
1. For an arbitrary covariance matrix, Σ, let P be its corresponding correlation
matrix.
2. Compute the Cholesky decomposition, A, of P so that P = AT A.
3. Generate Z ∼ MNd (0, Id ).
4. Set X = AT Z.
5. Return U = (Φ(X1 ), . . . , Φ(Xd )).
The distribution of U is the Gaussian copula CPGauss (u) so that

Prob(U1 ≤ u1 , . . . , Ud ≤ ud ) = Φ Φ−1 (u1 ), . . . , Φ−1 (ud )




- this is also (why?) the copula of X.

Desired marginal inverses can now be applied to each component of U in order


to generate the desired multivariate distribution with the given Gaussian copula
- follows again by invariance of copula to monotonic transformations.

35 (Section 5)
Simulating the t Copula
1. For an arbitrary covariance matrix, Σ, let P be its corresponding correlation
matrix.
2. Generate X ∼ MNd (0, P).
3. Generate ξ ∼ χ2ν independent of X.
 p p 
4. Return U = tν (X1 / ξ/ν), . . . , tν (Xd / ξ/ν) where tν is the CDF of a
univariate t distribution with ν degrees-of-freedom.
t
The distribution of U is the t copula Cν,P (u)
- this is also (why?) the copula of X.

Desired marginal inverses can now be applied to each component of U in order


to generate the desired multivariate distribution with the given t copula.

36 (Section 5)
Estimating / Calibrating Copulas
There are several related methods that can be used for estimating copulas:
1. Maximum likelihood estimation (MLE)
2. Pseudo-MLE of which there are two types:
- parametric pseudo-MLE
- semiparametric pseudo-MLE
3. Moment-matching methods are also sometimes used
- they can also be used for finding starting points for (pseudo) MLE.

MLE is often considered too difficult to apply


- too many parameters to estimate.
Pseudo-MLE seems to be used most often in practice
- marginals are estimated via their empirical CDFs
- then the copula can be estimated via MLE.

37 (Section 6)
Maximum Likelihood Estimation
Let Y = (Y1 . . . Yd )> be a random vector and suppose we have parametric
models FY1 (· | θ 1 ), . . . , FYd (· | θ d ) for the marginal CDFs.

Also have a parametric model cY (· | θ C ) for the copula density of Y.

By differentiating (2) we see that the density of Y is given by


d
Y
fY (y) = fY (y1 , . . . , yd ) = cY (FY1 (y1 ), . . . , FYd (yd )) fYj (yj ). (11)
j=1

Given an IID sample Y1:n = (Y1 , . . . , Yn ), we obtain the log-likelihood as


n
Y
log L(θ 1 , . . . , θ d , θ C ) = log fY (yi )
i=1
n 
X
= log[cY (FY1 (yi,1 | θ 1 ), . . . , FYd (yi,d | θ d ) | θ C )]
i=1

+ log (fY1 (yi,1 | θ 1 )) + · · · + log (fYd (yi,d | θ d )) (12)

38 (Section 6)
Maximum Likelihood Estimation
The ML estimators θ̂ 1 , . . . , θ̂ d , θ̂ C are obtained by maximizing (12).

But there are problems with this:


1. Many parameters – especially for large d – so optimization can be difficult.
2. If any of the parametric univariate distributions FYi (· | θ i ) are misspecified
then this can cause biases in estimation of both univariate distributions and
the copula.

The pseudo-MLE approach helps to resolve these problems.

39 (Section 6)
Pseudo-Maximum Likelihood Estimation
The pseudo-MLE approach has two steps:
1. First estimate the marginal CDFs to obtain F̂Yj for j = 1, . . . , d. Can do
this using either:
The empirical CDF of y1,j , . . . , yn,j so that
Pn
i=1
1{yi,j ≤y}
F̂Yj (y) =
n+1
A parametric model with θ̂ j obtained using usual MLE approach.
2. Then estimate the copula parameters θ C by maximizing
n
X h  i
log cY F̂Y1 (yi,1 ), . . . , F̂Yd (yi,d ) | θ C (13)
i=1

- Note relation between (12) and (13)!

Even (13) may be difficult to maximize if d large


- important then to have good starting point for the optimization or to impose
additional structure on θ C .
40 (Section 6)
Fitting Gaussian and t Copulas
Proposition (Results 8.1 from Ruppert and Matteson):
Let Y = (Y1 . . . Yd )> have a meta-Gaussian distribution with continuous
Gauss
univariate marginal distributions and copula CΩ and let Ωi,j = [Ω]i,j . Then

2
ρτ (Yi , Yj ) = arcsin (Ωi,j ) (14)
π
and
6
ρS (Yi , Yj ) = arcsin (Ωi,j /2) ≈ Ωi,j (15)
π
If instead Y has a meta-t distribution with continuous univariate marginal
t
distributions and copula Cν,Ω then (14) still holds but (15) does not. 2

Question: There are several ways to use this Proposition to fit meta Gaussian
and t copulas. What are some of them?

41 (Section 6)
Collateralized Debt Obligations (CDO’s)
Want to find the expected losses in a simple 1-period CDO with the following
characteristics:
The maturity is 1 year.
There are N = 125 bonds in the reference portfolio.
Each bond pays a coupon of one unit after 1 year if it has not defaulted.
The recovery rate on each defaulted bond is zero.
There are 3 tranches of interest:

1. The equity tranche with attachment points: 0-3 defaults


2. The mezzanine tranche with attachment points: 4-6 defaults
3. The senior tranche with attachment points: 7-125 defaults.

Assume probability, q, of defaulting within 1 year is identical across all bonds.

42 (Section 7)
Collateralized Debt Obligations (CDO’s)
Xi is the normalized asset value of the i th credit and we assume
√ p
Xi = ρM + 1 − ρ Zi (16)
where M , Z1 , . . . , ZN are IID normal random variables
- note correlation between each pair of asset values is identical.

We assume also that i th credit defaults if Xi ≤ x̄i .

Since probability, q, of default is identical across all bonds must therefore have
x̄1 = · · · x̄N = Φ−1 (q). (17)
It now follows from (16) and (17) that
P(i defaults | M ) = P(Xi ≤ x̄i | M )

= P( ρM + 1 − ρ Zi ≤ Φ−1 (q) | M )
p

Φ−1 (q) − ρM
 
= P Zi ≤ √ M .
1−ρ

43 (Section 7)
Collateralized Debt Obligations (CDO’s)
Therefore conditional on M , the total number of defaults is Bin(N , qM ) where
 −1 √
Φ (q) − ρM

qM := Φ √ .
1−ρ

That is,  
N k
p(k | M ) = q (1 − qM )N −k .
k M

Unconditional probabilities computed by numerically integrating the binomial


probabilities with respect to M so that
Z ∞
P(k defaults) = p(k | M )φ(M ) dM .
−∞

Can now compute expected (risk-neutral) loss on each of the three tranches:

44 (Section 7)
Collateralized Debt Obligations (CDO’s)
2
X
EQ
0 [Equity tranche loss] = 3 × P(3 or more defaults) + k P(k defaults)
k=1
X2
EQ
0 [Mezz tranche loss] = 3 × P(6 or more defaults) + k P(k + 3 defaults)
k=1
119
X
EQ
0 [Senior tranche loss] = k P(k + 6 defaults).
k=1

45 (Section 7)
Collateralized Debt Obligations (CDO’s)
Regardless of the individual default probability, q, and correlation, ρ, we have:

EQ Q Q
0 [% Equity tranche loss] ≥ E0 [% Mezz tranche loss] ≥ E0 [% Senior tranche loss] .

Also note that expected equity tranche loss always decreasing in ρ.

Expected mezzanine tranche loss often relatively insensitive to ρ.

Expected senior tranche loss (with upper attachment point of 100%) always
increasing in ρ.

46 (Section 7)
Expected Tranche Losses As a Function of ρ
Collateralized Debt Obligations (CDO’s)
Question: How does the total expected loss in the portfolio vary with ρ?

The dependence structure we used in (??) to link the default events of the
various bonds is the famous Gaussian-copula model.

In practice CDO’s are multi-period securities and can be cash or synthetic


CDO’s.

48 (Section 7)
Multi-period CDO’s
Will now assume:
There are N credits in the reference portfolio.
Each credit has a notional amount of Ai .
If the i th credit defaults, then the portfolio incurs a loss of Ai × (1 − Ri )
- Ri is the recovery rate
- we assume Ri is fixed and known.
The default time of i th credit is Exp(λi ) with CDF Fi
- λi easily estimated from either CDS spreads or prices of corporate bonds
- so can compute Fi (t), the risk-neutral probability that i th credit defaults
before time t.

We can easily relax the exponential distribution assumption


- we simply need to be able to estimate Fi (t).

49 (Section 7)
Multi-period CDO’s
Again Xi is the normalized asset value of the i th credit and we assume
q
Xi = ai M + 1 − ai2 Zi (18)

where M , Z1 , . . . , ZN are IID normal random variables.

The factor loadings, ai , are assumed to lie in interval [0, 1].

Clear that Corr(Xi , Xj ) = ai aj with covariance matrix equal to correlation


matrix, P, where Pi,j = ai aj for i 6= j.

Let F (t1 , . . . , tn ) denote joint distribution of the default times of the N credits.

Then assume

= ΦP Φ−1 (F1 (t1 )), . . . , Φ−1 (Fn (tn ))



F (t1 , . . . , tn ) (19)

where ΦP (·) is the multivariate normal CDF with mean 0 and correlation P
- so distribution of default times has a Gaussian copula!
50 (Section 7)
Computing the Portfolio Loss Distribution
In order to price credit derivatives, we need to compute the portfolio loss
distribution.

We fix t = t1 = · · · = tN and set qi := Fi (t).

As before, the N default events are independent, conditional on M . They are


given by !
Φ−1 (qi ) − ai M
qi (t|M ) = Φ p .
1 − ai2
Now let pN (l, t) = risk-neutral probability that there are a total of l portfolio
defaults before time t.

Then may write Z ∞


p(l, t) = pN (l, t|M ) φ(M ) dM . (20)
−∞

51 (Section 7)
Computing the Portfolio Loss Distribution
Straightforward to calculate pN (l, t|M ) using a simple iterative procedure.

Can then perform a numerical integration on the right-hand-side of (20) to


calculate p(l, t).

If we assume that the notional, Ai , and the recovery rate, Ri , are constant across
all credits, then the loss on any given credit will be either 0 or A(1 − R).

This then implies that knowing probability distribution of the number of defaults
is equivalent to knowing probability distribution of the total loss in the reference
portfolio.

52 (Section 7)
The Mechanics and Pricing of a CDO Tranche
A tranche is defined by the lower and upper attachment points, L and U ,
respectively.

The tranche loss function, TLL,U (l), for a fixed time, t, is a function of the
number of defaults, l, and is given by

TLtL,U (l) := max {min{lA(1 − R), U } − L, 0} .

For a given number of defaults it tells us the loss suffered by the tranche.

For example:
Suppose L = 3% and U = 7%
Suppose also that total portfolio loss is lA(1 − R) = 5%
Then tranche loss is 2% of total portfolio notional
- or 50% of tranche notional.

53 (Section 7)
The Mechanics and Pricing of a CDO Tranche
When an investor sells protection on the tranche she is guaranteeing to reimburse
any realized losses on the tranche to the protection buyer.

In return, the protection seller is paid a premium at regular intervals


- typically every three months
- though in some cases protection buyer may also pay an upfront amount in
addition to, or instead of, a regular premium
- an upfront typically occurs for equity tranches which have a lower
attachment point of zero.

The fair value of the CDO tranche is that value of the premium for which the
expected value of the premium leg equals the expected value of the default leg.

54 (Section 7)
The Mechanics and Pricing of a CDO Tranche
Clearly then the fair value of the CDO tranche depends on the expected value of
the tranche loss function.

Indeed, for a fixed time, t, the expected tranche loss is given by

h i N
X
E TLtL,U = TLtL,U (l) p(l, t)
l=0

– which we can compute using (20).

We now compute the fair value of the premium and default legs ...

55 (Section 7)
Fair Value of the Premium Leg
Premium leg represents the premium payments that are paid periodically by the
protection buyer to the protection seller.

They are paid at the end of each time interval and they are based upon the
remaining notional in the tranche.

Formally, the time t = 0 value of the premium leg, P0L,U , satisfies


n
X  h i
PL0L,U = s dt ∆t (U − L) − E0 TLtL,U (21)
t=1

n is the number of periods in the contract


dt is the risk-free discount factor for payment date t
s is the annualized spread or premium paid to the protection seller
∆t is the accrual factor for payment date t, e.g. ∆t = 1/4 if payments take
place quarterly.

Note that (21) is consistent with the statement that the premium paid at any
time t is based only on the remaining notional in the tranche.
56 (Section 7)
Fair Value of the Default Leg
Default leg represents the cash flows paid to the protection buyer upon losses
occurring in the tranche.

Formally, the time t = 0 value of the default leg, DL0L,U , satisfies


n
X  h i h i
DL0L,U = L,U
dt E0 TLtL,U − E0 TLt−1 . (22)
t=1

57 (Section 7)
Fair Value of the Tranche
The fair premium, s ∗ say, is the value of s that equates the value of the default
leg with the value of the premium leg:

DL0L,U
s ∗ := P  h i .
n L,U
t=1 d t ∆ t (U − L) − E0 TLt

As is the case with swaps and forwards, the fair value of the tranche to the
protection buyer and seller at initiation is therefore zero.

Easy to incorporate any possible upfront payments that the protection buyer
must pay at time t = 0 in addition to the regular premium payments.

Can also incorporate recovery values and notional values that vary with each
credit in the portfolio.

58 (Section 7)
Cash CDO’s
First CDOs to be traded were all cash CDOs
- the reference portfolio actually existed and consisted of corporate bonds that
the CDO issuer usually kept on its balance sheet.

Capital requirements meant that these bonds required a substantial amount of


capital to be set aside to cover any potential losses.

To reduce these capital requirements, banks converted the portfolio into a series
of tranches and sold most of these tranches to investors.

Banks usually kept the equity tranche for themselves. This meant:
They kept most of the economic risk and rewards of the portfolio
But they also succeeded in dramatically reducing the amount of capital they
needed to set aside
Hence first CDO deals were motivated by regulatory arbitrage considerations.

59 (Section 7)
Synthetic CDO’s
Soon become clear there was an appetite in the market-place for these products.

e.g. Hedge funds were keen to buy the riskier tranches while insurance companies
and others sought the AAA-rated senior and super-senior tranches.

This appetite and explosion in the CDS market gave rise to synthetic tranches
where:
the underlying reference portfolio is no longer a physical portfolio of
corporate bonds or loans
it is instead a fictitious portfolio consisting of a number of credits with an
associated notional amount for each credit.

60 (Section 7)
Synthetic CDO’s
Mechanics of a synthetic tranche are precisely as described earlier.

But they have at least two features that distinguish them from cash CDOs:
(i) With a synthetic CDO it is no longer necessary to tranche the entire
portfolio and sell the entire “deal"
e.g. A bank could sell protection on a 3%-7% tranche and never have to
worry about selling the other pieces of the reference portfolio. This is not
the case with cash CDOs.

(ii) Because the issuer no longer owns the underlying bond portfolio, it is no
longer hedged against adverse price movements
- it therefore needs to dynamically hedge its synthetic tranche position and
would have typically done so using the CDS markets.

61 (Section 7)
Calibrating the Gaussian Copula Model
In practice, very common to calibrate synthetic tranches as follows:
1. Assume all pairwise correlations, Corr(Xi , Xj ), are identical
- equivalent to taking a1 = · · · = aN = a in (18) so that
Corr(Xi , Xj ) = a 2 := ρ for all i, j.

2. In the case of the liquid CDO tranches whose prices are observable in the
market-place, we then choose ρ so that the fair tranche spread in the model
is equal to the quoted spread in the market place.

We refer to this calibrated correlation, ρimp say, as the tranche implied


correlation.

62 (Section 7)
Calibrating the Gaussian Copula Model
If the model is correct, then every tranche should have the same ρimp .

Unfortunately, this does not occur in practice.

Indeed, in the case of mezzanine tranches it is possible that there is no value of ρ


that fits the market price!
It is also possible that there are multiple solutions.

The market responded to this problem by introducing the concept of base


correlations
- they are the implied correlations of equity tranches with increasing upper
attachment points.

Implied base correlations can always be computed and then bootstrapping


techniques are employed to price the mezzanine tranches.

Just as equity derivatives markets have an implied volatility surface, the CDO
market has implied base correlation curves.
63 (Section 7)
Calibrating the Gaussian Copula Model
The implied base correlation curve is generally an increasing function of the
upper attachment point.

A distinguishing feature of CDOs and other credit derivatives such as


n th -to-default options is that they can be very sensitive to correlation
assumptions.

As a risk manager or investor in structured credit, it is very important to


understand why equity, mezzanine and super senior tranches react as they do to
changes in implied correlation.

64 (Section 7)
IEOR E4602: Quantitative Risk Management
Risk Measures

Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]

Reference: Chapter 8 of 2nd ed. of MFE’s Quantitative Risk Management.


Risk Measures
Let M denote the space of random variables representing portfolio losses
over some fixed time interval, ∆.
Assume that M is a convex cone so that
If L1 , L2 ∈ M then L1 + L2 ∈ M
And λL1 ∈ M for every λ > 0.

A risk measure is a real-valued function, % : M → R, that satisfies certain


desirable properties.
%(L) may be interpreted as the riskiness of a portfolio or ...
... the amount of capital that should be added to the portfolio so that it can
be deemed acceptable
Under this interpretation, portfolios with %(L) < 0 are already acceptable
In fact, if %(L) < 0 then capital could even be withdrawn.

2 (Section 1)
Axioms of Coherent Risk Measures
Translation Invariance For all L ∈ M and every constant a ∈ R, we have

%(L + a) = %(L) + a.

- necessary if earlier risk-capital interpretation is to make sense.

Subadditivity: For all L1 , L2 ∈ M we have

%(L1 + L2 ) ≤ %(L1 ) + %(L2 )

- reflects the idea that pooling risks helps to diversify a portfolio


- the most debated of the risk axioms
- allows for the decentralization of risk management.

3 (Section 1)
Axioms of Coherent Risk Measures
Positive Homogeneity For all L ∈ M and every λ > 0 we have

%(λL) = λ%(L).

- also controversial: has been criticized for not penalizing concentration of risk
- e.g. if λ > 0 very large, then perhaps we should require %(λL) > λ%(L)
- but this would be inconsistent with subadditivity:

%(nL) = %(L + · · · + L) ≤ n%(L) (1)

- positive homogeneity implies we must have equality in (1).

Monotonicity For L1 , L2 ∈ M such that L1 ≤ L2 almost surely, we have

%(L1 ) ≤ %(L2 )

- clear that any risk measure should satisfy this axiom.


4 (Section 1)
Coherent Risk Measures
Definition: A risk measure, %, acting on the convex cone M is called coherent if
it satisfies the translation invariance, subadditivity, positive homogeneity and
monotonicity axioms. Otherwise it is incoherent.

Coherent risk measures were introduced in 1998


- and a large literature has developed since then.

5 (Section 1)
Convex Risk Measures
Criticisms of subadditivity and positive homogeneity axioms led to the study
of convex risk measures.
A convex risk measure satisfies the same axioms as a coherent risk measure
except that subadditivity and positive homogeneity axioms are replaced by
the convexity axiom:

Convexity Axiom For L1 , L2 ∈ M and λ ∈ [0, 1]

%(λL1 + (1 − λ)L2 ) ≤ λ%(L1 ) + (1 − λ)%(L2 )

It is possible to find risk measures within the convex class that satisfy
%(λL) > λ%(L) for λ > 1.

6 (Section 1)
Value-at-Risk
Recall ...

Definition: Let α ∈ (0, 1) be some fixed confidence level. Then the VaR of the
portfolio loss, L, at the confidence level, α, is given by

VaRα := qα (L) = inf{x ∈ R : FL (x) ≥ α}

where FL (·) is the CDF of the random variable, L.

Value-at-Risk is not a coherent risk measure since it fails to be subadditive!

7 (Section 1)
Example 1
Consider two IID assets, X and Y where

X =  + η where  ∼ N(0, 1)

0, with prob .991
and η =
−10, with prob .009.

Consider a portfolio consisting of X and Y . Then

VaR.99 (X + Y ) = 9.8
> VaR.99 (X ) + VaR.99 (Y )
= 3.1 + 3.1
= 6.2

- thereby demonstrating the non-subadditivity of VaR.

8 (Section 1)
Example 2: Defaultable Bonds
Consider a portfolio of n = 100 defaultable corporate bonds
Probability of default over next year identical for all bonds and equal to 2%.
Default events of different bonds are independent.
Current price of each bond is 100.
If bond does not default then will pay 105 one year from now
- otherwise there is no repayment.

Therefore can define the loss on the i th bond, Li , as

Li := 105Yi − 5

where Yi = 1 if the bond defaults over the next year and Yi = 0 otherwise.

By assumption also see that P(Li = −5) = .98 and P(Li = 100) = .02.

9 (Section 1)
Example 2: Defaultable Bonds
Consider now the following two portfolios:
A: A fully concentrated portfolio consisting of 100 units of bond 1.
B: A completely diversified portfolio consisting of 1 unit of each of the 100
bonds.
We want to compute the 95% VaR for each portfolio.

Obtain VaR.95 (LA ) = −500, representing a gain(!) and VaR.95 (LB ) = 25.

So according to VaR.95 , portfolio B is riskier than portfolio A


- absolute nonsense!
Have shown that
100
! 100
X X
VaR.95 Li ≥ 100 VaR.95 (L1 ) = VaR.95 (Li )
i=1 i=1

demonstrating again that VaR is not sub-additive.

10 (Section 1)
Example 2: Defaultable Bonds
Now let % be any coherent risk measure depending only on the distribution of L.

Then obtain (why?)


100
! 100
X X
% Li ≤ %(Li ) = 100%(L1 )
i=1 i=1

- so % would correctly classify portfolio A as being riskier than portfolio B.


We now describe a situation where VaR is always sub-additive ...

11 (Section 1)
Subadditivity of VaR for Elliptical Risk Factors
Theorem
Suppose that X ∼ En (µ, Σ, ψ) and let M be the set of linearized portfolio losses
of the form
Xn
M := {L : L = λ0 + λi Xi , λi ∈ R}.
i=1

Then for any two losses L1 , L2 ∈ M, and 0.5 ≤ α < 1,

VaRα (L1 + L2 ) ≤ VaRα (L1 ) + VaRα (L2 ).

12 (Section 1)
Proof of Subadditivity of VaR for Elliptical Risk Factors
Without (why?) loss of generality assume that λ0 = 0.

Recall if X ∼ En (µ, Σ, ψ) then X = AY + µ where A ∈ Rn×k , µ ∈ Rn and


Y ∼ Sk (ψ) is a spherical random vector.

Any element L ∈ M can therefore be represented as

L = λT X = λT AY + λT µ
∼ ||λT A|| Y1 + λT µ (2)

- (2) follows from part 3 of Theorem 2 in Multivariate Distributions notes.


Translation invariance and positive homogeneity of VaR imply

VaRα (L) = ||λT A|| VaRα (Y1 ) + λT µ.

Suppose now that L1 := λT T


1 X and L2 := λ2 X. Triangle inequality implies

||(λ1 + λ2 )T A|| ≤ ||λT T


1 A|| + ||λ2 A||

Since VaRα (Y1 ) ≥ 0 for α ≥ .5 (why?), result follows from (2). 2


13 (Section 1)
Subadditivity of VaR
Widely believed that if individual loss distributions under consideration are
continuous and symmetric then VaR is sub-additive.

This is not true(!)


Counterexample may be found in Chapter 8 of MFE
The loss distributions in the counterexample are smooth and symmetric but
the copula is highly asymmetric.

VaR can also fail to be sub-additive when the individual loss distributions have
heavy tails.

14 (Section 1)
Expected Shortfall
Recall ...

Definition: For a portfolio loss, L, satisfying E[|L|] < ∞ the expected shortfall
(ES) at confidence level α ∈ (0, 1) is given by
Z 1
1
ESα := qu (FL ) du.
1−α α

Relationship between ESα and VaRα therefore given by


Z 1
1
ESα := VaRu (L) du (3)
1−α α

- clear that ESα (L) ≥ VaRα (L).

When the CDF, FL , is continuous then a more well known representation given by

ESα = E [L | L ≥ VaRα ] .

15 (Section 1)
Expected Shortfall
Theorem: Expected shortfall is a coherent risk measure.

Proof: Translation invariance, positive homogeneity and monotonicity properties


all follow from the representation of ES in (3) and the same properties for
quantiles.

Therefore only need to demonstrate subadditivity


- this is proven in lecture notes. 2

There are many other examples of risk measures that are coherent
- e.g. risk measures based on generalized scenarios
- e.g. spectral risk measures
- of which expected shortfall is an example.

16 (Section 1)
Risk Aggregation
Let L = (L1 , . . . , Ln ) denote a vector of random variables
- perhaps representing losses on different trading desks, portfolios or operating
units within a firm.

Sometimes need to aggregate these losses into a random variable, ψ(L), say.

Common examples include:


Pn
1. The total loss so that ψ(L) = i=1 Li .
2. The maximum loss where ψ(L) = max{L1 , . . . , Ln }.
Pn
3. The excess-of-loss treaty so that ψ(L) = i=1 (Li − ki )+ .
Pn +
4. The stop-loss treaty in which case ψ(L) = ( i=1 Li − k) .

17 (Section 2)
Risk Aggregation
Want to understand the risk of the aggregate loss function, %(ψ(L))
- but first need the distribution of ψ(L).

Often know only the distributions of the Li ’s


- so have little or no information about the dependency or copula of the Li ’s.

In this case can try to compute lower and upper bounds on %(ψ(L)):

%min := inf{%(ψ(L)) : Li ∼ Fi , i = 1, . . . , n}
%max := sup{%(ψ(L)) : Li ∼ Fi , i = 1, . . . , n}

where Fi is the CDF of the loss, Li .

Problems of this type are referred to as Frechet problems


- solutions are available in some circumstances, e.g. attainable correlations.
Pn
Have been studied in some detail when ψ(L) = i=1 Li and %(·) is the VaR
function.
18 (Section 2)
Capital Allocation
Pn
Total loss given by L = i=1 Li .

Suppose we have determined the risk, %(L), of this loss.

The capital allocation problem seeks a decomposition, AC1 , . . . , ACn , such that
n
X
%(L) = ACi (4)
i=1

- ACi is interpreted as the risk capital allocated to the i th loss, Li .


This problem is important in the setting of performance evaluation where we
want to compute a risk-adjusted return on capital (RAROC).

e.g. We might set RAROCi = Expected Profiti / Risk Capitali


- must determine risk capital of each Li in order to compute RAROCi .

19 (Section 3)
Capital Allocation
Pn
More formally, let L(λ) := i=1 λi Li be the loss associated with the portfolio
consisting of λi units of the loss, Li , for i = 1, . . . , n.

Loss on actual portfolio under consideration then given by L(1).

Let %(·) be a risk measure on a space M that contains L(λ) for all λ ∈ Λ, an
open set containing 1.

Then the associated risk measure function, r% : Λ → R, is defined by

r% (λ) = %(L(λ)).

We have the following definition ...

20 (Section 3)
Capital Allocation Principles
Definition: Let r% be a risk measure function on some set Λ ⊂ Rn \ 0 such that
1 ∈ Λ.
Then a mapping, f r% : Λ → Rn , is called a per-unit capital allocation principle
associated with r% if, for all λ ∈ Λ, we have
n
r
X
λi fi % (λ) = r% (λ). (5)
i=1

r
We then interpret fi % as the amount of capital allocated to one unit of Li
when the overall portfolio loss is L(λ).
r
The amount of capital allocated to a position of λi Li is therefore λi fi % and
so by (5), the total risk capital is fully allocated.

21 (Section 3)
The Euler Allocation Principle
Definition: If r% is a positive-homogeneous risk-measure function which is
differentiable on the set Λ, then the per-unit Euler capital allocation principle
associated with r% is the mapping

r ∂r%
f r% : Λ → Rn : fi % (λ) = (λ).
∂λi

The Euler allocation principle is a full allocation principle since a well-known


property of any positive homogeneous and differentiable function, r(·) is
that it satisfies
n
X ∂r
r(λ) = λi (λ).
i=1
∂λi
The Euler allocation principle therefore gives us different risk allocations for
different positive homogeneous risk measures.
There are good economic reasons for employing the Euler principle when
computing capital allocations.

22 (Section 3)
Value-at-Risk and Value-at-Risk Contributions
α
Let rVaR (λ) = VaRα (L(λ)) be our risk measure function.

Then subject to technical conditions can be shown that


α
rα ∂rVaR
fi VaR (λ) = (λ)
∂λi

= E [Li | L(λ) = VaRα (L(λ))] , for i = 1, . . . , n. (6)

Capital allocation, ACi , for Li is then obtained by setting λ = 1 in (6).

Will now use (6) and Monte-Carlo to estimate the VaR contributions from each
security in a portfolio.
- Monte-Carlo is a general approach that can be used for complex portfolios
where (6) cannot be calculated analytically.

23 (Section 3)
An Application: Estimating Value-at-Risk Contributions
Pn
Recall total portfolio loss is L = i=1 Li .

According to (6) with λ = 1 we know that

ACi = E [Li | L = VaRα (L)] (7)

∂ VaRα (λ)
=
∂λi λ=1

∂ VaRα
= wi (8)
∂wi

for i = 1, . . . , n and where wi is the number of units of the i th security held in


the portfolio.

Question: How might we use Monte-Carlo to estimate the VaR contribution,


ACi , of the i th asset?

Solution: There are three approaches we might take:


24 (Section 3)
First Approach: Monte-Carlo and Finite Differences
As ACi is a (mathematical) derivative we could estimate it numerically using a
finite-difference estimator.

Such an estimator based on (8) would take the form


i,+ i,−
d i := VaRα − VaRα
AC (9)
2δi
where VaRi,+ i,−
α (VaRα ) is the portfolio VaR when number of units of the i
th

security is increased (decreased) by δi wi units.

Each term in numerator of (9) can be estimated via Monte-Carlo


- same set of random returns should be used to estimate each term.

What value of δi should we use? There is a bias-variance tradeoff but a value of


δi = .1 seems to work well.
Pn d
This estimator will not satisfy the additivity property so that i AC i 6= VaRα

- but easy to re-scale estimated AC i ’s so that the property will be satisfied.


d
25 (Section 3)
Second Approach: Naive Monte-Carlo
Another approach is to estimate (7) directly. Could do this by simulating N
Pn (j)
portfolio losses L(1) , . . . , L(N ) with L(j) = i=1 Li
(j)
- Li is the loss on the i th security in the j th simulation trial.

(m)
Could then set (why?) ACi = Li where m denotes the VaRα scenario, i.e.
L(m) is the dN (1 − α)eth largest of the N simulated portfolio losses.

Question:
Pn Will this estimator satisfy the additivity property, i.e. will
i AC i = VaRα ?

Question: What is the problem with this approach? Will this problem disappear
if we let N → ∞?

26 (Section 3)
A Third Approach: Kernel Smoothing Monte-Carlo
An alternative approach that resolves the problem with the second approach is to
take a weighted average of the losses in the i th security around the VaRα
scenario.

A convenient way to do this is via a kernel function.

x

In particular, say K (x; h) := K h is a kernel function if it is:
1. Symmetric about zero
2. Takes a maximum at x = 0
3. And is non-negative for all x.

A simple choice is to take the triangle kernel so that


 x 
K (x; h) := max 1 − ,0 .
h

27 (Section 3)
A Third Approach: Kernel Smoothing Monte-Carlo
The kernel estimate of ACi is then given by
 
PN
K L (j)
− ˆ α ; h L(j)
VaR
ker j=1 i
AC
d
i := P   (10)
N ˆ α; h
(j) − VaR
j=1 K L

d α := L(m) with m as defined above.


where VaR

One minor problem with (10) is that the additivity property doesn’t hold. Can
easily correct this by instead setting
 
PN (j) ˆ α ; h L(j)
− VaR
ker j=1 K L i
AC
d
i := VaR

PN   . (11)
ˆ α ; h L(j)
(j) − VaR
j=1 K L

Must choose an appropriate value of smoothing parameter, h.


Can be shown that an optimal choice is to set
h = 2.575 σ N −1/5
where σ = std(L), a quantity that we can easily estimate.
28 (Section 3)
When Losses Are Elliptically Distributed
If L1 , . . . , LN have an elliptical distribution then it may be shown that
Cov (L, Li )
ACi = E [Li ] + (VaRα (L) − E [L]) . (12)
Var (L)
In numerical example below, we assume 10 security returns are elliptically
distributed. In particular, losses satisfy (L1 , . . . , Ln ) ∼ MNn (0, Σ).

Other details include:


1. First eight securities were all positively correlated with one another.
2. Second-to-last security uncorrelated with all other securities.
3. Last security had a correlation of -0.2 with the remaining securities.
4. Long position held on each security.
Estimated VaRα=.99 contributions of the securities displayed in figure below
- last two securities have a negative contribution to total portfolio VaR
- also note how inaccurate the “naive” Monte-Carlo estimator is
- but kernel Monte-Carlo is very accurate!
29 (Section 3)
VaR Contributions By Security
α
2
Analytic
Kernel Monte−Carlo
Naive Monte−Carlo
1.5

1
Contribution

0.5

−0.5

−1
1 2 3 4 5 6 7 8 9 10
Security

30 (Section 3)
IEOR E4602: Quantitative Risk Management
Model Risk

Martin Haugh
Department of Industrial Engineering and Operations Research
Columbia University
Email: [email protected]
Outline

Introduction to Model Risk


E.G: Pricing Barrier Options
E.G: Parameter Uncertainty and Hedging in Black-Scholes
E.G: Calibration and Extrapolation in Short-Rate Models
E.G: Regime Shift in Market Dynamics

Model Transparency
Local Volatility Models
Stochastic Volatility Models
Jump-Diffusion Models and Levy Processes

More on Model Risk and Calibration


Using Many Models to Manage Model Risk
Other Problems With Model Calibration

2 (Section 0)
A Simple Example: Pricing Barrier Options
Barrier options are options whose payoff depends in some way on whether or not
a particular barrier has been crossed before the option expires.

A barrier can be:


A put or a call
A knock-in or knock-out
A digital or vanilla
– so (at least) eight different payoff combinations.

e.g. A knockout put option with strike K , barrier B and maturity T has payoff
 
Knock-Out Put Payoff = max 0, (K − ST ) 1{St ≥B for all t∈[0,T]} .

e.g. A digital down-and-in call option with strike K , barrier B and maturity T
has payoff
 
Digital Down-and-In Call = max 0, 1{mint∈[0,T] St ≤B} × 1{ST ≥K} .

3 (Section 1)
Barrier Options
Knock-in options can be priced from knock-out options and vice versa since a
knock-in plus a knock-out – each with the same strike – is equal to the vanilla
option or digital with the same strike.

Analytic solutions can be computed for European barrier options in the


Black-Scholes framework where the underlying security follows a geometric
Brownian motion (GBM).

Will not bother to derive or present these solutions here, however, since they are
of little use in practice
- this is because the Black-Scholes model is a terrible(!) model for pricing
barrier options.

But can still use GBM to develop intuition and as an example of model risk.

Will concentrate on knock-out put option


- they are traded quite frequently in practice.

4 (Section 1)
Value of a Knockout Put Option Using Black-Scholes GBM

0.9

0.8

0.7

0.6
Option Price

0.5

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40 45 50
Black−Scholes Implied Volatility (%)

S0 = 105, K = 100, Barrier = 85, T = 6 months, r = q = 0


5 (Section 1)
Value of Knockout Put with Corresponding Vanilla Put

4.5
Knockout Put
Vanilla Put
4

3.5
Option Price

2.5

1.5

0.5

0
0 5 10 15 20 25
Black−Scholes Implied Volatility (%)

6 (Section 1)
Barrier Options
So knock-out put option always cheaper than corresponding vanilla put option.

For low values of σ, however, the prices almost coincide. Why?

While the vanilla option is unambiguously increasing in σ the same is not true for
the knock-out option. Why?

Question: What do you think would happen to the value of the knock-out put
option as σ → ∞?

Black-Scholes model is not a good model for pricing barrier options:


It cannot price vanilla options correctly so certainly cannot price barriers.
Suppose there was no market for vanillas but there was a liquid market for
knockout puts. What value of σ should be used to calibrate your BS price to
the market price?
The Black-Scholes Greeks would also be very problematic: what implied
volatility would you use to calculate the Greeks?
In summary, the Black-Scholes model is a disaster when it comes to pricing
barrier options.
7 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
Now consider the use of the Black-Scholes model to hedge a vanilla European
call option in the model.

Will assume that assumptions of Black-Scholes are correct:


Security price has GBM dynamics
Possible to trade continuously at no cost
Borrowing and lending at the risk-free rate are also possible.

Then possible to dynamically replicate payoff of the call option using a


self-financing (s.f.) trading strategy
- initial value of this s.f. strategy is the famous Black-Scholes arbitrage-free
price of the option.
The s.f. replication strategy requires continuous delta-hedging of the option but
of course not practical to do this.

Instead we hedge periodically – this results in some replication error


- but this error goes to 0 as the interval between rebalancing goes to 0.
8 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
Pt denotes time t value of the discrete-time s.f. strategy and C0 denotes initial
value of the option.

The replicating strategy is then satisfies

P0 := C0 (1)

Pti+1 = Pti + (Pti − δti Sti ) r∆t + δti Sti+1 − Sti + qSti ∆t (2)

where:
∆t := ti+1 − ti
r = risk-free interest rate
q is the dividend yield
δti is the Black-Scholes delta at time ti
– a function of Sti and some assumed implied volatility, σimp .

Note that (1) and (2) respect the s.f. condition.

9 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
Stock prices are simulated assuming St ∼ GBM(µ, σ) so that
2

/2)∆t+σ ∆tZ
St+∆t = St e (µ−σ

where Z ∼ N(0, 1).

In the case of a short position in a call option with strike K and maturity T , the
final trading P&L is then defined as

P&L := PT − (ST − K )+ (3)

where PT is the terminal value of the replicating strategy in (2).

In the Black-Scholes world we have σ = σimp and the P&L = 0 along every price
path in the limit as ∆t → 0.

In practice, however, we cannot know σ and so the market (and hence the option
hedger) has no way to ensure a value of σimp such that σ = σimp .

10 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
This has interesting implications for the trading P&L: it means we cannot exactly
replicate the option even if all of the assumptions of Black-Scholes are correct!

In figures on next two slides we display histograms of the P&L in (3) that results
from simulating 100k sample paths of the underlying price process with
S0 = K = $100.

11 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes

6000

5000

4000
# of Paths

3000

2000

1000

0
−8 −6 −4 −2 0 2 4 6 8

Histogram of delta-hedging P&L with true vol. = 30% and implied vol. = 20%.

Option hedger makes substantial loses. Why?


12 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes

8000

7000

6000

5000
# of Paths

4000

3000

2000

1000

0
−8 −6 −4 −2 0 2 4 6 8

Histogram of delta-hedging P&L with true vol. = 30% and implied vol. = 40%.

Option hedger makes substantial gains. Why?


13 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
Clearly then this is a situation where substantial errors in the form of non-zero
hedging P&L’s are made
- and this can only be due to the use of incorrect model parameters.

This example is intended to highlight the importance of not just having a good
model but also having the correct model parameters.

The payoff from delta-hedging an option is in general path-dependent.

Can be shown that the payoff from continuously delta-hedging an option satisfies
Z T 2 2
St ∂ Vt 2
− σt2 dt

P&L = 2
σimp
0 2 ∂S
where Vt is the time t value of the option and σt is the realized instantaneous
volatility at time t.
S2 2
We recognize the term 2t ∂∂SV2t as the dollar gamma
- always positive for a vanilla call or put option.
14 (Section 1)
E.G: Parameter Uncertainty and Hedging in Black-Scholes
Returning to s.f. trading strategy of (1) and (2), note that we can choose any
model we like for the security price dynamics
- e.g. other diffusions or jump-diffusion models.

It is interesting to simulate these alternative models and to then observe what


happens to the replication error from (1) and (2).

It is common to perform numerical experiments like this when using a model to


price and hedge a particular security.

Goal then is to understand how robust the hedging strategy (based on the given
model) is to alternative price dynamics that might prevail in practice.

Given the appropriate data, one can also back-test the performance of a model
on realized historical price data to assess its hedging performance.

15 (Section 1)
E.G: Calibration and Extrapolation in Short-Rate Models


r3,3 
 
 
r2,2 r3,2 
qu    

r1,1 qd r2,1 r3,1 
   
r0,0 r1,0 r2,0 r3,0

t=0 t=1 t=2 t=3 t=4


Binomial models for the short-rate were very popular for pricing fixed income
derivatives in the 1990’s and into the early 2000’s.

Lattice above shows a generic binomial model for the short-rate, rt which is a
1-period risk-free rate.

Risk-neutral probabilities of up- and down-moves in any period are given by qu


and qd = 1 − qu , respectively.

Securities then priced in the lattice using risk-neutral pricing in the usual
backwards evaluation manner.
16 (Section 1)
E.G: Calibration and Extrapolation in Short-Rate Models
e.g. Can compute time t price, ZtT , of a zero-coupon bond (ZCB) maturing at
time T by setting ZTT ≡ 1 and then calculating
 
Bt 1
Zt,j = EQ
t Zt+1 = [qu × Zt+1,j+1 + qd × Zt+1,j ]
Bt+1 1 + rt,j

for t = T − 1, . . . , 0 and j = 0, . . . , t.

More generally, risk-neutral pricing for a “coupon” paying security takes the form
 
Q Zt+1 + Ct+1
Zt,j = Et
1 + rt,j
1
= [qu (Zt+1,j+1 + Ct+1,j+1 ) + qd (Zt+1,j + Ct+1,j )] (4)
1 + rt,j

where Ct,j is the coupon paid at time t and state j, and Zt,j is the “ex-coupon”
value of the security at time t and state j.

17 (Section 1)
E.G: Calibration and Extrapolation in Short-Rate Models
Can iterate (4) to obtain
" t+s
#
Zt X Cj Zt+s
= EQ
t + (5)
Bt B
j=t+1 j
Bt+s

Other securities including caps, floors, swaps and swaptions can all be priced
using (4) and appropriate boundary conditions.

Moreover, when we price securities like in this manner the model is guaranteed to
be arbitrage-free. Why?

18 (Section 1)
The Black-Derman-Toy (BDT) Model
The Black-Derman-Toy (BDT) model assumes the interest rate at node Ni,j is
given by
ri,j = ai e bi j
where log(ai ) and bi are drift and volatility parameters for log(r), respectively.

To use such a model in practice we must first calibrate it to the observed


term-structure in the market and, ideally, other liquid fixed-income security prices
- can do this by choosing the ai ’s and bi ’s to match market prices.

Once the parameters have been calibrated we can now consider using the model
to price less liquid or more exotic securities.

19 (Section 1)
Using BDT to Price a 2 − 8 Payer Swaption
Consider pricing a 2 − 8 payer swaption with fixed rate = 11.65%.

This is:

An option to enter an 8-year swap in 2 years time.


The underlying swap is settled in arrears so payments would take place in
years 3 through 10.
Each floating rate payment in the swap is based on the prevailing short-rate
of the previous year.
The “payer" feature of the option means that if the option is exercised, the
exerciser “pays fixed and receives floating" in the underlying swap.

20 (Section 1)
Using BDT to Price a 2 − 8 Payer Swaption
We use a 10-period BDT lattice so that 1 period corresponds to 1 year.

Lattice was calibrated to the term structure of interest rates in the market
- there are therefore 20 free parameters, ai and bi for i = 0, . . . , 9, to choose
- and want to match 10 spot interest rates, st for t = 1, . . . , 10.

Therefore have 10 equations in 20 unknowns and so the calibration problem has


too many parameters.

We can (and do) resolve this issue by simply setting bi = b = .005 for all i
- which leaves 10 unknown parameters.

Will assume a notional principal for the underlying swaps of $1m.

Let S2 denote swap value at t = 2


- can compute S2 by discounting the swap’s cash-flows back from t = 10 to
t = 2 using risk-neutral probabilities which we take to be qd = qu = 0.5.
21 (Section 1)
Using BDT to Price a 2 − 8 Payer Swaption
Option then exercised at time t = 2 if and only if S2 > 0
- so value of swaption at t = 2 is max(0, S2 ).

Time t = 0 value of swaption can be computed using backwards evaluation


- after calibration we find a swaption price of $1, 339 (when b = .005).

It would be naive(!) in the extreme to assume that this is a fair price for the
swaption
- after all, what was so special about the choice of b = .005?

Suppose instead we chose b = .01.

In that case, after recalibrating to the same term structure of interest rates, we
find a swaption price of $1, 962
- a price increase of approximately 50%!

This price discrepancy should not be at all surprising! Why?


22 (Section 1)
Using BDT to Price a 2 − 8 Payer Swaption
Several important lessons regarding model risk and calibration here:
1. Model transparency is very important. In particular, it is very important to
understand what type of dynamics are implied by a model and, more
importantly in this case, what role each of the parameters plays.
2. Should also be clear that calibration is an intrinsic part of the pricing process
and is most definitely not just an afterthought.
Model selection and model calibration (including the choice of calibration
instruments) should not be separated.
3. When calibration is recognized to be an intrinsic part of the pricing process,
we begin to recognize that pricing is really no more than an exercise in
interpolating / extrapolating from observed market prices to unobserved
market prices.
Since extrapolation is in general difficult, we would have much more
confidence in our pricing if our calibration securities were “close” to the
securities we want to price.
This was not the case with the swaption!
23 (Section 1)
E.G: Regime Shifts: LIBOR Pre- and Post-Crisis
LIBOR calculated on a daily basis:
The British Bankers’ Association (BBA) polls a pre-defined list of banks
with strong credit ratings for their interest rates of various maturities.
Highest and lowest responses are dropped
Average of remainder is taken to be the LIBOR rate.

Understood there was some (very small) credit risk associated with these banks
- so LIBOR would therefore be higher than corresponding rates on
government treasuries.

Since banks that are polled always had strong credit ratings (prior to the crisis)
spread between LIBOR and treasury rates was generally quite small.

Moreover, the pre-defined list of banks regularly updated so that banks whose
credit ratings have deteriorated are replaced with banks with superior credit
ratings.
This had the practical impact of ensuring (or so market participants believed up
until the crisis(!)) that forward LIBOR rates would only have a very modest
degree of credit risk associated with them.
24 (Section 1)
E.G: Regime Shifts: LIBOR Pre- and Post-Crisis
LIBOR extremely important since it’s a benchmark interest rate and many of the
most liquid fixed-income derivative securities are based upon it.

These securities include:


1. Floating rate notes (FRNs)
2. Forward-rate agreements (FRAs)
3. Swaps and (Bermudan) swaptions
4. Caps and floors.

Cash-flows associated with these securities are determined by LIBOR rates of


different tenors.

Before 2008 financial crisis, however, these LIBOR rates were viewed as being
essentially (default) risk-free
- led to many simplifications when it came to the pricing of the
aforementioned securities.

25 (Section 1)
Derivatives Pricing Pre-Crisis
e.g.1. Consider a floating rate note (FRN) with face value 100.

At each time Ti > 0 for i = 1, . . . , M the note pays interest equal to


100 × τi × L(Ti−1 , Ti ) where τi := Ti − Ti−1 and L(Ti−1 , Ti ) is the LIBOR
rate at time Ti−1 for borrowing / lending until the maturity Ti .

Note expires at time T = TM and pays 100 (in addition to the interest) at that
time.

A well known and important result is that the fair value of the FRN at any reset
point just after the interest has been paid, is 100
- follows by a simple induction argument.

26 (Section 1)
Derivatives Pricing Pre-Crisis
e.g.2. The forward LIBOR rate at time t based on simple interest for lending in
the interval [T1 , T2 ] is given by
 
1 P(t, T1 )
L(t, T1 , T2 ) = −1 (6)
T2 − T1 P(t, T2 )

where P(t, T ) is the time t price of a deposit maturing at time T .

Spot LIBOR rates are obtained by setting T1 = t.

LIBOR rates are quoted as simply-compounded interest rates, and are quoted on
an annual basis.

The accrual period or tenor, T2 − T1 , usually fixed at δ = 1/4 or δ = 1/2


- corresponding to 3 months and 6 months, respectively.

27 (Section 1)
Derivatives Pricing Pre-Crisis
With a fixed value of δ in mind can define the δ-year forward rate at time t with
maturity T as  
1 P(t, T )
L(t, T , T + δ) = −1 . (7)
δ P(t, T + δ)
The δ-year spot LIBOR rate at time t then given by L(t, t + δ) := L(t, t, t + δ).

Also note that L(t, T , T + δ) is the FRA rate at time t for the swap(let)
maturing at time T + δ.

That is, L(t, T , T + δ) is unique value of K for which the swaplet that pays
±(L(T , T + δ) − K ) at time T + δ is worth zero at time t < T .

28 (Section 1)
Derivatives Pricing Pre-Crisis
e.g.3. We can compound two consecutive 3-month forward LIBOR rates to
obtain corresponding 6-month forward LIBOR rate.

In particular, we have

F13m F 3m F 6m
  
1+ 1+ 2 =1+ (8)
4 4 2

where:
F13m := L(0, 3m, 6m)
F13m := L(0, 6m, 9m)
F 6m := L(0, 3m, 9m).

29 (Section 1)
During the Crisis
All three pricing results broke down during the 2008 financial crisis.

Because these results were also required for pricing of swaps and swaptions, caps
and floors, etc., the entire approach to the pricing of fixed income derivative
securities needed broke down.

Cause of this breakdown was the loss of trust in the banking system and the loss
of trust between banks.

This meant that LIBOR rates were no longer viewed as being risk-free.

Easiest way to demonstrate this loss of trust is via the spread between LIBOR
and OIS rates.

30 (Section 1)
Overnight Indexed Swaps (OIS)
An overnight indexed swap (OIS) is an interest-rate swap where the periodic
floating payment is based on a return calculated from the daily compounding of
an overnight rate (or index).
e.g. Fed Funds rate in the U.S., EONIA in the Eurozone and SONIA in the
U.K.
The fixed rate in the swap is the fixed rate the makes the swap worth zero at
inception.

Note there is essentially no credit / default risk premium included in OIS rates
- due to fact that floating payments in the swap are based on overnight
lending rates.

31 (Section 1)
Overnight Indexed Swaps (OIS)
We see that the LIBOR-OIS spreads were essentially zero leading up to the
financial crisis
- because market viewed LIBOR rates as being essentially risk-free with no
associated credit risk.

This changed drastically during the crisis when entire banking system nearly
collapsed and market participants realized there were substantial credit risks in
the interbank lending market.

Since the crisis the spreads have not returned to zero and must now be
accounted for in all fixed-income pricing models.

This regime switch constituted an extreme form of model risk where the entire
market made an assumption that resulted in all pricing models being hopelessly
inadequate!

33 (Section 1)
Derivatives Pricing Post-Crisis
Since the financial crisis we no longer take LIBOR rates to be risk-free.

And risk-free curve is now computed from OIS rates


- so (for example) we no longer obtain the result that an FRN is always worth
par at maturity.

OIS forward rates (but not LIBOR rates!) are calculated as in (6) or (7) so that
 
1 Pd (t, T1 )
Fd (t, T1 , T2 ) = −1
T2 − T1 Pd (t, T2 )

where Pd denotes the discount factor computed from the OIS curve and Fd
denotes forward rates implied by these OIS discount factors.

34 (Section 1)
Derivatives Pricing Post-Crisis
Forward LIBOR rates are now defined as risk-neutral expectations (under the
forward measure) of the spot LIBOR rate, L(T , T + δ).

Relationships such as (8) no longer hold and we now live in a multi-curve world
- with a different LIBOR curve for different tenors
- e.g. we have a 3-month LIBOR curve, a 6-month LIBOR curve etc.

Question: Which if any of the 3-month and 6-month LIBOR curves will be
lower? Why?

With these new definitions, straightforward extensions of the traditional pricing


formulas (that held pre-crisis) can be used to price swaps, caps, floors, swaptions
etc. in the post-crisis world.

35 (Section 1)
Model Transparency
Will now consider some well known models in equity derivatives space
- these models (or slight variations of them) also be used in foreign exchange
and commodity derivatives markets.

Can be useful when trading in exotic derivative securities and not just the
“vanilla" securities for which prices are readily available.

If these models can be calibrated to observable vanilla security prices, they can
then be used to construct a range of plausible prices for more exotic securities
- so they can help counter extrapolation risk
- and provide alternative estimates of the Greeks or hedge ratios.

Will not focus on stochastic calculus or various numerical pricing techniques here.

Instead simply want to emphasize that a broad class of tractable models exist and
that they should be employed when necessary.

36 (Section 2)
Model Transparency
Very important for the users of these models to fully understand their various
strengths and weaknesses
- and the implications of these strengths and weaknesses when they are used
to price and risk manage a given security.

Will see later how these models and others can be used together to infer prices of
exotic securities as well as their Greeks or hedge ratios.

In particular will emphasize how they can be used to avoid the pitfalls associated
with price extrapolation.

Any risk manager or investor in exotic securities should therefore maintain a


library of such models that can be called upon as needed.

37 (Section 2)
Recalling the Implied Volatility Surface
Recall the GBM model:

dSt = µSt dt + σSt dWt .

When we use risk-neutral pricing we know that µ = r − q.

Therefore have a single free parameter, σ, which we can fit to option prices or,
equivalently, the volatility surface.

Not all surprising then that this exercise fails: the volatility surface is never flat so
that a constant σ fails to re-produce market prices.

This became particularly apparent after the stock market crash of October 1987
when market participants began to correctly identify that lower strike options
should be priced with a higher volatility, i.e. there should be a volatility skew.

The volatility surface of the Eurostoxx 50 Index on 28th November 2007 is


plotted in Figure 39.

38 (Section 2)
The Implied Volatility Surface

The Eurostoxx 50 Volatility Surface.

39 (Section 2)
Local Volatility Models
The local volatility framework assumes risk-neutral dynamics satisfy
dSt = (r − q)St dt + σl (t, St )St dWt (9)
– so σl (t, St ) is now a function of time and stock price.

Key result is the Dupire formula that links the local volatilities, σl (t, St ), to the
implied volatility surface:
∂C ∂C
+ (r − q)K + qC
σl2 (T , K ) = ∂T
K 2 ∂2C
∂K
(10)
2 ∂K 2

where C = C (K , T ) = option price as a function of strike and time-to-maturity.

Calculating local volatilities from (10) is difficult and numerically unstable.

Local volatility is very nice and interesting


- a model that is guaranteed to replicate the implied volatility surface in the
market.

But it also has weaknesses ...


40 (Section 2)
Local Volatility Models
For example, it leads to unreasonable skew dynamics and underestimates the
volatility of volatility or “vol-of-vol".

Moreover the Greeks that are calculated from a local volatility model are
generally not consistent with what is observed empirically.

Understanding these weaknesses is essential from a risk management point of


view.

Nevertheless, local volatility framework is theoretically interesting and is still


often used in practice for pricing certain types of exotic options such as barrier
and digital options.

They are known to be particularly unsuitable for pricing derivatives that depend
on the forward skew such as forward-start options and cliquets.

41 (Section 2)
Local Volatility

110 300

100 250
Implied Vol (%)

Local Vol (%)


90 200
80 150
70
100
60
50 50
50
50
1.5 0
1.5
1 100 100
1
0.5 0.5
150
0 150 Index Level 0 Index Level
Time−To−Maturity (Years) Time−To−Maturity (Years)

(a) Implied Volatility Surface (b) Local Volatility Surface

Implied and local volatility surfaces: local volatility surface is constructed from
implied volatility surface using Dupire’s formula.

42 (Section 2)
Stochastic Volatility Models
Most well-known stochastic volatility model is due to Heston (1989).

It’s a two-factor model and assumes separate dynamics for both the stock price
and instantaneous volatility so that
√ (s)
dSt = (r − q)St dt + σt St dWt (11)
√ (vol)
dσt = κ (θ − σt ) dt + γ σt dWt (12)
(s) (vol)
where Wt and Wt are standard Q-Brownian motions with constant
correlation coefficient, ρ.

Heston’s stochastic volatility model is an incomplete model


- why?

The particular EMM that we choose to work with would be determined by some
calibration algorithm
- the typical method of choosing an EMM in incomplete market models.

The volatility process in (12) is commonly used in interest rate modeling


- where it’s known as the CIR model.
43 (Section 2)
Stochastic Volatility Models
The price, C (t, St , σt ), of any derivative security under Heston must satisfy

∂C 1 2 ∂ 2 C ∂2C 1 2 ∂2C ∂C ∂C
+ σS + ρσγS + γ σ 2 + (r − q)S + κ (θ − σ) = rC .
∂t 2 ∂S 2 ∂S∂σ 2 ∂σ ∂S ∂σ
(13)

Price then obtained by solving (13) subject to the relevant boundary conditions.

Once parameters of the model have been identified via some calibration
algorithm, pricing can be done by either solving (13) numerically or alternatively,
using Monte-Carlo.

Note that some instruments can be priced analytically in Heston’s model


- e.g. continuous-time version of a variance swap.

Heston generally captures long-dated skew quite well but struggles with
short-dated skew, particularly when it is steep.
44 (Section 2)
Stochastic Volatility Models

40

Implied Volatility (%)


35

30

25

20

15
2.5 50
2
1.5 100
1
0.5
0 150
Strike
Time−to−Maturity

Figure displays an implied volatility surface under Heston’s stochastic volatility


model.

45 (Section 2)
A Detour: Forward Skew
Forward skew is the implied volatility skew that prevails at some future date and
that is consistent with some model that has been calibrated to today’s market
data.

e.g. Suppose we simulate some model that has been calibrated to today’s
volatility surface forward to some date T > 0.

On any simulated path can then compute the implied volatility surface as of that
date T .

This is the date T forward implied volatility surface


- forward skew refers to general shape of the skew in this forward vol. surface
- note that this forward skew is model dependent.

46 (Section 2)
Forward Skew and Local Volatility
Well known that forward skew in local volatility models tends to be very flat.

A significant feature of local volatility models that is not at all obvious until one
explores and works with the model in some detail.

Not true of stochastic volatility or jump diffusion models


- where forward skew is generally similar in shape to the spot skew.

These observations have very important implications for pricing cliquet-style


securities.

The failure of market participants to understand the properties of their models


has often led to substantial trading losses
- such losses might have been avoided had they priced the securities in
question with several different models.

47 (Section 2)
Example: The Locally Capped, Globally Floored Cliquet
The locally capped, globally floored cliquet is structured like a bond:
Investor pays the bond price upfront at t = 0.
In return he is guaranteed to receive the principal at maturity as well as an
annual coupon.
The coupon is based on monthly returns of underlying security over previous
year. It is calculated as
( 12 )
X
Payoff = max min (max(rt , −.01), .01) , MinCoupon (14)
t=1

where MinCoupon = .02 and each monthly return satisfies


St − St−1
rt = .
St−1

Annual coupon therefore capped at 12% and floored at 2%.


Ignoring global floor, coupon behaves like a strip of forward-starting monthly
call-spreads.
48 (Section 2)
Example: The Locally Capped, Globally Floored Cliquet
A call spread is a long position in a call option with strike k1 and a short position
in a call option with strike k2 > k1 .

A call spread is sensitive to the implied volatility skew.

Would therefore expect coupon value to be very sensitive to the forward skew in
the model.

In particular, would expect coupon value to increase as the skew becomes more
negative.

So would expect a local volatility model to underestimate the price of this


security.

In the words of Gatheral (2006):


“We would guess that the structure should be very sensitive to forward skew
assumptions. Thus our prediction would be that a local volatility assumption
would substantially underprice the deal because it generates forward skews that
are too flat . . .”
49 (Section 2)
Example: The Locally Capped, Globally Floored Cliquet
And indeed this intuition can be confirmed.

In fact numerical experiments suggest that a stochastic volatility model places a


much higher value on this bond than a comparable local volatility model.

But our intuition is not always correct!

So important to evaluate exotic securities with which we are not familiar under
different modeling assumptions.

And any mistakes are likely to be costly!

In the words of Gatheral (2006) again:


“ . . . since the lowest price invariably gets the deal, it was precisely those
traders that were using the wrong model that got the business . . .
The importance of trying out different modeling assumptions cannot be
overemphasized. Intuition is always fallible!”

50 (Section 2)
Jump-Diffusion Models
Merton’s jump diffusion model assumes that time t stock price satisfies
Nt
2 Y
/2)t+σWt
St = S0 e (µ−σ Yi (15)
i=1

where Nt ∼ Poisson(λt) and Yi ’s ∼ log-normal and IID.

Each Yi represents the magnitude of the i th jump and stock price behaves like a
regular GBM between jumps.

If the dynamics in (15) are under an EMM, Q, then µ, λ and the mean jump size
are constrained in such a way that Q-expected rate of return must equal r − q.

Question: Can you see how using (15) to price European options might lead to
an infinitely weighted sum of Black-Scholes options prices?

Other tractable jump-diffusion models are due to Duffie, Pan and Singleton
(1998), Kou (2002), Bates (1996) etc.
51 (Section 2)
Merton’s Jump-Diffusion Model

140

Implied Volatility (%)


120

100

80

60

40

20
150
1
0.8
100 0.6
0.4
0.2
50 0
Strike Time−to−Maturity

An implied volatility surface under Merton’s jump diffusion model


- any observations?

52 (Section 2)
Levy Processes
Definition: A Levy process is any continuous-time process with stationary and
independent increments.

Definition: An exponential Levy process, St , satisfies St = exp(Xt ) where Xt is a


Levy process.

The two most common examples of Levy processes are Brownian motion and
the Poisson process.
A Levy process with jumps can be of infinite activity so that it jumps
infinitely often in any finite time interval, or of finite activity so that it
makes only finitely many jumps in any finite time interval.
The Merton jump-diffusion model is an example of an exponential Levy
process of finite activity.
The most important result for Levy processes is the Levy-Khintchine formula
which describes the characteristic function of a Levy process.

53 (Section 2)
Time-Changed Exponential Levy Processes
Levy processes cannot capture “volatility clustering", the tendency of high
volatility periods to be followed by periods of high volatility, and low volatility
periods to be followed by periods of low volatility.

This is due to the stationary and independent increments assumption.

Levy models with stochastic time, however, can capture volatility clustering.

Definition: A Levy subordinator is a non-negative, non-decreasing Levy process.

A subordinator can be used to change the “clock speed" or “calendar speed".

More generally, if yt is a positive process, then we can define our stochastic


clock, Yt , as Z t
Yt := ys ds. (16)
0
Using Yt to measure time instead of the usual t, we can then model a security
price process, St , as an exponential time-changed Levy process.
54 (Section 2)
Time-Changed Exponential Levy Processes
Can write the Q-dynamics for St as
e XYt
St = S0 e (r−q)t . (17)
EQ
 XYt
0 e |y0
Volatility clustering captured using the stochastic clock, Yt .

e.g. Between t and t + ∆t time-changed process will have moved approximately


yt × ∆t units.
So when yt large the clock will have moved further and increments of St will
generally be more volatile as a result.

Note that if the subordinator, Yt , is a jump process, then St can jump even if the
process Xt cannot jump itself.

Note that in (17) the dynamics of St are Q-dynamics corresponding to the cash
account as numeraire. Why is this the case?
- typical of how incomplete markets are often modeled: we directly specify
Q-dynamics so that martingale pricing holds by construction
- other free parameters are then chosen by a calibration algorithm.
55 (Section 2)
Normal-Inverse-Gaussian Process with CIR Clock

3600

3400

3200

3000

2800
Price

2600

2400

2200

2000

1800

1600
0 0.5 1 1.5 2 2.5 3

Time
56 (Section 2)
Variance-Gamma Process with OU-Gamma Clock

3600

3400

3200

3000

2800
Price

2600

2400

2200

2000

1800

1600
0 0.5 1 1.5 2 2.5 3
Time

57 (Section 2)
Time-Changed Exponential Levy Processes
Many Levy processes are quite straightforward to simulate.

However, the characteristic functions of the log-stock price are often available in
closed form, even when the Levy process has a stochastic clock.

In fact it can be shown that


h i
φ(u, t) := E e iu log(St )
| S0 , y 0
ϕ(−iψx (u); t, y0 )
= e iu((r−q)t+log(S0 )) (18)
ϕ(−iψx (−i); t, y0 )iu
where
ψx (u) := log E e iuX1
 
(19)
is the characteristic exponent of the Levy process and ϕ(u; t, y0 ) is the
characteristic function of Yt given y0 .

Therefore if we know the characteristic function of the integrated subordinator,


Yt , and the characteristic function of the Levy process, Xt , then we also know
the characteristic function of the log-stock price
- and (vanilla) options can be priced using numerical transform methods.
58 (Section 2)
Calibration is an Integral part of the Pricing Process
When we used the BDT model to price swaptions,we saw the calibration process
in an integral component of the pricing process.

Indeed, in early 2000’s there was debate about precisely this issue.

Context for the debate was general approach in the market to use simple
one-factor models to price Bermudan swaptions.

Longstaff, Santa-Clara and Schwartz (2001) argued that since the term-structure
of interest rates was driven by several factors, using a one-factor model to price
Bermudan swaptions was fundamentally flawed.

This argument was sound ... but they neglected to account for the calibration
process.

In response Andersen and Andreasen (2001) argued the Bermudan swaption


prices were actually quite accurate even though they were indeed typically priced
at the time using simple one-factor models.

59 (Section 3)
Calibration is an Integral part of the Pricing Process
Their analysis relied on the fact that when pricing Bermudan swaptions it was
common to calibrate the one-factor models to the prices of European
swaptions.

On the basis that Bermudan swaptions are actually quite “close” to European
swaptions, they argued the extrapolation risk was small.

This debate clearly tackled the issue of model transparency and highlighted
that model dynamics may not be important at all if the exotic security being
priced is within or close to the span of the securities used to calibrate the model.

Essentially two wrongs, i.e. a bad model and bad parameters, can together make
a right, i.e. an accurate price!

60 (Section 3)
Using Many Models to Manage Model Risk
The models we have described are quite representative of the models that are
used for pricing many equity and FX derivative securities in practice.

Therefore very important for users of these models to fully understand their
strengths and weaknesses and the implications of these strengths and
weaknesses when used to price and risk manage a given security.

Will see how these models and others can be used together to infer the prices of
exotic securities as well as their Greeks or hedge ratios.

In particular we will emphasize how they can be used to avoid the pitfalls
associated with price extrapolation.

Any risk manager or investor in exotic securities would therefore be well-advised


to maintain a library of such models that can be called upon as needed.

61 (Section 3)
Another Example: Back to Barrier Options
Well known that the price of a barrier option is not solely determined by the
marginal distributions of the underlying stock price.

Figures on next 2 slides emphasizes this point:


They display prices of down-and-out (DOB) and up-and-out (UOB) barrier
call options as a function of the barrier level for several different models.
In each case strike was set equal to the initial value of underlying security.
Parameter values for all models were calibrated to implied volatility surface
of the Eurostoxx 50 Index on October 7th , 2003.
So their marginal distributions coincide, at least to the extent that
calibration errors are small.

62 (Section 3)
Down-and-Out Barrier Call Prices Under Different Models
550

500

Down-and-Out Barrier Call Price


450

400
Hest
350 Hest-J
BNS
300 VG-OUG
VG-CIR
250
NIG-CIR
200 NIG-OUG

150
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Barrier (% of Spot)

Down-and-out (DOB) barrier call option prices for different models, all of which
have been calibrated to the same implied volatility surface.
See “A Perfect Calibration! Now What?” by Schoutens, Simons and Tistaert
(2003) for further details.
63 (Section 3)
Up-and-Out Barrier Call Prices Under Different Models
200

180
Hest
160 Hest-J

Up-and-Out Barrier Call Price


BNS
140
VG-OUG
120
VG-CIR
100 NIG-CIR
80
NIG-OUG

60

40

20

0
1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
Barrier (% of Spot)

Up-and-out (UOB) barrier call option prices for different models, all of which
have been calibrated to the same implied volatility surface.
See “A Perfect Calibration! Now What?” by Schoutens, Simons and Tistaert
(2003) for further details.
64 (Section 3)
Barrier Options and Extrapolation Risk
Clear that the different models result in very different barrier prices.

Question therefore arises as to what model one should use in practice?


- a difficult question to answer!

Perhaps best solution is to price the barrier using several different models that:
(a) have been calibrated to the market prices of liquid securities and
(b) have reasonable dynamics that make sense from a modeling viewpoint.

The minimum and maximum of these prices could then be taken as the bid-offer
prices if they are not too far apart.

If they are far apart, then they simply provide guidance on where the fair price
might be.

Using many plausible models is perhaps the best way to avoid extrapolation risk.

65 (Section 3)
Model Calibration Risk
Have to be very careful when calibrating models to market prices!

In general (and this is certainly the case with equity derivatives markets) the
most liquid instruments are vanilla call and put options.

Assuming then that we have a volatility surface available to us, we can at the
very least calibrate our model to this surface.

An obvious but potentially hazardous approach would be to solve


N
X 2
min ωi (ModelPricei (γ) − MarketPricei ) (20)
γ
i=1

where ModelPricei and MarketPricei are model and market prices, respectively,
of the i th option used in the calibration.

The ωi ’s are fixed weights that we choose to reflect either the importance or
accuracy of the i th observation and γ is the vector of model parameters.

There are many problems with performing a calibration in this manner!


66 (Section 3)
Problems with Calibrating Using (20)
1. In general (20) is a non-linear and non-convex optimization problem. It may
therefore be difficult to solve as there may be many local minima.

2. Even if there is only one local minimum, there may be “valleys" containing
the local minimum in which the objective function is more or less flat.

Then possible that the optimization routine will terminate at different points
in the valley even when given the same input, i.e. market option prices
- this is clearly unsatisfactory!
Consider following scenario:

Day 1: Model is calibrated to option prices and resulting calibrated model is


then used to price some path-dependent security. Let P1 be the price of this
path-dependent security.

Day 2: Market environment is unchanged. The calibration routine is rerun,


the path-dependent option is priced again and this time its price is P2 . But
now P2 is very different from P1 , despite the fact that the market hasn’t
changed. What has happened?
67 (Section 3)
Problems with Calibrating Using (20)
Recall that the implied volatility surface only determines the marginal
distributions of the stock prices at different times, T .

It tells you nothing(!) about the joint distributions of the stock price at different
times.

Therefore, when we are calibrating to the volatility surface, we are only


determining the model dynamics up-to the marginal distributions.

All of the parameter combinations in the “valley" might result in the very similar
marginal distributions, but they will often result in very different joint
distributions.

This was demonstrated when we saw how different models, that had been
calibrated to the same volatility surface, gave very different prices for
down-and-out call prices.

Also saw similar results when we priced interest-rate swaptions.

68 (Section 3)

You might also like