Market Risk Measurement and Management PDF

Financial Risk Manager
EXAM PART II
Market Risk Measurement
and Management
<S>GARP
2020
EXAM PART II
Market Risk Measurement
and Management
Pearson
Excerpts taken from:
O ptions, Futures, and O th er D erivatives, Tenth Edition by John C . Hull

Copyright © 2017, 2015, 2012, 2009, 2006, 2003, 2000 by Pearson Education, Inc.
New York, New York 10013
Copyright © 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012 by Pearson Learning Solutions
All rights reserved.
This copyright covers material written expressly for this volum e by the editor/s as well as the compilation itself.
It does not cover the individual selections herein that first appeared elsew here. Permission to reprint these
has been obtained by Pearson Learning Solutions for this edition only. Further reproduction by any means,
electronic or m echanical, including photocopying and recording, or by any information storage or retrieval
system , must be arranged with the individual copyright holders noted.
Grateful acknowledgment is made to the following sources for permission to reprint

material copyrighted or controlled by them:
"Estim ating M arket Risk M easures," "N on-Param etric A p p ro ach es," and "Param etric Approaches (III): Extrem e
Value" by Kevin Dowd, reprinted from M easuring M arket Risk, Second Edition (2005), by permission of John
W iley & Sons, Inc.
"B ack Testing VaR" and "VaR M apping," by Philippe Jo rio n, reprinted from Value at Risk: The N ew Benchm ark
for M anaging Financial Risk, Third Edition (2007), by permission of The M cGraw Hill Com panies.
"M essages from the A cadem ic Literature on Risk M easurem ent for the Trading Book," Working Paper No. 19,
reprinted by permission of the Basel Com m ittee on Banking Supervision, January 2011.
"Som e Correlation Basics: Definitions, Applications, and Term inology," "Em pirical Properties of Correlation:
How Do Correlations Behave in the Real W orld?," and "Financial Correlation M odeling— Bottom Up
A pp ro aches," by G unter Meissner, reprinted from Correlation Risk M o d elin g and M anagem ent, Second Edition
(2019), by permission of Risk Books/InfoPro Digital Services, Ltd.
"Em pirical Approaches to Risk Metrics and H ed ges," "The Science of Term Structure M odels," "The Evolution
of Short Rates and the Shape of the Term Structure," "The A rt of Term Structure M odels: D rift," and "The A rt
of Term Structure M odels: Volatility and D istribution," by Bruce Tuckman and Angel Serrat, reprinted from Fixed
Incom e Secu rities: Tools for Today's M arkets, Third Edition (2012), by permission of John W iley & Sons, Inc.
"Fundam ental Review of the Trading Book," by John C . Hull, reprinted from Risk M anagem ent and Financial
Institutions, Fifth Edition (2018), by permission of John W iley & Sons, Inc.
Learning O bjectives provided by the Global Association of Risk Professionals.
All tradem arks, service marks, registered tradem arks, and registered service marks are the property of their
respective owners and are used herein for identification purposes only.
Pearson Education, Inc., 330 Hudson Street, New York, New York 10013
A Pearson Education Com pany
w w w .pearsoned.com
Printed in the United States of Am erica
ScoutAutomatedPrintCode
000200010272205728
EEB /K W
ISBN 10: 0135965896

ISBN 13: 9780135965894
C o n ten ts
1.6 The Core Issues:An Overview 13

Chapter Estimating Market
1.7 Appendix 13
Risk Measures 1
Preliminary Data Analysis 13
Plotting the Data and Evaluating
Summary Statistics 14
1.1 Data 2
QQ Plots 14
Profit/Loss Data 2
Loss/Profit Data 2
Arithmetic Return Data 2 Chapter 2 Non-Parametric
Geometric Return Data 2
Approaches 17
1.2 Estimating Historical
Simulation VaR 3
2.1 Compiling Historical
1.3 Estimating Parametric VaR 4
Simulation Data 18
Estimating VaR with Normally Distributed
Profits/Losses 4 2.2 Estimation of Historical
Estimating VaR with Normally Distributed Simulation VaR and ES 19
Arithmetic Returns 5 Basic Historical Simulation 19
Estimating Lognormal VaR 6 Bootstrapped Historical Simulation 19
1.4 Estimating Coherent Risk Measures 7 Historical Simulation Using
Non-parametric Density Estimation 19
Estimating Expected Shortfall 7
Estimating Curves and Surfaces for
Estimating Coherent Risk Measures 8 VaR and ES 21
1.5 Estimating the Standard Errors 2.3 Estimating Confidence Intervals
of Risk Measure Estimators 10 for Historical Simulation VaR and ES 21
Standard Errors of Quantile Estimators 10 An Order Statistics Approach to the
Standard Errors in Estimators of Estimation of Confidence Intervals for
Coherent Risk Measures 12 HS VaR and ES 22
A Bootstrap Approach to the Estimation
of Confidence Intervals for HS VaR and ES 22 Chapter 4 Backtesting VaR 49
2.4 Weighted Historical
Simulation 23
4.1 Setup for Backtesting 50
Age-weighted Historical Simulation 24
An Example 50
Volatility-weighted Historical Simulation 25
Which Return? 50
Correlation-weighted Historical
Simulation 26 4.2 Model Backtesting with
Filtered Historical Simulation 26 Exceptions 51
Model Verification Basedon Failure Rates 51
2.5 Advantages and Disadvantages
of Non-Parametric Methods 28 The Basel Rules 54
Advantages 28 Conditional CoverageModels 55
Disadvantages 28 Extensions 56
4.3 Applications 56
4.4 Conclusions 57
Chapter 3 Parametric
Approaches (II):
Extreme Value 35 Chapter 5 VaR Mapping 59
3.1 Generalised Extreme-Value 5.1 Mapping for Risk Measurement 60

Theory 36 Why Mapping? 60
Theory 36 Mapping as a Solution to Data
A Short-Cut EV Method 39 Problems 60
Estimation of EV Parameters 39 The Mapping Process 61
General and Specific Risk 62
3.2 The Peaks-Over-Threshold
Approach: The Generalised Pareto 5.2 Mapping Fixed-Income
Distribution 43 Portfolios 63
Theory 43 Mapping Approaches 63
Estimation 45 Stress Test 64
GEV vs POT 45 Benchmarking 64
5.3 Mapping Linear Derivatives 66
3.3 Refinements to EV
Forward Contracts 66
Approaches 46
Commodity Forwards 67
Conditional EV 46
Forward Rate Agreements 68
Dealing with Dependent
(or Non-iid) Data 46 Interest-Rate Swaps 69
Multivariate EVT 47 5.4 Mapping Options 70
3.4 Conclusions 47 5.5 Conclusions 72
iv ■ Contents
6.6 Unified Versus Compartmentalised
Chapter 6 Messages from the Risk Measurement 89
Academic Literature Overview 89
on Risk Management Aggregation of Risk: Diversification
for the Trading versus Compounding Effects 90
Papers Using the "Bottom-Up" Approach 91
Book 73
Papers Using the "Top-Down" Approach 94
Conclusions 95
6.1 Introduction 74 6.7 Risk Management and
6.2 Selected Lessons on VaR Value-at-Risk in aSystemicContext 95
Implementation 74 Overview 95
Overview 74 Intermediation, Leverage and
Value-at-Risk: EmpiricalEvidence 96
Time Horizon for Regulatory VaR 74
What Has All This to Do with VaR-Based
Time-Varying Volatility in VaR 76
Regulation? 97
Backtesting VaR Models 77
Conclusions 98
Conclusions 78
6.3 Incorporating Liquidity 78
Overview 78
Exogenous Liquidity 79 Chapter 7 Correlation Basics:
Endogenous Liquidity: Motivation 79 Definitions,
Endogenous Liquidity and Market Applications, and
Risk for Trading Portfolios 80
Adjusting the VaR Time Horizon to
Terminology 105
Account for Liquidity Risk 81
Conclusions 81
7.1 A Short History of Correlation 106
6.4 Risk Measures 82
7.2 What Are Financial
Overview 82
Correlations? 106
VaR 82
Expected Shortfall 84 7.3 What Is Financial Correlation
Spectral Risk Measures 85 Risk? 106
Other Risk Measures 86 7.4 Motivation: Correlations and
Conclusions 86 Correlation Risk Are Everywhere
6.5 Stress Testing Practices for in Finance 108
Market Risk 87 Investments and Correlation 108
Overview 87 7.5 Trading and Correlation 109
Incorporating Stress Testing into Risk Management and Correlation 112
Market-Risk Modelling 87 The Global Financial Crises 2007 to
Stressed VaR 88 2009 and Correlation 113
Conclusions 89 Regulation and Correlation 116
Contents ■ v
7.6 How Does Correlation Risk Fit
into the Broader Picture of Risks Chapter 9 Financial Correlation
in Finance? 116 Modeling—Bottom-U p
Correlation Risk and Market Risk 117 Approaches 133
Correlation Risk and Credit Risk 117
7.7 Correlation Risk and
9.1 Copula Correlations 134
Systemic Risk 119
The Gaussian Copula 134
7.8 Correlation Risk and Simulating the Correlated Default Time for
Concentration Risk 119 Multiple Assets 137
7.9 A Word on Terminology 121
Chapter 10 Empirical Approaches

Chapter 8 Empirical Properties to Risk Metrics and
of Correlation: How Hedging 139
Do Correlations
Behave in the Real
World? 125 10.1 Single-Variable Regression-Based
Hedging 140
Least-Squares Regression Analysis 141
8.1 How Do Equity Correlations Behave The Regression Hedge 142
in a Recession, Normal Economic The Stability of Regression Coefficients
Period or Strong Expansion? 126 over Time 143
8.2 Do Equity Correlations Exhibit 10.2 Two-Variable

Mean Reversion? 128 Regression-Based Hedging 144
How Can We Quantify MeanReversion? 128 10.3 Level Versus Change
8.3 Do Equity Correlations Exhibit Regressions 146
Autocorrelation? 129 10.4 Principal Components
8.4 How are Equity Correlations Analysis 146
Distributed? 130 Overview 146
PCAs for USD Swap Rates 147
8.5 Is Equity Correlation Volatility
Hedging with PCA and an Application
an Indicator for Future Recessions? 130 to Butterfly Weights 149
8.6 Properties of Bond Correlations Principal Component Analysis of EUR,
and Default Probability GBP, and JPY Swap Rates 150
Correlations 131 The Shape of PCs over Time 150
vi ■ Contents
Chapter 11 The Science of Chapter 13 The Art of Term
Term Structure Structure Models:
Models 155 Drift 175
11.1 Rate and Price Trees 156 13.1 Model 1: Normally Distributed
Rates and No Drift 176
11.2 Arbitrage Pricing of
Derivatives 157 13.2 Model 2: Drift and Risk
Premium 178
11.3 Risk-Neutral Pricing 158
13.3 The Ho-Lee Model:
11.4 Arbitrage Pricing in a
Time-Dependent Drift 179
Multi-Period Setting 159
13.4 Desirability of Fitting to
11.5 Example: Pricing a
the Term Structure 180
Constant-Maturity Treasury
Swap 161 13.5 The Vasicek Model: Mean
Reversion 181
11.6 Option-Adjusted Spread 162
11.7 Profit and Loss Attribution
with an OAS 162
Chapter 14 The Art of Term
11.8 Reducing the Time Step 163
Structure Models:
11.9 Fixed Income Versus Volatility and
Equity Derivatives 164 Distribution 187
14.1 Time-Dependent Volatility:

Chapter 12 The Evolution Model 3 188
of Short Rates
14.2 The Cox-Ingersoll-Ross and
and the Shape Lognormal Models: Volatility as a
of the Term Function of the Short Rate 189
Structure 167 14.3 Tree for the Original Salomon
Brothers Model 190
12.1 Introduction 168 14.4 The Black-Karasinski Model: a
Lognormal Model with Mean
12.2 Expectations 168 Reversion 191
12.3 Volatility and Convexity 169 14.5 Appendix 191
12.4 Risk Premium 171 Closed-Form Solutions for Spot Rates 191
Contents ■ vii
Chapter 1! Volatility Smiles 193 Chapter 16 Fundamental
Review of the
Trading Book 203
15.1 Why the Volatility Smile Is the
Same for Calls and Puts 194
15.2 Foreign Currency Options 195 16.1 Background 204
Empirical Results 195 16.2 Standardized Approach 205
Reasons for the Smile in Foreign Term Structures 206
Currency Options 196
Curvature Risk Charge 206
15.3 Equity Options 196 Default Risk Charge 207
The Reason for the Smile in Equity Residual Risk Add-On 207
Options 197
A Simplified Approach 207
15.4 Alternative Ways of 16.3 Internal Models Approach 207
Characterizing the Volatility Smile 198
Back-Testing 208
15.5 The Volatility Term Structure Profit and Loss Attribution 209
and Volatility Surfaces 198 Credit Risk 209
15.6 Minimum Variance Delta 199 Securitizations 209
15.7 The Role of the Model 199 16.4 Trading book vs. Banking
book 209
15.8 When a Single Large Jump Is
Anticipated 199 Index 211
•••
V III Contents
Chairman
Dr. Rene Stulz
Everett D. Reese Chair of Banking and M onetary Econom ics,
The Ohio State University
Members
Richard Apostolik Dr. Attilio Meucci, CFA
President and C E O , Global Association of Risk Professionals Founder, ARPM
Michelle McCarthy Beck, SMD Dr. Victor Ng, CFA, MD

C h ief Risk Officer, T IA A Financial Solutions C h ief Risk A rchitect, M arket Risk M anagem ent and Analysis,
Goldm an Sachs
Richard Brandt, MD
O perational Risk M anagem ent, Citigroup Dr. Matthew Pritsker
Senior Financial Econom ist and Policy Advisor / Supervision,
Julian Chen, FRM, SVP
Regulation, and C redit, Federal Reserve Bank of Boston
FRM Program Manager, Global Association of Risk Professionals
Dr. Samantha Roberts, FRM, SVP
Dr. Christopher Donohue, MD
Balance Sheet Analytics & M odeling, PN C Bank
G A R P Benchm arking Initiative, Global Association of Risk
Professionals Dr. Til Schuermann
Partner, O liver Wyman
Donald Edgar, FRM, MD
Risk & Q uantitative Analysis, BlackRock Nick Strange, FCA
Director, Supervisory Risk Specialists, Prudential Regulation
Herve Geny
Authority, Bank of England
Group Head of Internal A udit, London Stock Exchange Group
Dr. Sverrir Porvaldsson, FRM
Keith Isaac, FRM, VP
Senior Q uant, SEB
Capital M arkets Risk M anagem ent, TD Bank Group
William May, SVP

Global Head of Certifications and Educational Program s, Global
Association of Risk Professionals
FRM® Committee ■ ix
ESTIMATING
MARKET RISK
MEASURES
An Introduction and Overview
Learning Objectives
A fter com pleting this reading you should be able to:
Estim ate VaR using a historical simulation approach. Estim ate risk measures by estimating quantiles.
Estim ate VaR using a param etric approach for both normal Evaluate estim ators of risk measures by estim ating their
and lognormal return distributions. standard errors.
Estim ate the expected shortfall given P/L or return data. Interpret Q Q plots to identify the characteristics of a
distribution.
Define coherent risk m easures.
E x c e rp t is C hapter 3 o f Measuring M arket Risk, S e co n d Edition, by Kevin D ow d.
1
This chapter provides a brief introduction and overview Loss/Profit Data
of the main issues in m arket risk m easurem ent. O ur main
concerns are: W hen estim ating VaR and ES, it is som etim es more convenient
to deal with data in loss/profit (L/P) form . L/P data are a simple
• Preliminary data issues: How to deal with data in profit/loss
transform ation of P/L data:
form , rate-of-return form , and so on.
L /P t = ~ P / L t (1.4)
• Basic m eth ods o f VaR estim ation: How to estim ate simple
VaRs, and how VaR estimation depends on assumptions L/P observations assign a positive value to losses and a negative
about data distributions. value to profits, and we will call these L/P data 'losses' for short.
• How to estim ate coherent risk measures. Dealing with losses is som etim es a little more convenient for risk
m easurem ent purposes because the risk measures are them
• How to gauge the precision of our risk measure estim ators by
selves denom inated in loss term s.
estim ating their standard errors.
• O verview : An overview of the different approaches to market

risk m easurem ent, and of how they fit together. Arithmetic Return Data
We begin with the data issues. Data can also come in the form of arithm etic (or simple) returns.
The arithm etic return rt is defined as:
Pt + Dt Pt- 1 Pt+Dt
1.1 DATA Pt- 1 Pt- 1
- 1 (1.5)
Profit/Loss Data which is the same as the P/L over period t divided by the value
of the asset at the end of t — 1.
O u r data can com e in various form s. Perhaps the sim p lest is
In using arithmetic returns, we implicitly assume that the interim
in term s of profit/loss (or P/L). The P/L g en erated by an
payment D t does not earn any return of its own. However, this
asset (or portfolio) over the period t, P /L t, can be defined
assumption will seldom be appropriate over long periods because
as the value of the asset (or portfolio) at the end of t
interim income is usually reinvested. Hence, arithmetic returns
plus any interim paym ents Dt minus the asset value at
should not be used when we are concerned with long horizons.
the end of t — 1:
P / L t = Pt + D t - Pt_ ! (1.1)
Geometric Return Data
If data are in P/L form , positive values indicate profits and nega
tive values indicate losses. Returns can also be expressed in geom etric (or compound)
form . The geom etric return Rt is
If we wish to be strictly correct, we should evaluate all payments
from the same point of tim e (i.e., we should take account of the
( 1. 6 )
tim e value of money). We can do so in one of two ways. The first
way is to take the present value of P jL t evaluated at the end of
The geom etric return implicitly assumes that interim payments are
the previous period, t — 1:
continuously reinvested. The geom etric return is often more
(Pt + Dt) economically meaningful than the arithmetic return, because it
P resen t Value (P /L t) = + - Pt_-| (1.2)
ensures that the asset price (or portfolio value) can never become
negative regardless of how negative the returns might be. With
where d is the discount rate and we assume for convenience
arithmetic returns, on the other hand, a very low realized return—
that D t is paid at the end of t. The alternative is to take the for
or a high loss— implies that the asset value Pt can become nega
ward value of P /L t evaluated at the end of period t:
tive, and a negative asset price seldom makes economic sense .1
Forw ard Value (P / L t) = Pt + D t - (1 + d)Pt-i (1.3)
The geom etric return is also more convenient. For exam ple,
which involves com pounding Pt_-| by d. Th e d ifferences if we are dealing with foreign currency positions, geom etric
betw een these values depend on the discount rate d, and returns will give us results that are independent of the reference
will be small if the periods th em selves are short. W e will
ignore th ese d ifferen ces to sim plify the discussio n, but they
1 This is mainly a point of principle rather than practice. In practice, any
can m ake a d ifference in p ractice when dealing with longer distribution we fit to returns is only likely to be an approximation, and
periods. many distributions are ill-suited to extreme returns anyway.
2 ■ Financial Risk Manager Exam Part II: Market Risk Measurement and Management
returns are 1 + rt = exp(Rt) => rt = exp(Rt) — 1 =
exp(0.05) — 1 = 0.0513. In both cases the arithmetic return is
close to, but a little higher than, the geometric return— and this
makes intuitive sense when one considers that the geometric
return compounds at a faster rate.
1.2 ESTIMATING HISTORICAL

SIMULATION V a R
The sim plest way to estim ate VaR is by means of historical
simulation (HS). The HS approach estim ates VaR by means of
ordered loss observations.
Suppose we have 1000 loss observations and are interested in

the VaR at the 95% confidence level. Since the confidence level
Arithmetic Return
implies a 5% tail, we know that there are 50 observations in the
F ia u re 1.1 Geometric and arithmetic returns. tail, and we can take the VaR to be the 51st highest loss
observation.
currency. Sim ilarly, if we are dealing with multiple periods, the W e can estim ate the VaR on a sp read sh eet by ordering
geom etric return over those periods is the sum of the one- our data and reading off the 51st largest observation from
period geom etric returns. Arithm etic returns have neither of the sp read sh eet. W e can also estim ate it more directly
these convenient properties. by using the 'Larg e' com m and in E xce l, which gives us
The relationship of the two types of return can be seen by the kth largest value in an array. Thus, if our data are an
rewriting Equation (1.6) (using a Taylor's series expansion for the array called 'Lo ss_d a ta ', then our VaR is given by the Excel
com m and 'La rg e (Lo ss_d a ta ,5 1 )'. If we are using MAT-
natural log) as:
LA B , we first order the loss/profit data using the 'Sort()'
/ Pt + Dt\ 1 9 1 9 com m and (i.e ., by typing 'Loss_data = Sort(Loss_data)');
Rt = ln( p 1 ) = ln0 + 't) = rt - - r f + - r ? ----- (1.7)
and then derive the VaR by typing in 'Lo ss_d ata(51 )' at the
from which we can see that Rt ~ rt provided that returns are com m and line.
'sm all'. This conclusion is illustrated by Figure 1.1, which plots More generally, if we have n observations, and our confi
the geom etric return Rt against its arithm etic counterpart rt. dence level is a, we would want the (1 — a), n + 1th
The difference between the two returns is negligible when both highest observation, and we would use the commands
returns are small, but the difference grows as the returns get 'Large(Loss_data,(1 — alpha)*n + 1)' using Excel, or
bigger— which is to be exp ected , as the geom etric return is a 'Loss_data((1 — alpha)*n + 1)' using M A TLA B , provided in
log function of the arithm etic return. Since we would expect the latter case that our 'Loss_data' array is already sorted into
returns to be low over short periods and higher over longer ordered observations .2
3
periods, the difference between the two types of return is
negligible over short periods but potentially substantial over
longer ones. And since the geom etric return takes account of
2 In theory, the VaR is the quantile that demarcates the tail region
earnings on interim incom e, and the arithm etic return does not,
from the non-tail region, where the size of the tail is determined by
we should always use the geom etric return if we are dealing the confidence level, but with finite samples there is a certain level of
with returns over longer periods. arbitrariness in how the ordered observations relate to the VaR itself—
that is, do we take the VaR to be the 50th observation, the 51st obser
vation, or some combination of them? However, this is just an issue of
approximation, and taking the VaR to be the 51st highest observation
Example 1.1 Arithmetic and Geometric Returns is not unreasonable.
If arithmetic returns rt over some period are 0.05, Equation (1.7) 3 We can also estimate HS VaR using percentile functions such as
tells us that the corresponding geometric returns are the 'Percentile' function in Excel or the 'pretile' function in MATLAB.
However, such functions are less transparent (i.e., it is not obvious to
Rt = ln(1 + rt) = In(1.05) = 0.0488. Similarly, if geometric the reader how the percentiles are calculated), and the Excel percentile
returns Rt are 0.05, Equation (1.7) implies that arithmetic function can be unreliable.
Chapter 1 Estimating Market Risk Measures ■ 3

An exam ple of an HS VaR is given in Figure 1.2. This figure In practice, it is often helpful to obtain HS VaR estim ates from
shows the histogram of 1000 hypothetical loss o b serva a cum ulative histogram , or em pirical cum ulative frequency
tions and the 95% VaR. Th e figure is generated using the function. This is a plot of the ordered loss observations against
'hsvarfigure' com m and in the MMR Toolbox. The VaR is their em pirical cum ulative frequency (e .g ., so if there are n
1.704 and sep arates the top 5% from the bottom 95% of observations in to tal, the em pirical cum ulative frequency of
loss observations. the rth such ordered observation is i/n). The em pirical cum u
lative frequency function of our earlier data set is shown in
Figure 1.3. The em pirical frequency function m akes it very
easy to obtain the VaR: we sim ply move up the cum ulative
frequency axis to w here the cum ulative frequency equals our
confidence level, draw a horizontal line along to the curve,
and then draw a vertical line down to the x-axis, which gives
us our VaR.
£ 40
<u
1.3 ESTIMATING PARAMETRIC V a R
20- W e can also estim ate VaR using param etric approaches,
the distinguishing feature of which is that they require us
10 -
to exp licitly sp ecify the statistical distribution from which
our data observations are draw n. W e can also think of
param etric approaches as fitting curves through the data
-1 0 1 and then reading off the VaR from the fitted curve.
Loss (+)/profit (-)
Fiaure 1.2 Historical simulation VaR. In making use of a param etric approach, we therefore need to
take account of both the statistical distribution and the type of
Note: Based on 1000 random numbers drawn from a standard norma
data to which it applies.
L/P distribution, and estimated with 'hsvarfigure' function.
Estimating VaR with Normally Distributed

Profits/Losses
Suppose that we wish to estim ate VaR under the assumption
that P/L is normally distributed. In this case our VaR at the confi
dence level a is:
aVaR = -fip jL + o-P/Lza (1.8)
w here za is the standard normal variate corresponding

to a, and /zP/L and a P/L are the mean and standard deviation
of P/L. Thus, za is the value of the standard normal variate
such that a of the probability density mass lies to its left,
and 1 — a of the probability density mass lies to its right.
For exam p le, if our confidence level is 95% , za = Z0.95 will
be 1.645.
Loss (+)/profit (-) In practice, /xP L and o> L would be unknown, and we would have
Fiaure 1.3 Historical simulation via an empirical to estim ate VaR based on estim ates of these param eters. O ur
cumulative frequency function. VaR estim ate, aVaR6, would then be:
Note: Based on the same data as Figure 1.2. aVaRe = —mpjL + sP/Lzw (1.9)
where mP/L and sP L are estim ates of the mean and standard
deviation of P/L.
Figure 1.4 shows the 95% VaR for a norm ally distributed P/L
with mean 0 and standard deviation 1. Since the data are
in P/L form , the VaR is indicated by the negative of the cut
off point betw een the lower 5% and the upper 95% of P/L
observations. The actual VaR is the negative of -1 .6 4 5 , and is
therefore 1.645.
If we are working with normally distributed L/P data,

then /iL/p = —/JLp/L and crL/P = a P/L, and it im m ediately follows
that:
aVaR = [xLj P + crL/Pza (1.10a)
aVaRe = mL/P + s L/Pza (1.10b)

Profit (+)/loss (-)
Figure 1.5 illustrates the corresponding VaR. This figure Fig u re 1.4 VaR with standard normally distributed
gives the sam e inform ation as Figure 1.4, but is a little more profit/loss data.
straightforw ard to interp ret because the VaR is defined in
Note: Obtained from Equation (1.9) with /i,P/L = 0 and aP/L — 1. Esti
units of losses (or 'lost m oney') rather than P/L. In this case, mated with the 'normalvarfigure' function.
the VaR is given by the point on the x-axis that cuts off the
top 5% of the pdf mass from the bottom 95% of pdf m ass. If
w e prefer to w ork with the cum ulative density function, the
VaR is the x-value that corresponds to a cd f value of 95% .
Eith er w ay, the VaR is again 1.645, as w e would (hopefully)
e xp e ct.
Example 1.2 VaR with Normal P/L

If P/L over some period is normally distributed with mean
10 and standard deviation 20, then (by Equation (1.8)) the
95% VaR is - 1 0 + 20z0.95 = - 1 0 + 20 X 1.645 = 22.9.
The corresponding 99% VaR is —10 + 20z099 = —10 +
20 X 2.326 = 36.52.
Estimating VaR with Normally Distributed Loss (+)/profit (-)

Arithmetic Returns Fig u re 1.5 VaR with normally distributed loss/profit
We can also estim ate VaR making assumptions about data.
returns rather than P/L. Suppose then that we assume that Note: Obtained from Equation (1.10a) with /a L/P = 0 and aL/P = 1.
arithm etic returns are normally distributed with mean /xr
and standard deviation ar To derive the VaR, we begin by
obtaining the critical value of rt, r*, such that the probabil
Since the actual return rt is the loss/profit divided by the earlier
ity that rt exceeds r* is equal to our confidence level a . r* is
asset value, Pt_-|, it follows that:
therefore:
r* = /xr - arza ( 1 . 11) ( 1. 12)

Substituting r* for rt then gives us the relationship between r*
and the VaR: BOX 1.1 THE LOGNORMAL
DISTRIBUTION
(1.13)
A random variate X is said to be lognorm ally d istrib
uted if the natural log of X is norm ally d istrib uted.
Substituting Equation (1.11) into Equation (1.13) and rearrang
Th e lognorm al distribution can be specified in term s
ing then gives us the VaR itself: of the mean and standard deviation of In X . Call these
aVaR = - { f i r - cTrza)Pt_ i (1.14) param eters ju and er. The lognorm al is often also rep re
sented in term s of m and a , w here m is the m edian of x,
Equation (1.14) will give us equivalent answers to our earlier VaR and m = exp(/x).
equations. For exam ple, if we set a = 0.95, /zr = 0, ay = 1 and The pdf of X can be written:
Pt- ‘| = 1, which correspond to our earlier illustrative P/L and U P
param eter assum ptions, aVaR is 1.645: the three approaches
give the same results, because all three sets of underlying
assum ptions are equivalent. for x > 0. Thus, the lognorm al pdf is only defined for
positive values of x and is skew ed to the right as in
Figure 1. 6 .
Example 1.3 VaR with Normally Distributed Arithmetic Let a) = exp(er2) for convenience. The mean and variance
Returns of the lognormal can be written as:
Suppose arithm etic returns rt over some period are distributed mean = mexp(<x2/ 2 ) and variance = m2a>(a> - 1)
as normal with mean 0.1 and standard deviation 0.25, and we
Turning to higher m oments, the skewness of the
have a portfolio currently worth 1. Then (by Equation (1.14)) the
lognormal is
95% VaR is -0 .1 + 0.25 X 1.645 = 0.331, and the 99% VaR is
-0 .1 + 0.25 X 2.326 = 0.482. skew ness = {a> + 2)(co — 1)1 2
and is always positive, which confirms the lognormal has a

long right-hand tail. The kurtosis of the lognormal is
Estimating Lognormal VaR
kurtosis = co4 + 2o? + 3 to2 — 3
Each of the previous ap p ro aches assigns a positive p ro b
and th erefo re varies from a minimum of (just over)
ab ility of the asset value, Pt, becom ing n eg ative, but we
3 to a p o ten tially large value depending on the
can avoid this d raw b ack by w orking with g eo m etric returns. value of s.
Now assum e th at g eo m etric returns are norm ally d istrib
uted with mean /jl r and standard deviation crR. If Dt is zero
or reinvested continually in the asset itself (e .g ., as with
profits reinvested in a mutual fund), this assum ption im plies W e then use the definition of the geom etric return to unravel
th at the natural logarithm of Pt is norm ally d istrib u te d , or the critical value of P* (i.e., the value of Pt corresponding to a
th at Pt itself is lognorm ally d istrib u ted . Th e lognorm al d is loss equal to our VaR), and thence infer our VaR:
tribution is exp lain ed in Box 1.1, and a lognorm al asset P* = ln (P */P t_ 1) = In P* - In Pt_-,
price is shown in Figure 1 . 6 : o b serve th at the price is alw ays
=> In P* = P* + In Pt~i
non-negative, and its distribution is skew ed with a long
right-hand tail. => P* = Pt- 1 exp[P*] = Pt_-, exp[^R - aRza]
Since the VaR is a loss, and since the loss is the difference => aVaR = Pt_! - P* = Pt—-|(1 - exp[/xR - a Rz J ) (1.16)
between Pt (which is random) and P t_-1 (which we can take here
This gives us the lognormal VaR, which is consistent with
as given), then the VaR itself has the same distribution as Pt.
normally distributed geom etric returns.
Normally distributed geom etric returns imply that the VaR is
lognormally distributed. The lognormal VaR is illustrated in Figure 1.7, based on
the standardised (but typically unrealistic) assumptions that
If we proceed as we did earlier with the arithm etic return,
HR = 0, crR = 1, and Pt_ 1 = 1. In this case, the VaR at the 95%
we begin by deriving the critical value of R, R*, such that the
confidence level is 0.807. The figure also shows that the distri
probability that R > R* is a:
bution of L/P is a reflection of the distribution of Pt shown earlier
R* = n R - a Rza (1.15) in Figure 1. 6 .
The corresponding 99% VaR is 1 — exp(0.05 — 0.20 X
2.326) = 0.340. O bserve that these VaRs are quite close
to those obtained in Exam ple 1.3, where the arithm etic return
param eters were the same as the geom etric return param eters
assumed here.
Example 1.5 Lognormal VaR vs Normal VaR

Suppose that we make the em pirically not too unrealistic
assum ptions that the mean and volatility of annualised returns
are 0.10 and 0.40. W e are interested in the 95% VaR at the
1-day holding period for a portfolio worth USD 1. Assum
ing 250 trading days to a year, the daily return has a mean
0.1/250 = 0.00040 and standard deviation 0 .4 0 /V 2 5 0 = 0.0253.
The normal 95% VaR is -0 .0 0 0 4 + 0.0253 X 1.645 = 0.0412. If
we assume a lognormal, then the 95% VaR is 1 — exp(0.0004 —
0.0253 X 1.645) = 0.0404. The normal VaR is 4.12% and the
lognormal VaR is 4.04% of the value of the portfolio. This illus
Note: Estimated using the 'lognpdf' function in the Statistics Toolbox.
trates that normal and lognormal VaRs are much the same
if we are dealing with short holding periods and realistic
return param eters.
1.4 ESTIMATING CO H EREN T RISK

M EASURES
Estimating Expected Shortfall

W e turn now to the estim ation of coherent risk m easures, and
the easiest of these to estim ate is the exp ected shortfall (ES).
The ES is the probability-w eighted average of tail losses, and
a normal ES is illustrated in Figure 1.8. In this case, the 95%
ES is 2 .0 6 3 , corresponding to our earlier normal 95% VaR
of 1.645.
The fact that the ES is a probability-w eighted average of tail

losses suggests that w e can estim ate ES as an average of 'tail
V aRs '.4 The easiest w ay to im plem ent this approach is to slice
the tail into a large num ber n of slices, each of which has the
sam e probability m ass, estim ate the VaR associated with each
Note: Estimated assuming the mean and standard deviation of geomet slice, and take the ES as the average of these VaRs.
ric returns are 0 and 1, and for an initial investment of 1. The figure is
produced using the 'lognormalvarfigure' function. To illustrate the m ethod, suppose w e wish to estim ate a 95%
ES on the assum ption that losses are norm ally distributed with
mean 0 and standard deviation 1. In p ractice, w e would use a
Example 1.4 Lognormal VaR

4 The obvious alternative is to seek a 'closed-form' solution, which we
Suppose that geom etric returns Rt over some period are distrib could use to estimate the ES, but ES formulas seem to be known only
for a limited number of parametric distributions (e.g., elliptical, including
uted as normal with mean 0.05, standard deviation 0.20, and we
normal, and generalised Pareto distributions), whereas the 'average-
have a portfolio currently worth 1. Then (by Equation (1.16)) tail-VaR' method is easy to implement and can be applied to any 'well-
the 95% VaR is 1 - exp(0.05 - 0.20 X 1.645) = 0.244. behaved' ESs that we might encounter, parametric or otherwise.

O f co u rse, in using this m ethod fo r p ractical p u rp o ses,
0.4 - w e w ould w ant a value o f n large enough to give accurate
results. To give som e idea of w hat this m ight be, Table 1.2
reports som e altern ative ES estim ates o b tained using this
p ro ced ure with varying values of n. T h e se results show that
the estim ated ES rises with n, and g rad u ally co n verg es to
the true value of 2 .0 6 3 . T h e se results also show th at our ES
estim ation p ro ced ure seem s to be reasonably accu rate even
for quite sm all values of n. A n y d e cen t co m p uter should
th erefo re be able to produce accu rate ES estim ates quickly
in real tim e.
Estimating Coherent Risk Measures

O ther coherent risk measures can be estim ated using m odifica
tions of this 'average VaR' m ethod. Recall that a coherent risk
measure is a weighted average of the quantiles (denoted by qp)
Note: Estimated with the mean and standard deviation of P/L equal to 0 of our loss distribution:
and 1 respectively, using the 'normalesfigure' function.
1
M<i>= J <Mp)qPdp (1.17)

high value of n and carry out the calculations on a sp read sheet o
or using appropriate softw are. However, to show the p ro ce where the weighting function or risk-aversion function </>(p) is
dure m anually, let us w ork with a very small n value of 10. This specified by the user. The ES gives all tail-loss quantiles an equa
value gives us 9 (i.e ., n - 1) tail VaRs, or VaRs at confidence w eight, and other quantiles a w eight of 0. Thus the ES is a sp e
levels in excess of 95% . Th ese VaRs are shown in Table 1.1, cial case of obtained by setting </>(p) to the following:
and vary from 1.6954 (for the 95.5% VaR) to 2.5758 (for the
99.5% VaR). O ur estim ated ES is the average of these VaRs, f0 .f P < «
(1.18)
which is 2.0250. \ 1/(1 “ «) ' P - «
Table 1.2 ES Estimates as a Function of the Number

Table 1.1 Estimating ES as a Weighted Average of of Tail Slices
Tail VaRs Number of Tail Slices (n) ES
Confidence Level Tail VaR 10 2.0250
95.5% 1.6954 25 2.0433
96.0% 1.7507 50 2.0513
96.5% 1.8119 100 2.0562
97.0% 1.8808 250 2.0597
97.5% 1.9600 500 2.0610
98.0% 2.0537 1000 2.0618
98.5% 2.1701 2500 2.0623
99.0% 2.3263 5000 2.0625
99.5% 2.5738 10 000 2.0626
A verage of tail VaRs 2.0250 True value 2.0630
Note: VaRs estimated assuming the mean and standard deviation of Note: VaRs estimated assuming the mean and standard deviation of
losses are 0 and 1, using the 'normalvar' function in the MMR Toolbox. losses are 0 and 1.
The more general coherent risk measure, involves a poten to give accurate results. Table 1.4 reports som e alternative
tially more sophisticated weighting function </>(p). We can th ere estim ates obtained using this procedure with increasing val
fore estim ate any of these measures by replacing the equal ues of n. These results show that the estim ated risk m easure
weights in the 'average VaR' algorithm with the </>(p) weights rises with n, and gradually converges to a value in the region
appropriate to the risk measure being estim ated. of about 1.854. The estim ates in this table indicate that we
may need a considerably larger value of n than we did earlier
To show how this might be done, suppose we have the exp o
to get results of the sam e level of accuracy. Even so, a good
nential weighting function:
com puter should still be able to produce accurate estim ates of
—(1 -p )/y
spectral risk m easures fairly quickly.
W hen estim ating ES or more general coherent risk measures in

and we believe that we can represent the degree of our practice, it also helps to have some guidance on how to choose
risk-aversion by setting y = 0.05. To illustrate the procedure the value of n. Granted that the estim ate does eventually con
manually, we continue to assume that losses are standard verge to the true value as n gets large, one useful approach is to
normally distributed and we set n = 10 (i.e., we divide the start with some small value of n, and then double n repeatedly
com plete losses density function into 10 equal-probability until we feel the estim ates have settled down sufficiently. Each
slices). With n = 10, we have n — 1 = 9 (i.e., n — 1) loss quan tim e we do so, we halve the width of the discrete slices, and
tiles or VaRs spanning confidence levels from 0.1 to 0.90. These we can monitor how this 'halving' process affects our estim ates.
VaRs are shown in the second column of Table 1.3, and vary This suggests that we look at the 'halving error' en given by:
from —1.2816 (for the 10% VaR) to 1.2816 (for the 90% VaR).
en = M (n) - M ("/2) (1.20)
The third column shows the </>(p) weights corresponding to each
confidence level, and the fourth column shows the products of w here l\/fnJ is our estim ated risk m easure based on n slices.
each VaR and corresponding w eight. O ur estim ated exponential W e stop doubling n when s n falls below som e to lerance level
spectral risk measure is the </>(p)-weighted average of the VaRs, that indicates an accep tab le level of accuracy. Th e process is
and is therefore equal to 0.4228.
As when estim ating the ES earlier, when using this type of

routine in practice we would w ant a value of n large enough
Table 1.4 of Exponential Spectral
Coherent Risk Measure as a Function of the Number
of Tail Slices
Table 1.3 Estimating Exponential Spectral Risk Estimate of Exponential

Measure as a Weighted Average of VaRs Number of Tail Slices Spectral Risk Measure
Confidence 10 0.4227
Level (a) aVaR Weight (f>[a) tp[a) X a VaR 50 1.3739
10% -1 .2 8 1 6 0 0.0000 100 1.5853
20 % -0 .8 4 1 6 0 0.0000 250 1.7338
30% -0 .5 2 4 4 0 0.0000 500 1.7896
40% -0 .2 5 3 3 0.0001 0.0000 1000 1.8197
50% 0 0.0009 0.0000 2500 1.8392
60% 0.2533 0.0067 0.0017 5000 1.8461
70% 0.5244 0.0496 0.0260 10 000 1.8498
80% 0.8416 0.3663 0.3083 50 000 1.8529
90% 1.2816 2.7067 3.4689 100 000 1.8533
Risk measure = mean (cf>[a) tim es aVaR) = 0.4226 500 000 1.8536
Note: VaRs estimated assuming the mean and standard deviation of Note: VaRs estimated assuming the mean and standard deviation of
losses are 0 and 1, using the 'normalvar' function in the MMR Toolbox. losses are 0 and 1, using the 'normalvar' function in the MMR Toolbox.
The weights (f){a) are given by the exponential function (Equation (1.19)) The weights </>(«) are given by the exponential function (Equation (1.19))
with y — 0.05. with y — 0.05.

Table 1.5 Estimated Risk Measures and Halving Errors Th u s, the key to estim ating any co h eren t risk m easure is
to be able to estim ate quantiles or VaRs: the co herent risk
Estimated
m easures can then be obtained as ap p ro p riately w eig hted
Number of Tail Spectral Risk
averag es of q u an tiles. From a practical point of view , this
Slices Measure Halving Error
is extrem ely helpful as all the building blocks th at go into
100 1.5853 0.2114 quantile or VaR estim atio n— d atab ases, calculation rou
200 1.7074 0.1221 tin e s, e tc .— are exactly w hat w e need for the estim ation of
co h eren t risk m easures as w ell. If an institution already has
400 1.7751 0.0678
a VaR en g in e, then th at engine needs only sm all ad ju st
800 1.8120 0.0368 m ents to produce estim ates of co herent risk m easures:
1600 1.8317 0.0197 in d eed , in m any cases, all th at needs changing is the last few
lines of code in a long data processing system . Th e costs
3200 1.8422 0.0105
of sw itching from VaR to more so p h isticated risk m easures
6400 1.8477 0.0055
are th erefo re very low.
12 800 1.8506 0.0029
25 600 1.8521 0.0015
51 200 1.8529 0.0008

1.5 ESTIMATING THE STANDARD
Note: VaRs estimated assuming the mean and standard deviation of
ERRORS O F RISK M EASURE
losses are 0 and 1, using the 'normalvar' function in the MMR Toolbox. ESTIMATORS
The weights (f){a) are given by the exponential function (Equation (1.19))
with 7 — 0.05.
W e should always bear in mind that any risk measure estim ates
that we produce are just that— estim ates. We never know the
true value of any risk m easure, and an estim ate is only as good
shown in Table 1.5. Starting from an arbitrary value of 100, we as its precision: if a risk measure is very im precisely estim ated,
rep eatedly double n (so it becom es 200, 400, 800, etc.). A s we then the estim ator is virtually worthless, because its imprecision
do so, the estim ated risk m easure gradually converges, and tells us that true value could be alm ost anything; on the other
the halving error gradually falls. So, for exam p le, for n = 6400, hand, if we know that an estim ator is fairly precise, we can be
the estim ated risk m easure is 1.8477, and the halving error is confident that the true value is fairly close to the estim ate, and
0 .0 0 5 5 . If we double n to 12,800, the estim ated risk m easure the estim ator has some value. Hence, we should always seek to
becom es 1.8506, and the halving error falls to 0 .0029, supplem ent any risk estim ates we produce with some indicator
and so on. of their precision. This is a fundam ental principle of good risk
However, this 'weighted average quantile' procedure is rather m easurem ent practice.
crude, and (bearing in mind that the risk measure (Equation (1.17)) W e can evaluate the precision of estim ators of risk measures by
involves an integral) we can in principle expect to get substantial means of their standard errors, or (generally better) by produc
improvements in accuracy if we resorted to more 'respectable' ing confidence intervals for them . In this chapter we focus on
numerical integration or quadrature methods. This said, the crude the more basic indicator, the standard error of a risk measure
'weighted average quantile' method actually seems to perform estimator.
well for spectral exponential risk measures when compared
against some of these alternatives, so one is not necessarily better
off with the more sophisticated methods .5 Standard Errors of Quantile Estimators
W e first co nsid er the standard errors of quantile (or VaR)
estim ato rs. Follow ing Kendall and S tu art ,6 suppose we have
a distribution (or cum ulative density) function F(x), which
m ight be a param etric distribution function or an em pirical
5 There is an interesting reason for this: the spectral weights give the
highest loss the highest weight, whereas the quadrature methods such
as the trapezoidal and Simpson's rules involve algorithms in which the
two most extreme quantiles have their weights specifically cut, and this
undermines the accuracy of the algorithm relative to the crude approach.
However, there are ways round these sorts of problems, and in principle
versions of the sophisticated approaches should give better results. 6 Kendall and Stuart (1972), pp. 251-252.
distribution function (i.e ., a cum ulative histogram ) estim ated
from real data. Its corresponding density or relative-frequency
function is f(x). Suppose also th at w e have a sam ple of size
n, and we se le ct a bin width h. Let d F be the p ro b ab ility that
(k — 1) o b servatio n s fall b elow som e value q — h j 2, th at one
observation falls in the range q ± h/2, and that (n — k) o b ser
vations are g re ater than q + h/2. d F is proportional to
{F(q )}‘ - 1fi:q)c/q{1 - R q )}"- fe (1.21)
This gives us the frequency function for the quantile q not

exceeded by a proportion k/n of our sam ple, i.e ., the 100(k/n)th
percentile.
If this proportion is p, Kendall and Stuart show that Equation

(1.21) is approxim ately equal to p np(1 - p)n(1_p) for large values
of n. If e is a very small increm ent to
p, then
p np(1 — p)n(1 p) « (p + e)np(1 ~ p ~ e)n(1 p) ( 1. 22)

Note: Based on random samples of size n drawn from a standard normal
Taking logs and expanding, Equation (1.22) is itself distribution. The bin width h is set to 0.1.
approxim ately
(p + e)np(1 - p - e)n(1“ p) (1.23)

2p(1 - p) In addition, the quantile standard error depends on the prob
ability density function f(.)— so the choice of density function can
which implies that the distribution function d F is approxim ately
make a difference to our estim ates— and also on the bin width
proportional to
h, which is essentially arbitrary.
-n s2 \
exp (1.24) The standard error can be used to construct confidence inter
2P (1 - p ) J
vals around our quantile estim ates in the usual textb o o k way.
Integrating this out, For exam ple, a 90% confidence interval for a quantile q is
given by
1 —ns'
ds (1.25)
V ^ V p (T ^ ~ p )7 n 2p(1 - p) [q — 1.645 se(q), q + 1.645 se(q)]
which tells us that s is normally distributed in the limit with vari V p (1 - p )/n V p (1 - p)/n
1.645 . . , q + 1.645 — (1.27)
ance p(1 — p)/n. However, we know that de = d F(q ) = i{q)dq, Kq) Kq)
so the variance of q is
p(1 ~ p)
variq) « (1.26) Example 1.6 Obtaining VaR Confidence Intervals Using
n[«q)]2
Quantile Standard Errors
This gives us an approxim ate expression for the variance,
Suppose we wish to estim ate the 90% confidence interval for a
and hence its square root, the standard error, of a quantile
95% VaR estim ated on a sam ple of size of n = 1000 to be drawn
estim ator q.
from a standard normal distribution, based on an assumed bin
This expression shows that the quantile standard error depends width h = 0.1.
on p, the sam ple size n and the pdf value f(q). The way in which
W e know that the 95% VaR of a standard normal is 1.645.
the (normal) quantile standard errors depend on these param
W e take this to be q in Equation (1.27), and we know that
eters is apparent from Figure 1.9. This shows that:
q falls in the bin spanning 1.645 ± 0.1 /2 = [1.595, 1.695].
• The standard error falls as the sam ple size n rises. The probability of a loss exceeding 1.695 is 0 .0 4 5 , and this
• The standard error rises as the probabilities becom e more is also equal to p, and the probability of profit or a loss less
extrem e and we move further into the tail— hence, the more than 1.595 is 0.9446. Hence f(q), the probability mass in the
extrem e the quantile, the less precise its estimator. q range, is 1 — 0.0450 — 0.9446 = 0.0104. W e now plug the

relevant values into Equation (1.27) to obtain the 90% confi of VaR and ES estim ators for Levy distributions with varying a
dence interval for the VaR: stability param eters. Their results suggested that VaR and ES
estim ators had com parable standard errors for near-normal Levy
V0.045(1 - 0.045)/1000
1.645 - 1.645 distributions, but the ES estim ators had much bigger standard
0.0104 '
errors for particularly heavy-tailed distributions. They explained
V0.045(1 - 0.045)/1000
1.645 + 1.645 this finding by saying that as tails becam e heavier, ES estim a
0.0104
tors becam e more prone to the effects of large but infrequent
= [0.6081,2.6819] losses. This finding suggests the depressing conclusion that the
This is a wide confidence interval, especially when com pared to presence of heavy tails might make ES estim ators in general less
the O S and bootstrap confidence intervals. accurate than VaR estim ators.
The confidence interval narrows if we take a w ider bin Fortunately, there are grounds to think that such a conclusion
width, so suppose that we now repeat the exercise might be overly pessim istic. For exam ple, Inui and Kijima (2003)
using a bin width h = 0.2, which is probably as wide as present alternative results showing that the application of a
we can reasonably go with these data, q now falls into Richardson extrapolation method can produce ES estim ators
the range 1.645 ± 0.2/2 = [1.545, 1.745]. p, the prob that are both unbiased and have com parable standard errors to
ability of a loss exceeding 1.745, is 0.0405, and the prob VaR estim ators.7 Acerbi (2004) also looked at this issue and,
ability of profit or a loss less than 1.545 is 0.9388. Hence although he confirm ed that tail heaviness did increase the stan
f(q) = 1 — 0.0405 — 0.9388 = 0.0207. Plugging these values into dard errors of ES estim ators relative to VaR estim ators, he con
Equation (1.27) now gives us a new estim ate of the 90% confi cluded that the relative accuracies of VaR and ES estim ators
dence interval: were roughly com parable in em pirically realistic ranges.
However, the standard error of any estim ator of a coherent risk

Vo.0405(1 - 0.0405 )/1000
1.645 - 1.645 measure will vary from one situation to another, and the best
0.0207
practical advice is to get into the habit of always estim ating the
Vo.0405(1 - 0.0405 )/1000
standard error w henever one estim ates the risk measure itself.
1.645 + 1.645
0.0207 Estim ating the standard error of an estim ator of a coherent
= [1.1496, 2.1404]
risk measure is also relatively straightforward. One way to do
This is still a rather wide confidence interva so starts from recognition that a coherent risk measure is an
L-estimator (i.e., a weighted average of order statistics), and
This exam ple illustrates that although we can use quantile stan
L-estimators are asym ptotically normal. If we take N discrete
dard errors to estim ate VaR confidence intervals, the intervals
points in the density function, then as N gets large the variance
can be wide and also sensitive to the arbitrary choice of bin
of the estim ator of the coherent risk measure (Equation (1.17)) is
width.
approxim ately
The quantile-standard-error approach is easy to implement and
has some plausibility with large sample sizes. However, it also has cr(M^) —> [ (f)[p)4>{q)------------- — ---- d p d q
* N J f(F-\p ))f(F-\q )) H W
weaknesses relative to other methods of assessing the precision p<q
of quantile (or VaR) estimators— it relies on asymptotic theory and
requires large sample sizes; it can produce imprecise estimators, = Jj J mx) WF( y) ) F( x) ( 1 - F(y))dxdy (1.28)
or wide confidence intervals; it depends on the arbitrary choice of x< y
bin width; and the symmetric confidence intervals it produces are and this can be com puted numerically using a suitable numeri
misleading for extrem e quantiles whose 'true' confidence inter cal integration procedure. W here the risk measure is the ES, the
vals are asymmetric reflecting the increasing sparsity of extreme standard error becom es
observations as we move further out into the tail. F-\a)F~\a)
a ( J I [min(F(x), F(y)) - F(x)F(y)]dxdy (1.29)
Standard Errors in Estimators of Coherent 0 0
Risk Measures and used in conjunction with a suitable numerical integration

m ethod, this gives good estim ates even for relatively low values
We now consider standard errors in estim ators of coherent risk
m easures. O ne of the first studies to exam ine this issue (Yamai
and Yoshiba (2001b) did so by investigating the relative accuracy 7 See Inui and Kijima (2003).
of N .8 If we wish to obtain confidence intervals for our risk m ea portfolio level is more limiting, but easier, while working at the
sure estim ators, we can make use of the asym ptotic normality of position level gives us much more flexibility, but can involve
these estim ators to apply textbook formulas (e.g ., such as much more work.
Equation (1.27)) based on the estim ated standard errors and
Which m eth od? Having chosen our risk measure and level
centred around a 'good' best estim ate of the risk measure.
of analysis, we then choose a suitable estimation m ethod.
An alternative approach to the estimation of standard errors for To decide on this, we would usually think in term s of the classic
estim ators of coherent risk measures is to apply a bootstrap: we 'VaR trinity':
bootstrap a large number of estim ators from the given distribu
• Non-param etric methods
tion function (which might be param etric or non-parametric,
• Param etric methods
e .g ., historical); and we estim ate the standard error of the sam
ple of bootstrapped estim ators. Even better, we can also use • Monte Carlo simulation methods
a bootstrapped sam ple of estim ators to estim ate a confidence Each of these involves some com plex issues.
interval for our risk measure.
1.7 APPENDIX
1.6 THE CO RE ISSUES: AN OVERVIEW
Preliminary Data Analysis
Before proceeding to more detailed issues, it might be help
ful to pause for a moment to take an overview of the structure, W hen confronted with a new data set, we should n ever proceed
as it w ere, of the subject m atter itself. This is very useful, as it straight to estimation without some preliminary analysis to get
gives the reader a mental fram e of reference within which the to know our data. Preliminary data analysis is useful because it
'detailed' material that follows can be placed. Essentially, there gives us a feel for our data, and because it can highlight prob
are three core issues, and all the material that follows can be lems with our data set. Rem em ber that we never really know
related to these. They also have a natural sequence, so we can where our data com e from , so we should always be a little wary
think of them as providing a roadmap that leads us to where we of any new data set, regardless of how reputable the source
want to be. might appear to be. For exam ple, how do you know that a clerk
hasn't made a mistake som ewhere along the line in copying
Which risk m easure? The first and most im portant is to choose
the data and, say, put a decimal point in the wrong place? The
the type of risk m easure: do we want to estim ate VaR, ES, etc.?
answer is that you don't, and never can. Even the most repu
This is logically the first issue, because we need to know what
table data providers provide data with errors in them , however
we are trying to estim ate before we start thinking about how w e
careful they are. Everyone who has ever done any empirical
are going to estim ate it.
work will have encountered such problem s at some tim e or
Which level? The second issue is the level of analysis. Do we other: the bottom line is that real data must always be viewed
wish to estim ate our risk measure at the level of the portfo with a certain am ount of suspicion.
lio as a whole or at the level of the individual positions in it?
Such preliminary analysis should consist of at least the first two
The form er would involve us taking the portfolio as our basic
and preferably all three of the following steps:
unit of analysis (i.e., we take the portfolio to have a specified
com position, which is taken as given for the purposes of our • The first and by far the most im portant step is to eyeball the
analysis), and this will lead to a univariate stochastic analysis. data to see if they 'look right'— or, more to the point, we
The alternative is to work from the position level, and this has should eyeball the data to see if anything looks wrong. Does
the advantage of allowing us to accom m odate changes in the the pattern of observations look right? Do any observations
portfolio com position within the analysis itself. The disadvan stand out as questionable? And so on. The interocular trauma
tage is that we then need a multivariate stochastic fram ework, test is the most im portant test ever invented and also the
and this is considerably more difficult to handle: we have to easiest to carry out, and we should always perform it on any
get to grips with the problem s posed by variance-covariance new data set.
m atrices, copulas, and so on, all of which are avoided if we work • We should plot our data on a histogram and estim ate the
at the portfolio level. There is thus a trade-off: working at the relevant summary statistics (i.e., mean, standard deviation,
skewness, kurtosis, etc.). In risk m easurem ent, we are par
ticularly interested in any non-normal features of our data:
8 See Acerbi (2004, pp. 200-201). skewness, excess kurtosis, out-liers in our data, and the like.

W e should therefore be on the lookout for any evidence of of the distribution, which might indicate fat tails on at least
non-normality, and we should take any such evidence into the left-hand side. In fact, these particular observations are
account when considering whether to fit any param etric dis drawn from a Student-t distribution with 5 degrees of freedom ,
tribution to the data. so in this case we know that the underlying true distribution
• Having done this initial analysis, we should consider what is unim odal, sym m etric and heavy tailed. However, we would
kind of distribution might fit our data, and there are a num not know this in a situation with 'real' data, and it is precisely
ber of useful diagnostic tools available for this purpose, the because we do not know the distributions of real-world data
most popular of which are Q Q plots— plots of empirical sets that prelim inary analysis is so im portant.
quantiles against their theoretical equivalents. Som e sum m ary statistics for this data set are shown in
Table 1.6. The sam ple mean (—0.099) and the sam ple m ode dif
fer som ew hat (—0.030), but this difference is small relative to
Plotting the Data and Evaluating
the sam ple standard deviation (1.363). However, the sam ple
Summary Statistics skew (—0.503) is som ew hat negative and the sam ple kurtosis
To get to know our data, we should first obtain their histogram (3.985) is a little bigger than norm al. The sam ple minimum
and see what might stand out. Do the data look normal, or (—4.660) and the sam ple maximum (3.010) are also not sym m et
non-normal? Do they show one pronounced peak, or more than ric about the sam ple mean or m ode, which is further evidence
one? Do they seem to be skew ed? Do they have fat tails or thin of asym m etry. If w e encountered these results with 'real' data,
tails? Are there outliers? And so on. w e would be concerned about possible skew ness and kurtosis.
However, in this hypothetical case w e know that the sam ple
As an exam ple, Figure 1.10 shows a histogram of 100 random
skew ness is m erely a product of sam ple variation, because we
observations. In practice, we would usually wish to work with
happen to know that the data are drawn from a sym m etric
considerably longer data sets, but a data set this small helps
distribution.
to highlight the uncertainties one often encounters in practice.
These observations show a dom inant peak in the centre, which Depending on the context, we might also seriously consider
suggests that they are probably drawn from a unimodal distri carrying out some formal tests. For exam ple, we might test
bution. On the other hand, there may be a negative skew, and w hether the sam ple param eters (mean, standard deviation, etc.)
there are som e large outlying observations on the extrem e left are consistent with what we might exp ect under a null hypoth
esis (e.g ., such as normality).
The underlying principle is very sim ple: since we never know the
true distribution in practice, all we ever have to work with are
estim ates based on the sam ple at hand; it therefore behoves
us to make the best use of the data we have, and to extract as
much information as possible from them .
Q Q Plots
Having done our initial analysis, it is often good practice to ask
what distribution might fit our data, and a very useful device
for identifying the distribution of our data is a quantile-quantile
or Q Q plot— a plot of the quantiles of the em pirical distribu
tion against those of some specified distribution. The shape of
the Q Q plot tells us a lot about how the em pirical distribution
com pares to the specified one. In particular, if the Q Q plot is
linear, then the specified distribution fits the data, and we have
identified the distribution to which our data belong. In addi
tion, departures of the Q Q from linearity in the tails can tell
Figure 1.10 A histogram. us w hether the tails of our empirical distribution are fatter, or
Note: Data are 100 observations randomly drawn from a Student-t with thinner, than the tails of the reference distribution to which it is
5 degrees of freedom. being com pared.
Table 1.6 Summary Statistics 6 1---------1---------1---------1---------1---------1---------1-
+
Parameter Value 4
Mean -0 .0 9 9
2
Mode -0 .0 3 0
Standard deviation 1.363

0
Skewness -0 .5 0 3
Kurtosis 3.985 -2
E
Minimum -4 .6 6 0 Hi
-4
Maximum 3.010
Num ber of observations 100 -6
Note: Data are the same observations shown in Figure 1.10.

+________I________I________I________I________I________L
_I
-8
- 4 - 3 - 2 - 1 0 1 2 3 4
Normal Quantiles
To illustrate, Figure 1.11 shows a Q Q plot for a data sam ple Figure 1.12 Q Q plot: t sample against normal
drawn from a normal distribution, com pared to a reference reference distribution.
distribution that is also norm al. The Q Q plot is obviously close
Note: The empirical sample is a random sample of 500 observations
to linear: the central mass observations fit a linear Q Q plot drawn from Student-t with 5 degrees of freedom. The reference distribu
very closely, and the extrem e tail observations som ew hat tion is standard normal.
less so. However, there is no denying that the overall plot is
approxim ately linear. Figure 1.11 is a classic exam ple of a Q Q
plot in which the em pirical distribution m atches the reference By contrast, Figure 1.12 shows a good exam ple of a Q Q plot
population. where the em pirical distribution does not match the reference
population. In this case, the data are drawn from a Student-
tw ith 5 degrees of freedom , but the reference distribution
is standard normal. The Q Q plot is now clearly non-linear:
although the central mass observations are close to linear, the
tails show steeper slopes indicative of the tails being heavier
than those of the reference distribution.
A Q Q plot is useful in a number of ways. First, as noted already,

if the data are drawn from the reference population, then the
Q Q plot should be linear. This remains true if the data are drawn
from some linear transform ation of the reference distribution
(i.e., are drawn from the same distribution but with different
location and scale param eters). We can therefore use a Q Q plot
to form a tentative view of the distribution from which our data
might be drawn: we specify a variety of alternative distributions,
and construct Q Q plots for each. Any reference distributions
that produce non-linear Q Q plots can then be dism issed, and
any distribution that produces a linear Q Q plot is a good candi
date distribution for our data.
Normal Quantiles
Figure 1.11 Q Q plot: normal sample against normal Second, because a linear transform ation in one of the distribu
reference distribution. tions in a Q Q plot merely changes the intercept and slope of
the Q Q plot, we can use the intercept and slope of a linear Q Q
Note: The empirical sample is a random sample of 500 observations
drawn from a standard normal. The reference distribution is standard plot to give us a rough idea of the location and scale param eters
normal. of our sam ple data. For exam ple, the reference distribution in

Figure 1.11 is a standard normal. The linearity of the Q Q plot Finally, a Q Q plot is good for identifying outliers (e.g ., observa
in this figure suggests that the data are normal, as mentioned tions contam inated by large errors): such observations will stand
already, but Figure 1.11 also shows that the intercept and slope out in a Q Q plot, even if the other observations are broadly con
are approxim ately 0 and 1 respectively, and this indicates that sistent with the reference distribution.9
the data are drawn from a standard normal, and not just any
normal. Such rough approxim ations give us a helpful yardstick
against which we can judge more 'sophisticated' estim ates of
location and scale, and also provide useful initial values for itera
tive algorithm s. 9 Another useful tool, especially when dealing with the tails, is the mean
excess function (MEF): the expected amount by which a random vari
Third, if the em pirical distribution has heavier tails than the ref able X exceeds some threshold u, given that X > u. The usefulness of
erence distribution, the Q Q plot will have steeper slopes at its the MEF stems from the fact that each distribution has its own distinc
tive MEF. A comparison of the empirical MEF with the theoretical MEF
tails, even if the central mass of the em pirical observations are
associated with some specified distribution function therefore gives
approxim ately linear. Figure 1.12 is a good case in point. A Q Q us an indication of whether the chosen distribution fits the tails of our
plot where the tails have slopes different than the central mass is empirical distribution. However, the results of MEF plots need to be
interpreted with some care, because data observations become more
therefore suggestive of the em pirical distribution having heavier,
scarce as X gets larger. For more on these and how they can be used,
or thinner, tails than the reference distribution. see Embrechts et. al. (1997, Chapters 3.4 and 6.2).
Non-Parametric
Approaches
Learning Objectives
A pply the bootstrap historical simulation approach to Com pare and contrast the age-w eighted, the volatility-
estim ate coherent risk m easures. w eighted, the correlation-weighted, and the filtered
historical simulation approaches.
Describe historical simulation using non-parametric
density estim ation. Identify advantages and disadvantages of non-parametric
estimation methods.
E x c e rp t is C hapter 4 o f M easuring M arket Risk, S e co n d Edition, by Kevin D ow d.
17
This chapter looks at some of the most popular approaches for HS VaR and ES. Then we will address weighted HS— how
to the estimation of risk m easures— the non-parametric we might w eight our data to capture the effects of observation
approaches, which seek to estim ate risk measures without age and changing m arket conditions. These m ethods introduce
making strong assum ptions about the relevant (e.g ., P/L) param etric formulas (such as G A R C H volatility forecasting
distribution. The essence of these approaches is that we try to equations) into the picture, and in so doing convert hitherto
let the P/L data speak for them selves as much as possible, and non-param etric methods into what are best described as semi-
use the recent em pirical (or in some cases simulated) distribu param etric m ethods. Such m ethods are very useful because
tion of P/L— not some assumed theoretical distribution— to they allow us to retain the broad HS fram ework while also
estim ate our risk m easures. All non-param etric approaches are taking account of ways in which we think that the risks we face
based on the underlying assumption that the near future will over the foreseeable horizon period might differ from those
be sufficiently like the recent past that we can use the data in our sam ple period. Finally we review the main advantages
from the recent past to forecast risks over the near future— and and disadvantages of non-param etric and sem i-param etric
this assumption may or may not be valid in any given context. approaches, and offer some conclusions.
In deciding w hether to use any non-param etric approach, we
must make a judgm ent about the extent to which data from
the recent past are likely to give us a good guide about the 2.1 COMPILING HISTORICAL
risks we face over the horizon period we are concerned with. SIMULATION DATA
To keep the discussion as clear as possible, we will focus on
the estimation of non-parametric VaR and ES. However, the The first task is to assem ble a suitable P/L series for our portfo
lio, and this requires a set of historical P/L or return observations
methods discussed here extend very naturally to the estimation
of coherent and other risk measures as w ell. These can be on the positions in our current portfolio. These P/Ls or returns
will be measured over a particular frequency (e.g ., a day), and
estim ated using an 'average quantile' approach along the lines
discussed in Chapter 1: we would select our weighting function we want a reasonably large set of historical P/L or return obser
4>{p), decide on the number of probability 'slices' n to take, vations over the recent past. Suppose we have a portfolio of
n assets, and for each asset / we have the observed return for
estim ate the associated quantiles, and take the weiqhted aver
age using an appropriate numerical algorithm (see C hapter 1).1 each of T su b p erio d s (e.g ., daily subperiods) in our historical
sam ple period. If Ri t is the (possibly mapped) return on asset
We can then obtain standard errors or confidence intervals for
our risk measures using suitably modified forms. / in subperiod t, and if w, is the amount currently invested in
asset /, then the historically sim ulated portfolio P/L over the
In this chapter we begin by discussing how to assem ble the P/L subperiod t is:
data to be used for estimating risk m easures. We then discuss
the most popular non-param etric approach-historical simulation P/L, = i ; w,R,. t (2.1)
(HS). Loosely speaking, HS is a histogram-based approach: it is 1=1
conceptually sim ple, easy to im plem ent, very widely used, and Equation (2.1) gives us a historically simulated P/L series for our
has a fairly good historical record. We focus on the estimation of current portfolio, and is the basis of HS VaR and ES. This series
VaR and ES, but as explained in the previous chapter, more gen will not generally be the same as the P/L actually earned on our
eral coherent risk measures can be estim ated using appropri portfolio— because our portfolio may have changed in com posi
ately weighted averages of any non-param etric VaR estim ates. tion over tim e or be subject to mapping approxim ations, and so
We then discuss refinem ents to basic HS using bootstrap and on. Instead, the historical simulation P/L is the P/L we w ould
kernel m ethods, and the estimation of VaR or ES curves and sur have earned on our current portfolio had we held it throughout
Q
faces. We will discuss how we can estim ate confidence intervals the historical sam ple period.
As an aside, the fact that multiple positions collapse into

1 Nonetheless, there is an important caveat. This method was one single HS P/L as given by Equation (2.1) implies that it is
explained in Chapter 1 in an implicit context where the risk measurer
could choose n, and this is sometimes not possible in a non-parametric
context. For example, a risk measurer might be working with an n deter
mined by the HS data set, and even where he/she has some freedom 2 To be more precise, the historical simulation P/L is the P/L we would
to select n, their range of choice might be limited by the data avail have earned over the sample period had we rearranged the portfolio at
able. Such constraints can limit the degree of accuracy of any resulting the end of each trading day to ensure that the amount left invested in
estimated risk measures. However, a good solution to such problems each asset was the same as at the end of the previous trading day: we
is to increase the sample size by bootstrapping from the sample data. take out our profits, or make up for our losses, to keep the w; constant
(The bootstrap is discussed further in Appendix 2 to this chapter). from one end-of-day to the next.
very easy for non-param etric methods to accom m odate
high dim ensions— unlike the case for some param etric
m ethods. With non-param etric m ethods, there are no
problem s dealing with variance-covariance m atrices,
curses of dim ensionality, and the like. This means that
non-param etric methods will often be the most natura
choice for high-dimension problem s.
2.2 ESTIMATION O F HISTORICAL

SIMULATION V a R AND ES
Basic Historical Simulation

Having obtained our historical simulation P/L data, we
can estim ate VaR by plotting the P/L (or L/P) on a simple
histogram and then reading off the VaR from the histo
gram . To illustrate, suppose we have 1000 observations in
Loss (+)/Profit (-)
our HS P/L series and we plot the L/P histogram shown in
Fiaure 2.1 Basic historical simulation VaR and ES.
Figure 2.1. If these were daily data, this sam ple size would
be equivalent to four years' daily data at 250 trading days Note: This figure and associated VaR and ES estimates are obtained using the
'hsesfigure' function.
to a year. If we take our confidence level to be 95% , our
VaR is given by the x-value that cuts off the upper 5% of
very high losses from the rest of the distribution. Given bootstrap is very intuitive and easy to apply. A bootstrapped
1000 observations, we can take this value (i.e., our VaR) to be estim ate will often be more accurate than a 'raw' sam ple esti
the 51 st highest loss value, or 1.704.3 The ES is then the aver mate, and bootstraps are also useful for gauging the precision
age of the 50 highest losses, or 2.196. of our estim ates. To apply the bootstrap, we create a large num
ber of new sam ples, each observation of which is obtained by
The imprecision of these estim ates should be obvious when we
drawing at random from our original sam ple and replacing the
consider that the sam ple data set was drawn from a standard
observation after it has been drawn. Each new 'resam pled' sam
normal distribution. In this case the 'true' underlying VaR and ES
ple gives us a new VaR estim ate, and we can take our 'best' esti
are 1.645 and 2.063, and Figure 2.1 should (ideally) be normal.
mate to be the mean of these resam ple-based estim ates. The
O f course, this imprecision underlines the need to work with
same approach can also be used to produce resam ple-based
large sam ple sizes where practically feasible.
ES estim ates— each one of which would be the average of the
losses in each resam ple exceeding the resam ple VaR— and our
Bootstrapped Historical Simulation 'best' ES estim ate would be the mean of these estim ates. In our
particular case, if we take 1000 resam ples, then our best VaR
O ne sim ple but powerful im provem ent over basic HS is to
and ES estim ates are (because of bootstrap sampling variation)
estim ate VaR and ES from bootstrapped data. A s explained
about 1.669 and 2.114— and the fact that these are much closer
in A ppendix 2 to this chapter, a bootstrap procedure involves
to the known true values than our earlier basic HS estim ates
resampling from our existing data set with replacem ent. The
suggests that bootstraps estim ates might be more accurate.
3 We can also estimate the HS VaR more directly (i.e., without bothering Historical Simulation Using
with the histogram) by using a spreadsheet function that gives us the
51st highest loss value (e.g., the 'Large' command in Excel), or we can Non-parametric Density Estimation
sort our losses data with highest losses ranked first, and then obtain the
VaR as the 51st observation in our sorted loss data. We could also take Another potential im provem ent over basic HS som etim es
VaR to be any point between the 50th and 51st largest losses (e.g., such suggested is to make use of non param etric density estim a
as their mid-point), but with a reasonable sample size (as here) there will tion. To appreciate what this involves, we must recognise that
seldom be much difference between these losses anyway. For conve
nience, we will adhere throughout to this convention of taking the VaR basic HS does not make the best use of the information we
to be the highest loss observation outside the tail. have. It also has the practical draw back that it only allows us to
Chapter 2 Non-Parametric Approaches ■ 19

estim ate VaRs at discrete confidence intervals determ ined by (a) Original histogram (b) Surrogate density function
the size of our data set. For exam ple, if we have 100 HS P/L
observations, basic HS allows us to estim ate VaR at the 95%
confidence level, but not the VaR at the 95.1% confidence
level. The VaR at the 95% confidence level is given by the
sixth largest loss, but the VaR at the 95.1% confidence level is
a problem because there is no corresponding loss observation
to go with it. We know that it should be greater than the sixth
largest loss (or the 95% VaR), and sm aller than the fifth largest
loss (or the 96% VaR), but with only 100 observations there
is no observation that corresponds to any VaR whose confi
dence level involves a fraction of 1%. With n observations,
basic HS only allows us to estim ate the VaRs associated with,
at best, n different confidence levels. Fiaure 2.2 Histograms and surrogate density functions.
Non-param etric density estimation offers a potential solution

to both these problem s. The idea is to treat our data as if they
our data set. Each possible confidence level would correspond
were drawings from some unspecified or unknown empirical
to its own tail similar to the shaded area shown in Figure 2.2(b),
distribution function. This approach also encourages us to con
and we can then use a suitable calculation method to estim ate
front potentially im portant decisions about the width of bins and
the VaR (e.g ., we can carry out the calculations on a spreadsheet
where bins should be centred, and these decisions can som e
or, more easily, by using a purpose-built function such as the
tim es make a difference to our results. Besides using a histo
'hsvar' function in the MMR Toolbox).4 O f course, drawing
gram , we can also represent our data using naive estim ators or,
straight lines through the mid-points of the tops of histogram
more generally, kernels, and the literature tells us that kernels
bars is not the best we can do: we could draw smooth curves
are (or ought to be) superior. So, having assem bled our 'raw'
that meet up nicely, and so on. This is exactly the point of non-
HS data, we need to make decisions on the widths of bins and
param etric density estim ation, the purpose of which is to give us
where they should be centred, and w hether to use a histogram,
some guidance on how 'best' to draw lines through the data
a naive estimator, or some form of kernel. If we make good deci
points we have. Such methods are also straightforward to apply
sions on these issues, we can hope to get better estim ates of
if we have suitable software.
VaR and ES (and more general coherent measures).
Some empirical evidence by Butler and Schachter (1998) using
Non-parametric density estimation also allows us to estimate
real trading portfolios suggests that kernel-type methods produce
VaRs and ESs for any confidence levels we like and so avoid con
VaR estimates that are a little different to those we would obtain
straints imposed by the size of our data set. In effect, it enables
under basic HS. However, their work also suggests that the differ
us to draw lines through points on or near the edges of the 'bars'
ent types of kernel methods produce quite similar VaR estimates,
of a histogram. We can then treat the areas under these lines as a
although to the extent that there are differences among them, they
surrogate pdf, and so proceed to estimate VaRs for arbitrary con
also found that the 'best' kernels were the adaptive Epanechinikov
fidence levels. The idea is illustrated in Figure 2.2. The left-hand
and adaptive Gaussian ones. To investigate these issues myself,
side of this figure shows three bars from a histogram (or naive
I applied four standard kernel estimators— based on normal, box,
estimator) close up. Assuming that the height of the histogram
triangular and Epanechinikov kernels— to the test data used in
(or naive estimator) measures relative frequency, then one option
earlier examples, and found that each of these gave the same VaR
is to treat the histogram itself as a pdf. Unfortunately, the result
estimate of 1.735. In this case, these different kernels produced
ing pdf would be a strange one—just look at the corners of each
the same VaR estimate, which is a little higher (and, curiously,
bar— and it makes more sense to approximate the pdf by drawing
lines through the upper parts of the histogram.
A sim ple way to do this is to draw in straight lines connecting

4 The actual programming is a little tedious, but the gist of it is that if
the mid-points at the top of each histogram bar, as illustrated in the confidence level is such that the VaR falls between two loss obser
the figure. O nce we draw these lines, we can forget about the vations, then we take the VaR to be a weighted average of these two
histogram bars and treat the area under the lines as if it were a observations. The weights are chosen so that a vertical line drawn
through the VaR demarcates the area under the 'curve' in the correct
pdf. Treating the area under the lines as a pdf then enables us to proportions, with a to one side and 1 — a to the other. The details can
estim ate VaRs at any confidence level, regardless of the size of be seen in the coding for the 'hsvar' and related functions.
a little less accurate) than the basic HS VaR
estimate of 1.704 obtained earlier. Other
results not reported here suggest that
the different kernels can give somewhat
different estimates with smaller samples,
but again suggest that the exact kernel
specification does not make a great deal
of difference.
So although kernel methods are better

in theory, they do not necessarily pro
duce much better estim ates in practice.
There are also practical reasons why we
might prefer sim pler non-parametric
density estimation methods over kernel
ones. Although the kernel m ethods are
theoretically better, crude methods like
drawing straight-line 'curves' through the
tops of histograms are more transparent
and easier to check. We should also not
forget that our results are subject to a
number of sources of error (e.g ., due to
errors in P/L data, mapping approxim a Confidence level
tions, and so on), so there is a natural Fiqure 2.3 Plots of HS VaR and ES against confidence level.
limit to how much real fineness we can
Note: Obtained using the 'hsvaresplot2D_cl' function and the same hypothetical P/L data used
actually achieve. in Figure 2.1.
Estimating Curves and soon find that we don't have enough data. To illustrate, if we
Surfaces for VaR and ES have 1000 observations of daily P/L, corresponding to four
years' worth of data at 250 trading days a year, then we have
It is straightforward to produce plots of VaR or ES against the con
1000 P/L observations if we use a daily holding period. If we
fidence level. For exam ple, our earlier hypothetical P/L data yields
have a w eekly holding period, with five days to a w eek, each
the curves of VaR and ES against the confidence level shown in
w eekly P/L will be the sum of five daily P/Ls, and we end up with
Figure 2.3. Note that the VaR curve is fairly unsteady, as it directly
only 200 observations of w eekly P/L; if we have a monthly hold
reflects the randomness of individual loss observations, but the ES
ing period, we have only 50 observations of monthly P/L; and so
curve is smoother, because each ES is an average of tail losses.
on. Given our initial data, the number of effective observations
It is more difficult constructing curves that show how non-para rapidly falls as the holding period rises, and the size of the data
metric VaR or ES changes with the holding period. The methods set im poses a major constraint on how large the holding period
discussed so far enable us to estimate the VaR or ES at a single can practically be. In any case, even if we had a very long run of
holding period equal to the frequency period over which our data, the older observations might have very little relevance for
data are observed (e.g., they give us VaR or ES for a daily holding current m arket conditions.
period if P/L is measured daily). In theory, we can then estimate
VaRs or ESs for any other holding periods we wish by construct
ing a HS P/L series whose frequency matches our desired hold 2.3 ESTIMATING CO N FID EN CE
ing period: if we wanted to estimate VaR over a weekly holding INTERVALS FOR HISTORICAL
period, say, we could construct a weekly P/L series and estimate
SIMULATION V a R AND ES
the VaR from that. There is, in short, no theoretical problem as
such with estimating HS VaR or ES over any holding period we like.
The m ethods considered so far are good for giving point esti
However, there is a major practical problem : as the holding mates of VaR or ES, but they don't give us any indication of the
period rises, the number of observations rapidly falls, and we precision of these estim ates or any indication of VaR or ES

confidence intervals. However, there are 200
methods to get around this limitation
and produce confidence intervals for our 180 -
risk estim ates.5
160 -
An Order Statistics 140 -

Approach to the Upper bound of
Lower bound of
Estimation of 120 -
90% confidence 90% confidence

Confidence Intervals u
c
d)
interval = 1.552 interval = 1.797
for HS VaR and ES CT 100

3
d)
-
One of the most promising methods is 80 -

to apply the theory of order statistics,
explained in Appendix 1 to this chapter. 60 -
This approach gives us, not just a VaR
(or ES) estimate, but a complete VaR 40 -
(or ES) distribution function from which
we can read off the VaR (or ES) confidence 20 -
interval. (The central tendency param

eters (mean, mode, median) also give us 0
1.5 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9
alternative point estimates of our VaR or
VaR
ES, if we want them.) This approach is
(relatively) easy to programme and very
Fiqure 2.4 Bootstrapped VaR.
general in its application. Note: Results obtained using the 'bootstrapvarfigure' function with 1000 resamples, and the same
hypothetical data as in earlier figures.
Applied to our earlier P/L data, the OS
approach gives us estim ates (obtained
using the 'hsvarpdfperc' function) of the 5% and 95% points of A Bootstrap Approach to the Estimation
the 95% VaR distribution function— that is, the bounds of the of Confidence Intervals for HS VaR and ES
90% confidence interval for our VaR— of 1.552 and 1.797. This
W e can also estim ate confidence intervals using a b o o t
tells us we can be 90% confident that the 'true' VaR lies in the
strap approach: we produce a bootstrapped histogram of
range [1.552, 1.797].
resam ple-based VaR (or ES) estim ates, and then read the
The corresponding points of the ES distribution function can be confidence interval from the quantiles of this histogram .
obtained (using the 'hsesdfperc' function) by mapping from the For exam p le, if we take 1000 bootstrapped sam ples from our
VaR to the ES: we take a point on the VaR distribution function, P/L data set, estim ate the 95% VaR of each, and then plot
and estim ate the corresponding percentile point on the ES them , we g et the histogram shown in Figure 2 .4 . Using the
distribution function. Doing this gives us an estim ated 90% basic percentile interval approach outlined in A p p e n d ix 2
confidence interval of [2 .0 2 1 ,2 .2 2 4 ].6 to this chapter, the 90% confidence interval for our VaR is
[1 .5 5 4 , 1.797]. The sim ulated histogram is surprisingly
disjo inted, although the bootstrap seem s to give a relatively
robust estim ate of the confidence interval if we keep
5 In addition to the methods considered in this section, we can also
estimate confidence intervals for VaR using estimates of the quantile repeating the exercise.
standard errors. However, as made clear there, such confidence intervals
are subject to a number of problems, and the methods suggested here W e can also use the bootstrap to estim ate ESs in much the
are usually preferable. sam e w ay: fo r each new resam pled data se t, we estim ate
6 Naturally, the order statistics approach can be combined with more the VaR, and then estim ate the ES as the averag e of losses
sophisticated non-parametric density estimation approaches. Instead of in e xcess of VaR. Doing this a large num ber of tim es gives
applying the OS theory to the histogram or naive estimator, we could us a large num ber of ES estim ates, and we can plot them
apply it to a more sophisticated kernel estimator, and thereby extract
more information from our data. This approach has some merit and is in the sam e w ay as the VaR estim ates. Th e histogram of
developed in detail by Butler and Schachter (1998). b o o tstrap p ed ES values is shown in Figure 2 .5 , and is b etter
100 2.4 W EIGHTED
90 -
HISTORICAL
SIMULATION
80 - Upper bound of
Lower bound of O ne of the most im portant features of
90% confidence 90% confidence
70 - interval = 1.986 interval = 2.271 traditional HS is the way it weights past
observations. Recall that Rl t is the return
60 - on asset / in period t, and we are im ple
>
u> menting HS using the past n observa
c
a> tions. An observation will therefore
3 50 -
tr
a> belong to our data set if j takes any of
40 - the values 1 , . . . , t — n, where j is the
age of the observation (e.g ., so j = 1
30 - indicates that the observation is 1 day
old, and so on). If we construct a new HS
20 -
P/L series, P/Lt, each day, our observa
tion Ri t_ j will first affect P/Lt, then affect
10 -
P /L t+i, and so on, and finally affect
P / L 1+n: our return observation will affect
0 each of the next n observations in our
1.8 1.9 2.1 2.2 2.3 2.4 2.5
P/L series. Also, other things (e.g ., posi
ETL
tion weights) being equal, Ri t- j will affect
Figure 2.5 Bootstrapped ES. each P/L in exactly the same way. But
Note: Results obtained using the 'bootstrapesfigure' function with 1000 resamples, and the same after n periods have passed, Pl t_y will fall
hypothetical data as in earlier figures. out of the data set used to calculate the
current HS P/L series, and will thereafter
Table 2.1 90% Confidence Intervals for have no effect on P/L. In short, our HS
Non-parametric VaR and ES P/L series is constructed in a way that gives any observation the
sam e w eight on P/L provided it is less than n periods old, and
Approach Lower bound Upper bound
no w eight (i.e., a zero weight) if it is older than that.
95% VaR
This weighting structure has a number of problem s. One
O rder statistics 1.552 1.797 problem is that it is hard to justify giving each observation in our
Bootstrap 1.554 1.797 sam ple period the same weight, regardless of age, market
volatility, or anything else. A good exam ple of the difficulties
95% ES
this can create is given by Shimko et. al. (1998). It is well known
O rder statistics 2.021 2.224 that natural gas prices are usually more volatile in the winter
Bootstrap 1.986 2.271 than in the summer, so a raw HS approach that incorporates
both sum mer and winter observations will tend to average the
Note: Bootstrap estimates based on 1000 resamples.
sum mer and winter observations together. As a result, treating
all observations as having equal w eight will tend to underesti
behaved than the VaR histogram in the last figure because
mate true risks in the winter, and overestim ate them in the
the ES is an averag e of tail VaRs. Th e 90% confid ence interva
sum mer.7
fo r our ES is [1 .9 8 6 , 2.271 ].
The equal-weight approach can also make risk estim ates
It is also interesting to com pare the VaR and ES confidence
unresponsive to major events. For instance, a stock market crash
intervals obtained by the two m ethods. These are sum m arised
in Table 2 .1 , and we can see that the O S and bootstrap
approaches give very sim ilar results. This suggests that 7 If we have data that show seasonal volatility changes, a solution—
suggested by Shimko et. al. (1998)— is to weight the data to reflect
either approach is likely to be a reasonable one to use in seasonal volatility (e.g., so winter observations get more weight, if we
practice. are estimating a VaR in winter).

might have no effect on VaRs excep t at a very high confidence historical simulation' and can be regarded as sem i-param etric
level, so we could have a situation where everyone might agree methods because they com bine features of both param etric and
that risk had suddenly increased, and yet that increase in risk non-param etric methods.
would be missed by most HS VaR estim ates. The increase in risk
would only show up later in VaR estim ates if the stock market
Age-weighted Historical Simulation
continued to fall in subsequent days— a case of the stable door
closing only well after the horse had long since bolted. That O ne such approach is to w eight the relative im portance, of our
said, the increase in risk w ould show up in ES estim ates just observations by their age, as suggested by Boudoukh, Richard
after the first shock occurred— which is, incidentally, a good son and W hitelaw (BRW : 1998). Instead of treating each obser
exam ple of how ES can be a more informative risk measure vation for asset / as having the same implied probability as any
than the VaR.8* other (i.e., 1/n), we could w eight their probabilities to discount
the older observations in favour of newer ones. Thus, if w(1) is
The equal-weight structure also presumes that each observa
the probability w eight given to an observation 1 day old, then
tion in the sam ple period is equally likely and independent of
w(2), the probability given to an observation 2 days old, could
the others over tim e. However, this 'iid' assumption is unrealistic
be Aw(1); w(3) could be A2w(1); and so on. The A term is between
because it is well known that volatilities vary over tim e, and that
0 and 1, and reflects the exponential rate of decay in the weight
periods of high and low volatility tend to be clustered together.
or value given to an observation as it ages: a A close to 1 indi
The natural gas exam ple just considered is a good case in point.
cates a slow rate of decay, and a A far away from 1 indicates a
It is also hard to justify why an observation should have a w eight high rate of decay. w(1) is set so that the sum of the weights
that suddenly goes to zero when it reaches age n. W hy is it that is 1, and this is achieved if we set w(1) = (1 — A)/(1 — An). The
an observation of age n — 1 is regarded as having a lot of value w eight given to an observation / days old is therefore:
(and, indeed, the same value as any more recent observation),
but an observation of age n is regarded as having no value at Am (1 - A)
(2 .2)
all? Even old observations usually have some information con 1 - A"
tent, and giving them zero value tends to violate the old statisti and this corresponds to the w eight of 1/n given to any in-sample
cal adage that we should never throw information away. observation under basic HS.
This weighting structure also creates the potential for ghost O u r core inform ation— the inform ation inputted to the HS
effects— we can have a VaR that is unduly high (or low) because estim ation p rocess— is the paired set of P/L values and asso
of a small cluster of high loss observations, or even just a single ciated p ro b ab ility w eig h ts. To im plem ent ag e-w eig hting ,
high loss, and the measured VaR will continue to be high (or low) w e m erely replace the old equal w eig hts 1/n with the age-
until n days or so have passed and the observation has fallen out d e p en d en t w eig hts w(i) given by (2.4). For exam p le, if we
of the sam ple period. A t that point, the VaR will fall again, but are using a sp read sh e et, we can order our P/L ob servations
the fall in VaR is only a ghost effect created by the weighting in one colum n, put th e ir w eig hts w(i) in the next colum n, and
structure and the length of sam ple period used. go down th at colum n until w e reach our desired p ercen tile.
We now address various ways in which we might 'adjust' our O u r VaR is then the negative of the co rresp o nd ing value in
data to overcom e some of these problem s and take account of the first colum n. And if our desired p ercentile falls betw een
ways in which current m arket conditions might differ from those tw o p ercen tiles, w e can take our VaR to be the (negative of
in our sam ple. These fall under the broad heading of 'weighted the) interp o lated value of the co rresp o nd ing first-colum n
o b servatio n s.
This age-weighted approach has four major attractions. First,

8 However, both VaR and ES suffer from a related problem. As Pritsker it provides a nice generalisation of traditional HS, because we
(2001, p. 5) points out, HS fails to take account of useful information can regard traditional HS as a special case with zero decay, or
from the upper tail of the P/L distribution. If the stock experiences a
series of large falls, then a position that was long the market would A —> 1. If HS is like driving along a road looking only at the rear
experience large losses that should show up, albeit later, in HS risk view mirror, then traditional equal-weighted HS is only safe if
estimates. However, a position that was short the market would experi the road is straight, and the age-weighted approach is safe if
ence a series of large profits, and risk estimates at the usual confidence
levels would be completely unresponsive. Once again, we could have a
the road bends gently.
situation where risk had clearly increased— because the fall in the market Second, a suitable choice of A can make the VaR (or ES) esti
signifies increased volatility, and therefore a significant chance of losses
due to large rises in the stock market— and yet our risk estimates had mates more responsive to large loss observations: a large loss
failed to pick up this increase in risk. event will receive a higher w eight than under traditional HS, and
the resulting next-day VaR would be higher than it would oth exam ple, if the current volatility in a m arket is 1.5% a day, and it
erwise have been. This not only means that age-weighted VaR was only 1% a day a month ago, then data a month old under
estim ates are more responsive to large losses, but also makes state the changes we can exp ect to see tom orrow, and this
them better at handling clusters of large losses. suggests that historical returns would underestim ate tomorrow's
risks; on the other hand, if last month's volatility was 2% a day,
Third, age-weighting helps to reduce distortions caused by
month-old data will overstate the changes we can exp ect tom or
events that are unlikely to recur, and helps to reduce ghost
row, and historical returns would overestim ate tom orrow's risks.
effects. A s an observation ages, its probability w eight gradually
We therefore adjust the historical returns to reflect how volatility
falls and its influence diminishes gradually over tim e. Further
tom orrow is believed to have changed from its past values.
more, when it finally falls out of the sam ple period, its weight
will fall from Anw(1) to zero, instead of from 11n to zero. Since Suppose we are interested in forecasting VaR for day T. Let
Anw(1) is less than 1/n for any reasonable values of A and n, then rt j be the historical return in asset / on day t in our historical
the shock— the ghost effect— will be less than it would be under sam ple, at i be the historical G A R C H (or EW M A) forecast of the
equal-weighted HS. volatility of the return on asset / for day t, made at the end of
day t — 1, and crT i be our most recent forecast of the volatility
Finally, we can also modify age-weighting in a way that makes
of asset /. We then replace the returns in our data set, rt jl with
our risk estim ates more efficient and effectively elim inates any
volatility-adjusted returns, given by:
remaining ghost effects. Since age-weighting allows the impact
of past extrem e events to decline as past events recede in
tim e, it gives us the option of letting our sam ple size grow over
tim e. (Why can't we do this under equal-weighted HS? Because
Actual returns in any period ta re therefore increased (or
we would be stuck with ancient observations whose information
decreased), depending on w hether the current forecast of vola
content was assumed never to date.) Age-weighting allows us
tility is greater (or less than) the estim ated volatility for period t.
to let our sam ple period grow with each new observation, so we
We now calculate the HS P/L using Equation (2.3) instead of the
never throw potentially valuable information away. This would
original data set rt/, and then proceed to estim ate HS VaRs or
improve efficiency and elim inate ghost effects, because there
ESs in the traditional way (i.e., with equal weights, e tc .).10
would no longer be any 'jumps' in our sam ple resulting from old
observations being thrown away. The HW approach has a number of advantages relative to the
traditional equal-weighted and/or the BRW age-weighted
However, age-weighting also reduces the effective sam ple size,
approaches:
other things being equal, and a sequence of major profits or
losses can produce major distortions in its implied risk profile. In • It takes account of volatility changes in a natural and direct
addition, Pritsker shows that even with age-weighting, VaR way, whereas equal-weighted HS ignores volatility changes
estim ates can still be insufficiently responsive to changes in and the age-weighted approach treats volatility changes in a
underlying risk.9 Furtherm ore, there is the disturbing point that rather arbitrary and restrictive way.
the BRW approach is ad hoc, and that excep t for the special • It produces risk estim ates that are appropriately sensitive to
case where A = 1 we cannot point to any asset-return process current volatility estim ates, and so enables us to incorpo
for which the BRW approach is theoretically correct. rate information from G A R C H forecasts into HS VaR and ES
estim ation.
Volatility-weighted Historical Simulation • It allows us to obtain VaR and ES estim ates that can exceed
the maximum loss in our historical data set: in periods of high
We can also w eight our data by volatility. The basic idea— sug volatility, historical returns are scaled upwards, and the HS
gested by Hull and W hite (HW; 1998b)— is to update return P/L series used in the HW procedure will have values that
information to take account of recent changes in volatility. For exceed actual historical losses. This is a major advantage over
traditional HS, which prevents the VaR or ES from being any
bigger than the losses in our historical data set.
9 If VaR is estimated at the confidence level a, the probability of an HS
estimate of VaR rising on any given day is equal to the probability of a • Em pirical evidence presented by HW indicates that their
loss in excess of VaR, which is of course 1 — a. However, if we assume a
approach produces superior VaR estim ates to the BRW one.
standard GARCH(1,1) process and volatility is at its long-run mean value,
then Pritsker's proposition 2 shows that the probability that HSVaR
should increase is about 32% (Pritsker (2001, pp. 7-9)). In other words,
most of the time HS VaR estimates should increase (i.e., when risk rises), 10 Naturally, volatility weighting presupposes that one has estimates of
they fail to. the current and past volatilities to work with.

The HW approach is also capable of various extensions. For major generalisation of the HW approach, because it gives us a
instance, we can com bine it with the age-weighted approach if weighting system that takes account of correlations as well as
we wished to increase the sensitivity of risk estim ates to large volatilities.
losses, and to reduce the potential for distortions and ghost
effects. W e can also com bine the HW approach with O S or
bootstrap m ethods to estim ate confidence intervals for our VaR Example 2.1 Correlation-weighted HS
or ES— that is, we would work with order statistics or resam ple Suppose we have only two positions in our portfolio, so m = 2.
with replacem ent from the HW -adjusted P/L, rather than from The historical correlation between our two positions is 0.3, and
the traditional HS P/L. we wish to adjust our historical returns R to reflect a current
correlation of 0.9.
Correlation-weighted Historical Simulation If djj is the /, /th elem ent of the 2 x 2 m atrix A, then applying the
Choleski decom position tells us that
We can also adjust our historical returns to reflect changes
between historical and current correlations. Correlation-w eight a-n = 1, a -12 = 0, a2i = p, a22 = V i — p2
ing is a little more involved than volatility-weighting. To see the where p = 0.3. The m atrix A is similar except for having p = 0.9.
principles involved, suppose for the sake of argum ent that we Standard m atrix theory also tells us that
have already made any volatility-based adjustm ents to our HS
returns along Hull-White lines, but also wish to adjust those -1 _ 1 a22 ,- a 12
a 11a 22 - a 12a 21 L a 21/a 11
returns to reflect changes in correlations.11
Substituting these into Equation (2.5), we find that
To make the discussion concrete, we have m positions
and our (perhaps volatility adjusted) 1 X m vector of his R = A A -1 R
D —
1
torical returns R for some period t reflects an m X m 1,0 V 1 - 0 .3 2 V i - 0 .3 2,0
variance-covariance m atrix X . X in turn can be decom posed 0.9, Vi - 0 .9 2 - 0 .3 ,1
into the product <j C<j t , where a is an m X m diagonal m atrix of R
1
volatilities (i.e., so the /th elem ent of a is the /th volatility <x, and
= Vi - 0 .3 2 V i - 0.32
the off-diagonal elem ents are zero), a T is its transpose, and C is
0 .9 V 1 - 0.32 - 0.3 Vi - 0.92, Vi - 0.92
the m X m m atrix of historical correlations. R therefore reflects
an historical correlation m atrix C, and we wish to adjust R so
that they becom e R reflecting a current correlation m atrix C. 1, 0
Now suppose for convenience that both correlation matrices are 0.7629, 0.4569
positive definite. This means that each correlation m atrix has an
m X m 'm atrix square root', A and A respectively, given by a
Choleski decom position (which also implies that they are easy to Filtered Historical Simulation
obtain). We can now write R and R as m atrix products of the rel
Another promising approach is filtered historical simulation
evant Choleski matrices and an uncorrelated noise process e:
(FH S).12 This is a form of sem i-param etric bootstrap which aims
R = Ae (2.4a) to com bine the benefits of HS with the power and flexibility of
R = Ae (2.4b) conditional volatility models such as G A R C H . It does so by boot
strapping returns within a conditional volatility (e.g ., G A RC H )
We then invert Equation (2.4a) to obtain e = A 1R, and substi
fram ew ork, where the bootstrap preserves the non-parametric
tute this into (Equation 2.4b) to obtain the correlation-adjusted
nature of HS, and the volatility model gives us a sophisticated
series R that we are seeking:
treatm ent of volatility.
R = AA~1R (2.5)
Suppose we wish to use FHS to estim ate the VaR of a single
The returns adjusted in this way will then have the currently asset portfolio over a 1-day holding period. The first step in
prevailing correlation m atrix C and, more generally, the FHS is to fit, say, a G A R C H model to our portfolio-return data.
currently prevailing covariance m atrix X .T h is approach is a W e want a model that is rich enough to accom m odate the key
12 This approach is suggested in Barone-Adesi et. al. (1998),

11 The correlation adjustment discussed here is based on a suggestion Barone-Adesi et. al. (1999), Barone-Adesi aud Giannopoulos (2000)
by Duffie and Pan (1997). and in other papers by some of the same authors.
features of our data, and Barone-Adesi and colleagues recom keep any correlation structure present in the raw returns. The
mend an asym m etric G A R C H , or A G A R C H , model. This not only bootstrap thus maintains existing correlations, without our hav
accom m odates conditionally changing volatility, volatility clus ing to specify an explicit m ultivariate pdf for asset returns.
tering, and so on, but also allows positive and negative returns
The other obvious extension is to a longer holding period. If
to have differential im pacts on volatility, a phenomenon known
we have a longer holding period, we would first take a draw
as the leverage effect. The A G A R C H postulates that portfolio
ing and use Equation (2.8) to get a return for tom orrow; we
returns obey the following process:
would then use this drawing to update our volatility forecast
rt = /x + st (2.6a) for the day after tom orrow, and take a fresh drawing to deter
mine the return for that day; and we would carry on in the same
crt2 = (o + a (s t_ i + y )2 + /3o_2_ 1 (2.6b)
manner— taking a drawing, updating our volatility forecasts,
The daily return in Equation (2.6a) is the sum of a mean daily taking another drawing for the next period, and so on— until
return (which can often be neglected in volatility estimation) and we had reached the end of our holding period. A t that point we
a random error s t. The volatility in Equation (2.6b) is the sum of would have enough information to produce a single simulated
a constant and term s reflecting last period's 'surprise' and last P/L observation; and we would repeat the process as many
period's volatility, plus an additional term y that allows for the tim es as we wished in order to produce the histogram of sim u
surprise to have an asym m etric effect on volatility, depending lated P/L observations from which we can estim ate our VaR.
on w hether the surprise term is positive or negative.
FHS has a number of attractions: (i) It enables us to com bine the
The second step is to use the model to forecast volatility for non-param etric attractions of HS with a sophisticated (e.g .,
each of the days in a sam ple period. These volatility forecasts G A R C H ) treatm ent of volatility, and so take account of changing
are then divided into the realised returns to produce a set of m arket volatility conditions, (ii) It is fast, even for large portfo
standardised returns. These standardised returns should be lios. (iii) As with the earlier HW approach, FHS allows us to get
independently and identically distributed (iid), and therefore be VaR and ES estim ates that can exceed the maximum historical
suitable for HS. loss in our data set. (iv) It maintains the correlation structure in
Assuming a 1-day VaR holding period, the third stage involves our return data without relying on knowledge of the variance-
bootstrapping from our data set of standardised returns: we take covariance m atrix or the conditional distribution of asset returns,
a large number of drawings from this data set, which we now (v) It can be modified to take account of autocorrelation or past
treat as a sample, replacing each one after it has been drawn, cross-correlations in asset returns, (vi) It can be modified to pro
and multiply each random drawing by the A G A RC H forecast of duce estim ates of VaR or ES confidence intervals by combining
tomorrow's volatility. If we take M drawings, we therefore get M it with an O S or bootstrap approach to confidence interval esti
simulated returns, each of which reflects current market conditions m ation.14 (vii) There is evidence that FHS works w e ll.15
because it is scaled by today's forecast of tomorrow's volatility.
Finally, each of these sim ulated returns gives us a possible

end-of-tomorrow portfolio value, and a corresponding possible 14 The OS approach would require a set of paired P/L and associated
loss, and we take the VaR to be the loss corresponding to our probability observations, so we could apply this to FHS by using a P/L
series that had been through the FHS filter. The bootstrap is even easier,
chosen confidence level.13
since FHS already makes use of a bootstrap. If we want 8 bootstrapped
We can easily modify this procedure to encom pass the obvious estimates of VaR, we could produce, say, 100*8 or 1000*8 bootstrapped
P/L values; each set of 100 (or 1000) P/L series would give us one HS
com plications of a multi asset portfolio or a longer holding VaR estimate, and the histogram of M such estimates would enable us to
period. If we have a multi-asset portfolio, we would fit a infer the bounds of the VaR confidence interval.
multivariate G A R C H (or A G A R C H ) to the set or vector of asset 15 Barone-Adesi and Giannopoulos (2000), p. 17. However, FHS does
returns, and we would standardise this vector of asset returns. have problems. In his thorough simulation study of FHS, Pritsker (2001,
The bootstrap would then select, not just a standardised pp. 22-24) comes to the tentative conclusions that FHS VaR might not
pay enough attention to extreme observations or time-varying correla
portfolio return for some chosen past (daily) period, but the tions, and Barone-Adesi and Giannopoulos (2000, p. 18) largely accept
standardised vector of asset returns for the chosen past period. these points. A partial response to the first point would be to use ES
This is im portant because it means that our simulations would instead of VaR as our preferred risk measure, and the natural response
to the second concern is to develop FHS with a more sophisticated past
cross-correlation structure. Pritsker (2001, p. 22) also presents simula
tion results that suggest that FHS-VaR tends to underestimate 'true' VaR
13 The FHS approach can also be extended easily to allow for the esti over a 10-day holding period by about 10%, but this finding conflicts
mation of ES as well as VaR. For more on how this might be done, see with results reported by Barone-Adesi et. al. (2000) based on real data.
Giannopoulos and Tunaru (2004). The evidence on FHS is thus mixed.

2.5 A D V A N TA G ES A N D on the historical data s e t .16 Th ere are various other
related problem s:
D ISAD VA N TAG ES O F
• If our data period was unusually quiet, non-parametric
N O N -PARAM ETRIC M ETH O D S
m ethods will often produce VaR or ES estim ates that are
too low for the risks we are actually facing; and if our data
Advantages
period was unusually volatile, they will often produce VaR or
In draw ing our discussion to a clo se, it is perhaps a ES estim ates that are too high.
good idea to sum m arise the main ad vantag es and d isad van • Non-param etric approaches can have difficulty handling
tag e s of non-param etric ap p ro ach es. Th e ad vantag es shifts that take place during our sam ple period. For exam ple,
include: if there is a perm anent change in exchange rate risk, it will
• Non-param etric approaches are intuitive and conceptually usually take time for the HS VaR or ES estim ates to reflect
sim ple. the new exchange rate risk. Similarly, such approaches are
som etim es slow to reflect major events, such as the increases
• Since they do not depend on param etric assumptions about
in risk associated with sudden m arket turbulence.
P/L, they can accom m odate fat tails, skewness, and any other
non-normal features that can cause problem s for param etric • If our data set incorporates extrem e losses that are unlikely
approaches. to recur, these losses can dom inate non-param etric risk
estim ates even though we don't exp ect them to recur.
• They can in theory accom m odate any type of position,
including derivatives positions. • Most (if not all) non-param etric m ethods are subject (to a
greater or lesser extent) to the phenomenon of ghost or
• There is a w idespread perception among risk practitioners
shadow effects.
that HS works quite well em pirically, although formal em piri
cal evidence on this issue is inevitably mixed. • In general, non-param etric estim ates of VaR or ES make no
allowance for plausible events that might occur, but did not
• They are (in varying degrees, fairly) easy to im plem ent on a
actually occur, in our sam ple period.
spreadsheet.
• Non-param etric estim ates of VaR and ES are to a greater or
• N on-param etric m ethods are free of the operational prob
lesser extent constrained by the largest loss in our historical
lems to which param etric m ethods are subject when applied
data set. In the sim pler versions of HS, we cannot extrapolate
to high-dim ensional problem s: no need for covariance
from the largest historical loss to anything larger that might
m atrices, no curses of dim ensionality, etc.
conceivably occur in the future. More sophisticated versions
• They use data that are (often) readily available, either from
of HS can relax this constraint, but even so, the fact remains
public sources (e.g ., Bloom berg) or from in-house data sets
that non-param etric estim ates of VaR or ES are still con
(e.g ., collected as a by-product of marking positions to
strained by the largest loss in a way that param etric estim ates
market).
are not. This means that such methods are not well suited to
• They provide results that are easy to report and com m uni handling extrem es, particularly with small- or medium-sized
cate to senior managers and interested outsiders (e.g ., bank sam ples.
supervisors or rating agencies).
However, we can often am eliorate these problem s by suit
• It is easy to produce confidence intervals for non-parametric
able refinem ents. For exam ple, we can am eliorate volatility,
VaR and ES.
m arket turbulence, correlation and other problem s by semi-
• N on-param etric ap p ro ach es are cap ab le of co n sid param etric adjustm ents, and we can am eliorate ghost effects
erab le refinem ent and potential im provem ent if we by age-weighting our data and allowing our sam ple size to rise
com bine them with param etric 'add-ons' to make over tim e.
them sem i-p aram etric: such refinem ents include a g e
There can also be problem s associated with the length of the
w eighting (as in BRW ), volatility-w eig hting (as in HW
sam ple w indow period. We need a reasonably long window
and FH S), and co rrelatio n-w eig hting .
16 There can also be problems getting the data set. We need time series
Disadvantages data on all current positions, and such data are not always available
(e.g., if the positions are in emerging markets). We also have to ensure
Perhaps th eir b ig g est potential w eakness is that their that data are reliable, compatible, and delivered to the risk estimation
results are very (and in m ost cases, com p letely) d ep en d en t system on a timely basis.
to have a sam ple size large enough to get risk estim ates of Using Order Statistics to Estimate
accep tab le precision, and as a broad rule of thum b, most
Confidence Intervals for VaR
exp erts believe that we usually need at least a couple of year's
worth of daily observations (i.e ., 500 observations, at 250 If we have a sam ple of n P/L observations, we can regard
trading days to the year), and often m ore. On the other hand, each observation as giving an estim ate of VaR at an implied
a very long w indow can also create its own problem s. The confidence level. For exam ple, if n = 1000, we might take the
longer the w indow : 95% VaR as the negative of the 51st sm allest P/L observation,
we might take the 99% VaR as the negative of the 1 1th sm all
• the greater the problem s with aged data;
est, and so on. We therefore take the a VaR to be equal to
• the longer the period over which results will be distorted by the negative of the rth lowest observation, where r is equal to
unlikely-to-recur past events, and the longer we will have to
100(1 — a) + 1. More generally, with n observations, we take
wait for ghost effects to disappear;
the VaR as equal to the negative of the rth lowest observation,
• the more the news in current m arket observations is likely where r = n( 1 — a) + 1.
to be drowned out by older observations— and the less
The rth order statistic is the rth lowest (or, alternatively, highest)
responsive will be our risk estim ates to current market
in a sam ple of n observations, and the theory of order statis
conditions; and
tics is well established in the statistical literature. Suppose our
• the greater the potential for data-collection problem s. This observations x 1f x 2/. . . , x n com e from some known distribution
is a particular concern with new or em erging m arket instru (or cum ulative density) function F(x), with rth order statistic X(r).
ments, where long runs of historical data don't exist and are Now suppose that x (1) < x (2) ^ ^ x (n). The probability that
not necessarily easy to proxy. j of our n observations do not exceed a fixed value x must obey
the following binomial distribution:
CONCLUSIONS P r { j observation s < x } = ^n^ { F ( x ) } J{1 — F ( x ) } n_j (2.7)
Non-param etric methods are widely used and in many respects It follows that the probability that at least r observations in the
highly attractive approaches to the estimation of financial risk sam ple do not exceed x is also a binomial:
m easures. They have a reasonable track record and are often
superior to param etric approaches based on sim plistic assum p G,(x) = J ; ( " ) { F ( x ) M 1 - F (x )}" - ' (2.8)
tions such as normality. They are also capable of considerable
refinem ent to deal with some of the weaknesses of more basic G r(x) is therefore the distribution function of our order statistic
non-param etric approaches. A s a general rule, they work fairly and, hence, of our quantile or V aR .17
well if m arket conditions remain reasonably stable, and are
This VaR distribution function provides us with estim ates of our
capable of considerable refinem ent. However, they have their
VaR and of its associated confidence intervals. The median (i.e.,
limitations and it is often a good idea to supplem ent them with
50 percentile) of the estim ated VaR distribution function gives us
other approaches. W herever possible, we should also com ple
a natural 'best' estim ate of our VaR, and estim ates of the lower
ment non-parametric m ethods with stress testing to gauge our
and upper percentiles of the VaR distribution function give us
vulnerability to 'what if' events. We should never rely on non-
estim ates of the bounds of our VaR confidence interval. This is
param etric m ethods alone.
useful, because the calculations are accurate and easy to carry
out on a spreadsheet. Equation (2.8) is also very general and
gives us confidence intervals for any distribution function F(x),
APPENDIX 1
param etric (normal, t, etc.) or em pirical.
Estimating Risk Measures with Order To use this approach, all we need to do is specify F(x) (as nor
mal, t, etc.), set our param eter values, and use Equation (2.8) to
Statistics
estim ate our VaR distribution function.
The theory of order statistics is very useful for risk m easure
To illustrate, suppose we w ant to apply the order-statistics (OS)
ment because it gives us a practical and accurate m eans of
approach to estim ate the distribution function of a standard
estim ating the distribution function for a risk m easure— and
this is useful because it enables us to estim ate confidence
intervals for them . 17 See, e.g., Kendall and Stuart (1973), p. 348, or Reiss (1989), p. 20.

normal VaR. We then assum e that F(x) is standard normal equivalent— the t-distribution function, the Gum bel distribu
and use Equation (2.8) to estim ate three key param eters of tion function, and so on. We can also use the same approach to
the VaR distribution: the median or 50 percentile of the esti estim ate the confidence intervals for an em pirical distribution
m ated VaR distribution, which can be interpreted as an O S function (i.e., for historical simulation VaR), where F(x) is some
estim ate of normal VaR; and the 5 and 95 percentiles of the em pirical distribution function.
estim ated VaR distribution, which can be interpreted as the
O S estim ates of the bounds of the 90% confidence interval
for standard normal VaR. Conclusions
Some illustrative estimates for the 95% VaR are given in Table 2.2. The O S approach provides an ideal method for estimating the
To facilitate comparison, the table also shows the estimates of confidence intervals for our VaRs and ESs. In particular, the OS
standard normal VaR based on the conventional normal VaR approach is:
formula as explained in Chapter 1. The main results are:
• Com pletely general, in that it can be applied to any param et
• The confidence interval— the gap between the 5 and 95 ric or non-param etric VaR or ES.
percentiles— is quite wide for low values of n, but narrows as • Reasonable even for relatively small sam ples, because it is
n gets larger. not based on asym ptotic theory— although it is also the case
• A s n rises, the median of the estim ated VaR distribution that estim ates based on small sam ples will also be less accu
converges to the conventional estim ate. rate, precisely because the sam ples are small.
• The confidence interval is (in this case, a little) w ider for more • Easy to im plem ent in practice.
extrem e VaR confidence levels than it is for the more central
Th e O S approach is also su p erio r to confid ence-interval
ones.
estim ation m ethods based on estim ates of quantile stan
The same approach can also be used to estim ate the per dard errors (see C h ap te r 1), because it does not rely on
centiles of other VaR distribution functions. If we wish to asym p to tic th eo ry and/or force estim ated co n fid ence
estim ate the percentiles of a non-normal param etric VaR, we intervals to be sym m etric (which can be a problem for
replace the normal distribution function F(x) by the non-normal extrem e VaRs and ESs).
Table 2.2 Order Statistics Estimates of Standard Normal 95% VaRs and Associated Confidence Intervals
(a) As n varies
No. of observations 100 500 1000 5000 10 000
Lower bound of confidence interval 1.267 1.482 1.531 1.595 1.610
Median of VaR distribution 1.585 1.632 1.639 1.644 1.644
Standard estim ate of VaR 1.645 1.645 1.645 1.645 1.645
Upper bound of confidence interval 1.936 1.791 1.750 1.693 1.679
W idth of interval/median 42.2% 18.9% 13.4% 6.0% 4.2%
(b) A s VaR confidence level varies (with n = 500)
VaR confidence level 0.90 0.95 0.99
Lower bound of confidence interval 1.151 1.482 2.035
Median of VaR distribution 1.274 1.632 2.279
Standard estim ate of VaR 1.282 1.645 2.326
Upper bound of confidence interval 1.402 1.791 2.560
W idth of interval/median of interval 19.7% 18.9% 23.0%
Notes: The confidence interval is specified at a 90% level of confidence, and the lower and upper bounds of the confidence interval are estimated as
the 5 and 95 percentiles of the estimated VaR distribution (Equation (2.8)).
A P P EN D IX 2 distribution are unknown— and, more likely than not, so too is
the distribution itself. We are interested in a particular parameter
0, where 0 might be a mean, variance (or standard deviation),
The Bootstrap
quantile, or some other parameter. The obvious approach is to
The bootstrap is a sim ple and useful method for assessing estimate 0 using a suitable sample estimator— so if 0 is the mean,
/V
uncertainty in estimation procedures. Its distinctive feature our estimator 0 would be the sample mean, if 0 is the variance, our
/V
is that it replaces mathematical or statistical analysis with estimator 0 would be based on some sample variance, and so on.
simulation-based resampling from a given data set. It therefore Obtaining an estimator for 0 is therefore straightforward, but how
provides a means of assessing the accuracy of param eter esti do we obtain a confidence interval for it?
mators without having to resort to strong param etric assum p
To estim ate confidence intervals for 0 using traditional closed-
tions or closed-form confidence-interval form ulas. The roots of
form approaches requires us to resort to statistical theory, and
the bootstrap go back a couple of centuries, but the idea only
the theory available is of limited use. For exam ple, suppose we
took off in the last three decades after it was developed and
wish to obtain a confidence interval for a variance. If we assume
popularised by the work of Bradley Efron. It was Efron, too, who
that the underlying distribution is normal, then we know that
first gave it its name, which refers to the phrase 'to pull oneself
(n — 1)cr2/(T2 is distributed as x 2 with n — 1 degrees of free
up by one's bootstraps'. The bootstrap is a form of statistical
dom , and this allows us to obtain a confidence interval for a 2.
'trick', and is therefore very aptly named.
If we denote the a point of this distribution as Xa.n- 1, then the
The main purpose of the bootstrap is to assess the accuracy of
A /N
90% confidence interval for (n — 1)cr /<x is:

param eter estim ates. The bootstrap is ideally suited for this pur
[*0.05,n-1'*0.95,n-l] (2.9)
pose, as it can provide such estim ates without having to rely on
potentially unreliable assumptions (e.g ., assumptions of normal This implies that the 90% confidence interval for a 2 is:
ity or large sam ples).181
9The bootstrap is also easy to use ^ A
(n - 1)cr2 (n - 1)a
^ ■
because it does not require the user to engage in any difficult ( 2 . 10)
_ *0.95,n-1 *0.05,n-1
mathematical or statistical analysis. In any case, such traditional
methods only work in a limited number of cases, whereas the On the other hand, if we cannot assume that the underlying
bootstrap can be applied more or less universally. So the boot distribution is normal, then obtaining a confidence interval for
strap is easier to use, more powerful and (as a rule) more reliable a 2 can becom e very difficult: the problem is that although we
than traditional means of estimating confidence intervals for can estim ate a 2 itself, under more general conditions we would
param eters of interest. In addition, the bootstrap can be used to often not know the distribution of a 2, or have expressions for
provide alternative 'point' estim ates of param eters as well.

1o standard errors, and we cannot usually obtain closed-form confi
dence intervals without them .
We can face similar problem s with other param eters as well,

Limitations of Conventional Sampling such as m edians, correlations, and tail probabilities.20 So in gen
Approaches eral, closed-form confidence intervals are of limited applicability,
and will not apply to many of the situations we are likely to meet
The bootstrap is best appreciated by considering the limitations of
in practice.
conventional sampling approaches. Suppose we have a sample of
size n drawn from a population. The parameters of the population
The Bootstrap and Its Implementation

18 The bootstrap is also superior to the jackknife, which was often used
The bootstrap frees us of this type of limitation, and is also
for similar purposes before the advent of powerful computers. The jack
knife is a procedure in which we construct a large number of subsamples much easier to im plem ent. It enables us to estim ate a con
from an original sample by taking the original sample and leaving one fidence interval for any param eter that we can estim ate,
observation out at a time. For each such subsample, we estimate the regardless of whether we have any formulas for the distribu
parameter of interest, and the jackknife estimator is the average of the
tion function for that param eter or for the standard error of its
subsample-based estimators. The jackknife can also be regarded as an
approximation to the bootstrap, but it can provide a very poor approxi estimator. The bootstrap also has the advantage that it comes
mation when the parameter estimator is a non-smooth function of the with less baggage, in the sense that the assumptions needed
data. The bootstrap is therefore more reliable and easier to implement.
19 The bootstrap also has other uses too. For example, it can be used to
relax and check assumptions, to give quick approximations and to check 20 However, in the case of quantiles, we can use order statistics to write
the results obtained using other methods. down their distribution functions.

to im plem ent the bootstrap are generally less dem anding than accelerated (or B C a) approach, which generates a substantial
the assum ptions needed to estim ate confidence intervals using im provem ent in both theory and practice over the basic percen
more traditional (i.e., closed-form) methods. tile interval approach. To use this approach we replace the a and
1 — a subscripts in Equation (2.11) with a-, and a 2, where
The basic bootstrap procedure is very sim ple. We start with a
given original sam ple of size n.22 We now draw a new random z° + z.a Z° + Zi1—
_a
,o + Z1_ J . 2 12)
( .
sam ple of the same size from this original sam ple, taking care to 1 - a(z° + z J J ' 1 - a(zu
replace each chosen observation back in the sam ple pool after it
If the param eters a and z° are zero, this 8 C a confidence interval
has been drawn. (This random sam pling, or resam pling, is the
will coincide with the earlier percentile interval. However, in
very heart of the bootstrap. It requires that we have a uniform
general, they will not be 0, and we can think of the 8 C a method
random number generator to select a random number between
as correcting the end-points of the confidence interval. The
1 and n, which determ ines the particular observation that is cho
param eter a refers to the rate of change of the standard error of
sen each tim e.) W hen constructing the new sam ple, known as a
0 with respect to the true param eter 0, and it can be regarded
resam ple, we would typically find that some observations get
as a correction for skewness. This param eter can be estim ated
chosen more than once, and others don't get chosen at all: so
from the following, which would be based on an initial bootstrap
the resam ple would typically be different from the original one,
or jackknife exercise:
even though every observation included in it was drawn from
B/a \3
the original sam ple. O nce we have our resam ple, we use it to 2 , m= i (0 - 0 B(O)
3/2 (2.13)
estim ate the param eter we are interested in. This gives us a B/.-w2
resam ple estim ate of the parameter. W e then repeat the 'resam
pling' process again and again, and obtain a set of B resample The param eter z° can be estim ated as the standard normal
param eter estim ates. This set of 8 resam ple estim ates can also inverse of the proportion of bootstrap replications that is less
be regarded as a bootstrapped sam ple of param eter estim ates.
A
than the original estim ate 0. The 8 C a method is therefore

We can then use the bootstrapped sam ple to estim ate a con (relatively) straightforward to im plem ent, and it has the theoreti
fidence interval for our param eter 0. For exam ple, if each resa- cal advantages over the percentile interval approach of being
A
both more accurate and of being transform ation-respecting, the
p
mple / gives us a resam ple estim ator 0 (i) we might construct a

latter property meaning that if we take a transform ation of 0
A q
simulated density function from the distribution of our 0°(i) val

ues and infer the confidence intervals from its percentile points. (e.g ., if 0 is a variance, we might wish to take its square root to
If our confidence interval spans the central 1 — 2a of the prob obtain the standard deviation), then the B C a method will auto
ability mass, then it is given by: m atically correct the end-points of the confidence interval of the
transform ed param eter.23
C on fid en ce Interval = [0f , (2.11)
A q W e can also use a bootstrapped sam ple of param eter
where 0% is the a quantile of the distribution of bootstrapped
A p estim ates to provide an alternative point estim ator of a
6 (i) values. This 'percentile interval' approach is very easy to /V
param eter that is often superior to the raw sam ple estim ator 0.
apply and does not rely on any param etric theory, asym ptotic or
Given that there are 8 resam ple estim ators, we can take our
otherwise. A p
bootstrapped point estim ator 0 as the sam ple mean of our

N onetheless, this basic percentile interval approach is limited B § B (i) va lu e s:24
itself, particularly if param eter estim ators are biased. It is th ere
fore often better to use more refined percentile approaches, eB = (2-14)
b;=1
and perhaps the best of these is the bias-corrected and
Relatedly, we can also use a bootstrap to estim ate the bias in an
estimator. The bias is the difference between the expectation
21 This application of the bootstrap can be described as a non-paramet- of an estim ator and the quantity estim ated (i.e., the bias equals
ric one because we bootstrap from a given data sample. The bootstrap
can also be implemented parametrically, where we bootstrap from the
assumed distribution. When used in parametric mode, the bootstrap 23 For more on BCa and other refinements to the percentile interval
provides more accurate answers than textbook formulas usually do, and approach, see Efron and Tibshirani (1993, Chapters 14 and 22) or
it can provide answers to problems for which no textbook formulas exist. Davison and Hinkley (1997, Chapter 5).
The bootstrap can also be implemented semi-parametrically and a good
24 This basic bootstrap estimation method can also be supplemented
example of this is the FRS approach.
by variance-reduction methods (e.g., importance sampling) to improve
22 In practice, it might be possible to choose the value of n, but we will accuracy at a given computational cost. See Efron and Tibshirani (1993,
assume for the sake of argument that n is given. Chapter 23) or Davison and Hinkley (1997, Chapter 9).
E[0] — 0)), and can be estim ated by plugging Equation (2.14) these results are lim ited, because Equation (2.19) only applies to
and a basic sam ple estim ator 0 into the bias equation: the mean and Equation (2.20) presupposes normality as well.
Bias = E[0] — 0=> Estim a ted Bias = 0 B — 0 (2.15) We therefore face two related questions: (a) how we can esti
mate var(sg) in general? and (b) how can we choose B to achieve
We can use an estim ate of bias for various purposes (e.g ., to
a given level of accuracy in our estim ate of sB? O ne approach to
correct a biased estimator, to correct prediction errors, etc.).
these problem s is to apply brute force: we can estim ate var(sB)
However, the bias can have a (relatively) large standard error.
using a jackknife-after-bootstrap (in which we first bootstrap
In such cases, correcting for the bias is not always a good idea,
the data and then estim ate var(sB) by jackknifing from the
because the bias-corrected estim ate can have a larger standard
bootstrapped data), or by using a double bootstrap (in which we
error than the unadjusted, biased, estim ate.
estim ate a sam ple of bootstrapped sB values and then estim ate
The programs to com pute bootstrap statistics are easy to write their variance). We can then experim ent with different values of
and the most obvious price of the bootstrap, increased com pu B to determ ine the values of these param eters needed to bring
tation, is no longer a serious problem .25 var(se ) down to an acceptable level.
If we are more concerned about the second problem (i.e., how
Standard Errors of Bootstrap Estimators to choose B), a more elegant approach is the following, sug
gested by Andrew s and Buchinsky (1997). Suppose we take as
Naturally, bootstrap estim ates are them selves subject to error. our 'ideal' the value of sB associated with an infinite number of
Typically, bootstrap estim ates have little bias, but they often
resam ples, i.e ., s ,c. Let t be a target probability that is close to
have substantial variance. The latter com es from basic sampling 1, and let b o u n d be a chosen bound on the percentage devia
variability (i.e., the fact that we have a sam ple of size n drawn tion of scrB from s x. We want to choose B = B(bound, t ) such
from our population, rather than the population itself) and from
that the probability that sB is within the desired bound is r :
resampling variability (i.e., the fact that we take only B bootstrap
SB ~ See
A A
resam ples rather than an infinite number of them ). The esti- Pr 100 < bound ( 2 . 21 )
/V
SB
A
mated standard error for 0, s B, can be obtained from:

1 B \ 1/2 If B is large, then the required number of resam ples is
1 ^ - B f\ nB\2 (2.16)
h = ( e 2 > 6(0 - approxim ately
where 0 = (1 / B ) E /=10 (/). sB is of course also easy to estim ate. „ 2500( k - 1)*5
D — y ( 2 . 22 )
However, sB is itself variable, and the variance of sB is: b ou n d
var(§g) = var[E(§g)] + E[var(se)] (2.17) However, this formula is not operational because k , the kurtosis
of the distribution of § B, is unknown. To get around this prob
Following Efron and Tibshirani (1993, Chapter 19), this can be
lem, we replace k with a consistent estim ator of k , and this leads
rearranged as:
Andrew s and Buchinsky to suggest the following three-step
A / A
m2 m* _ method to determ ine B:

var(sg) = vartm ^2] + E (2.18)
_4B\m2
• We initially assume that k = 3, and plug this into
where m,• is the rth moment of the bootstrap distribution of the Equation (2.22) to obtain a preliminary value of B, denoted
A Q
0 (i). In the case where 0 is the mean, Equation (2.18) reduces to: by B0, where
. m4/ffi2 - m2 a2 (T2{mA/m l - 3) 5000^ \

var(sg) = --------- z------ + — + (2.19) (2.23)
4n‘ 2nB 4n B b ou n d 2J
If the distribution is normal, this further reduces to: and where int(a) refers to the sm allest integer greater than or
cr equal to a.
var(sg) = ( 2 . 20)
2 n' • We sim ulate B0 resam ples, and estim ate the sam ple kurtosis
A p A
We can then set B to reduce var(sg) to a desired level, and so of the bootstrapped 0 values, k .
achieve a target level of accuracy in our estim ate of sB. However, • We take the desired number of bootstrap resam ples as equal
to m ax(B0, B n), where
25 An example of the bootstrap approach applied to VaR is given ear
2500( k - P x ?
lier in this chapter discussing the bootstrap point estimator and boot (2.24)
strapped confidence intervals for VaR. bound2

• This method does not directly tell us what the variance of s B G A R C H procedure). W e can then bootstrap from the residu
might be, but we already know how to estim ate this in any als, which should be independent. However, this solution
case. Instead, this method gives us something more useful: it requires us to identify the underlying stochastic model and
tells us how to set 8 to achieve a target level of precision in estim ate its param eters, and this exposes us to model and
our bootstrap estim ators, and (unlike Equations (2.19) and param eter risk.
A
(2.20)) it applies for any param eter u and applies however 0 • An alternative is to use a block approach: we divide sample
is distributed.26 data into non-overlapping blocks of equal length, and select
a block at random. However, this approach can 'whiten' the
data (as the joint observations spanning different blocks are
Time Dependency and the Bootstrap taken to be independent), which can undermine our results.
Perhaps the main limitation of the bootstrap is that standard On the other hand, there are also various m ethods of dealing
bootstrap procedures presuppose that observations are inde with this problem (e.g ., making block lengths stochastic, etc.)
pendent over tim e, and they can be unreliable if this assumption but these refinem ents also make the block approach more
does not hold. Fortunately, there are various ways in which we difficult to im plem ent.
can modify bootstraps to allow for such dependence: • A third solution is to m odify the p ro b ab ilities with which
individual o b servatio n s are cho sen. Instead of assum ing
• If we are prepared to make param etric assum ptions, we
that each observation is chosen with the sam e p ro b ab ility,
can model the dependence param etrically (e .g ., using a
w e can m ake the p ro b ab ilities of selectio n d ep en d en t on
the tim e indices of recently se lected o b servatio n s: so, for
26 This three-step method can also be improved and extended. For exam p le, if the sam ple data are in chronological order
example, it can be improved by correcting for bias in the kurtosis estima and observation / has ju st been chosen, then observation
tor, and a similar (although more involved) three-step method can be used
to achieve given levels of accuracy in estimates of confidence intervals as / + 1 is more likely to be chosen next than m ost other
well. For more on these refinements, see Andrews and Buchinsky (1997). observatio ns.
Parametric
Approaches (II):
Extreme Value
Learning Objectives
Explain the im portance and challenges of extrem e values Com pare and contrast generalized extrem e value and POT.
in risk m anagem ent.
Evaluate the tradeoffs involved in setting the threshold
Describe extrem e value theory (EVT) and its use in risk level when applying the G P distribution.
m anagem ent.
Explain the im portance of m ultivariate EV T for risk
Describe the peaks-over-threshold (POT) approach. m anagem ent.
E x c e rp t is C hapter 7 o f Measuring M arket Risk, S e co n d Edition, by Kevin D ow d.
35
There are many problem s in risk m anagem ent that deal with to these problem s.1 EVT focuses on the distinctiveness of extreme
extrem e events— events that are unlikely to occur, but can be values and makes as much use as possible of what theory has to
very costly when they do. These events are often referred to offer. Not surprisingly, EVT is quite different from the more fam il
as low-probability, high-impact events, and they include large iar 'central tendency' statistics that most of us have grown up
m arket falls, the failures of major institutions, the outbreak of with. The underlying reason for this is that central tendency statis
financial crises and natural catastrophes. Given the im portance tics are governed by central limit theorem s, but central limit theo
of such events, the estimation of extrem e risk measures is a key rems do not apply to extrem es. Instead, extrem es are governed,
concern for risk managers. appropriately enough, by extreme-value theorem s. EVT uses
these theorems to tell us what distributions we should (and
However, to estim ate such risks we have to confront a dif
should not!) fit to our extrem es data, and also guides us on how
ficult problem : extrem e events are rare by definition, so we
we should estimate the parameters involved. These EV distribu
have relatively few extrem e observations on which to base
tions are quite different from the more familiar distributions of
our estim ates. Estim ates of extrem e risks must therefore be
central tendency statistics. Their parameters are also different,
very uncertain, and this uncertainty is especially pronounced
and the estimation of these parameters is more difficult.
if we are interested in extrem e risks not only within the range
of observed data, but w ell b e y o n d it— as m ight be the case if This chapter provides an overview of EV theory, and of how it
we w ere interested in the risks associated with events more can be used to estimate measures of financial risk. We will focus
extrem e than any in our historical data set (e .g ., an unprec mainly on the VaR (and to a lesser extent, the ES) to keep the dis
edented stock m arket fall). cussion brief, but the approaches considered here extend natu
rally to the estimation of other coherent risk measures as well.
Practitioners can only respond by relying on assum ptions to
make up for lack of data. Unfortunately, the assumptions they The chapter itself is divided into four sections. The first two
make are often questionable. Typically, a distribution is selected discuss the two main branches of univariate EV theory, the next
arbitrarily, and then fitted to the whole data set. However, this discusses some extensions to, including m ultivariate EVT, and
means that the fitted distribution will tend to accom m odate the the last concludes.
more central observations, because there are so many of them ,

rather than the extrem e observations, which are much sparser.
Hence, this type of approach is often good if we are interested
3.1 G E N E R A LIS E D EX TR EM E-V A LU E
in the central part of the distribution, but is ill-suited to handling T H EO R Y
extrem es.
Theory
W hen dealing with extre m e s, w e need an approach that
com es to term s with the basic problem posed by extrem e- Suppose we have a random loss variable X, and we assume to
value estim atio n: th at the estim ation of the risks associated begin with that X is independent and identically distributed (iid)
with low -frequency events with lim ited data is inevitably from some unknown distribution F(x) = Prob(X < x). W e wish to
p ro b lem atic, and th at th ese d ifficu lties increase as the events estim ate the extrem e risks (e.g ., extrem e VaR) associated with
concerned becom e rarer. Such problem s are not unique to the distribution of X. Clearly, this poses a problem because we
risk m anag em ent, but also occur in other d iscip lin es as w ell. don't know what F(x) actually is.
Th e standard exam p le is hydrology, w here eng ineers have
This is where EVT comes to our rescue. Consider a sample of size
long strug gled with the question of how high dikes, sea w alls
n drawn from F(x), and let the maximum of this sample be Mn.1
2
and sim ilar barriers should be to contain the p ro b ab ilities
If n is large, we can regard M n as an extrem e value. Under rela
of floods within reasonable lim its. Th ey have had to do so
tively general conditions, the celebrated Fisher-Tippett theorem
with even less data than financial risk p ractitio n ers usually
have, and th eir quantile estim ates— the flood w ater levels
they w ere contending w ith— w ere also typ ically w ell out of 1 The literature on EVT is vast. However, some standard book references
the range of th eir sam ple data. So th ey have had to grap p le on EVT and its finance applications are Embrechts et. al. (1997), Reiss
and Thomas (1997) and Beirlant et. al. (2004). There is also a plethora
with co m p arab le problem s to those faced by insurers and
of good articles on the subject, e.g., Bassi et. al. (1998), Longin (1996,
risk m anagers, but have had to do so with even less data and 1999), Danielsson and de Vries (1997a,b), McNeil (1998), McNeil and
po tentially much more at stake. Saladin (1997), Cotter (2001, 2004), and many others.
2 The same theory also works for extremes that are the minima rather than
The result of their efforts is extrem e-value theory the maxima of a (large) sample: to apply the theory to minima extremes,
(EVT)— a branch of applied statistics that is tailor-made we simply apply the maxima extremes results but multiply our data by - 1.
(1928) then tells us that as n gets large, the distribution of O bserve that most of the probability mass is located between
extrem es (i.e., Mn) converges to the following generalised x values of —2 and + 6. More generally, this means most of the
extreme-value (GEV) distribution: probability mass will lie between x values of /x — 2cr and /x + 6er.
To obtain the quantiles associated with the G E V distribution, we

f * 0 set the left-hand side of Equation (3.1) to p, take logs of both
(3.1) sides of Equation (3.1) and rearrange to get:
£ = 0 /
where x satisfies the condition 1 + £(x — /x)/er > 0 .3 This distri

bution has three param eters. The first two are /x, the location
param eter of the limiting distribution, which is a measure of the
central tendency of Mn, and a , the scale param eter of the limit
We then unravel the x-values to get the quantiles associated
ing distribution, which is a measure of the dispersion of Mn.
with any chosen (cumulative) probability p :5
These are related to, but distinct from , the more fam iliar mean
and standard deviation, and we will return to these presently. x = fx — —[1 — (—ln(p)) (Frechet, £ > 0) (3.3a)
The third parameter, £, the tail index, gives an indication of the
shape (or heaviness) of the tail of the limiting distribution. x = fi — a ln[—ln(p)] (Gum bel, £ = 0) (3.3b)
The G E V Equation (3.1) has three special cases:
• If £ > 0, the G E V becom es the Frechet distribution. This case Example 3.1 G u m b e l q u a n tile s
applies where the tail of F(x) obeys a power function and is For the standardised G um bel, the 5% quantile is
therefore heavy (e.g ., as would be the case if F(x) were a Levy —In[—ln(0.05)] = —1.0972 and the 95% quantile is
distribution, a t-distribution, a Pareto distribution, etc.). This —In[—ln(0.95)] = 2.9702.
case is particularly useful for financial returns because they
are typically heavy-tailed, and we often find that estim ates of
£ for financial return data are positive but less than 0.35. Example 3.2 F re c h e t q u a n tile s
• If £ = 0, the G E V becom es the Gum bel distribution, corre For the standardised Frechet with £ = 0.2, the 5% quantile is
sponding to the case where F(x) has exponential tails. These —(1/0.2)[1 - (—ln(0.05))“ ° 2] = -0 .9 8 5 1 and the 95% quantile
are relatively light tails such as those we would get with nor is —(1/0.2)[1 - (—ln(0.95))-0 2] = 4.0564. For £ = 0.3, the 5%
mal or lognormal distributions. quantile is —(1/0.3)[1 — (—ln(0.05))~° 3] = —0.9349 and the
• If £ < 0, the G E V becom es the W eibull distribution, corre 95% quantile is —(1/0.3)[1 - (- ln (0 .9 5 )r ° '3] = 4.7924. Thus,
sponding to the case where F(x) has lighter than normal tails. Frechet quantiles are sensitive to the value of the tail index £,
However, the W eibull distribution is not particularly useful for and tend to rise with £. Conversely, as £ —>0, the Frechet quan
modelling financial returns, because few empirical financial tiles tend to their Gum bel equivalents.
returns series are so light-tailed.4 We need to rem em ber that the probabilities in Equations
The standardised (i.e., /x = 0, a = 1) Frechet and Gum bel prob (3.1)-(3.3) refer to the probabilities associated with the extrem e
ability density functions are illustrated in Figure 3.1. Both are loss distribution, not to those associated with the distribution
skewed to the right, but the Frechet is more skewed than the of the 'parent' loss distribution from which the extrem e losses
Gum bel and has a noticeably longer right-hand tail. This means are drawn. For exam ple, a 5th percentile in Equation (3.3) is the
that the Frechet has considerably higher probabilities of produc cut-off point between the lowest 5% of extrem e (high) losses
ing very large X-values.
5 We can obtain estimates of EV VaR over longer time periods by using

appropriately scaled parameters, bearing in mind that the mean scales
3 See, e.g., Embrechts et. al. (1997), p. 316.
proportionately with the holding period h, the standard deviation scales
4 We can also explain these three cases in terms of domains of attrac with the square root of h, and (subject to certain conditions) the tail
tion. Extremes drawn from Levy or t-distributions fall in the domain of index does not scale at all. In general, we find that the VaR scales with a
attraction of the Frechet distribution, and so obey a Frechet distribution parameter k (i.e., so VaR(h) — VaR(1)(h)K, where h is the holding period),
as n gets large; extremes drawn from normal and lognormal distribu and empirical evidence reported by Hauksson et. al. (2001, p. 93) sug
tions fall in the domain of attraction of the Gumbel, and obey the gests an average value for k of about 0.45. The square-root scaling rule
Gumbel as n gets large, and so on. (i.e., k = 0.5) is therefore usually inappropriate for EV distributions.
Chapter 3 Parametric Approaches (II): Extreme Value ■ 37

X 10-3
and the highest 95% of extrem e (high) losses; it is not the 5th estim ate the relevant VaRs. O f course, the VaR formulas given by
percentile point of the parent distribution. The 5th percentile of Equation (3.5) are meant only for extrem ely high confidence lev
the extrem e loss distribution is therefore on the /eft-hand side of els, and we cannot exp ect them to provide accurate estim ates
the distribution of extrem e losses (because it is a small extrem e for VaRs at low confidence levels.
loss), but on the r/ght-hand tail of the original loss distribution
(because it is an extrem e loss).
Example 3.3 Gumbel VaR
To see the connection between the probabilities associated with
For the standardised Gum bel and n = 100, the 99.5% VaR
the distribution of M n and those associated with the distribution
is —ln[—100 X ln(0.995)] = 0.6906, and the 99.9% VaR is
of X, we now let M n* be some extrem e threshold value. It then
—In[—100 X ln(0.999)] = 2.3021.
follows that:
Pr[Mn < M*n] = p = {Pr[X < M*n]}n = k ] n (3.4)

Example 3.4 Frechet VaR
where a is the VaR confidence level associated with the thresh
old M n*. To obtain the a VaR, we now use Equation (3.4) to sub For the standardised Frechet with £ = 0.2 and n = 100, the 99.5%
stitute [a]n for p in Equation (3.3), and this gives us: VaR is —(1/0.2)[1 - (-1 0 0 X ln(0.995)r°'2] = 0.7406 and the
99.9% VaR is —(1/0.2)[1 - (-1 0 0 X ln(0.999))-°-2] = 2.9237. For
VaR = /in - - ( - n ln(«))^ n] (Frechet, £ > 0) (3.5a) £ = 0.3, the 99.5% VaR is -(1/0.3)[1 - (-1 0 0 X ln(0.995)r0 3]
Sn = 0.7674 and the 99.9% VaR is —(1/0.3)[1 - (- 1 0 0 X ln(0.999))“ a3
VaR = [jLn — an ln[—n ln(a)] (Gum bel, £ = 0) (3.5b) = 3.3165.
(Since n is now explicit, we have also subscripted the param e These results tell us that EV-VaRs (and, by im plication, other
ters with n to make explicit that in practice these would refer to EV risk measures) are sensitive to the value of the tail index £n,
the param eters associated with maxima drawn from sam ples of which highlights the im portance of getting a good estim ate of
size n. This helps to avoid errors with the limiting VaRs as n gets £n when applying EVT. This applies even if we use a Gum bel,
large.) Given values for the extrem e-loss distribution param because we should use the Gum bel only if we think £n is insigni
eters /xn, crn and (where needed) £n, Equation (3.5) allows us to ficantly different from zero.
where k(x) varies slowly with x. For exam ple, if we assume
Example 3.5 R e a lis tic F re c h e t V aR
for convenience that k(x) is approxim ately constant, then
Suppose w e wish to estim ate Frech et VaRs with more real Equation (3.6) becom es:
istic p aram eters. For US stock m arkets, som e fairly plau
F(x) - /o T 1/£ (3.7)
sible param eters are /x = 2%, cr = 0.7% and £ = 0.3% . If
w e put these into our Frech et VaR form ula Equation (3.5a) Now consider two probabilities, a first, 'in-sample' probability
and retain the earlier n value, the estim ated 99.5% VaR Pin-sample#a nd a second, sm aller and typically out-of-sample
(in %) is 2 - (0.7/0.3)[1 - (- 1 0 0 X ln(0.995)r°-3] = 2.537, probability pout-of-sample- Equation (3.7) implies:
and the estim ated 99.9% VaR (in %) is 2 — (0.7/0.3)
Pin-sample k*in-sample a^d
[1 - (- 1 0 0 X ln(0.999)ra3] = 4.322. For the next trading
day, these estim ates tell us that w e can be 99.5% confident Pout-of-sample ôut-of-sample (3.8)
of not making a loss in excess of 2.537% of the value of our which in turn implies:
portfolio, and so on.
Pin-sample Xin-sample \ ^
It is also interesting to note that had we assumed a Gum bel
Pout-of-sample * out-of-sample
(i.e., £ = 0) we would have estim ated these VaRs (again
in %) to be 2 — 0.7 X ln[—100 X ln(0.995)] = 2.483 and ( Pin-sample (3.9)
~
2 —0.7 X ln [-1 0 0 X ln(0.999)] = 3.612. These are lower than

the Frechet VaRs, which underlines the im portance of getting Pout-of-sample
This allows us to estim ate one quantile (denoted here as
the £ right.
Xout-of-sample) based on a known in-sample quantile x /n.samp/e, a
How do we choose betw een the G um bel and the Frech et? known out-of-sample probability pout-of-sampie (which is known
Th ere are various w ays w e can d ecid e which EV distribution because it com es directly from our VaR confidence level), and an
to use: unknown in-sample probability p m.sarnp\e.
• If we are confident that we can identify the parent loss distri The latter can easily be proxied by its em pirical counterpart, t/n,
bution, we can choose the EV distribution in whose domain where n is the sam ple size and tth e number of observations
of attraction the parent distribution resides. For exam ple, if higher than x,n_samp/e. Using this proxy then gives us:
we are confident that the original distribution is a t, then we
I ^Pout-of-sample
would choose the Frechet distribution because the t belongs Xout-of-sample în-samplel 7
in the domain of attraction of the Frechet. In plain English,
we choose the EV distribution to which the extrem es from which is easy to estim ate using readily available information.
the parent distribution will tend.
To use this approach, we take an arbitrarily chosen in-sample
• W e could te st the significance of the tail ind ex, and we
quantile, x /n.samp/e, and determ ine its counterpart em pirical prob
m ight choose the G um bel if the tail index was insignificant
ability, t/n. We then determ ine our out-of-sample probability
and the Frechet otherw ise. However, this leaves us open
from our chosen confidence level, estim ate our tail index using a
to the danger that we m ight incorrectly conclude that £ is
suitable m ethod, and our out-of-sample quantile estim ator
0, and this could lead us to underestim ate our extrem e risk
im m ediately follows from Equation (3.10).6
m easures.
• Given the dangers of model risk and bearing in mind that the
estim ated risk measure increases with the tail index, a safer Estimation of EV Parameters
option is always to choose the Frechet. To estim ate EV risk m easures, we need to estim ate the relevant
EV param eters—//,, cr and, in the case of the Frechet, the tail
index £, so we can insert their values into our quantile formulas
A Short-Cut EV Method
There are also short-cut ways to estim ate VaR (or ES) using EV
theory. These are based on the idea that if £ > 0, the tail of 6 An alternative short-cut is suggested by Diebold et. al. (2000).
They suggest that we take logs of Equation (3.7) and estimate the
an extrem e loss distribution follows a power-law tim es a slowly
log-transformed relationship using regression methods. However, their
varying function: method is still relatively untried, and its reliability is doubtful because
there is no easy way to ensure that the regression procedure will
F(x) = k(x)x~y * (3.6) produce a 'sensible' estimate of the tail index.

(i.e., Equation (3.5)). We can obtain estim ators using maximum above.) In the case where £ 0, Equations (3.1) and (3.13)
likelihood (ML) m ethods, regression m ethods, moment-based or together give
sem i-param etric methods.
T- “ e xp [-(1 + « - (3.14)
\ + m
ML Estimation Methods
Taking logs twice of both sides yields:
ML m ethods derive the most probable param eter estim ators
given the data, and are obtained by maximising the likeli i
(3.15)
hood function. To apply an ML approach, we begin by con 1 + m
structing the likelihood or log-likelihood function. In the case
and we can obtain least squares estim ates of /zn, crn and
of the Gum bel (£ = 0) and with m observations for Mn, the
£n from a regression of log[—log (//(I + m))] against
log-likelihood function is:
[1 + £n(M n — ixn)/crn\. W hen £ = 0, then the equivalent of Equa
m tion (3.14) is:
/(/xn, <rn) = - m In{an) - 2 e x P (_ (3.11)
/= 1
(3.16)
W here £ ¥= 0 the log-likelihood function is: 1 + m
m and the recovery of param eter estim ates from a regression is

/(/xn/ an, gn) = - m lri(crn) - (1 + 1/£n) 2 ln
/= 1 straightforward.
1
m
5>
1 Semi-Parametric Estimation Methods
We can also estimate parameters using semi-parametric methods.
which would be m axim ised su b ject to the co nstraint that
These are typically used to estimate the tail index £, and the most
any observation M'n satisfies 1 + g(M'n — n )/ a > 0. Th e ML
popular of these is the Hill estimator. This estimator is directly
approach has som e attractive p ro p erties (e .g ., it is sta tisti
applied to the ordered parent loss observations. Denoting these
cally w ell grounded , p aram eter estim ato rs are co nsistent
from highest to lowest by X 17 X 2, . . . , X n, the Hill ^nHl is:
and asym p to tically norm al if £n > —1/2, w e can easily te st
hypotheses about param eters using likelihood ratio sta tis
tics, e tc.). H ow ever, it also lacks closed-form solutions for
l » = r i ' n X - l n X l(+1 (3.17)
K i= 1
the p aram eters, so the M L approach requires the use of an
ap p ro p riate num erical solution m ethod. This requires suitable where k, the tail threshold used to estim ate the Hill estimator,
so ftw are, and there is the danger th at M L estim ators m ight has to be chosen in an appropriate way. The Hill estim ator is the
not be robust. In ad d itio n , because the underlying th eo ry is average of the k most extrem e (i.e., tail) observations, minus the
asym p to tic, there is also the potential fo r problem s arising k + 1th observation, or the one next to the tail. The Hill estim a
from sm allness of sam ples. tor is known to be consistent and asym ptotically normally dis
tributed, but its properties in finite sam ples are not well
understood, and there are concerns in the literature about its
Regression Methods small-sample properties and its sensitivity to the choice of
An easier method to apply is a regression method due to threshold k. However, these (and other) reservations notwith
Gum bel (1958).7 To see how the method works, we begin by standing, many EV T practitioners regard the Hill estim ator as
ordering our sam ple of M„ values from lowest to highest, so being as good as any other.8
Mj!, < ^ ^ M™. Because these are order statistics, it
The main problem in practice is choosing a cut-off value for k.
follows that, for large n:
W e know that our tail index estim ates can be sensitive to the
choice of k, but theory gives us little guidance on what the value
• •
E [H (M ')] = t - 4 - ^ H ( M ') - — ( 3. 13)

i + m i + m of k should be. A suggestion often given to this problem is that
where H (M 'n) is the cumulative density function of maxima, and

we drop all redundant scripts for convenience. (See Equation (3.1)
8 An alternative is the Pickands estimator (see, e.g., Bassi et. al. (1998),
p. 125 or Longin (1996), p. 389). This estimator does not require a posi
tive tail index (unlike the Hill estimator) and is asymptotically normal and
weakly consistent under reasonable conditions, but is otherwise less effi
7 See Gumbel (1958), pp. 226, 260, 296. cient than the Hill estimator.
BOX 3.1 MOMENT-BASED ESTIMATORS OF EV PARAMETERS
An alternative approach is to estim ate EV param eters using The Gum bel equivalents are obtained by taking the limit as
em pirical m oments. Let m,- be the rth em pirical moment £ —>0. In this case
of our extrem es data set. Assum ing £ 0, we can adapt
Em brechts et. al. (1997, pp. 293-295) to show that:
m, = IX— | ( 1 — r(1 - f)) jl = m-, + r(1)<7 = m-, - 0.57722O-
2 m2 - m,= | r (1 - | ) ( 2« - 1) This moment-based approach is easy to apply, but, it is

unreliable because of the poor sampling properties of the
3m3 - m,= |r ( 1 - ~D second- and higher-order moments.
However, following Hosking et. al. (1985), we can obtain
where the T(.) is a gamma function. Dividing the last of these estim ates with superior sampling properties if we replace
into the preceding one gives us an implied estim ator | of £. the mj in the above expressions with their probability-
The first two equations can then be rearranged to give us w eighted counterparts w„ where w(- = E[X(F(X)'-1)] for
/ = 1 ,2 , ... If we wish to, we can also replace the m(-with
A
estim ators for /x and er in term s of £ and sam ple moments m -1

and m2: more general probability-weighted moments w/rs, where
w/,r,s = E [X (F (X )H1(1 - F(X))S~1] for s = 1, .
(2m2 - m-|)|
/V
r<i - | ) ( 2 S - D
A = m, + ? ( 1 - m - I))
we estim ate Hill (or Pickands) estim ators for a range of k values, nothing of the sort. This com es as a shock. In fact, the values of
and go for k values where the plot of estim ators against k-values the Hill estim ator show no sign of settling down at all. The Hill
(hopefully) becom es more or less horizontal: if the plot stabilises plot becom es a 'Hill horror plot' and gives us no real guidance
and flattens out, then the plateau value should give a reason on how we might choose k— and this means that it does not
able estim ate of our tail index. This suggestion tries to extract help us to determ ine what the value of the Hill estim ator might
the maximum possible information from all our data, albeit in an b e.91
0The Hill horror plot is shown in Figure 3.3.
informal way.
Hill horror plots can be a real problem , and it is som etim es sug
To show how this might be done, Figure 3.2 shows a 'Hill plot'— a gested that the best practical response when m eeting them is to
plot of the values of the Hill estimator against k, the tail threshold 'patch' up the estim ator and hope for the best. To illustrate this
size used to estimate the Hill estimator, based on 1000 simulated in the present context, I played around a little with the above
observations from an underlying distribution. As we can see, the data and soon discovered that I could obtain a fairly nice Hill
A A
Hill estimates are a little unsteady for low values of k, but they plot by making a very small adjustm ent to the Hill form ula. The
become more stable and settle down as k gets larger, and one resulting 'Hill happy plot' is shown in Figure 3.4. In this case, the
might suppose that the 'true' value of the tail index lies in the values of the Hill estim ator do settle down as k gets larger, and
region of 0.18-0.20. Such a value would be plausible for many the plot suggests that we can take the best value of the tail
real situations, so if we met such a situation in practice we could index to be som ewhere in the region of 0.15. We have therefore
easily persuade ourselves that this was a fair estimate. 'solved' the problem of the Hill horror plot. However, this
However, in coming to such a conclusion, we are im plicitly pre

suming that the values of the Hill estim ator do indeed settle
down for values of k bigger than 40. Is this assumption justified? 9 Purists might point out that we might expect a badly behaved Hill esti
mator when using data drawn from a normal distribution. This may be
The answer, sadly, is that it is often not. We can see why when
true, but it misses the main point of the exercise: Hill horror plots are all
we extend the same plot for higher values of k: despite the fact too common, and occur with many non-normal distributions as well.
that the values of the Hill estim ator looked like they were set 10 For those who are curious about it, the adjustment used is to add in
tling down as k approached 40, it turns out that they were doing the extra term -0.0015 k to the Hill formula Equation (3.17).

Figure 3.2 Hill plot.
Note: Based on 1000 simulated drawings from a standard normal distribution.
Figure 3.3 Hill horror plot.

Fiq u re 3 .4 'Hill happy plot.'
'solution' com es at a big price: the adjustm ent is com pletely ad the tail of the distribution function F(x), and exploit the point
hoc and has no theoretical basis whatever. More to the point, that the tail size is optimal in an asym ptotic mean-squared-error
we don't even know whether the answer it gives us is any good: sense where bias and variance disappear at the same rate. This
all we really know is that we have managed to patch up the esti optimal size can be found by a subsam ple bootstrap procedure.
mator to stabilise the Hill plot, but w hether this actually helps us However, this approach requires a large sam ple size— at least
is quite a different matter. 1500 observations— and is therefore im practical with small sam
ple sizes. In addition, any autom atic procedure for selecting k
If we have a very large sam ple size, we can also use an alterna
tends to ignore other, softer, but nonetheless often very useful,
tive method of gauging the 'right' value of k.
inform ation, and this leads some writers to be som ewhat scepti
Danielsson and de Vries (1997b) have suggested an ingenious cal of such methods.
(though rather involved) procedure based on the fact that the
choice of k implies a trade-off between bias and variance. If we
increase k, we get more data and so move to the centre of the
distribution. This increases the precision of our estim ator (and
3.2 TH E PEA KS-O V ER -TH R ESH O LD
therefore reduces its variance), but also increases the bias of A P PR O A C H : TH E G EN ER A LIS ED
the tail estim ator by placing relatively more w eight on observa PARETO DISTRIBUTION
tions closer to the centre of our distribution. Alternatively, if we
decrease k and move further out along the tail, we decrease
Theory
bias but have few er data to work with and get a higher variance.
These authors suggest that we choose the value of k to mini We turn now to the second strand of the EV literature, which
mise a mean-squared-error (M SE) loss function, which reflects deals with the application of EV T to the distribution of excess
an optimal trade-off, in an M SE sense, between bias and vari losses over a (high) threshold. This gives rise to the peaks-
ance. The idea is that we take a second-order approxim ation to over-threshold (PO T) or generalised Pareto approach, which

BOX 3.2 ESTIMATING VaR UNDER MAXIMUM DOMAIN OF ATTRACTION
CONDITIONS
We have assumed so far that our maxima data were drawn This leads to the quantile estim ator
exactly from the GEV. But what happens if our data are only
approxim ately G E V distributed (i.e., are drawn from the
Xp = d„+ %([n(1 - p ) ] - f - 1)
maximum domain of attraction of the G EV )? The answer is
•f
that the analysis becom es som ewhat more involved. C on
sider the Frechet case where £ = M a > 0. The far-right tail
/V A
A A
for appropriate param eter estim ators cn, d n and £, and some
F(x) = 1 — F(x) is now F(x) = x~aL(x) for some slowly varying high probability (confidence level) p. The problem is then to
function L. However, the fact that the data are drawn from the estim ate p-quantiles outside the range of the data where the
maximum domain of attraction of the Frechet also means that em pirical tail F() = 0. The standard approach to this problem
is a subsequence trick: in effect, we replace n with n/k. This
lim n F(cnx + d n) = - lo g H Jx )
yields the quantile estim ator
where H^(x) is the standardised (0 location, unit scale)

Frechet, and cn and dn are appropriate norming (or scal K> ~ dn/k +
ing) param eters. Invoking Equation (3.1), it follows for large
u = cnx + dn that
cn/k and d n/k can be obtained using suitable sem i-param etric
i /V
m ethods, and £ can be obtained using the usual Hill or other

£
tail index approaches.11
(generally) requires few er param eters than EV approaches based index encountered already with G E V theory. The cases that usu
on the generalised extrem e value theorem . The P O T approach ally interest us are the first tw o, and particularly the first (i.e.,
provides the natural way to model exceedances over a high £ > 0 ), as this corresponds to data being heavy tailed.
threshold, in the same way that G E V theory provides the natural
The G PBdH theorem is a very useful result, because it tells us
way to model the maxima or minima of a large sam ple.
that the distribution of excess losses always has the same form
If X is a random iid loss with distribution function F(x), and u is (in the limit, as the threshold gets high), pretty much regardless
a threshold value of X, we can define the distribution of excess of the distribution of the losses them selves. Provided the thresh
losses over our threshold u as: old is high enough, we should therefore regard the G P distribu
tion as the natural model for excess losses.
F(x + u) - F(u)
F u(x) = P r{X — u < x X > u } = (3.18)
1 - Hu) To apply the G P distribution, we need to choose a reasonable
threshold u, which determ ines the number of observations, NUI in
for x > 0. This gives the probability that a loss exceeds the
excess of the threshold value. Choosing u involves a trade-off: we
threshold u by at most x, given that it does exceed the thresh
want a threshold u to be sufficiently high for the GPBdH theorem
old. The distribution of X itself can be any of the commonly
to apply reasonably closely; but if u is too high, we won't have
used distributions: normal, lognormal, t, etc., and will usually
enough excess-threshold observations on which to make reliable
be unknown to us. However, as u gets large, the G ned en ko -
estim ates. We also need to estim ate the param eters £ and f3. As
Pickands-Balkem a-deH aan (G PBdH ) theorem states that the
with the G E V distributions, we can estimate these using m axi
distribution Fu(x) converges to a generalised Pareto distribution,
mum likelihood approaches or semi-parametric approaches.
given by:
W e now rearrange the right-hand side of Equation (3.18) and
1 - (1 + £x/prVi jf f * 0 move from the distribution of exceedances over the threshold to
G ^ (x) (3.19)
1 - exp [~x/(3) ' £ = 0 the parent distribution F(x) defined over 'ordinary' losses:
F(x) = (1 - F (u ))G ^ (x - u) + Hu) (3.20)

defined for x > 0 for £ > 0 and 0 < x < —/3/£ for £ < 0. This
distribution has only two param eters: a positive scale parameter,
/3, and a shape or tail index parameter, £, that can be positive, 11 For more on estimation under maximum domain of attraction condi
zero or negative. This latter param eter is the same as the tail tions, see Embrechts et. al. (1997, section 6.4).
where x > u. To make use of this equation, we need an estimate the m ean-excess function, and choose a threshold where the
of F(u), the proportion of observations that do not exceed the M EF becom es horizontal. We also need to estim ate the param e
threshold, and the most natural estimator is the observed propor ters £ and p and, as with the earlier G E V approaches, we can
tion of below-threshold observations, (n — N J/n . We then substi estim ate these using maximum likelihood or other appropriate
tute this for F(u), and plug Equation (3.19) into Equation (3.20): m ethods. Perhaps the most reliable are the M L approaches,
which involve the maximisation of the following log-likelihood:
Fix) = 1 (3.21)
m
f - m In 0 - (1 + 1 / f ) 2 ln(1 + fX / 0 ) (* 0
The VaR is given by the x-value in Equation (3.21), which can be HW = \ m '“ 1 if (3.24)
recovered by inverting Equation (3.21) and rearranging to get: [ - m l n / 3 —(1 //3 )2 X , f = 0
1=1
VaR = u + d - a) (3.22)
subject to the conditions on which G ^ x ) is defined. Provided
£ > —0.5, ML estim ators are asym ptotically normal, and there
where a, naturally, is the VaR confidence level.
fore (relatively) well behaved.
The ES is then equal to the VaR plus the m ean-excess loss over
VaR. Provided £ < 1, our ES is:
GEV vs POT
VaR P ~ £u
(3.23)
1- f + 1 - e Both G E V and P O T approaches are different m anifestations
of the same underlying EV theory, although one is geared
tow ards the distribution of extrem es as such, whereas the other
Example 3.6 POT risk measures is geared tow ards the distribution of exceedances over a high
Suppose we set our param eters at some em pirically plausible threshold. In theory, there is therefore not too much to choose
values denom inated in % term s (i.e., /3 = 0.8, £ = 0.15, u = 2% between them , but in practice there may som etim es be reasons
and Nu/n = 4%; these are based on the em pirical values associ to prefer one over the other:
ated with contracts on futures clearinghouses). The 99.5% VaR
• One might be more natural in a given context than the
(in %) is therefore
other (e.g ., we may have limited data that would make one
0.8 1 preferable).
VaR = 2 + (1 - 0.995) 3.952
0.15 0.04 • The G E V typically involves an additional param eter relative
to the POT, and the most popular G E V approach, the block
The corresponding ES (in %) is
maxima approach (which we have implicitly assumed so
3.952 0.8 - 0.15 X 2 far), can involve some loss of useful data relative to the PO T
5.238
1 - 0.15 + 1 - 0.15
approach, because some blocks might have more than one
If we change the confidence level to 99.9% , the VaR and ES are extrem e in them . Both of these are disadvantages of the G E V
easily shown to be 5.942 and 17.578. relative to the POT.
• On the other hand, the P O T approach requires us to grapple

with the problem of choosing the threshold, and this problem
Estimation does not arise with the GEV.
To obtain estim ates, we need to choose a reasonable threshold However, at the end of the day, either approach is usually rea
u, which then determ ines the number of excess-threshold obser sonable, and one should choose the one that seem s to best suit
vations, Nu. The choice of threshold is the w eak spot of PO T the problem at hand.
theory: it is inevitably arbitrary and therefore judgm ental.
Choosing u also involves a trade-off: we want the threshold u to
be sufficiently high for the G PBdH theorem to apply reasonably
closely; but if u is too high, we will not have enough excess- 12 We can also estimate these parameters using moment-based meth
threshold observations from which to obtain reliable estim ates. ods, as for the GEV parameters (see Box 8-2). For the GPD, the param
This threshold problem is very much akin to the problem of eter estimators are (3 — 2m1m2/(m-| — 2m2) and £ = 2 — m1/(m1 - 2m2)
(see, e.g., Embrechts et. al. (1997), p. 358). However, as with their
choosing k to estim ate the tail index. We can also (if we are GEV equivalents, moment-based estimators can be unreliable, and the
lucky!) deal with it in a similar way. In this case, we would plot probability-weighted or ML ones are usually to be preferred.

3.3 REFINEM ENTS TO EV observations are clustered together. C lustering m atters for a
num ber of reasons:
APPROACHES
• It violates an im portant prem ise on which the earlier results
Having outlined the basics of EV T and its im plem entation, we depend, and the statistical im plications of clustering are not
now consider some refinem ents to it. These fall under three well understood.
headings: • There is evidence that data dependence can produce very
1A
• Conditional EV. poor estim ator perform ance.
• Dealing with dependent (or non-iid) data. • Clustering alters the interpretation of our results. For exam
ple, w e m ight say that there is a certain quantile or VaR
• M ultivariate EVT.
value that we would exp ect to be exce e d e d , on average,
only once every so often. But if data are clustered, w e do
Conditional EV not know how many tim es to exp ect this value to be
The EVT procedures described above are all unconditional: they breached in any given period: how frequently it is breached
are applied directly (i.e., without any adjustment) to the random will depend on the tendency of the breaches to be clus-
a r
variable of interest, X. A s with other unconditional applications, tered . Clustering therefore has an im portant effect on the
unconditional EV T is particularly useful when forecasting VaR interpretation of our results.
or ES over a long horizon period. However, it will som etim es There are two simple methods of dealing with tim e dependency
be the case that we wish to apply EVT to X adjusted for (i.e., in our data. Perhaps the most common (and certainly the easi
conditional on) some dynamic structure, and this involves dis est) is just to apply G E V distributions to block m axima. This is
tinguishing between X a n d the random factors driving it. This the sim plest and most w idely used approach. It exploits the
conditional or dynamic EVT is most useful when we are dealing point that maxima are usually less clustered than the underly
with a short horizon period, and where X has a dynamic struc ing data from which they are drawn, and becom e even less
ture that we can model. A good exam ple is where X might be clustered as the periods of tim e from which they are drawn get
governed by a G A R C H process. In such circum stances we might longer. We can therefore com pletely eliminate tim e dependence
want to take account of the G A R C H process and apply EV T not if we choose long enough block periods. This block maxima
to the raw return process itself, but to the random innovations approach is very easy to use, but involves some efficiency loss,
that drive it. because we throw away extrem e observations that are not block
O ne way to take account of this dynamic structure is to estim ate m axima. There is also the draw back that there is no clear guide
the G A R C H process and apply EV T to its residuals. This sug about how long the block periods should be, which leads to a
gests the following two-step procedure:13 new bandwidth problem com parable to the earlier problem of
how to select k.
• We estim ate a G ARCH -type process (e.g., a simple G A R C H ,
etc.) by some appropriate econom etric method and extract its A second solution to the problem of clustering is to estim ate the
residuals. These should turn out to be iid. The GARCH -type tail of the conditional distribution rather than the unconditional
model can then be used to make one-step ahead predictions one: we would first estim ate the conditional volatility model
of next period's location and scale param eters, /xt+1 and crt+ v (e.g ., via a G A R C H procedure), and then estim ate the tail index
of conditional standardized data. The tim e dependency in our
• W e apply EV T to these residuals, and then derive VaR esti
data is then picked up by the determ inistic part of our model,
m ates taking account of both the dynam ic (i.e., G A RC H )
and we can treat the random process as independent.1
16
5
1
4
structure and the residual process.
Dealing with Dependent (or Non-iid) Data

W e have assum ed so far that the stochastic process driving
14 See, e.g., Kearns and Pagan (1997).
our data is iid, but most financial returns exh ib it som e form
15 See McNeil (1998), p. 13.
of tim e d ep en d en cy (or pattern over tim e). This tim e d e p en
16 There is also a third, more advanced but also more difficult, solution.
dency usually takes the form of clustering, w here high/low
This is to estimate an extremal index—a measure of clustering—and
use this index to adjust our quantiles for clustering. For more details on
the extremal index and how to use it, see, e.g., Embrechts et. al. (1997,
13 This procedure is developed in more detail by McNeil and Frey (2000). Chapter 8.1).
Multivariate EVT the central focus of MEVT. And, as a m atter of empirical fact,
it is manifestly obvious that (at least some) extrem e events are
We have been dealing so far with univariate EVT, but there also not independent: a major earthquake can trigger other natural
exists m ultivariate extrem e value theory (M EVT), which can be or financial disasters (e.g ., tsunamis or m arket crashes). We all
used to model the tails of multivariate distributions in a theoreti know that disasters are often related. It is therefore impor
cally appropriate way. The key issue here is how to model the tant for risk managers to have some awareness of multivariate
dependence structure of extrem e events. To appreciate this extrem e risks.
issue, it is again im portant to recognise how EV theory differs
from more fam iliar central-value theory. As we all know, when
dealing with central values, we often rely on the central limit
3.4 CONCLUSIONS
theorem to justify the assumption of a normal (or more broadly,
elliptical) distribution. W hen we have such a distribution, the
EV T provides a tailor-made approach to the estimation of
dependence structure can then be captured by the (linear) cor
extrem e probabilities and quantiles. It is intuitive and plausible;
relations between the different variables. Given our distribu
and it is relatively easy to apply, at least in its more basic forms.
tional assum ptions, knowledge of variances and correlations (or,
It also gives us considerable practical guidance on what we
if we like, covariances) suffices to specify the m ultivariate distri
should estim ate and how we should do it; and it has a good
bution. This is why correlations are so im portant in central-value
track record. It therefore provides the ideal, tailor-made, way to
theory.
estim ate extrem e risk m easures.
H ow ever, this logic does not carry over to extrem es. W hen
EV T is also im portant in what it tells us not to do, and the
w e go beyond elliptical distrib utio ns, correlation no longer
most im portant point is not to use distributions justified by
suffices to d escrib e the d ep en d en ce structure. Instead, the
central limit theory— most particularly, the normal or Gaussian
m odeling of m ultivariate extrem es requires us to m ake use of
distribution— for extrem e-value estim ation. If we wish to esti
copulas. M EV T tells us th at the lim iting distribution of m ulti
mate extrem e risks, we should do so using the distributions sug
variate extrem e values will be a m em ber of the fam ily of EV
gested by EVT, not arbitrary distributions (such as the normal)
copulas, and w e can model m ultivariate EV d ep en d en ce by
that go against what EVT tells us.
assum ing one of these EV co p ulas. In th eo ry, our copulas can
also have as many dim ensions as we like, reflecting the num But we should not lose sight of the limitations of EV approaches,
ber of random variab les to be co n sid ered . However, there is and certain limitations stand out:
a curse of dim ensionality here. For exam p le, if w e have two • EV problem s are intrinsically d ifficu lt, because by d e fin i
in d ep en d en t variab les and classify univariate extrem e events tion w e alw ays have relatively few extrem e-value o b se rva
as those that occur one tim e in a 100, then w e should e xp e ct tions to w ork w ith. This m eans th at any EV estim ates will
to see one m ultivariate extrem e event (i.e ., both variab les necessarily be very uncertain, relative to any estim ates we
taking extrem e values) only one tim e in 1002, or one tim e in m ight m ake of more central quantiles or p ro b ab ilities. EV
10 000 o b servatio ns; with three in d ep en d en t variab les, we estim ates will th erefo re have relatively w ide confid ence
should e xp e ct to see a m ultivariate extrem e event one tim e intervals attached to them . This uncertainty is not a fault of
in 1003, or one tim e in 1 000 000 o b servatio n s, and so on. EV T as such, but an inevitab le co nseq uence of our paucity
A s the dim ensionality rises, our m ultivariate EV events rapidly of data.
becom e much rarer: w e have fe w e r m ultivariate extrem e
• EV estim ates are subject to considerable model risk. We have
ob servations to w ork w ith, and more param eters to e sti
to make various assumptions in order to carry out extreme-
m ate. Th ere is clearly a limit to how many dim ensions we can
value estim ations, and our results will often be very sensitive
handle.
to the precise assumptions we make. A t the same tim e, the
O ne might be tem pted to conclude from this exam ple that mul veracity or otherwise of these assum ptions can be difficult
tivariate extrem es are sufficiently rare that we need not worry to verify in practice. Hence, our estim ates are often critically
about them . However, this would be a big m istake. Even in th e dependent on assum ptions that are effectively unverifiable.
ory, the occurrence of multivariate extrem e events depends on EVT also requires us to make ancillary decisions about thresh
their joint distribution, and extrem e events cannot be assumed old values and the like, and there are no easy ways to make
to be independent. Instead the occurrence of such events is those decisions: the application of EV methods involves a
governed by the tail dependence of the m ultivariate distribu lot of subjective 'judgm ent'. Because of this uncertainty, it is
tion. Indeed, it is for exactly this reason that tail dependence is especially im portant with extrem es to estim ate confidence

intervals for our estim ated risk measures and to subject the of extrem e events and represents the most honest approach
i7
latter to stress testing. to measuring the uncertainty inherent in the problem .
• Because we have so little data and the theory we have is Thus EVT has a very useful, albeit limited, role to play in risk
(mostly) asym ptotic, EV estim ates can be very sensitive m easurem ent. As Diebold et. al. nicely put it:
to small sam ple effects, biases, non-linearities, and other
EVT is here to stay, but we believe that best-practice applica
unpleasant problem s.
tions of EV T to financial risk m anagem ent will benefit from
In the final analysis, we need to make the best use of theory awareness of its limitations— as well as its strengths. When
while acknowledging that the paucity of our data inevitably lim the smoke clears, the contribution of EV T remains basic
its the reliability of our results. To quote M cNeil, and useful: It helps us to draw smooth curves through the
W e are working in the tail . . . and we have only a limited extrem e tails of em pirical survival functions in a way that is
amount of data which can help us. The uncertainty in our guided by powerful theory. . . . [But] we shouldn't ask more
A Q
analyses is often high, as reflected by large confidence inter of the theory than it can deliver.1
8
7
vals. . . . However, if we wish to quantify rare events we are

better off using the theoretically supported m ethods of EVT 17 McNeil (1998, p. 18).
than other ad hoc approaches. EV T gives the best estim ates 18 Diebold et. al. (2000), p. 34.
Learning Objectives
Define backtesting and exceptions and explain the • Define and identify Type I and Type II errors.
im portance of backtesting VaR models.
Explain the need to consider conditional coverage in the
Explain the significant difficulties in backtesting a VaR model. backtesting fram ework.
Verify a model based on exceptions or failure rates. Describe the Basel rules for backtesting.
E x c e rp t is C hapter 6 o f Value at Risk: The New Benchm ark for Managing Financial Risk, Third Edition, b y Philippe Jo rio n .
49
4.1 SETUP FOR BACKTESTING
Disclosure of quantitative measures of m arket risk, such as
value-at-risk, is enlightening only when accom panied by a VaR m odels are only useful insofar as they can be dem onstrated
thorough discussion of how the risk measures were calcu to be reasonably accurate. To do this, users must check system
lated and how they related to actual perform ance. atically the validity of the underlying valuation and risk models
— Alan Greenspan (1996) through comparison of predicted and actual loss levels.
W hen the model is perfectly calibrated, the num ber of obser

vations falling outside VaR should be in line with the confi
dence level. The num ber of exceed en ces is also known as the
Value-at-risk (VaR) m odels are only useful insofar as they pre
num ber of e x ce p tio n s. W ith too many excep tio ns, the model
dict risk reasonably w ell. This is why the application of these
underestim ates risk. This is a m ajor problem because too little
m odels alw ays should be accom panied by valid ation. M o d e l
capital may be allocated to risk-taking units; penalties also may
validation is the general process of checking w hether a model
be im posed by the regulator. Too few exceptions are also a
is ad eq uate. This can be done with a set of to o ls, including
problem because they lead to excess, or inefficient, allocation
b acktesting , stress testin g , and indep end ent review and
of capital across units.
oversight.
This chapter turns to backtesting techniques for verifying the

accuracy of VaR m odels. Backtesting is a formal statistical fram e An Example
work that consists of verifying that actual losses are in line with
An exam ple of model calibration is described in Figure 4.1,
projected losses. This involves system atically com paring the his
which displays the fit between actual and forecast daily VaR
tory of VaR forecasts with their associated portfolio returns.
numbers for Bankers Trust. The diagram shows the absolute
Th ese procedures, som etim es called reality ch ecks, are essen value of the daily profit and loss (P&L) against the 99 percent
tial for VaR users and risk m anagers, who need to check that VaR, defined here as the daily price volatility . 1 The graph shows
th eir VaR fo recasts are well calib rated . If not, the m odels substantial tim e variation in the VaR m easures, which reflects
should be reexam ined for faulty assum ptions, wrong param changes in the risk profile of the bank. O bservations that lie
eters, or inaccurate m odeling. This process also provides ideas above the diagonal line indicate days when the absolute value
for im provem ent and as a result should be an integral part of of the P&L exceeded the VaR.
all VaR system s.
Assuming sym m etry in the P&L distribution, about 2 percent
B acktesting is also central to the Basel C o m m ittee's gro und of the daily observations (both positive and negative) should
breaking decision to allow internal VaR m odels fo r capital lie above the diagonal, or about 5 data points in a year. Here
req uirem ents. It is unlikely the Basel C o m m ittee would we observe four exceptions. Thus the model seem s to be well
have done so w ithout the d iscip line of a rigorous b a ck te st calibrated. We could have observed, however, a greater number
ing m echanism . O th e rw ise , banks may have an incentive to of deviations simply owing to bad luck. The question is: A t what
understate th eir risk. This is why the backtesting fram ew o rk point do we reject the model?
should be designed to m axim ize the p ro b ab ility of catching
banks that w illfully understate th eir risk. On the other hand,
the system also should avoid unduly penalizing banks w hose
Which Return?
VaR is e xce e d e d sim ply because of bad luck. This d elicate Before we even start addressing the statistical issue, a serious
choice is at the heart of statistical decision procedures for data problem needs to be recognized. VaR measures assume
b acktestin g . that the current portfolio is "frozen" over the horizon. In
This chapter first provides an actual exam ple of model verifica

tion and discusses im portant data issues for the setup of VaR
backtesting, then presents the main method for backtesting,
which consists of counting deviations from the VaR m odel. It 1 Note that the graph does not differentiate losses from gains. This is
also describes the supervisory fram ework by the Basel C o m typically the case because companies usually are reluctant to divulge the
extent of their trading losses. This illustrates one of the benefits of VaR
mittee for backtesting the internal-models approach. Finally, relative to other methods, namely, that by taking the absolute value, it
practical uses of VaR backtesting are illustrated. hides the direction of the positions.
Since the VaR forecast really pertains to R*, back
testing ideally should be done with these hypo
thetical returns. Actual returns do matter, though,
because they entail real profits and losses and are
scrutinized by bank regulators. They also reflect the
true ex post volatility of trading returns, which is
also inform ative. Ideally, both actual and hypotheti
cal returns should be used for backtesting because
both sets of num bers yield inform ative com pari
sons. If, for instance, the model passes backtesting
with hypothetical but not actual returns, then the
problem lies with intraday trading. In contrast, if the
model does not pass backtesting with hypothetical
returns, then the m odeling m ethodology should be
reexam ined.
Daily price volatility
Fiq u re 4.1 Model evaluation: Bankers trust.

4.2 M ODEL BACKTESTING WITH
EXCEPTIONS
practice, the trading portfolio evolves dynam ically during the
day. Thus the actual portfolio is "contam inated" by changes Model backtesting involves system atically com paring historical
in its com position. The actual return corresponds to the actual VaR measures with the subsequent returns. The problem is
P&L, taking into account intraday trades and other profit items that since VaR is reported only at a specified confidence level,
such as fees, com missions, spreads, and net interest income. we exp ect the figure to be exceeded in some instances, for
exam ple, in 5 percent of the observations at the 95 percent
This contamination will be minimized if the horizon is relatively
confidence level. But surely we will not observe exactly 5 per
short, which explains why backtesting usually is conducted on
cent exceptions. A greater percentage could occur because of
daily returns. Even so, intraday trading generally will increase
bad luck, perhaps 8 percent. A t some point, if the frequency of
the volatility of revenues because positions tend to be cut down
deviations becom es too large, say, 20 percent, the user must
toward the end of the trading day. Counterbalancing this is the
conclude that the problem lies with the m odel, not bad luck,
effect of fee incom e, which generates steady profits that may
and undertake corrective action. The issue is how to make this
not enter the VaR measure.
decision. This a cce p t or reject decision is a classic statistical
For verification to be meaningful, the risk m anager should track decision problem.
both the actual portfolio return Rt and the hypothetical return
A t the outset, it should be noted that this decision must be
Rt that most closely matches the VaR forecast. The hypothetical
made at some confidence level. The choice of this level for the
return Rt represents a frozen portfolio, obtained from fixed
test, however, is not related to the quantitative level p selected
positions applied to the actual returns on all securities, m ea
for VaR. The decision rule may involve, for instance, a 95 percent
sured from close to close.
confidence level for backtesting VaR numbers, which are them
Som etim es an approxim ation is obtained by using a cleaned selves constructed at some confidence level, say, 99 percent for
return, which is the actual return minus all non-mark-to-market the Basel rules.
items, such as fees, com m issions, and net interest income.
Under the latest update to the m arket-risk am endm ent, supervi
Model Verification Based on Failure Rates
sors will have the choice to use either hypothetical or cleaned
returns.2 The sim plest method to verify the accuracy of the model is to
record the failure rate, which gives the proportion of tim es VaR
is exceeded in a given sam ple. Suppose a bank provides a VaR
figure at the 1 percent left-tail level (p = 1 — c) for a total of
2 See BCBS (2005b). T days. The user then counts how many tim es the actual loss
Chapter 4 Backtesting VaR ■ 51

exceeds the previous day's VaR. Define N as the number of
exceptions and N / T as the failure rate. Ideally, the failure rate BOX 4.1 J.P. MORGAN'S EXCEPTIONS
should give an unbiased measure of p, that is, should converge
In its 1998 annual report, the U.S. commercial bank J .R
to p as the sam ple size increases. Morgan (JPM ) explained that
We want to know, at a given confidence level, w hether N is In 1998, daily revenue fell short of the downside
too small or too large under the null hypothesis that p = 0.01 (95 percent VaR) band . . . on 20 days, or more than
in a sam ple of size T. Note that this test makes no assumption 5 percent of the tim e. Nine of these 20 occurrences
fell within the August to O ctober period.
about the return distribution. The distribution could be normal,
or skew ed, or with heavy tails, or tim e-varying. We simply count We can test whether this was bad luck or a faulty
model, assuming 252 days in the year. Based on Equa
the number of exceptions. As a result, this approach is fully
tion (4.2), we have z = (x — p T )/ V p (1 — p )T =
nonparam etric.
(20 - 0.05 X 252)/V 0.05(0.95)252 = 214. This is larger
The setup for this test is the classic testing fram ework for a than the cutoff value of 1.96. Therefore, we reject the
sequence of success and failures, also called Bernoulli trials. hypothesis that the VaR model is unbiased. It is unlikely (at
the 95 percent test confidence level) that this was bad luck.
Under the null hypothesis that the model is correctly calibrated,
the number of exceptions x follows a binom ial probability The bank suffered too many exceptions, which must have
distribution: led to a search for a better model. The flaw probably was
due to the assumption of a normal distribution, which
does not model tail risk adequately. Indeed, during the
Ux)= Q p - o - p)r ~x (4.D
fourth quarter of 1998, the bank reported having switched
to a "historical sim ulation" model that better accounts for
fat tails. This episode illustrates how backtesting can lead
We also know that x has expected value of E(x) = p T and vari
to improved models.
ance V(x) = p(1 — p)T. W hen T is large, we can use the central
limit theorem and approxim ate the binomial distribution by the
normal distribution
x - pT
N ( 0, 1) (4.2) W hen designing a verification test, the user faces a trade-off
V p (1 - p )T between these two types of error. Table 4.1 sum marizes the
two states of the world, correct versus incorrect model, and the
which provides a convenient shortcut. If the decision rule is
decision. For backtesting purposes, users of VaR models need
defined at the two-tailed 95 percent test confidence level, then
to balance type 1 errors against type 2 errors. Ideally, one would
the cutoff value of |z| is 1.96. Box 4.1 illustrates how this can be
want to set a low type 1 error rate and then have a test that
used in practice.
This binomial distribution can be used to test

w hether the number of exceptions is acceptably
Frequency (Model is correct: p=0.01, T=250 observations)
small. Figure 4.2 describes the distribution when
the model is calibrated correctly, that is, when
p = 0.01 and with 1 year of data, T = 250. The
graph shows that under the null, we would observe
more than four exceptions 10.8 percent of the
tim e. The 10.8 percent number describes the prob
ability of committing a type 7 error, that is, reject
ing a correct model.
N ext, Figure 4.3 d escrib es the distribution of

num ber of excep tio n s when the model is cali
brated incorrectly, that is, when p = 0.03 instead
of 0 .0 1 . The graph shows that we will not reject
the incorrect model more than 12.8 percent of
the tim e. This d escrib es the probability of com
m itting a type 2 error, that is, not rejecting an Number of exceptions
incorrect m odel. Fig u re 4 .2 Distribution of exceptions when model is correct.
Frequency (Model is false: p=3%, T=250 observations) Kupiec (1995) develops approxim ate 95 percent
confidence regions for such a test, which are
reported in Table 4.2. These regions are defined by
the tail points of the log-likelihood ratio:
LRue = - 2 ln[(1 - p)T“ V l

+2 ln{[1 — {N /T )]t4
( .3)
which is asym ptotically (i.e., when T is large) dis

tributed chi-square with one degree of freedom
under the null hypothesis that p is the true prob
ability. Thus we would reject the null hypothesis if
LR > 3.841. This test is equivalent to Equation (4.2)
because a chi-square variable is the square of a
normal variable.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In the JPM exam ple, we had N = 20 exceptions
Number of exceptions
over T = 252 days, using p = 95 percent VaR con
Fiqure 4.3 Distribution of exceptions when model is incorrect. fidence level. Setting these numbers into Equation
(4.3) gives LRUC = 3.91. Therefore, we reject uncon
ditional coverage, as expected.
creates a very low type 2 error rate, in which case the test is said
For instance, with 2 years of data (T = 510), we would expect
to be pow erful. It should be noted that the choice of the confi
to observe N = p T = 1 percent tim es 510 = 5 exceptions. But
dence level for the decision rule is not related to the quantita
the VaR user will not be able to reject the null hypothesis as long
tive level p selected for VaR. This confidence level refers to the
as N is within the [1 < N < 11] confidence interval. Values of
decision rule to reject the model.
N greater than or equal to 11 indicate that the VaR is too low
or that the model understates the probability of large losses.
Table 4.1 Decision Errors Values of N less than or equal to 1 indicate that the VaR model is
overly conservative.
Model
The table also shows that this interval, expressed as a pro
Decision Correct Incorrect portion N/T, shrinks as the sam ple size increases. Select,
for instance, the p = 0.05 row. The interval for T = 252
A ccep t OK Type 2 error
is [6/252 = 0.024, 20/252 = 0.079]; for T = 1000, it is
Reject Type 1 error OK
[37/1000 = 0.037, 65/1000 = 0.065]. Note how the interval
Table 4.2 Model Backtesting, 95 Percent Nonrejection Test Confidence Regions

Nonrejection Region for Number of Failures N
Probability level P VAR Confidence Level c T = 252 Days T = 510 Days T = 1000 Days
0.01 99% N <7 1 < N < 11 4 < N < 17
0.025 97.5% 2 < N < 12 6 < N < 21 15 < N < 36
0.05 95% 6 < N < 20 16 < N < 36 37 < N < 65
0.075 92.5% 11 < N < 2 8 27 < N < 51 59 < N < 9 2
0.10 90% 16 < N < 36 38 < N < 65 81 < N < 120
Note: N is the number of failures that could be observed in a sample size T without rejecting the null hypothesis that p is the correct probability at the
95 percent level of test confidence.
Source: Adapted from Kupiec (1995).

shrinks as the sam ple size extends. With more data, we should W ithin the "yellow " zone, the penalty is up to the supervisor,
be able to reject the model more easily if it is false. depending on the reason for the exception. The Basel Com m it
tee uses the following categories:
The table, however, points to a disturbing fact. For small val
ues of the VaR param eter p, it becom es increasingly difficult to • Basic integrity o f the m odel. The deviation occurred because
confirm deviations. For instance, the nonrejection region under the positions were reported incorrectly or because of an error
p = 0.01 and T = 252 is [ N < 7]. Therefore, there is no way to in the program code.
tell if N is abnorm ally small or w hether the model system atically • M o d e l accuracy cou ld b e im proved. The deviation occurred
overestim ates risk. Intuitively, detection of system atic biases because the model does not measure risk with enough preci
becom es increasingly difficult for low values of p because the sion (e.g ., has too few maturity buckets).
exceptions in these cases are very rare events.
• Intraday trading. Positions changed during the day.
This explains why some banks prefer to choose a higher VaR • Bad luck. M arkets were particularly volatile or correlations
confidence level, such as c = 95 percent, in order to be able to changed.
observe sufficient numbers of deviations to validate the model.
The description of the applicable penalty is suitably vague.
A m ultiplicative factor then is applied to translate the VaR fig
W hen exceptions are due to the first two reasons, the penalty
ure into a safe capital cushion number. Too often, however, the
"should" apply. With the third reason, a penalty "should be
choice of the confidence level appears to be made without
considered." W hen the deviation is traced to the fourth reason,
regard for the issue of VaR backtesting.
the Basel docum ent gives no guidance excep t that these excep
tions should "b e expected to occur at least some of the tim e."
The Basel Rules These exceptions may be excluded if they are the "result of
such occurrences as sudden abnormal changes in interest rates
This section now turns to a detailed analysis of the Basel C o m
or exchange rates, major political events, or natural disasters."
mittee rules for backtesting. W hile we can learn much from the
In other words, bank supervisors want to keep the flexibility to
Basel fram ework, it is im portant to recognize that regulators
adjust the rules in turbulent tim es as they see fit.
operate under different constraints from financial institutions.
Since they do not have access to every com ponent of the m od The crux of the backtesting problem is separating back luck
els, the approach is perforce im plem ented at a broader level. from a faulty m odel, or balancing type 1 errors against type 2
Regulators are also responsible for constructing rules that are errors. Table 4.4 displays the probabilities of obtaining a given
com parable across institutions. number of exceptions for a correct model (with 99 percent
coverage) and incorrect model (with only 97 percent coverage).
The Basel (1996a) rules for backtesting the internal-models
W ith five exceptions or more, the cum ulative probability, or type
approach are derived directly from this failure rate test. To design
1 error rate, is 10.8 percent. This is rather high to start with. In
such a test, one has to choose first the type 1 error rate, which
the current fram ew ork, one bank out of 10 could be penalized
is the probability of rejecting the model when it is correct. When
even with a correct model.
this happens, the bank simply suffers bad luck and should not be
penalized unduly. Flence one should pick a test with a low type 1 Even worse, the type 2 error rate is also very high. Assuming a
error rate, say, 5 percent (depending on its cost). The heart of the true 97 percent coverage, the supervisor will give passing grades
conflict is that, inevitably, the supervisor also will commit type 2
errors for a bank that willfully cheats on its VaR reporting.
| Table 4 .3 | The Basel Penalty Zones
The current verification procedure consists of recording daily
exceptions of the 99 percent VaR over the last year. O ne would Zone Number of Exceptions Increase in k
exp ect, on average, 1 percent of 250, or 2.5 instances of excep Green 0 to 4 0.00
tions over the last year.
Yellow 5 0.40
The Basel Com m ittee has decided that up to four exceptions
6 0.50
are acceptable, which defines a "green light" zone for the
bank. If the number of exceptions is five or more, the bank falls 7 0.65
into a "yellow " or "red " zone and incurs a progressive penalty 8 0.75
whereby the m ultiplicative factor k is increased from 3 to 4, as
9 0.85
described in Table 4.3. An incursion into the "re d " zone gener
Red 10+ 1.00
ates an autom atic penalty.
Table 4.4 Basel Rules for Backtesting, Probabilities of Obtaining Exceptions (T = 250)
Coverage = 99% Coverage = 97%

Model Is Correct Model Is Incorrect
Cumulative
Cumulative (Type 1) (Type 2) Power
Number of Probability (Reject) Probability (Do not reject) (Reject)
Zone Exceptions N P (X = N ) P(X > N ) P(X = N ) P(X < N ) P (X > N )
Green 0 8.1 100.0 0.0 0.0 100.0
1 20.5 91.9 0.4 0.0 100.0
2 25.7 71.4 1.5 0.4 99.6
3 21.5 45.7 3.8 1.9 98.1
Green 4 13.4 24.2 7.2 5.7 94.3
Yellow 5 6.7 10.8 10.9 12.8 87.2
6 2.7 4.1 13.8 23.7 76.3
7 1.0 1.4 14.9 37.5 62.5
8 0.3 0.4 14.0 52.4 47.6
Yellow 9 0.1 0.1 11.6 66.3 33.7
Red 10 0.0 0.0 8.6 77.9 21.1
11 0.0 0.0 5.8 86.6 13.4
to 12.8 percent of banks that have an incorrect model. The Another method to increase the power of the test would be
fram ework therefore is not very powerful. And this 99 versus 97 to increase the number of observations. With T = 1000, for
percent difference in VaR coverage is econom ically significant. instance, we would choose a cutoff of 14 exceptions, for a type
Assuming a normal distribution, the true VaR would be 23.7 per 1 error rate of 13.4 percent and a type 2 error rate of 0.03 per
cent times greater than officially reported, which is substantial. cent, which is now very small. Increasing the number of observa
tions drastically improves the test.
The lack of power of this fram ework is due to the choice of the
high VaR confidence level (99 percent) that generates too few
exceptions for a reliable test. Consider instead the effect of a 95 Conditional Coverage Models
percent VaR confidence level. (To ensure that the amount of capi
So far the fram ework focuses on unconditional co vera g e
tal is not affected, we could use a larger multiplier k.) We now
because it ignores conditioning, or tim e variation in the data.
have to decide on the cutoff number of exceptions to have a type
The observed exceptions, however, could cluster or "bunch"
1 error rate similar to the Basel fram ework. With an average of 13
closely in tim e, which also should invalidate the model.
exceptions per year, we choose to reject the model if the number
of exceptions exceeds 17, which corresponds to a type 1 error of With a 95 percent VaR confidence level, we would exp ect to
12.5 percent. Here we controlled the error rate so that it is close have about 13 exceptions every year. In theory, these occur
to the 10.8 percent for the Basel framework. But now the proba rences should be evenly spread over tim e. If, instead, we
bility of a type 2 error is lower, at 7.4 percent only.3 Thus, simply observed that 10 of these exceptions occurred over the last 2
changing the VaR confidence level from 99 to 95 percent sharply w eeks, this should raise a red flag. The m arket, for instance,
reduces the probability of not catching an erroneous model. could experience increased volatility that is not captured by
VaR. O r traders could have moved into unusual positions or risk
"h o les." W hatever the explanation, a verification system should
be designed to measure proper conditional coverage, that is,
3 Assuming again a normal distribution and a true VaR that is 23.7 per
cent greater than the reported VaR, for an alternative coverage of 90.8 conditional on current conditions. M anagem ent then can take
percent. the appropriate action.

Table 4.S Building an Exception Table: Expected Number 77-1 = 6/20 = 30.0 p ercen t. W e seem to have a
of Exceptions much higher p ro b ab ility of having an e x c e p
tion follow ing another one. Setting these num
Conditional
bers into Equation (4.4), w e find LRind = 9.53.
Day Before Because this is higher than the cuto ff value of
3 .8 4 , w e reject in d e p en d en ce . Exce p tio n s do
No Exception Exception Unconditional
seem to cluster abnorm ally. A s a result, the
Current day risk m anager may w ant to exp lo re m odels that
allow for tim e variation in risk.
No exception Too = T0(1 — 77q) T(1 - 77)
(T
I
0
Exception T 01 = Tq {7Tq) T -11 — T| (77-)) T(77)
Extensions
Total To T| T = T0 + T 1
We have seen that the standard exception tests
often lack power, especially when the VaR confi
Such a test has been developed by Christoffersen (1998), who dence level is high and when the number of observations is low.
extends the LRUC statistic to specify that the deviations must This has led to a search for improved tests.
be serially independent. The test is set up as follows: Each day
The problem , however, is that statistical decision theory has
we set a deviation indicator to 0 if VaR is not exceeded and to
shown that this exception test is the most powerful among its
1 otherwise. We then define as the number of days in which
class. More effective tests would have to focus on a different
state j occurred in one day while it was at / the previous day
hypothesis or use more information.
and TT-, as the probability of observing an exception conditional
on state / the previous day. Table 4.5 shows how to construct a For exam p le, C rn ko vic and Drachm an (1996) d evelo p ed a test
table of conditional exceptions. focusing on the entire probability distrib utio n, based on the
K u ip er sta tistic. This test is still nonparam etric but is more
If today's occurrence of an exception is independent of what
pow erful. H owever, it uses other inform ation than the VaR
happened the previous day, the entries in the second and third
fo recast at a given confidence level. A n o th er approach is to
columns should be identical. The relevant test statistic is
focus on the tim e period betw een excep tio n s, called duration.
LRind = - 2 In [(1 - t^ o+ T oV ^ +T,)] C hristo ffersen and Pelletier (2004) show that duration-based
tests can be more powerful than the standard te st when risk is
+ 2 In [(1 — 7To)T°° 77()01(1 — 77-|)Ti0 7t | 11] (4.4)
tim e-varying.
Here, the first term represents the maximized likelihood under
Finally, backtests could use param etric information instead.
hypothesis that exceptions are independent across days, or
If the VaR is obtained from a multiple of the standard devia
77 = 770 = 77-1 = (T 01 + Tu )/T. The second term is the maximized
tion, the risk m anager could test the fit between the realized
likelihood for the observed data.
and forecast volatility. This would lead to more powerful tests
The com bined test statistic for conditional coverage because more information is used. Another useful avenue would
be to backtest the portfolio com ponents as w ell. From the view
then is
point of the regulator, however, the only information provided is
LR cc = LRUC + LRind (4.5) the daily VaR, which explains why exception tests are used most
commonly nowadays.
Each com ponent is independently distributed as ^2(1). asym p
totically. The sum is distributed as ;^2(2). Thus we would reject
at the 95 percent test confidence level if LR > 5.991. We would
4.3 APPLICATIONS
reject independence alone if LRind > 3.841.
As an exam p le, assum e th at JP M observed the fo llo w Berkowitz and O 'Brien (2002) provide the first em pirical study
ing pattern of excep tio n s during 1998. O f 252 days, we of the accuracy of internal VaR m odels, using data reported
have 20 e xce p tio n s, which is a fraction of 77 = 7.9 p ercen t. to U.S. regulators. They describe the distributions of P&L,
O f th ese, 6 excep tio n s occurred follow ing an excep tio n which are com pared with the VaR forecasts. G enerally, the P&L
the previous day. A lte rn a tive ly, 14 excep tio n s occurred distributions are sym m etric, although they display fatter tails
when there w as none the previous day. This d efines co n d i than the normal. Stahl et. al. (2006) also report that, although
tional p ro b ab ility ratios of 77q = 14/232 = 6.0 p ercent and the com ponents of a trading portfolio could be strongly
nonnormal, aggregation to the highest
Conditional
level of a bank typically produces sym
metric distributions that resem ble the Day Before
normal.
No Exception Exception Unconditional
Figure 4.4 plots the tim e series of
Current day
P&L along with the daily VaR (the
low er lines) for a sam ple of six U .S. No exception 218 14 232
com m ercial banks. W ith a p p ro xi Exception 14 6 20
m ately 600 o b servatio n s, w e should
Total 232 20 252
o b serve on averag e 6 vio latio n s,
given a VaR co n fid en ce level of
99 p ercent.
Bank 1 Bank 2
It is striking to see the abnorm ally
sm all num ber of e xce p tio n s, even
though the sam ple includes the tu rb u
lent 1998 p erio d . Bank 4, for exam p le,
has zero excep tio n s over this sam ple.
Its VaR is several tim es g reater than
the m agnitude of extrem e flu ctu a
tions in its P&L. Ind eed , for banks 3
to 6, the averag e VaR is at least 60
p ercent higher than the actual 99th
p ercen tile of the P&L d istrib u tio n .
Thus banks report VaR m easures that
are co n se rv a tive , or too large relative
to th e ir actual risks. Th ese results are
surprising because they im ply th at the
banks' VaR and hence th eir m arket-risk
charges are too high. Banks th erefo re
Bank 5
allo cate too much reg ulato ry capital
to th e ir trading activities. Box 4.2
d escrib es a potential exp lan atio n ,
which is sim p listic. -5
Perhap s th ese o b servatio n s could be -io -l ,‘^ry

I
exp lain ed by the use of actual instead
1997.5 1998 1998.5 1999 1999.5 2000
of hyp o th etical re tu rn s.4 O r m aybe
the m odels are too sim p le, fo r e xa m Fig u re 4 .4 Bank VaR and trading profits.
ple failing to acco unt fo r d iv e rsific a
p refer to report high VaR num bers to avoid the p o ssib ility of
tion e ffe cts. Yet ano ther exp lan atio n is
reg u lato ry intrusion. S till, th ese p ractices im poverish the
th at cap ital req uirem ents are curren tly not b inding. The
inform ational co n ten t of VaR num bers.
am ount of eco n o m ic cap ital U .S . banks currently hold is in
e xce ss of th e ir reg u lato ry cap ital. A s a resu lt, banks may
4.4 CONCLUSIONS
4 Including fees increases the P&L, reducing the number of violations.
Using hypothetical income, as currently prescribed in the European Model verification is an integral com ponent of the risk m anage
Union, could reduce this effect. Jaschke, Stahl, and Stehle (2003) com ment process. Backtesting VaR numbers provides valuable fe e d
pare the VaRs for 13 German banks and find that VaR measures are, on
average, less conservative than for U.S. banks. Even so, VaR forecasts back to users about the accuracy of their m odels. The procedure
are still too high. also can be used to search for possible im provem ents.

Verification tests usually are based on "excep tio n " counts,
BOX 4.2 NO EXCEPTIONS defined as the number of exceedences of the VaR measure.
The goal is to check if this count is in line with the selected VaR
The C E O of a large bank receives a daily report of the
bank's VaR and P&L. W henever there is an exception, the confidence level. The method also can be modified to pick up
C E O calls in the risk officer for an explanation. bunching of deviations.
Initially, the risk officer explained that a 99 percent VaR con Backtesting involves balancing two types of errors: rejecting a
fidence level implies an average of 2 to 3 exceptions per correct model versus accepting an incorrect m odel. Ideally, one
year. The C E O is never quite satisfied, however. Later, tired
would want a fram ework that has very high power, or high prob
of going "upstairs," the risk officer simply increases the
confidence level to cut down on the number of exceptions. ability of rejecting an incorrect m odel. The problem is that the
power of exception-based tests is low. The current fram ework
Annual reports suggest that this is frequently the case.
could be improved by choosing a lower VaR confidence level or
Financial institutions routinely produce plots of P&L that
show no violation of their 99 percent confidence VaR over by increasing the number of data observations.
long periods, proclaiming that this supports their risk model.
Adding to these statistical difficulties, we have to recognize
other practical problem s. Trading portfolios do change over the
horizon. Models do evolve over tim e as risk managers improve
their risk modeling techniques. All this may cause further struc
Due thought should be given to the choice of VaR quantitative
tural instability.
param eters for backtesting purposes. First, the horizon should
be as short as possible in order to increase the number of obser D espite all these issues, backtesting has becom e a central
vations and to m itigate the effect of changes in the portfolio com ponent of risk m anagem ent system s. The m ethodology
com position. Second, the confidence level should not be too allows risk m anagers to im prove their m odels constantly.
high because this decreases the effectiveness, or power, of the Perhaps most im portant, backtesting should ensure that risk
statistical tests. m odels do not go astray.
VaR Mapping
Learning Objectives
Explain the principles underlying VaR m apping, and Describe how mapping of risk factors can support stress
describe the mapping process. testing.
Explain how the mapping process captures general and Explain how VaR can be used as a perform ance
specific risks. benchm ark.
Differentiate among the three methods of mapping port Describe the method of mapping forwards, forward rate
folios of fixed income securities. agreem ents, interest rate swaps, and options.
Summarize how to map a fixed income portfolio into posi

tions of standard instrum ents.
E x c e rp t is C hapter 11 o f Value at Risk: The New Benchm ark for Managing Financial Risk, Third Edition, b y Philippe Jo rion .
59
The second [principle], to divide each of the difficulties BOX 5.1 WHY MAPPING?
under exam ination into as many parts as possible, and as
"J.P. Morgan Chase's VaR calculation is highly granular,
might be necessary for its adequate solution. comprising more than 2.1 million positions and 240,000
— Rene Descartes pricing series (e.g ., securities prices, interest rates, foreign
exchange rates)." (Annual report, 2004)
W hichever value-at-risk (VaR) method is used, the risk m easure

forward contracts. The positions may differ owing to different
ment process needs to simplify the portfolio by m apping the
m aturities and delivery prices. It is unnecessary, however, to
positions on the selected risk factors. Mapping is the process by
model all these positions individually. Basically, the positions
which the current values of the portfolio positions are replaced
are exposed to a single major risk factor, which is the dollar/
by exposures on the risk factors.
euro spot exchange rate. Thus they could be sum m arized by
Mapping arises because of the fundam ental nature of VaR, a single aggregate exposure on this risk factor. Such ag grega
which is portfolio m easurem ent at the highest level. As a result, tion, of course, is not appropriate for the pricing of the port
this is usually a very large-scale aggregation problem . It would folio. For risk m easurem ent purposes, however, it is perfectly
be too com plex and time-consuming to model all positions indi accep tab le. This is why risk m anagem ent m ethods can differ
vidually as risk factors. Furtherm ore, this is unnecessary because from pricing m ethods.
many positions are driven by the same set of risk factors and
Mapping is also the only solution when the characteristics of
can be aggregated into a small set of exposures without loss of
the instrument change over tim e. The risk profile of bonds, for
risk information. O nce a portfolio has been m apped on the risk
instance, changes as they age. O ne cannot use the history of
factors, any of the three VaR methods can be used to build the
prices on a bond directly. Instead, the bond must be mapped
distribution of profits and losses.
on yields that best represent its current profile. Similarly, the
This chapter illustrates the mapping process for major financial risk profile of options changes very quickly. O ptions must be
instrum ents. First we review the basic principles behind m ap m apped on their primary risk factors. Mapping provides a way
ping for VaR. We then proceed to illustrate cases where instru to tackle these practical problem s.
ments are broken down into their constituent com ponents.
We will see that the mapping process is instructive because it
reveals useful insights into the risk drivers of derivatives. The Mapping as a Solution to Data Problems
next sections deal with fixed-incom e securities and linear deriva
Mapping is also required in many common situations. O ften a
tives. We cover the most im portant instrum ents, forward con
com plete history of all securities may not exist or may not be
tracts, forward rate agreem ents, and interest-rate swaps. Then
relevant. Consider a mutual fund with a strategy of investing in
we describe nonlinear derivatives, or options.
initial pu b lic offerings (IPOs) of common stock. By definition,
these stocks have no history. They certainly cannot be ignored
in the risk system , however. The risk m anager would have to
5.1 MAPPING FOR RISK replace these positions by exposures on similar risk factors
M EASUREM ENT already in the system .
Another common problem with global m arkets is the tim e at

Why Mapping?
which prices are recorded. Consider, for instance, a portfolio
The essence of VaR is aggregation at the highest level. This gen or mutual funds invested in international stocks. A s much as
erally involves a very large number of positions, including bonds, 15 hours can elapse from the time the m arket closes in Tokyo
stocks, currencies, com modities, and their derivatives. As a result, at 1:00 a .m. ES T (3:00 p.m . in Japan) to the time it closes in the
it would be impractical to consider each position separately (see United States at 4:00 p. m . A s a result, prices from the Tokyo
Box 5.1). Too many computations would be required, and the close ignore intervening information and are said to be stale.
time needed to measure risk would slow to a crawl. This led to the mutual-fund scandal of 2003, which is described
in Box 5.2.
Fortunately, m apping provides a shortcut. Many positions
can be sim plified to a sm aller num ber of positions on a For risk m anagers, stale prices cause problem s. Because returns
set of elem entary, or prim itive, risk factors. Consider, for are not synchronous, daily correlations across m arkets are too
instance, a trader's desk with thousands of open dollar/euro low, which will affect the m easurem ent of portfolio risk.
BOX 5.2 MARKET TIMING AND STALE PRICES
In Septem ber 2003, New York A ttorney General Eliot Spitzer Such trading, however, creates transactions costs that are
accused a number of investm ent com panies of allowing borne by the other investors in the fund. As a result, fund
m arket timing into their funds. M arket timing is a short-term com panies usually state in their prospectus that this practice
trading strategy of buying and selling the same funds. is not allowed. In practice, Eliot Spitzer found out that many
mutual-fund com panies had encouraged m arket tim ers,
Consider, for exam ple, our portfolio of Japanese and U.S.
which he argued was fraudulent. Eventually, a number of
stocks, for which prices are set in different time zones. The
funds settled by paying more than USD 2 billion.
problem is that U.S. investors can trade up to the close of
the U.S. market. M arket tim ers could take advantage of this This practice can be stopped in a number of ways. Many
discrepancy by rapid trading. For instance, if the U.S. mar mutual funds now impose short-term redem ption fees, which
ket moves up following good news, it is likely the Japanese make m arket timing uneconom ical. Alternatively, the cutoff
market will move up as well the following day. Market timers tim e for placing trades can be moved earlier.
would buy the fund at the stale price and resell it the next day.
One possible solution is mapping. For instance, Instruments

prices at the close of the U.S. market can be esti
mated from a regression of Japanese returns on U.S.
returns and using the forecast value conditional on
the latest U.S. information. Alternatively, correlations
can be measured from returns taken over longer
time intervals, such as weekly. In practice, the risk
manager needs to make sure that the data-collection
process will lead to meaningful risk estim ates.
The Mapping Process

Figure 5.1 illustrates a sim ple mapping process,
where six instruments are m apped on three risk
factors. The first step in the analysis is marking all
positions to m arket in current dollars or w hatever
reference currency is used. The m arket value for
each instrument then is allocated to the three risk
factors.
current m arket value is not fully allocated to the risk factors, it
Table 5.1 shows that the first instrument has a m arket value of must mean that the rem ainder is allocated to cash, which is not
V-i, which is allocated to three exposures, x^ , x 12, and x-13. If the a risk factor because it has no risk.
Table 5.1 Mapping Exposures
Exposure on Risk Factor
Market Value 1 2 3
Instrument 1 Vi *11 *12 *1 3
Instrument 2 v2 X2-| X22 x 23

• • • • •
• • • • •
• • • • •
Instrument 6 V6 *61 *62 *63

Total portfolio V Y6
^ = 2 ,,-iX n X2 = 2 /= 1X/2 X3 = X / = 1X/3
Chapter 5 VaR Mapping ■ 61

N ext, the system allocates the position for instrument 2 and These exposures are aggregated across all the stocks in the
so on. A t the end of the process, positions are summed for portfolio. This gives
each risk factor. For the first risk factor, the dollar exposure is N
x-| = 2 f= ix /1. This creates a vector x of three exposures that can /3
P=2
/= 1
w fii (5.3)
be fed into the risk m easurem ent system .
If the portfolio value is W, the mapping on the index is x = W(3p.
Mapping can be of two kinds. The first provides an exact alloca
tion of exposures on the risk factors. This is obtained for deriva N ext, we decom pose the variance Rp in Equation (5.2) and find
tives, for instance, when the price is an exact function of the risk N
factors. As we shall see in the rest of this chapter, the partial V(RP) = (& )V (R m) + fa*, (5.4)
1=1
derivatives of the price function generate analytical measures of
exposures on the risk factors. The first com ponent is the general m arket risk. The second com
ponent is the aggregate of specific risk for the entire portfolio.
Alternatively, exposures may have to be estim ated. This occurs,
This decom position shows that with more detail on the primitive
for instance, when a stock is replaced by a position in the stock
or general-m arket risk factors, there will be less specific risk for a
index. The exposure then is estim ated by the slope coefficient
fixed am ount of total risk V(Rp).
from a regression of the stock return on the index return.
As another exam ple, consider a corporate bond portfolio. Bond
positions describe the distribution of money flows over tim e by
General and Specific Risk their amount, tim ing, and credit quality of the issuer. This cre
ates a continuum of risk factors, going from overnight to long
This brings us to the issue of the choice of the set of prim itive
maturities for various credit risks.
risk factors. This choice should reflect the trade-off betw een
better quality of the approxim ation and faster process In practice, we have to restrict the number of risk factors to a
ing. More factors lead to tighter risk m easurem ent but also small set. For some portfolios, one risk factor may be sufficient.
require more tim e devoted to the m odeling process and risk For others, 15 maturities may be necessary. For portfolios with
com putation. options, we need to model movements not only in yields but
also in their implied volatilities.
The choice of primitive risk factors also influences the size of
specific risks. S p e cific risk can be defined as risk that is due to O ur primitive risk factors could be movements in a set of J gov
issuer-specific price m ovem ents, after accounting for general ernm ent bond yields and in a set of K credit spreads sk sorted
m arket factors. Hence the definition of specific risk depends on by credit rating. We model the m ovem ent in each corporate
that of general m arket risk. The Basel rules have a separate bond yield dy, by a m ovem ent in z at the closest maturity and in
charge for specific risk.1 s for the same credit rating. The remaining com ponent is er
To illustrate this decom position, consider a portfolio of N stocks. The m ovem ent in value W th en is
We are mapping each stock on a position in the stock market n J
index, which is our primitive risk factor. The return on a stock Ri d W = 2 D V B P ,d y , = ^ D V B P }dz}
'= 1 i= 1
is regressed on the return on the stock m arket index Rm, that is,
K N
Rj = a,- + (3jRm + e,- (5.1) + ^ D V B P kd sk + 2 D VBP ,de( (5.5)
k= 1 i=1
which gives the exposure /3j. In what follows, ignore a, which
where DVBP is the total dollar value of a basis point for the asso
does not contribute to risk. We assume that the specific risk
ciated risk factor. The values for DVBPj then represent the sum
owing to e is not correlated across stocks or with the market.
mation of the DVBP across all individual bonds for each maturity.
The relative w eight of each stock in the portfolio is given by w,-.
Thus the portfolio return is This leads to a total risk decom position of
N N N N
RP = 2 W>R < = 1=1

(=1
2 w 'P iR m + 1=1
2 W'6 i (5 -2 )
V{dW) = general risk + ^ DVBPfV(de()
/=1
(5.6)
A greater number of general risk factors should create less residual

risk. Even so, we need to ascertain the size of the second, specific
risk term. In practice, there may not be sufficient history to measure
1 Typically, the charge is 4 percent of the position value for equities and
unrated debt, assuming that the banks' models do not incorporate spe the specific risk of individual bonds, which is why it is often assumed
cific risks. that all issuers within the same risk class have the same risk.
5.2 MAPPING FIXED-INCOM E The portfolio has an average m aturity of 3 years and a
duration of 2.733 years. The table lays out the present value
PORTFOLIOS of all portfolio cash flows discounted at the appropriate
zero-coupon rate.
Mapping Approaches
Principal mapping considers the timing of redem ption pay
O nce the risk factors have been selected , the question is how ments only. Since the average maturity of this portfolio
to map the portfolio positions into exposures on these risk is 3 years, the VaR can be found from the risk of a 3-year
factors. We can distinguish three m apping system s for fixed- maturity, which is 1.484 percent from Table 5.3. VaR then is
incom e portfolios: principal, duration, and cash flow s. With USD 200 X 1.484/100 = USD 2.97 million. The only positive
prin cipal m apping, one risk factor is chosen that corresponds aspect of this method is its sim plicity. This approach overstates
to the average portfolio m aturity. W ith duration m apping, one the true risk because it ignores intervening coupon payments.
risk factor is chosen that corresponds to the portfolio dura
The next step in precision is duration m apping. We replace
tion. W ith cash-flow m apping, the portfolio cash flow s are
the portfolio by a zero-coupon bond with maturity equal
grouped into m aturity buckets. M apping should preserve the
to the duration of the portfolio, which is 2.733 years.
m arket value of the position. Ideally, it also should preserve its
Table 5.3 shows VaRs of 0.987 and 1.484 for these maturi
m arket risk.
ties, respectively. Using a linear interpolation, we find a risk of
As an exam ple, Table 5.2 describes a two-bond portfolio 0.987 + (1.484 - 0.987) X (2.733 - 2) = 1.351 percent for this
consisting of a USD 100 million 5-year 6 percent issue and hypothetical zero. With a USD 200 million portfolio, the duration-
a USD 100 million 1-year 4 percent issue. Both issues are based VaR is USD 200 X 1.351/100 = USD 2.70 million, slightly
selling at par, im plying a m arket value of USD 200 million. less than before.
Table 5.2 Mapping for a Bond Portfolio (USD millions)
Cash Flows Mapping (PV)
Term (Year) 5-Year 1-Year Spot Rate Principal Duration Cash Flow
1 USD 6 USD 104 4.000% 0.00 0.00 USD 105.77
2 USD 6 0 4.618% 0.00 0.00 USD 5.48
2.733 USD 200.00
3 USD 6 0 5.192% USD 200.00 0.00 USD 5.15
4 USD 6 0 5.716% 0.00 0.00 USD 4.80
5 USD 106 0 6.112% 0.00 0.00 USD 78.79
Total USD 200.00 USD 200.00 USD 200.00
Table 5.3 Computing VaR from Change in Prices of Zeroes
Term (Year) Cash Flows Old Zero Value Old PV of Flows Risk (%) New Zero Value New PV of Flows
1 USD 110 0.9615 USD 105.77 0.4696 0.9570 USD 105.27
2 USD 6 0.9136 USD 5.48 0.9868 0.9046 USD 5.43
3 USD 6 0.8591 USD 5.15 1.4841 0.8463 USD 5.08
4 USD 6 0.8006 USD 4.80 1.9714 0.7848 USD 4.71
5 USD 106 0.7433 USD 78.79 2.4261 0.7252 USD 76.88
Total USD 200.00 USD 197.37
Loss USD 2.63

Finally, the cash-flow mapping method consists of grouping measures, (USD 2.70 — USD 2.57 million), USD 70,000 is due to
all cash flows on term -structure "vertices" that correspond to differences in yield volatility, and (USD 2.70 — USD 2.63 million),
maturities for which volatilities are provided. Each cash flow USD 60,000 is due to imperfect correlations. The last column pres
is represented by the present value of the cash paym ent, dis ents the component VaR using computations as explained earlier.
counted at the appropriate zero-coupon rate.
The diversified VaR is com puted as Stress Test

VaR = a V x ' X x = V ( x X V )'R (x X V) (5.7) Table 5.3 presents another approach to VaR that is directly
derived from m ovem ents in the value of zeroes. This is an exam
where V = o u t is the vector of VaR for zero-coupon bond
ple of stress testing.
returns, and R is the correlation matrix.
Assum e that all zeroes are perfectly correlated. Then we could
Table 5.4 shows how to com pute the portfolio VaR using cash
decrease all zeroes' values by their VaR. For instance, the 1-year
flow m apping. The second column reports the cash flows x from
zero is worth 0.9615. Given the VaR in Table 5.3 of 0.4696, a
Table 5.2. Note that the current value of USD 200 million is fully
95 percent probability move would be for the zero to fall to
allocated to the five risk factors. The third column presents the
0.9615 X (1 — 0.4696/100) = 0.9570. If all zeroes are perfectly
product of these cash flows with the risk of each vertex x X V,
correlated, they should all fall by their respective VaR. This gen
which represents the individual VaRs.
erates a new distribution of present-value factors that can be
With perfect correlation across all zeros, the VaR of the portfolio is used to price the portfolio. Table 5.3 shows that the new value is
N USD 197.37 million, which is exactly USD 2.63 million below the
Undiversified VaR = ^ | x ;|\ / ;
/=1
original value. This number is exactly the same as the undiversi
fied VaR just com puted.
which is USD 2.63 million. This number is close to the VaR obtained
from the duration approximation, which was USD 2.70 million. The two approaches illustrate the link between computing VaR
through m atrix multiplication and through m ovem ents in under
The right side of the table presents the correlation m atrix of
lying prices. Com puting VaR through m atrix multiplication is
zeroes for maturities ranging from 1 to 5 years. To obtain the
much more direct, however, and more appropriate because it
portfolio VaR, we prem ultiply and postm ultiply the m atrix by the
across different sectors of the yield curve.
dollar amounts (xV) at each vertex. Taking the square root, we
find a diversified VaR measure of USD 2.57 million.
Benchmarking
Note that this is slightly less than the duration VaR of USD 2.70
million. This difference is due to two factors. First, risk measures N ext, we provide a practical fixed-incom e com pute VaR in
are not perfectly linear with maturity, as we have seen in a previ relative term s, that is, relative to a perform ance benchm ark.
ous section. Second, correlations are below unity, which reduces Table 5.5 presents the cash-flow decom position of the
risk even further. Thus, of the USD 130,000 difference in these J.P. Morgan U.S. bond index, which has a duration of 4.62 years.
Table 5 .4 Computing the VaR of a USD 200 Million Bond Portfolio (Monthly VaR at 95 Percent Level)
PV Cash Flows Individual VaR Correlation Matrix R Component VaR
Term (Year) X x X V 1Y 2Y 3Y 4Y 5Y xAVaR
1 USD 105.77 0.4966 1 USD 0.45
2 USD 5.48 0.0540 0.897 1 USD 0.05
3 USD 5.15 0.0765 0.886 0.991 1 USD 0.08
4 USD 4.80 0.0947 0.866 0.976 0.994 1 USD 0.09
5 USD 78.79 1.9115 0.855 0.966 0.988 0.998 1 USD 1.90
Total USD 200.00 2.6335
Undiversified VaR USD 2.63
Diversified VaR USD 2.57
Table 5.5 Benchmarking a USD 100 Million Bond Index (Monthly Tracking Error VaR at 95 Percent Level)
Position: Portfolio
Vertex Risk (%) Position: Index (USD) 1 (USD) 2 (USD) 3 (USD) 4 (USD) 5 (USD)
<1m 0.022 1.05 0.0 0.0 0.0 0.0 84.8
3m 0.065 1.35 0.0 0.0 0.0 0.0 0.0
6m 0.163 2.49 0.0 0.0 0.0 0.0 0.0
1Y 0.470 13.96 0.0 0.0 0.0 59.8 0.0
2Y 0.987 24.83 0.0 0.0 62.6 0.0 0.0
3Y 1.484 15.40 0.0 59.5 0.0 0.0 0.0
4Y 1.971 11.57 38.0 0.0 0.0 0.0 0.0
5Y 2.426 7.62 62.0 0.0 0.0 0.0 0.0
7Y 3.192 6.43 0.0 40.5 0.0 0.0 0.0
9Y 3.913 4.51 0.0 0.0 37.4 0.0 0.0
10Y 4.250 3.34 0.0 0.0 0.0 40.2 0.0
15Y 6.234 3.00 0.0 0.0 0.0 0.0 0.0
20Y 8.146 3.15 0.0 0.0 0.0 0.0 0.0
30Y 11.119 1.31 0.0 0.0 0.0 0.0 15.2
Total 100.00 100.0 100.0 100.0 100.0 100.0
Duration 4.62 4.62 4.62 4.62 4.62 4.62
Absolute VaR USD 1.99 USD 2.25 USD 2.16 USD 2.04 USD 1.94 USD 1.71
Tracking error VaR USD 0.00 USD 0.43 USD 0.29 USD 0.16 USD 0.20 USD 0.81
Assum e that we are trying to benchm ark a portfolio of USD 1.99 million absolute risk of the index. The remaining track
USD 100 million. O ver a monthly horizon, the VaR of the index ing error is due to nonparallel moves in the term structure.
at the 95 percent confidence level is USD 1.99 million. This is
Relative to the original index, the tracking error can be m ea
about equivalent to the risk of a 4-year note. sured in term s of variance reduction, similar to an R2 in a regres
N ext, we try to match the index with two bonds. The rightmost sion. The variance im provem ent is
columns in the table display the positions of two-bond portfo
lios with duration matched to that of the index. Since no zero- 1” ( t ^) = 95,4percent
coupon has a maturity of exactly 4.62 years, the closest portfolio
consists of two positions, each in a 4- and a 5-year zero. The which is in line with the explanatory power of the first factor in
respective weights for this portfolio are USD 38 million and the variance decom position.
USD 62 million. Next, we explore the effect of altering the composition of the
Define the new vector of positions for this portfolio as x and for tracking portfolio. Portfolio 2 widens the bracket of cash flows
the index as x 0. The VaR of the deviation relative to the bench in years 3 and 7. The TE-VaR is USD 0.29 million, which is an
mark is improvement over the previous number. Next, portfolio 3 has
positions in years 2 and 9. This comes the closest to approximat
Tracking Error VaR = a \ / ( x — x 0) '2 ( x — x 0) (5.8)
ing the cash-flow positions in the index, which has the greatest
A fter performing the necessary calculations, we find that the weight on the 2-year vertex. The TE-VaR is reduced further to
tracking error VaR (TE-VaR) of this duration-hedged portfolio USD 0.16 million. Portfolio 4 has positions in years 1 and 10. Now
is USD 0.43 million. Thus the maximum deviation between the the TE-VaR increases to USD 0.20 million. This mistracking is even
index and the portfolio is at most USD 0.43 million under normal more pronounced for a portfolio consisting of 1-month bills and
m arket conditions. This potential shortfall is much less than the 30-year zeroes, for which the TE-VaR increases to USD 0.81 million.

Am ong the portfolios considered here, the lowest tracking error consider the fact that investors have two alternatives that are
is obtained with portfolio 3. Note that the absolute risk of these econom ically equivalent: (1) Buy e yr units of the asset at the
portfolios is lowest for portfolio 5. As correlations decrease price St and hold for one period, or (2) enter a forward contract
for more distant m aturities, we should exp ect that a duration- to buy one unit of the asset in one period. Under alternative
matched portfolio should have the lowest absolute risk for the 1, the investm ent will grow, with reinvestm ent of dividend, to
combination of most distant m aturities, such as a barbell port exactly one unit of the asset after one period. Under alterna
folio of cash and a 30-year zero. However, minimizing absolute tive 2, the contract costs ft upfront, and we need to set aside
market risk is not the same as minimizing relative m arket risk. enough cash to pay K in the future, which is Ke rT. A fter 1 year,
the two alternatives lead to the same position, one unit of the
This exam ple dem onstrates that duration hedging only provides
asset. Therefore, their initial cost must be identical. This leads
a first approxim ation to interest-rate risk m anagem ent. If the
to the following valuation formula for outstanding forward
goal is to minimize tracking error relative to an index, it is essen
contracts:
tial to use a fine decom position of the index by maturity.
ft = Ste-yr - Ke~n (5.9)
Note that we can repeat the preceding reasoning to find the

5.3 M APPING LIN EA R DERIVATIVES current forward rate Ft that would set the value of the contract
to zero. Setting K = Ft and ft = 0 in Equation (5.9), we have
Forward Contracts
Ft = (Ste~yr)err (5.10)
Forward and futures contracts are the sim plest types of deriva
tives. Since their value is linear in the underlying spot rates, This allows us to rewrite Equation (5.9) as
their risk can be constructed easily from basic building blocks.
ft = Fte~rT - K e -rT = (Ft ~ K)e~rT (5.11)
Assum e, for instance, that we are dealing with a forward con
tract on a foreign currency. The basic valuation formula can be In other words, the current value of the forward contract is the
derived from an arbitrage argument. present value of the difference between the current forward rate
and the locked-in delivery rate. If we are long a forward contract
To establish notations, define
with contracted rate K, we can liquidate the contract by entering a
S t = spot price of one unit of the underlying cash asset new contract to sell at the current rate Ft. This will lock in a profit of
(Ft — K), which we need to discount to the present time to find ft.
K = contracted forward price
r = dom estic risk-free rate Let us exam ine the risk of a 1-year forward contract to purchase
100 million euros in exchange for USD 130.086 million. Table
y = income flow on the asset
5.6 displays pricing information for the contract (current spot,
r = time to maturity. forward, and interest rates), risk, and correlations. The first step
is to find the m arket value of the contract. We can use Equa
W hen the asset is a foreign currency, y represents the foreign
tion (5.9), accounting for the fact that the quoted interest rates
risk-free rate r*. W e will use these two notations interchange
are discretely com pounded, as
ably. For convenience, we assume that all rates are com
1 1
pounded continuously. ft = USD 1.2877 - USD 1.3009
(1 + 2.2810/100) (1 + 3.3304/100)
We seek to find the current value of a forward contract ft to buy
= USD 1.2589 - USD 1.2589 = 0
one unit of foreign currency at K after tim e r . To do this, we
Table 5.6 Risk and Correlations for Forward Contract Risk Factors (Monthly VaR at 95 Percent Level)
Correlations
Risk Factor Price or Rate VaR (%) EUR Spot EUR 1Y USD 1Y
EUR spot USD 1.2877 4.5381 1 0.1289 0.0400
Long EUR bill 2.2810% 0.1396 0.1289 1 -0 .0 5 8 3
Short USD bill 3.3304% 0.2121 0.0400 -0 .0 5 8 3 1
EUR forward USD 1.3009
Thus the initial value of the contract is zero. This value, however, This shows that the forward position can be separated
may change, creating market risk. into three cash flows: (1) a long spot position in EUR,
worth EUR 100 million = USD 130.09 million in a year, or
Am ong the three sources of risk, the volatility of the spot con
(Se r*T) = USD 125.89 million now, (2) a long position in a EUR
tract is the highest by far, with a 4.54 percent VaR (correspond
investm ent, also worth USD 125.89 million now, and (3) a short
ing to 1.65 standard deviations over a month for a 95 percent
position in a USD investm ent, worth USD 130.09 million in a
confidence level). This is much greater than the 0.14 percent
year, or (Ke rr) = USD 125.89 million now. Thus a position in the
VaR for the EUR 1-year bill or even the 0.21 percent VaR for the
forward contract has three building blocks:
USD bill. Thus most of the risk of the forward contract is driven
by the cash EUR position. Long forward contract = long foreign currency spot +
long foreign currency bill + short U.S.dollar bill
But risk is also affected by correlations. The positive correlation
of 0.13 between the EUR spot and bill positions indicates that Considering only the spot position, the VaR is USD 125.89 mil
when the EUR goes up in value against the dollar, the value of a lion tim es the risk of 4.538 percent, which is USD 5.713 million.
1-year EUR investm ent is likely to appreciate. Therefore, higher To com pute the diversified VaR, we use the risk m atrix from the
values of the EUR are associated with lower EUR interest rates. data in Table 5.7 and pre- and postmultiply by the vector of
positions (PV of flows column in the table). The total VaR for the
This positive correlation increases the risk of the combined
forward contract is USD 5.735 million. This number is about the
position. On the other hand, the position is also short a 1-year
same size as that of the spot contract because exchange-rate
USD bill, which is correlated with the other two legs of the trans
volatility dom inates the volatility of 1-year bonds.
action. The issue is, what will be the net effect on the risk of the
forward contract? More generally, the same m ethodology can be used for long
term currency swaps, which are equivalent to portfolios of
VaR provides an exact answer to this question, which is dis
forward contracts. For instance, a 10-year contract to pay dol
played in Table 5.7. But first we have to com pute the positions
lars and receive euros is equivalent to a series of 10 forward
x on each of the three building blocks of the contract. By taking
contracts to exchange a set amount of dollars into marks. To
the partial derivative of Equation (5.9) with respect to the risk
com pute the VaR, the contract must be broken down into a
factors, we have
currency-risk com ponent and a string of USD and EUR fixed-
df df ^ df , income com ponents. A s before, the total VaR will be driven pri
— d S H-----d r* H----dr
dS dr* dr marily by the currency com ponent.
= e~r*Td S - Se~r*Trd r* + K e ^ r d r (5.12)
Here, the building blocks consist of the spot rate and interest
Commodity Forwards
rates. Alternatively, we can replace interest rates by the price of
bills. Define these as P = e-rr and P* = e-r*T. We then replace The valuation of forward or futures contracts on com m odities
d r with dP using dP = (- T )e _rrd r and dP* = {- r)e ~ r*T dr*. The is substantially more com plex than for financial assets such as
risk of the forward contract becom es currencies, bonds, or stock indices. Such financial assets have
a well-defined income flow y, which is the foreign interest rate,
dP* dP
d f = (Se~r*T) ^ r + (Se-r*T) (5.13) the coupon paym ent, or the dividend yield, respectively.
- (K e~rT)T
Table 5 .7 Computing VaR for a EUR 100 Million Forward Contract (Monthly VaR at 95 Percent Level)
Component
Position Present-Value Factor Cash Flows (CF) PV of Flows, x Individual VaR, x V VaR, xAVaR
EUR spot USD 125.89 USD 5.713 USD 5.704
Long EUR bill 0.977698 EU R100.00 USD 125.89 USD 0.176 USD 0.029
Short USD bill 0.967769 - USD 130.09 - U S D 125.89 USD 0.267 USD 0.002

Table 5.8 Risk of Commodity Contracts (Monthly VaR at 95 Percent Level)
Energy Products
Maturity Natural Gas Heating Oil Unleaded Gasoline Crude Oil-WTI
1 month 28.77 22.07 20.17 19.20
3 months 22.79 20.60 18.29 17.46
6 months 16.01 16.67 16.26 15.87
12 months 12.68 14.61 14.05
Base Metals
Maturity Aluminum Copper Nickel Zinc

Cash 11.34 13.09 18.97 13.49
3 months 11.01 12.34 18.41 13.18
15 months 8.99 10.51 15.44 11.95
27 months 7.27 9.57 11.59
Precious Metals
Maturity Gold Silver Platinum

Cash 6.18 14.97 7.70
Things are not so simple for com m odities, such as m etals, agri per barrel. Using a present-value factor of 0.967769, this trans
cultural products, or energy products. Most products do not lates into a current position of USD 43,743,000.
make monetary payments but instead are consum ed, thus creat
Differentiating Equation (5.11), we have
ing an implied benefit. This flow of benefit, net of storage cost,
is loosely called convenience yield to represent the benefit from e~n d F = (5.14)
holding the cash product. This convenience yield, however, is
not tied to another financial variable, such as the foreign inter The term between parentheses therefore represents the exp o
est rate for currency futures. It is also highly variable, creating its sure. The contract VaR is
own source of risk.
VaR = USD 43,743,000 X 14.05/100 = USD 6,146,000
As a result, the risk m easurem ent of com m odity futures uses
In general, the contract cash flows will fall between the maturities of
Equation (5.11) directly, where the main driver of the value of
the risk factors, and present values must be apportioned accordingly.
the contract is the current forward price for this com m odity.
Table 5.8 illustrates the term structure of volatilities for selected
energy products and base m etals. First, we note that monthly
Forward Rate Agreements
VaR measures are very high, reaching 29 percent for near con
tracts. In contrast, currency and equity market VaRs are typically Forward rate agreem ents (FRAs) are forward contracts that allow
around 6 percent. Thus com m odities are much more volatile users to lock in an interest rate at some future date. The buyer
than typical financial assets. of an FRA locks in a borrowing rate; the seller locks in a lending
rate. In other words, the "long" receives a paym ent if the spot
Second, we observe that volatilities decrease with maturity. The
rate is above the forward rate.
effect is strongest for less storable products such as energy products
and less so for base metals. It is actually imperceptible for precious Define the timing of the short leg as t -i and of the long leg as t 2,
metals, which have low storage costs and no convenience yield. For both expressed in years. Assum e linear com pounding for sim
financial assets, volatilities are driven primarily by spot prices, which plicity. The forward rate can be defined as the implied rate that
implies basically constant volatilities across contract maturities. equalizes the return on a T2-period investm ent with a r r period
investm ent rolled over, that is,
Let us now say that we wish to com pute the VaR for a 12-month
forward position on 1 million barrels of oil priced at USD 45.2 (1 + R 2 t 2) = (1 + R1T1X1 + F 1(2(t 2 - r ,) \ (5.15)
Table 5.9 Computing the VaR of a USD 100 Million FRA (Monthly VaR at 95 Percent Level)
Position PV of Flows, x Risk (%), V Correlation Matrix, R Individual VaR, x V Component VaR, xAVaR
180 days - U S D 97.264 0.1629 1 0.8738 USD 0.158 - U S D 0.116
360 days USD 97.264 0.4696 0.8738 1 USD 0.457 USD 0.444
For instance, suppose that you sold a 6 X 12 FRA on USD 100 To illustrate, let us com pute the VaR of a USD 100 million 5-year
million. This is equivalent to borrowing USD 100 million for 6 interest-rate swap. W e enter a dollar swap that pays 6.195 per
months and investing the proceeds for 12 months. When the FRA cent annually for 5 years in exchange for floating-rate payments
expires in 6 months, assume that the prevailing 6-month spot rate indexed to London Interbank O ffer Rate (LIBO R). Initially, we
is higher than the locked-in forward rate. The seller then pays consider a situation where the floating-rate note is about to be
the buyer the difference between the spot and forward rates reset. Ju st before the reset period, we know that the coupon
applied to the principal. In effect, this payment offsets the higher will be set at the prevailing m arket rate. Therefore, the note car
return that the investor otherwise would receive, thus guarantee ries no m arket risk, and its value can be m apped on cash only.
ing a return equal to the forward rate. Therefore, an FRA can be Right after the reset, however, the note becom es similar to a bill
decom posed into two zero-coupon building blocks. with maturity equal to the next reset period.
Long 6 X 1 2 FRA = long 6-month bill Interest-rate swaps can be viewed in two different ways: as (1) a
+ short 12-month bill com bined position in a fixed-rate bond and in a floating-rate
bond or (2) a portfolio of forward contracts. We first value the
Table 5.9 provides a worked-out exam ple. If the 360-day spot
swap as a position in two bonds using risk data from Table 5.4.
rate is 5.8125 percent and the 180-day rate is 5.6250 percent,
The analysis is detailed in Table 5.10.
the forward rate must be such that
The second and third columns lay out the paym ents on both
_ (1 + 5.8125/100)
legs. Assum ing that this is an at-the-market swap, that is, that its
1,2 (1 + 5.6250/200)
coupon is equal to prevailing swap rates, the short position in
or F = 5.836 percent. The present value of the notional USD the fixed-rate bond is worth USD 100 million. Ju st before reset,
100 million in 6 months is x = USD 100/(1 + 5.625/200) = the long position in the FRN is also worth USD 100 million, so
USD 97.264 million. This amount is invested for 12 months. In the market value of the swap is zero. To clarify the allocation of
the m eantim e, what is the risk of this FRA? current values, the FRN is allocated to cash, with a zero maturity.
Table 5.9 displays the com putation of VaR for the FRA . The This has no risk.
VaRsof 6- and 12-month zeroes are 0.1629 and 0.4696, respec The next column lists the zero-coupon swap rates for maturities
tively, with a correlation of 0.8738. Applied to the principal of going from 1 to 5 years. The fifth column reports the present
USD 97.26 million, the individual VaRs are USD 0.158 million value of the net cash flows, fixed minus floating. The last column
and USD 0.457 million, which gives an undiversified VaR of presents the com ponent VaR, which adds up to a total diversi
USD 0.615 million. Fortunately, the correlation substantially low fied VaR of USD 2.152 million. The undiversified VaR is obtained
ers the FRA risk. The largest amount the position can lose over a from summing all individual VaRs. As usual, the USD 2.160 mil
month at the 95 percent level is USD 0.327 million. lion value som ewhat overestim ates risk.
This swap can be viewed as the sum of five forward contracts, as

Interest-Rate Swaps shown in Table 5.11. The 1-year contract promises paym ent of
USD 100 million plus the coupon of 6.195 percent; discounted
Interest-rate swaps are the most actively used derivatives. They
at the spot rate of 5.813 percent, this yields a present value of
create exchanges of interest-rate flows from fixed to floating
—USD 100.36 million. This is in exchange for USD 100 million
or vice versa. Swaps can be decom posed into two legs, a fixed
now, which has no risk.
leg and a floating leg. The fixed leg can be priced as a coupon
paying bond; the floating leg is equivalent to a floating-rate The next contract is a 1 X 2 forward contract that prom
note (FRN). ises to pay the principal plus the fixed coupon in 2 years, or

Table 5.10 Computing the VaR of a USD 100 Million Interest-Rate Swap (Monthly VaR at 95 Percent Level)
Cash Flows
Term (Year) Fixed Float Spot Rate PV of Net Cash Flows Individual VaR Component VaR
0 USD 0 + USD 100 + USD 100.000 USD 0 USD 0
1 - U S D 6.195 USD 0 5.813% - U S D 5.855 USD 0.027 USD 0.024
Total USD 0.000
Table 5.11 An Interest-Rate Swap Viewed as Forward Contracts (Monthly VaR at 95 Percent Level)
PV of Flows: Contract
Term (Year) 1 1x2 2X3 3X4 4X5 VaR
1 - U S D 100.36 USD 94.50
2 - U S D 94.64 USD 89.11
3 - U S D 89.08 USD 83.88
4 - U S D 83.70 USD 78.82
5 - U S D 78.55
VaR USD 0.471 USD 0.571 USD 0.488 USD 0.446 USD 0.425
—USD 106.195 million; discounted at the 2-year spot rate, this from USD 2.152 million to USD 1.763 million. More generally,
yields —USD 94.64 million. This is in exchange for USD 100 mil the swap's VaR will converge to zero as the swap m atures, dip
lion in 1 year, which is also USD 94.50 million when discounted ping each tim e a coupon is set.
at the 1-year spot rate. And so on until the fifth contract, a
4 X 5 forward contract.
5.4 MAPPING OPTIONS
Table 5.11 shows the VaR of each contract. The undiversified
VaR of USD 2.401 million is the result of a simple summation W e now consider the mapping process for nonlinear derivatives,
of the five VaRs. The fully diversified VaR is USD 2.152 million, or options. O bviously, this nonlinearity may create problem s for
exactly the same as in the preceding table. This dem onstrates risk m easurem ent system s based on the delta-normal approach,
the equivalence of the two approaches. which is fundam entally linear.
Finally, we exam ine the change in risk after the first paym ent has To sim plify, consider the Black-Scholes (BS) model for
just been set on the floating-rate leg. The FRN then becom es a European options. The model assum es, in addition to 2
1-year bond initially valued at par but subject to fluctuations in
rates. The only change in the pattern of cash flows in Table 5.10
is to add USD 100 million to the position on year 1 (from 2 For a systematic approach to pricing derivatives, see the excellent
—USD 5.855 to USD 94.145). The resulting VaR then decreases book by Hull (2005).
Table 5.12 Derivatives for a European Call
P a ra m e te rs: S = U SD 100, a = 20% , r = 5% , r* = 3% , r = 3 m onths
E x e rc ise P rice
V ariab le U nit K = 90 K = 100 K = 110
c Dollars 11.01 4.20 1.04
Change p e r
A Spot price Dollar 0.869 0.536 0.195
r Spot price Dollar 0.020 0.039 0.028
A Volatility (% pa) 0.102 0.197 0.138
p Interest rate (% pa) 0.190 0.123 0.046
P* A sset yield (% pa) -0 .2 1 7 -0 .1 3 3 -0 .0 4 9
e Tim e Day -0 .0 1 4 -0 .0 2 4 -0 .0 1 6
perfect capital m arkets, that the underlying spot price follow s Delta increases with the underlying spot price. The relationship
a continuous g e o m e tric brownian m otion with constant vo latil becom es more nonlinear for short-term options, for exam ple,
ity cr(dS/S). Based on these assum ptions, the Black-Scholes with an option maturity of 10 days. Linear methods approxim ate
(1973) m odel, as expanded by M erton (1973), gives the value delta by a constant value over the risk horizon. The quality of
of a European call as this approxim ation depends on param eter values.
c = c(S,K,T,r,r*,o-) = Se~r*TN(d^) - Ke~rTN(d2) (5.16) For instance, if the risk horizon is 1 day, the worst down move in the
spot price is - a S a V f = - 1 .6 4 5 X USD 100 X 0.20 V l/ 2 5 2 =
where N(d) is the cum ulative normal distribution function with
—USD 2.08, leading to a w orst price of U SD 97.92. With a 90-day
arguments
option, delta changes from 0.536 to 0.452 only. With such a
ln(Se~r*T/Ke~rT) small change, the linear effect will dom inate the nonlinear effect.
a \ fr Thus linear approxim ations may be acceptable for options with
where K is now the exercise price at which the option holder long maturities when the risk horizon is short.
can, but is not obligated to, buy the asset. It is instructive to consider only the linear effects of the spot rate
Changes in the value of the option can be approxim ated by tak and two interest rates, that is,
ing partial derivatives, that is, dc = A d S + p*dr* + p d r
< dc 1d2c dc . . dc , dc . dc . = [e~r*TN{d-\)]dS + [—Se_rT rN (d i)]d r* + [K e -rTrN (d 2)]dr
—r *
d c = —~dS + l — ~dS2 H---- -dr* H---- d r H----- d a H---- dt (5.17)

dS 2dS2 dr dr da dt
r/C r/P* dP
= A d S + i f d S 2 + p*dr* + p d r + A d a + ©dt [Se-^Ntd,)]— + [Se-^M d,)] — - [K e -rTN(d2)]
The advantage of the BS model is that it leads to closed-form solu dS dP* dP

+ x 3— (5.19)
tions for all these partial derivatives. Table 5.12 gives typical values
for 3-month European call options with various exercise prices.
This formula bears a striking resem blance to that for foreign
The first partial derivative, or delta, is particularly im portant. For currency forwards, as in Equation (5.13). The only difference
a European call, this is is that the position on the spot foreign currency and on the
A = e-^N td.,) (5.18) foreign currency bill x-| = x 2 now involves N(d-i), and the position
on the dollar bill x 3 involves N(d2). In the extrem e case, where
This is related to the cum ulative normal density function.
the option is deep in the money, both N(d^) and N(d2) are
Figure 5.2 displays its behavior as a function of the underlying
equal to unity, and the option behaves exactly like a position
spot price and for various maturities.
in a forward contract. In this case, the BS model reduces to
The figure shows that delta is not a constant, which may make c = Se r* T — Ke n, which is indeed the valuation formula for a
linear methods inappropriate for measuring the risk of options. forward contract, as in Equation (5.9).

Delta would be characterized by its net vega, or A . This
decomposition also can take into account second-
order derivatives using the net gamma, or T . These
exposures can be combined with simulations of the
underlying risk factors to generate a risk distribution.
5.5 CONCLUSIONS
Risk m easurem ent at financial institutions is a top-
level aggregation problem involving too many
positions to be m odeled individually. A s a result,
instruments have to be m apped on a sm aller set of
primitive risk factors.
Choosing the appropriate set of risk factors, how

ever, is part of the art of risk m anagem ent. Too
many risk factors would be unnecessary, slow, and
wasteful. Too few risk factors, in contrast, could
Also note that the position on the dollar bill Ke~n N[d2) is equiva
create blind spots in the risk m easurem ent system .
lent to Se~r*TN (d 1) — c = SA — c. This shows that the call option
is equivalent to a position of A in the underlying asset plus a The mapping process consists of replacing the current values
short position of (A S - c) in a dollar bill, that is of all instruments by their exposures on these risk factors. N ext,
exposures are aggregated across the portfolio to create a net
Long option = longA asset + short(A S - c)bill
exposure to each risk factor. The risk engine then com bines
For instance, assume that the delta for an at-the-money these exposures with the distribution of risk factors to generate
call option on an asset worth USD 100 is A = 0.536. The a distribution of portfolio values.
option itself is worth USD 4.20. This option is equivalent to a
For some instrum ents, the allocation into general-m arket risk
A S = USD 53.60 position in the underlying asset financed by a
factors is exhaustive. In other words, there is no specific risk left.
loan of A S - c = USD 53.60 - USD 4.20 = USD 49.40.
This is typically the case with derivatives, which are tightly priced
The next step in the risk measurement process is the aggregation in relation to their underlying risk factor. For others positions,
of exposures across the portfolio. Thus all options on the same such as individual stocks or corporate bonds, there remains
underlying risk factor are decomposed into their delta equivalents, some risk, called sp e cific risk. In large, well-diversified portfolios,
which are summed across the portfolio. This generalizes to move this remaining risk tends to wash away. O therw ise, specific risk
ments in the implied volatility, if necessary. The option portfolio needs to be taken into account.
72 Financial Risk Manager Exam Part II: Market Risk Measurement and Management
Messages from the
Academic Literature
on Risk Management
for the Trading Book
Learning Objectives
Explain the following lessons on VaR im plem entation: Com pare unified and com partm entalized risk
tim e horizon over which VaR is estim ated, the recogni m easurem ent.
tion of tim e varying volatility in VaR risk factors, and VaR
backtesting. Com pare the results of research on "top-dow n" and
"bottom -up" risk aggregation methods.
Describe exogenous and endogenous liquidity risk and
explain how they might be integrated into VaR m odels. Describe the relationship between leverage, market
value of asset, and VaR within an active balance sheet
Com pare VaR, expected shortfall, and other relevant risk m anagem ent fram ework.
measures.
E x c e rp t is rep rin ted b y perm ission from the Basel C om m ittee on Banking Supervision.
6.1 INTRODUCTION 6.2 SELECTED LESSONS ON V a R
IMPLEMENTATION
This report sum m arises the findings of a working group (the
"g ro u p ") that surveyed the academ ic literature that is relevant
Overview
to a fundam ental review of the regulatory fram ew ork of the
trading book. This jo in t working group em braced m em bers In this section we review the academ ic and industry literature
of the Trading Book G roup and of the Research Task Force on VaR implementation issues, as it pertains to regulatory capi
of the Basel C om m ittee on Banking Supervision. This report tal calculation. The three categories of implementation issues
sum m arises its main findings. It reflects the view s of indi reviewed are: (1) time horizon over which VaR is estim ated;
vidual contributing authors, and should not be construed (2) the recognition of time-varying volatility in VaR risk factors;
as representing specific recom m endations or guidance by and (3) VaR backtesting. With respect to (1), we find that the
the Basel C om m ittee for national supervisors or financial appropriate VaR horizon varies across positions and depends
institutions. on the position's nature and liquidity. For regulatory capital pur
poses, the horizon should be long, and yet the common square-
The report builds on and extends previous work by the Research
root of time scaling approach for short horizon VaR (e.g., one-day
Task Force on the interaction of m arket and credit risk (see
VaR) may generate biased long horizon VaR (e.g ., ten-day
Basel Com m ittee on Banking Supervision (2009a)). The litera
VaR) estim ates. Regarding (2), we find that while many trading
ture review was com plem ented by feedback from academ ic
book risk factors exhibit time-varying volatility, there are some
experts at a workshop hosted by the Deutsche Bundesbank in
concerns that regulatory VaR may suffer from instability and
April 2010 and reflects the state of the literature at this point
pro-cyclicality if VaR models incorporate time-varying volatility.
in tim e.
We also sketch several approaches to incorporate time-varying
The key findings of the group are presented in the executive volatility in VaR. As for (3), we survey the literature on VaR back
summary. The structure of the remaining report is as follows: testing and discuss several regulatory issues including whether
We address fundam ental issues of a som etim es highly technical VaR should be backtested using actual or hypothetical P&L, and
nature in current VaR-based approaches to risk m easurem ent. whether the banks' common practice of backtesting one-day VaR
More specifically, we give an overview of im plem entation issues provides sufficient support for their ten-day, regulatory VaR.
including questions on the necessity of including time-variation It is worthwhile to note that some issues related to tim e horizons
in volatility, the appropriate tim e horizon over which risk is and time-varying volatility, and to a lesser extent backtesting,
measured and backtesting of VaR. Capturing m arket liquidity in also pertain to risk measures other than VaR, such as Expected
a VaR fram ework is the key question addressed in the second Shortfall (ES). A discussion of these alternative risk measures is
section. Then, we look at the pros and cons of VaR as a metric contained in this chapter.
for risk and consider alternative metrics put forward in the litera
ture. Important aspects for the future evolution of stress tests
Time Horizon for Regulatory VaR
are addressed next.
O ne of the fundam ental issues in using VaR for regulatory capi
The last two sections include m anagem ent aspects, such as
tal is the horizon over which VaR is calculated. The 1998 M arket
inter-risk aggregation and the borderline between the banking
Risk Am endm ent (MRA) sets this horizon to be ten days, and it
and trading books (which is discussed only briefly). They also
allows ten-day VaR to be estim ated using square-root of time
expand the scope of this review by including macro-prudential
scaling of one-day VaR. This approach raises three questions:
aspects, such as system ic risk and pro-cyclicality. This section
(1) Is ten days an appropriate horizon? (2) Does VaR estim a
is concerned with an integrated versus a com partm entalised
tion based on tim e scaling of daily VaRs produce accurate risk
approach to risk m easurem ent, which has becom e particularly
m easures? (3) W hat role do intra-horizon risks (i.e., P&L fluctua
im portant since the recent financial crisis revealed that a focus
tions within ten days) play, and should such risks be taken into
on m arket risk alone may provide distorted results for a trad
account in the capital fram ework?
ing book. This topic draws heavily on the findings of the form er
working group of the Research Task Force on the interaction of
m arket and credit risk (see Basel Com m ittee on Banking Super
Is Ten Days an Appropriate Horizon?
vision (2009a)). The last section looks at the relations between There seem s to be consensus among academ ics and the indus
and among risk m easurem ent, system ic risk, and potential pro try that the appropriate horizon for VaR should depend on
cyclical effects of risk m easurem ent. the characteristics of the position. In the academ ic literature,
Christoffersen and Diebold (2000) and Christoffersen, Diebold The computation of VaR over longer horizons introduces the
and Schuermann (1998) both assert that the relevant horizon issue of how to account for time variation in the composition of
will likely depend on where the portfolio lies in the firm (e.g ., the portfolios, especially for institutions that make markets for
trading desk vs. C FO ) and asset class (e.g ., equity vs. fixed actively traded assets like currencies (Diebold, Hickman, Inoue
income), and the appropriate horizon should be assessed on and Schuermann (1998)). A common solution is to sidestep the
an application-by-application basis. From this perspective, it problem of changes to portfolio composition by calculating VaR
appears that an across-the-board application of ten-day VaR at short horizons and scaling up the results to the desired time
horizon is not optim al. Indeed, one of the motivations for the period using the square-root of tim e. W hile simple to implement,
Incremental Risk Charge (IRC) is to capture certain risks of credit this choice may compromise the accuracy of VaR because, as dis
related products at a longer horizon than ten days. cussed in the next section, tail risk is likely to be underestimated
(Bakshi and Panayotov (2010)). A second way to tackle the prob
Although the literature suggests that it may be preferable to
lem is to focus directly on calculating the portfolio VaR over the
allow the risk horizon to vary across positions, Finger (2009),
relevant horizon of interest (Hallerbach (2003)). These approaches
for instance, points out that there is no conceptual or statisti
may have limited value if the composition of the portfolio
cal fram ework to justify the aggregation of a ten-day VaR and a
changes rapidly. Furthermore, data limitations make it challeng
one-year IRC. Danielsson (2002) adds that, if the purpose of VaR
ing to study the P&L of newly traded assets. A third solution is to
is to protect against losses during a liquidity crisis, the ten-day
extend VaR models by incorporating a prediction of future trad
horizon at 99% refers to an event that happens roughly 25 tim es
ing activity, as noted by Diebold et. al. (1998): "To understand
a decade, while a liquidity crisis is "unlikely to happen even once
the risk over a longer horizon, we need not only robust statistical
a decade. Hence the probability and problem are m ism atched."
models for the underlying market price volatility, but also robust
In addition, even for the same financial product, the appropriate
behavioural models for changes in trading positions."
horizon may not be constant, because trade execution strate
gies depends on time-varying param eters, like transaction costs, Christoffersen and Diebold (2000) aptly characterised the issue
expected price volatility, and risk aversion (Almgren and Chriss of the optimal VaR horizon as "an obvious question with no
(2001), Engle and Ferstenberg (2006), Huberman and Stanzl obvious answer." Voices from the industry have suggested that
(2005)). In addition, variation in risk aversion over the business a horizon longer than ten days may be necessary for regula
cycle can be especially im portant in shortening the optimal tory capital purposes. It was also suggested that combining the
trading horizon, potentially generating larger losses than those liquidity horizon of individual positions with a constant level of
observable under more favourable conditions. risk may be an appropriate avenue.
Danielsson (2002) questions the suitability of a ten-day horizon if

VaR is to protect against a liquidity crisis, because a ten-day
Is Square-Root of Time Scaling a Good Idea?
horizon implies a higher frequency of liquidity crisis than is Under a set of restrictive assumptions on risk factors, long hori
observable in the data. O ther authors have similarly suggested zon VaR can be calculated as short horizon VaR scaled by the
that the appropriate VaR horizon should depend on the eco square root of tim e, if the object of interest is unconditional VaR
nomic purpose of V aR .1 Smithson and Minton (1996), for (Kaufman (2004), M cNeil, Frey and Em brechts (2005) and Daniels
instance, claim that nearly all risk managers believe a one-day son and Zigrand (2006)). Unfortunately, the assumptions that jus
horizon is valid for trading purposes but disagree on the appro tify square root of time scaling are rarely verified for financial risk
priate horizon for long-term solvency or capital. Finger (2009) factors, especially at high frequencies. Furthermore, risk m anage
notes that there is "a tension between the regulatory risk hori ment and capital computation are more often interested in
zon and the horizon at which banks manage their trading portfo assessing potential losses conditional on current information, and
lios," and that the M arket Risk Am endm ent (MRA) rules scaling today's VaR by the square root of time ignores time varia
represent a com prom ise between regulatory and trading hori tion in the distribution of losses. We have not found any evidence
zons through the introduction of the sixty-day moving average in support of square-root of time scaling for conditional VaRs.
and backtesting m ultiplier mechanisms.
The accuracy of square-root of tim e scaling depends on the sta
tistical properties of the data generating process of the risk fac
tors. Diebold et. al. (1998) show that, if risk factors follow a
1 For example, if VaR is expected to reduce the probability of
bankruptcy, the horizon would line up with the time a bank needs to
raise additional capital. If the focus is on losses while a position is being 2 Specifically, the risk factors have to be normally distributed with
offloaded, the appropriate horizon would be more strictly related to zero mean, and be independently and identically distributed ("IID")
asset characteristics. across time.
Chapter 6 Messages from the Academic Literature on Risk Management for the Trading Book ■ 75
G A R C H (1,1) process, scaling by the square-root of tim e over Is It Necessary to Incorporate Time-Varying
estim ates long horizon volatility and consequently VaR is over Volatilities and Correlations?
estim ated. Similar conclusions are drawn by Provizionatou,
The industry seem s to think so since many firms advocate the
Markose and M enkens (2005). In contrast to the results that
use of fast reacting measures of risk such as exponential time-
assume that risk factors exhibit time-varying volatility, Daniels-
weighted measures of volatility. The reason given is that such
son and Zigrand (2006) find that, when the underlying risk factor
VaR m odels provide early warnings of changing m arket condi
follows a jum p diffusion process, scaling by the square root of
tions and may perform better in backtesting. The academ ic
tim e system atically under-estim ates risk and the downward bias
literature has also observed that time-varying volatility in finan
tends to increase with the tim e horizon. W hile these results
cial risk factors is im portant to VaR, dating back to the 1996
argue against square-root of tim e scaling, it is im portant to
RiskM etrics Technical docum ent (J.P. Morgan (1996)). Pritsker
acknowledge that we were not able to find im m ediate alterna
(2006) showed theoretically that using historical simulation VaR
tives to square-root of time scaling in the literature. Therefore,
without incorporating time-varying volatility can dangerously
the practical usefulness of square-root of tim e scaling should be
under-estimate risk, when the true underlying risk factors exhibit
recognised.3
time-varying volatility.
Is Intra-Horizon Risk Important? In contrast, some have argued that, depending on the purpose
of VaR, capturing time-varying volatility in VaR may not be
Bakshi and Panayotov (2010) discuss intra-horizon VaR (VaR-l), a
necessary, or may even be inappropriate. Christoffersen and
risk measure that com bines VaR over the regulatory horizon with
Diebold (2000) observe that volatility forecastability decays
P&L fluctuations over the short term , with a particular focus on
quickly with tim e horizon for most equity, fixed income and
models that incorporate jum ps in the price process. The ratio
foreign exchange assets. The implication is that capturing time-
nale behind intra-horizon VaR is that the maximum cum ulative
varying volatility may not be as im portant when the VaR horizon
loss, as distinct from the end-of-period P&L, exerts a distinct
is long, com pared to when the VaR horizon is relatively short.
effect on the capital of a financial institution. Bakshi and Panayo
There are also concerns about pro-cyclicality and instability
tov (2010) suggest that VaR-l "can be im portant when traders
im plications associated with regulatory VaRs that capture time-
operate under mark-to-market constraints and, hence, sudden
varying volatility. Dunn (2009), for instance, states that there is a
losses may trigger margin calls and otherwise adversely affect
"contradiction between the requirem ent for a risk sensitive m et
the trading positions." Daily VaR does carry information on
ric to capture variations in volatility and correlation, and the reg
high frequency P&L but, as noted by Kritzman and Rich (2002),
ulatory requirem ent for a stable and forward looking basis for
"Know ledge of the VaR on a daily basis does not reveal the
com puting capital, that is not pro-cyclical." In reference to mod
extent to which losses may accum ulate over tim e." Bakshi and
elling time-varying volatility in VaR, it w rote, "Som e firms m en
Panayotov (2010) find that taking intra-horizon risk into account
tioned a trade-off in this issue, and that for some purposes such
generates risk measures consistently higher than standard VaR,
as capital allocation, risk measures with more stable properties
up to multiples of VaR, and the divergence is larger for deriva
that reflected longer historical norms were desirable."
tive exposures.
In summary, incorporating time-varying volatility in VaR appears
Time-Varying Volatility in VaR to be necessary given that it is prevalent in many financial risk
factors. Furtherm ore, many financial instruments are now priced
It is a stylised fact that certain asset classes, such as equities with models with stochastic volatility features. It is logical that
and interest rates, exhibit time-varying volatility. Accounting VaR m odels are constructed to account for these statistical
for time-varying volatility in VaR models has been one of the properties. However, using VaR with time-varying volatility for
most actively studied VaR im plem entation issues. This section regulatory capital raises the concerns of volatile and potentially
explores this topic, focusing on large and com plex trading pro-cyclical regulatory standards.
portfolios.
Methods to Incorporate Time-Varying Volatility in
A concept related to square-root of time scaling is the scaling of VaR VaR for Large, Complex Portfolios
to higher confidence levels. Although we were unable to find literature
on this topic, we recognize that this is an important issue particularly in Beginning with J.P. Morgan (1996), the Exponentially W eighted
situations when there are inadequate data points for one to accurately Moving A verage (EW M A) approach has been regarded as one
estimate risks deep into the tail. Some banks use certain reference
densities (e.g., Student's t with six degrees of freedom) to conduct such of the industry standards for incorporating time-varying volatil
scaling. ity in VaR. EW M A is a constrained version of an IG A RCH (1,1)
model, and in the case of RiskM etrics the param eter in IG ARCH estim ates VaR for large portfolios com prising stocks and bonds
was set to 0.97. An alternative and sim pler approach is to by first reducing the dimension of risk factors using dynamic fac
w eight historical data according to the weights introduced by tor m odels, and then estim ating a time-varying volatility m odel.
Boudoukh, Richardson and W hitelaw (1998), where an observa The resulting VaR estim ates are shown to out-perform historical
tion from / days ago receives a w eight of simulation and FHS based on filtering risk factors one-by-one.
0\ 1 - 0) All in all, incorporating time-varying volatility in VaR measures

1 - 0n is not straight forward when there are many risk factors. Time-
varying correlations should be taken into account. Rather than
Here n is the total number of days in the historical window, and
using more involved m ethods, the industry appears to be taking
0 is a number between zero and one which controls the rate of
less burdensom e alternatives, such as using simple weighting
memory decay. An even sim pler approach is to com pute VaR
of observations, or shortening the data window used to esti
with historical simulation using a short and frequently updated
mate VaR. These approaches com prom ise on accuracy, but are
tim e series. Dunn (2009) has suggested that this method
com putationally attractive for large and com plex portfolios.
captures time-varying volatility quite w ell. Using sim ulations,
The recent academ ic literature offers promise that some of the
Pritsker (2006) has shown that the approach of Boudoukh et. al.
sophisticated em pirical m ethodologies may soon becom e prac
(1998) is not sensitive enough to pick up volatility changes. He
tical for large com plex portfolios.
advocated the use of Filtered Historical Simulation (FHS), first
introduced by Barone-Adesi, Giannopoulos and Vosper (1999).
Broadly speaking, FHS is based on the idea that risk factors
Backtesting VaR Models
should first be filtered through a G A R C H m odel. The volatility is
then updated using the m odel, and adhered to the filtered risk As with any type of m odelling, a VaR model must be validated.
factors to constructed VaR. In particular, backtesting has been the industry standard for
validating VaR m odels. This section reviews some backtesting
Naturally, considerations should be given to how the above
m ethodologies suggested by the literature, and some issues
method can be applied to portfolios with large numbers of
pertaining to the application of such m ethodologies.
positions or risk factors. Barone-Adesi et. al. (1999) outlined a
position-by-position FHS approach. They recom m ended filtering
each risk factor separately, and building volatility forecasts for
Backtesting Approaches
each factor. Analogously, EW M A and the weights introduced by Banks typically draw inference on the perform ance of VaR m od
Boudoukh et. al. (1998) can be applied the same way. However, els using backtesting exceptions (som etimes also known as
weighting or filtering risk factors separately implicitly assumes backtesting "breaches" or "violations"). For regulatory capital,
that the correlation structure across risk factors does not change the MRA imposes a m ultiplier on VaR depending on the number
over tim e. Pritsker (2006) has pointed out that time-varying cor of backtesting exceptions the bank experiences.
relation is an im portant source of risk. Indeed, the recent crisis
W hile the M RA does not require banks to statistically test
has highlighted the fact that correlations among many risk fac
w hether VaR has the correct number of exceptions, formal sta
tors change significantly over tim e. O ne would need to be care
tistical inference is always desirable and many alternatives have
ful in handling time-varying volatilities as well as correlations.
been proposed in the literature. Kupiec (1995) introduced the
Multivariate G A R C H models such as the B E K K model of Engle unconditional coverage likelihood ratio tests as inference tools
and Kroner (1995), or the D C C model of Engle (2002) can be for w hether the VaR model generated the correct number of
used to estim ate time-varying volatilities as well as correla exceptions. This m ethodology is simple to im plem ent, but has
tions. However, such multivariate G A R C H models are difficult two drawbacks. First, as pointed out by Kupiec (1995 and 2005),
to estim ate when there are a large number of risk factors. Some when the number of trading days used in VaR evaluation is lim
recent advances in the literature allow one to estim ate a multi ited (e.g ., one year or approxim ately 250 trading days), or when
variate G A RCH -type model when there are a large number of the confidence level is high (e.g ., 99% as in regulatory VaR), such
risk factors. For instance, Engle, Shephard and Sheppard (2007) tests have low power. This is not surprising, since one would
proposed to average likelihoods before estimating the G A RC H exp ect only a small number of backtesting exceptions in most
model with maximum likelihood. Engle and Kelly (2009) imposes cases. Building a statistic out of a handful of exceptions, then,
a restriction on the correlation structure that helps facilitate may induce high variance in the test statistic itself and the result
estimation in large dim ensions, but still allow correlations to may be sensitive to an incremental exception. Second, given
change over tim e. Finally, Aram onte, Rodriguez and Wu (2010) that this test only counts exceptions, its power may be improved
by considering other aspects of the data such as the grouping of an 'apples-to-apples' com parison, but it com es with significant
exceptions in tim e. im plem entation burden given that hypothetical portfolios need
to be constructed.
Christoffersen (1998) has proposed a conditional backtesting
exception test that accounts for the timing as well as the num Another issue is the appropriate backtesting horizon. Banks typi
ber of exceptions. The test is based on the fact that when the cally backtest one-day ahead VaR and use it as a validation of
VaR model has conditionally the correct number of exceptions, the regulatory VaR, which is ten-day. The problem here is clear:
then indicator variables representing the exceptions are IID4 a good one-day VaR (as validated by backtesting) does not
Bernoulli random variables. This test, however, may still be necessarily imply a good ten-day VaR, and vice versa. Ten-day
exposed to the low power problem . To this end, Berkowitz, backtesting may not be ideal either, given the potentially large
Christoffersen and Pelletier (2010) provided a suite of condi portfolio shifts that may take place within ten days. In that case,
tional tests that have good power. These tests are based on the actual P&L backtesting in particular may not be very inform ative.
intuition of Christoffersen (1998) (i.e., correct conditional excep W hile we were unable to find literature on this particular issue, it
tions results in IID and Bernoulli exception indicators) but derive remains an im portant policy question.
inferences from autocorrelation, spectral, and hazard rate tests.
Aside from backtesting based on the number of exceptions, a

Conclusions
natural measure of VaR perform ance is the m agnitude of the
exceptions. Lopez (1999), for instance, form alised this idea by W e have reviewed the literature on a number of VaR im ple
introducing a quadratic loss function where loss is the differ mentation issues, including the appropriate time horizon,
ence between actual P&L and VaR, when an exception occurs. time-variation in the volatility of risk factors, and backtesting.
Some papers, including Pritsker (2006) and Shang (2009), also W e find that the optimal way of addressing these points is
consider the use of M ean-Squared-Error (M SE) as a measure of idiosyncratic to the problem under consideration. For instance,
VaR perform ance in backtesting. Typically, one would measure when estimating long horizon VaR by scaling the short horizon
the M SE between the 'true VaR' and the VaR estim ate based counterpart by the square root of tim e, one may overestim ate
on the model. Clearly, this method is not directly applicable VaR if the underlying P&L process exhibits time-varying volatility,
to observed portfolio P&Ls, since the true VaR is never known. but underestim ate VaR if the process has jum ps.
N onetheless, it can be a useful validation method prior to put Incorporating time-varying volatility in VaR measures appears to
ting a VaR model into production: one can define data generat be im portant to make m odels more realistic although it is not
ing processes mimicking those im posed by front office pricing straight forward when there are many risk factors. The recent
m odels, sim ulate position P&L enough tim es to construct a P&L academ ic literature offers promise in this direction. W hile many
distribution, and find the 'true VaR' based on this sim ulated dis trading book risk factors have time-varying volatility, models
tribution. Then, the VaR model can be applied to the generated that incorporate this feature may, however, generate pro-cyclical
data, and the difference between 'true VaR' and estim ated VaR VaR and also be unstable, not least because of estimation issues.
can be analysed.
In addition, the choice of whether to evaluate a VaR model on the
basis of hypothetical or actual backtesting may be affected by the
Backtesting Issues
characteristics of the portfolio. Indeed, actual backtesting is less
An im portant and yet ambiguous issue for backtesting is which informative when the composition of the portfolio has recently
P&L series to com pare to VaR. Broadly speaking, the estim ated changed. On the other hand, while hypothetical backtesting pro
VaR can be com pared to either actual P&L (i.e., the actual port vides a more consistent comparison, it may impose substantial
folio P&L at the VaR horizon), or hypothetical P&L (i.e., P&L con computational burdens because it requires reconstructing the his
structed based on the portfolio for which VaR was estim ated). tory of the portfolio on the basis of its current composition.
To com plicate m atters further, actual P&L may som etim es
contain commissions and fees, which are not directly related to
trading and trading risk. Franke, Hardle and Hafner (2008) and
6.3 INCORPORATING LIQUIDITY
Berry (2009) described the relative merits of actual and hypo
thetical backtesting: actual backtesting has little value if the
Overview
portfolio has changed drastically since VaR was estim ated, but
is simple to im plem ent; hypothetical backtesting would make Discussing the challenging issue of how to incorporate mar
ket liquidity into a VaR model requires first of all a distinction
4 IID: independently and identically distributed. between exo g en o u s and en d o g en o u s liquidity. This distinction
is made from the point of view of the bank, rather than in O ne w ay to incorporate liquidity risk into VaR m easures is to
general equilibrium term s (Bangia, Diebold, Schuermann include new VaR risk factors that can be used to model liquid
and Stroughair (1999a) and Bervas (2006)), More specifically, ity risks. This approach is feasible only when the param eters
exo g en o u s liquidity refers to the transaction cost for trades can be deduced from m arket data. Liquidity reserves taken
of average size, while en d o g en o u s liquidity is related to the by banks on their trading portfolio according to accounting
cost of unwinding portfolios large enough that the bid-ask standards correspond, more or less, to reserves for exogenous
spread cannot be taken as given, but is affected by the trades liquidity. In order to integrate this risk in the VaR com puta
them selves. tion, Bangia et. al. (1999a) propose to integrate the variability
of the bid/offer spread for average size transactions as a
Bangia et. al. (1999a) give a graphical representation of e x o g e
risk factor.
nous and en d o g en o u s liquidity that is reproduced in Figure 6.1.
Below a certain size, transactions may be traded at the bid/ask To take into account endogenous liquidity in the value-at-risk is
price quoted in the m arket (exo g en o u s liquidity), and above more difficult, as it is not even really taken into account in the
this size, the transaction will be done at a price below the initial valuation of trading portfolios, but its im pact on both valua
bid or above the initial ask, depending on the sign of the trade tion and VaR should be significant. Academ ic literature on the
(en d o g en o u s liquidity). subject— portfolio valuation and VaR com putation— is quite rich,
but very little application has been made in particular because
The exogenous com ponent of liquidity risk corresponds to the
endogenous liquidity reserves could be considered as not com
average transaction costs set by the m arket for standard trans
pliant to accounting standards.
action sizes. The endogenous com ponent corresponds to the
im pact on prices of the liquidation of a position in a relatively In the following section, we first describe how, following exist
tight m arket, or more generally when all m arket participants ing literature, exogenous liquidity might be integrated into
react in the same way, and therefore applies to orders that are VaR m easures. We then review several aspects of endogenous
large enough to move m arket prices (Bangia et. al. (1999a), liquidity risk, and detail how this risk could be integrated in
Bervas (2006)). Exogenous liquidity risk, corresponding to the portfolio valuation and VaR com putation. A t last, we discuss
normal variation of bid/ask spreads across instruments can be, on the choice of the VaR horizon when taking into account
from a theoretical point of view, easily integrated into a VaR liquidity risk.
fram ework. Endogenous risk, corresponding to the im pact on
m arket prices of the liquidation of a position, or of collective
portfolio adjustm ents, is more difficult to include in a VaR com
Exogenous Liquidity
putation. Its im pact, however, may be very significant, especially For the trading portfolio, following IAS rules, only exogenous
for many com plex derivatives held in trading books of large liquidity risk will be taken into account in the valuation of cash
institutions. assets and derivatives. Bangia et. al. (1999a) propose adding
the bid/offer spread to characterise exogenous liquidity as a risk
factor.
Point of Their method poses that the relative spread, S = (A sk -B id )/
Mid-price, has sam ple mean and variance jl and a 2. If the 99%
quantile of the normalised distribution of S is q099, then the Cost
of Liquidity is defined as
C o Lt =
where Pt is today's value of the position. C o Lt is added to VaR to

form a liquidity-adjusted VaR.
Endogenous Liquidity: Motivation

A dverse m arket conditions can generate a flight to liquid and
high-quality assets, which reduces the ability to unwind posi
Figure 6.1 Effect of position size on liquidation value. tions in thinly-traded, low-quality assets. The effect can be
Source: Bangia et. al. (1999a). com pounded when the inventory of market makers becom es
im balanced, thus reducing their willingness to further accom (the so-called volatility smile). This literature includes the
m odate sell trades, and when risk m anagem ent standards for work of Platen and Schweizer (1998), Sircar and Papanicolaou
traders becom e tighter, reducing the probability of finding a (1998) , Schonbucher and W ilm ott (2000) and Subramanian
counterparty. (2008).
Margin requirem ents are also a source of variation in the

response of assets' prices and liquidity to fundam ental shocks, Endogenous Liquidity and Market Risk
because higher margins increase the probability of binding
for Trading Portfolios
funding constraints. W hile the choice of margin requirem ents is
endogenous to a security's liquidity, assets with identical payoffs Several authors have studied the im plications of endog
can have different prices depending on margin requirem ents enous liquidity risk for portfolio valuation and on value-at-
and the opportunity cost of capital. risk measures (Jarrow and Protter (2005), Rogers and Singh
(2005)). In general, these authors define an optimal liquida
The trading activities associated with hedging may also have an
tion strategy in a finite (or infinite tim e horizon) model and
im pact on the dynamics of the underlying assets. For exam ple,
deduce from this strategy the market value of the portfolio
delta hedging an option position entails buying the asset when
which is equal to the expectation of its liquidation price. The
its price goes up, and selling it when the price goes down: if
associated VaR m easure, defined as a confidence interval
the size of these adjustm ents is not negligible with respect to
around this expected price, im plicitly incorporates m arket and
the volumes traded on the underlying, this strategy will increase
liquidity risks.
upward and downward price movements.
Some studies suggest that endogenous liquidity costs should
Such effects will be particularly im portant when:
be added to position returns before carrying out VaR calcula
• the underlying asset is not very liquid, tions. To that end, Bervas (2006) suggests to incorporate Kyle's
• the size of the positions of the investors hedging an option is Lambda or Am ihud's (2002) illiquidity ratio in returns. Both
im portant with respect to the market, measures are based on the relationship between returns and
• large numbers of small investors follow the same hedging volum e. Wu (2009) applies the illiquidity cost of Amihud (2002)
to stock returns and calculates the sum as "liquidity-adjusted
strategy,
returns." VaR is then estim ated by applying a G A R C H type
• the m arket for the underlying of the derivative is subject
model to the adjusted returns. Francois-Heude and Van Wyn-
to asym m etric information, which m agnifies the sensitivity
endaele (2001) suggest an approach that m odifies the model
of prices to clusters of similar trades (Gennotte and Leland
of Bangia et. al. (1999a) by using average weighted bid-ask
(1990)).
spreads, with weights based on volum e. Berkowitz (2000b)
In particular, on some specific m arkets driven by exotic options proposes to incorporate price im pact of an im m ediate liqui
(e.g ., Power Reverse Dual Callable, some C P P I5 strategies, etc.), dation via the concept of elasticity of dem and. Jarro w and
even if a bank's trading book positions are small with respect to Subramanian (2001) modify the mean and variance that appears
the m arket, this bank may be exposed to losses due to endoge in the standard param etric VaR formula to incorporate means
nous liquidity. W hen many other banks have the same kind of and variances of liquidation tim e and liquidity discount. Botha
positions and none has an opposite position,6 all these banks (2008) extended Jarro w and Subramanian (2001) to the two
will have to adjust their hedging portfolio in the same way at the assets portfolio level. O ther notable papers include Le Saout
same tim e, and will then influence the m arket dynam ics, and (2002) and Hisata and Yamai (2000). Finally, A cerbi and Scan-
thus its small position may then be exposed to a significant dolo (2008) explore the im pact of m arket and funding liquidity
liquidity cost. on portfolio prices and risk m easure. The authors revisit the
coherent measures of risk criteria introduced by A rtzner et. al.
The im plications of derivative hedging have been extensively
studied and derivative hedging has been identified as a poten (1999) . They explain how these criteria should be interpreted; in
tial explanation for the relation between implied volatilities particular they study liquidity models that lead to solutions for
the valuation of portfolios constituted of analytically tractable
and strike prices that can be observed on option markets
assets.
The liquidity risk adjustm ents proposed in the academ ic litera

5 CPPI: Constant Proportion Portfolio Insurance. ture, for the most part, have not been applied to the trading
6 The clients of these banks may be investors who do not dynamically books of banks. O ne reason for this may be that the suggested
hedge their positions. valuation methods are not necessarily com pliant with actual
accounting standards.7 A nother reason academ ic proposals determ ining the tem poral horizon by the size of the position
have been slow to be adopted may be the difficulty of estim at and the liquidity of the m arket. Haberle and Persson (2000)
ing model liquidity param eters, especially for O T C products. propose a method based on the fraction of daily volum e that
Indeed, the necessary data are not always available, and some can be liquidated without significant im pact on the market
of these param eters may be subjective. But recent discussions price, which can be interpreted as holding the horizon fixed
in academ ic circles regarding O T C transaction reporting could and determ ining how much can be liquidated during that hori
contribute to solve this problem . zon. The method of Jarro w and Subramanian (2001) is also
relevant in this context as it requires an estim ate of the average
A study of the im pact of endogenous liquidity on the valuation
liquidation tim e.
of exotic derivatives, similar to the contributions of exogenous
liquidity, would be especially w elcom e. W hen significant market Previous work of the Research Task Force suggests an in terd e
movements m aterialise, traders will adjust their hedging strate p e n d e n ce between risk assessm ent and liquidity horizon: On
gies which may have an im pact on the m arket dynamics if the the one hand the exposures of banks to m arket risk and credit
volumes they have to trade are significant. Such effect has been risk may vary with a risk horizon that is set dependent on mar
suggested as a possible explanation for the significant trading ket liquidity. If liquidity decreases, for exam ple, the risk horizon
losses that some banks have experienced during the last finan lengthens and the exposure to credit risk typically increases.
cial crisis. On the other hand, liquidity conditions are also affected by per
ceptions of m arket and credit risk. A higher estim ate of credit
Some authors have integrated liquidity risk with m arket and
risk for exam ple, may adversely affect the willingness to trade
credit risk. For exam ple, in order to evaluate a portfolio, Zheng
and thereby m arket liquidity (see Basel Com m ittee on Banking
(2006) studies optimal liquidation strategies, taking into account
Supervision (2009a)).
m arket and liquidity risk, together with the probability of default
of an issuer or of a counterparty. Stange and Kaserer (2008) sug Liquidation horizons vary over the business cycle, increasing
gest calculating liquidity-adjusted VaR conditional on the m arket during tim es of m arket stress. Besides transaction costs or the
value of a position by incorporating bid-ask spread liquidity size of the position relative to the m arket, a trade execution
adjustm ents in returns; Qi and Ng (2009) discuss intraday liquid strategy also depends on factors like expected price volatility
ity risk and its im pact on VaR. and risk aversion (Huberman and Stanzl (2005)). If, for instance,
risk aversion increases during a crisis, an investor may choose to
trade more rapidly than during normal tim es, thus generating
Adjusting the VaR Time Horizon higher losses than those observable under favourable econom ic
to Account for Liquidity Risk conditions.
The recent financial crisis has provided exam ples where a

change in m arket liquidity conditions alters the liquidity hori Conclusions
zon, i.e ., the tim e required to unwind a position without unduly
Both exogenous and endogenous liquidity risks are important;
affecting the underlying instrument prices (including in a
endogenous liquidity risk is particularly relevant for exotic/
stressed market). This finding was already addressed in previ
com plex trading positions. W hile exogenous liquidity is partially
ous work of the Research Task Force (see Basel Com m ittee
incorporated in the valuation of trading portfolios, endogenous
on Banking Supervision (2009a)) and it is consistent with the
liquidity is typically not, even though its im pact may be substan
literature.
tial. Although endogenous liquidity risk is especially relevant
Lawrence and Robinson (1997), for exam ple, suggest that the under stress conditions, portfolios may be subject to signifi
application of a unique horizon to all positions by ignoring cant endogenous liquidity costs under all market conditions,
their size and level of liquidity is undesirable. They suggest depending on their size or on the positions of other market
participants.
The academ ic literature suggests as a first step to adjust valu

7 For example, IAS 39 specify in AG72: "The appropriate quoted market
ation m ethods in order to take endogenous liquidity risk into
price for an asset held or liability to be issued is usually the current bid
price. . . . The fair value of a portfolio of financial instruments is the account. Then a VaR integrating liquidity risk could be com
product of the number of units of the instrument and its quoted market puted. Notwithstanding academ ic findings on this topic, in prac
price," and in AG75: "The objective of using a valuation technique is to tice, the ability to model exogenous and endogenous liquidity
establish what the transaction price would have been on the measure
ment date in an arm's length exchange motivated by normal business may be constrained by limited data availability, especially for
considerations." O T C instruments.
6.4 RISK M EA SU R ES Heath (1999) point out that VaR measures only quantiles of
losses, and thus disregards any loss beyond the VaR level. As a
Overview consequence, a risk m anager who strictly relies on VaR as the

only risk measure may be tem pted to avoid losses within the
This section com pares selected risk measures that appear to confidence level while increasing losses beyond the VaR level.
be relevant for risk m anagem ent purposes either today or in This incentive sharply contrasts with the interests of regulators
the future. The alternative measures considered include VaR, since losses beyond the VaR level are associated with cases
expected shortfall and spectral m easures of risk. The key fe a where regulators or deposit insurers have to step in and bear
tures used to decide among alternative risk m easurem ent some of the bank's losses. Hence, VaR provides the risk m anager
approaches include ease of calculation, numerical stability, the with incentives to neglect the severity of those losses that regu
possibility to calculate risk contributions of individual assets to lators are most interested in.
portfolio risk, backtesting possibilities, incentives created for
N eglecting the severity of losses in the tail of the distribution
risk m anagers, and, linked to the latter, the relation between
also has a positive flipside: it makes back-testing easier or pos
risk m easures and regulators' objectives. Although few financial
sible in the first place simply because em pirical quantiles are per
institutions currently make use of VaR alternatives, those that do
se robust to extrem e outliers, unlike typical estim ators of the
are often considered as technologically leading in the industry.
expected shortfall, e .g ., (see below).
In the literature, risk measures are usually defined as functions of
VaR is criticised for not being a co h eren t risk m easure, which
random variables (portfolio losses or returns in most cases). This
means that VaR lacks an axiom atic foundation as proposed by
seems to be a trivial aspect but is actually a substantial restriction
Artzner et. al. (1999). They set out the following four consistency
because it binds the analysis to one point of time; while this time
rules. A risk measure R is called co h eren t if it satisfies the follow
horizon can be varied, a jo in t analysis of a portfolio's losses at
ing axiom s.
several times, which may be important for asset/liabilities manage
ment, is excluded. Risk measures being a function of random loss • Subadditivity (diversification) R(L| + L2) ^ R(L|) + R(L2)
variables also means that these variables are not an attribute of risk • Positive homogeneity (scaling) R(AL) = AR(L), for every A > 0
measures; the probability distributions of the variables are speci
• M onotonicity R(L-|) < R(L2) if L| < L2
fied in a preceding step, and the analysis of risk measures is not an
• Transition property R(L + a) < R(L) — a
analysis of whether the random variables are correctly specified.
VaR is not coherent because it may violate the subadditivity cri
In our discussion of alternative measures we focus on VaR
terion. For why subadditivity indeed makes sense we quote from
because of its high relevance to the industry today and on
McNeil et. al. (2005):
Expected Shortfall and Spectral M easures because of their
advantages and hence a potentially growing im portance in • "Subadditivity reflects the idea that risk can be reduced by
the future. O ther risk m easures, such as variance or upper-tail diversification, . . . the use of non-subadditive risk measures
moments are briefly sketched for com pleteness. in a M arkowitz-type portfolio optimisation problem may lead
to optimal portfolios that are very concentrated and that
would be deem ed quite risky by normal econom ic standards.
VaR
• If a regulator uses a non-subadditive risk measure in deter
Concept of VaR and Its Problems mining the regulatory capital for a financial institution, that
institution has an incentive to legally break up into vari
VaR has becom e a standard measure used in financial risk man
ous subsidiaries in order to reduce its regulatory capital
agem ent due to its conceptual sim plicity, com putational facil
requirem ents . . . .
ity, and ready applicability. Given some random loss L and a
confidence level a, VaRa(L) is defined as the quantile of L at the • Subadditivity makes decentralisation of risk-managem ent
probability a. The quantile is not necessarily unique if there are system s possible. Consider as an exam ple two trading desks
regions where the loss distribution function F L does not grow. with positions leading to losses L| and L2. Imagine that a risk
For these cases, McNeil et. al. (2005) define the VaR as the m anager wants to ensure that R(L), the risk of the overall loss
sm allest, i.e ., most optim istic quantile: L = L | + L2, does not exceed some number M. If he uses a
subadditive risk measure R, he may sim ply choose bounds
VaRa (L) = inf {/ : FL (/) > a}
M -1 and M2 such that M-\ + M 2 < M and impose on each of
Despite its prevalence in risk m anagem ent and regulation, VaR the desks the constraint that R(L,) < M,-; subadditivity then
has several conceptual problem s. Artzner, D elbaen, Eber and ensures autom atically that R(L) < M-| + M2 < M ."
R em ark 1: Related to non-coherency of VaR, Basak and Shapiro As the textbook exam ple relies on jum ps in the loss distribution
(2001) create an exam ple where VaR-based risk m anagem ent one might conjecture that VaR works properly if loss distribu
may possibly be problem atic. They analyse optim al, dynamic tions are smooth or if discrete losses are superim posed by suf
portfolio and wealth/consum ption policies of utility maximising ficiently large smooth ones. W hether this intuition is correct
investors who must also manage market-risk exposure using ultim ately depends on the situation and particularly on the tail
VaR. They find that VaR risk m anagers often optim ally choose a thickness of the loss distributions:
larger exposure to risky assets than non-VaR risk managers and
• McNeil et. al. (2005) present an exam ple of a continuous two-
consequently incur larger losses when losses occur.
dimensional loss distribution in which VaR violates subad
R em ark 2 : A t first glance, subadditivity and positive hom ogene ditivity. W hile this is alarming in that it does not build on the
ity may not appear as meaningful concepts when risk measures abovem entioned textbook ingredients, the exam ple is still
are applied to counterparty credit risk (CCR) or other types of rather artificial.
credit risk. For exam ple, assume there is C C R involved with • If the joint distribution of risk factors is elliptical (multivariate
some position in the trading book. Doubling the position can normal, e.g .), VaR is subadditive; see McNeil et. al. (2005),
m ore than double the C C R sim ply because not only the exp o Theorem 6.8.
sure doubles but also because the position becoming extrem ely
• Gourier, Farkas, and A bbate (2009) give an exam ple where
profitable can make the counterparty go bankrupt. This appears
the sum of some fat-tailed, continuously distributed, and
to contradict the postulate of positive hom ogeneity which
independent (!) random variables has a larger VaR than the
claims R[2L) = 2 R(L). However, it is not that positive hom ogene
sum of individual VaRs. The exam ple is rather exotic as one of
ity is wrong for C C R but rather that this idea reflects a misunder
the random variables has infinite mean. W hile VaR fails in that
standing of risk measures as functions of positions. G enerally,
case, it must be conceded that there is no coherent and prac
the risk of the doubled position will not be 2L but rather a
ticable alternative at all because any coherent risk measure
random variable with a w ider probability distribution. Similar
must be infinite th en .8
effects are possible for subadditivity; the issue is related to the
Unified versus Com partm entalised Risk M easurem ent section • Danielsson, Jorgensen, Sam orodnitsky, Sarm a, and de Vries
(2005) prove that VaR is subadditive for a sufficiently high
on w hether a com partm entalised m easurem ent of risk is appro
priate. For instance, there may exist two positions which cause confidence level if the total loss has finite mean. Note, how
ever, that this is not an "all clear" signal but an asym ptotic
individual risks L, and L2, respectively, if held alone, but the
risk of holding both positions may be more severe than L, + L2 result only. G enerally it may happen that subadditivity is only
achieved for im practicably high confidence levels.
for similar reasons as in the above exam ple. Knowing this, one
might question the subadditivity property as such because it • D egen, Em brechts, and Lam brigger (2007) restrict their anal
requires R(L| + L2) < R(L|) + R(L2). However, not subadditivity is ysis to a param etric class of distributions but gain valuable
to blame but a potential misunderstanding of L-| + L2 as the risk insight into the interplay of tail thickness, confidence level,
of holding both positions together. These considerations imply and subadditivity. For exam ple, they find the 99%-VaR to be
two lessons: superadditive even for very m oderate tail indices above 6,
which means that moments of order 6 and lower may exist.9
• It may be problem atic to assume that a vector of assets lin
These are realistic cases in m arket risk. The dependence
early maps into the associated vector of random losses.
structure between the individual losses generally aggravates
• If a "risk m easure" is defined as a com posed mapping from the problem but has surprisingly low im pact in the cases
position s (via the loss variable) to numbers, this mapping is
considered.
generally not coherent. Assum ing coherence can lead to an
underestim ation of risk.
8 Gourier et. al. (2009) refer to Delbaen (2002) who shows in Theorem
Is VaR Failing Subadditivity Relevant in Practice? 13 that, given a continuous distribution and some continuity of the risk
measure, any coherent risk measure larger or equal than the a-VaR can
The favourite textbook exam ple of VaR violating subadditivity not fall short of the a-expected shortfall; the latter is already infinite in
is constructed with the help of two large losses the probability Gourier's example so that no useful coherent measure of that risk can
of which is lower than the confidence level of the VaR. When exist.
measured separately, each loss can have zero VaR but when 9 The higher the tail index, the thinner is the tail. Degen et. al. (2007)
aggregated, the probability that either of the losses occurs may consider random variables the distribution tail of which is as thick as that
of a transform exp(gZ + 0,5hZ2) of a standard normal Z; e.g., g — 2.3
exceed the confidence level so that the VaR of the aggregated and h — 0.25 make the VaR super-additive; the tail index is 4 in this
loss is positive. example.
To sum up, while the literature provides us with conditions that into (2) to reconcile it with the correct ES from (1); see A cerbi
assure VaR is subadditive (and thus coherent), these condi and Tasche (2002).
tions are generally not fulfilled in the m arket risk context; for
The calculation of ES and the marginal contributions of assets
exam ple, Balaban, O uenniche and Politou (2005) estim ate tail
to portfolio ES is more challenging than the corresponding
indices between 1 and 2 for UK stock index returns over hold
calculations for VaR, especially for high confidence levels,
ing periods between 1 and 10 days, meaning that these tails are
because a formula for the a-quantiles of the loss distribution
substantially heavier than necessary for assuring the subadditiv
is often missing. Simulations need to be done in most cases.
ity of VaR in general.
Since the introduction of expected shortfall, substantial prog
ress has been made on com putational issues, mainly through
the application of im portance sampling techniques (Kalkbrener,
Expected Shortfall Lotter, and O verbeck (2004), Egloff, Leippold and Johri (2005),
Expected shortfall (ES) is the most well-known risk measure or Kalkbrener, Kennedy, and Popp (2007)). Research suggests
following VaR. It is conceptually intuitive and has firm theoreti that com putational techniques have advanced to a point that
cal backgrounds; see, e .g ., Dunn (2009), Artzner et. al. (1999), expected shortfall is a viable risk m anagem ent option for finan
Acerbi and Tasche (2002), Sy (2006), and Yamai and Yoshiba cial institutions.
(2005). Therefore, it is now preferred to VaR by an increasing R em ark 3 : In Remark 1, it is noted that utility optim isation using
number of risk managers in the industry. a VaR constraint can lead to perverse investm ent decisions. Risk
ES corrects three shortcom ings of VaR. First, ES does account measures which control the first moment of a random variable
for the severity of losses beyond the confidence threshold. This (such as ES) have been proposed to overcom e this problem .
property is especially im portant for regulators, who are, as dis However, recently W ylie, Zhang, and Siu (2010) showed that in
cussed above, concerned about exactly these losses. Second, the context of hedging both ES and VaR can give rise to dis
it is always subadditive and coherent. Third, it m itigates the continuous hedging behaviour that can lead investors to take
im pact that the particular choice of a single confidence level extrem ely high-risk positions even when apparently minimising
may have on risk m anagem ent decisions, while there is seldom the risk m easures.
an objective reason for this choice.
To define ES, let L be a random loss with distribution function

Backtesting ES
F l and a G (0,1) a confidence level (close to 1). Recall that the Intuitively, backtesting ES is more com plicated and/or less pow
a-VaR is defined as the a-quantile of FL. The ES at level a is erful than backtesting VaR because the robust statistic given by
defined by the number of VaR violations, as the most common VaR backtest
statistic, must be replaced by something that accounts for the
1 /-1
ESa =------ / V a R JL )d u (6.1) m agnitude of VaR exceedances so that ES backtests by nature
1 “ « Ja
have to cope with the size of outliers.
and can thus be understood as an average of all VaRs from level
a up to 1. ES is a coherent risk m easure— and so subadditive. It W hether specialised ES backtests are good or not, one simple
is continuous in a and thus avoids cliff effects that may appear option is always available: during an ES calculation, the VaR at
when the distribution has discrete com ponents. the same a can be generated as a by-product with low addi
tional effort. O ne can backtest this VaR with traditional methods;
If the loss distribution is continuous, there is an even more
if the VaR is rejected, the corresponding ES calculation can
intuitive representation:
hardly be correct. O f course, VaR backtest acceptance does not
E S a = E (L | L > V a R J, (6.2) guarantee the correctness of the ES calculation, and this would
be true even if the VaR backtest were always right.
i.e ., ES is then the exp ected loss conditional on this loss
belonging to the 100(1 — a ) percent w orst losses. This m ea Some backtests verify if the VaR correctly adjusts for changes in
sure has several other nam es like tail conditional expectation risk dynamics ("conditional coverage"; see Berkowitz and O 'Brien
(TC E) or conditional VaR (CVaR). It is the key to simulations- (2002)). According to Pritsker (2006), they exploit the fact that
based calculations of ES but care has to be taken as it does exceedances of a correctly calculated VaR "should not help fore
not always coincide with ES, and it is also not necessarily cast future exceedances. Therefore, the autocorrelation function
subadditive. The technical problem arises if the distribution of the VaR exceedances should be equal to 0 at all lags." It is
function jum ps from a value below the VaR confidence level to hard to decide whether ES or VaR is verified with these backtests
a value above it. Th en , a correction term must be introduced because VaR exceedances are the very constituents of ES.
Because backtests that are strictly focused on some historical These results are promising, but in the context of banking regu
estim ator of the risk m easure, like the number of VaR violations, lation it must be taken into account that Wong's backtest would
often have low power, several authors propose to backtest the require that banks provide more information than they cur
whole distribution (or at least the tail), for instance by transform rently do for regulatory backtests. A t present, past returns are
ing loss realisations with the forecasted loss distribution: If the com pared with reported VaRs. With Wong's backtest, the bank
latter is correct, the transform ed sam ple must be equally distrib would also have to report its estim ates of tail thickness, which
uted on [0,1]. This hypothesis can be tested (Berkowitz (2001)). is potentially involved with weird incentives. For instance, banks
W hile not all backtests of this kind could be used in regulation,10 might, keeping minimum capital constant, be tem pted to rely on
Kerkhof and M elenberg (2004) follow this approach to develop certain tail distributions under which Wong's backtest has par
test statistics directly applicable to VaR and ES. The test statistic ticular low power so that it is difficult to provide firm evidence of
for the ES involves, besides the forecasted ES and VaR, also the wrong risk reporting. W hether such concerns are substantial is
calculation of the ES of the sq u a red loss, which would be a tol left to future research.
erable extra effort in practice.
Kerkhof and M elenberg (2004) show that their backtest statistics Spectral Risk Measures
for ES perform better than those for VaR. They also derive regu
Spectral risk m easures (SRM) are a promising generalisation of
latory multiplication factors for their backtests and conclude that
ES (Acerbi (2002)). W hile the a-ES assigns equal w eight to all
"the resulting regulatory capital schem e using expected shortfall
/3-VaRs with (3 > a but zero to all others, an SRM allows these
com pares favourably to the current Basel Accord backtesting
weights to be chosen more freely. This is im plem ented by a
schem e." It is im portant to notice that, according to Kerkhof
w eight function w:[0,1 ] —»[0, =°) that integrates to 1. An SRM is
and M elenberg (2004), a comparison of an a-ES with an a-VaR
form ally defined as
is not "fair" in the context of econom ic or regulatory capital.
Since ESa > VaRa for the same confidence level a , they lower SRM = f
Jo
w{u)VaRu (L)du
the confidence level a ' for the ES such that E S (a ') ~ VaR(a).
The intuition is that a regulator would require roughly the same Expected shortfall is a special case of spectral measure, where
amount of capital for a fixed portfolio, irrespective of the risk w(u) = (1 — a p 1 1{a<u<-|}. The definition of SRM is restricted
measure in use. to functions w that increase over [0,1 ], which ensures that the
risk measure is coherent. This restriction also implies that larger
This aspect is im portant not only in the context of backtesting
losses are taken more seriously than sm aller losses and thus the
but also when estimation errors for ES and VaR are com pared.
function w establishes a relationship to risk aversion. The intu
Yamai and Yoshiba (2005) find ES estim ates of fat (generalised
ition is that a financial institution is not very risk averse for small
Pareto distributed) tailed losses to be much more volatile than
losses, which can be absorbed by income, but becom es increas
their VaR counterparts but they com pare ES and VaR at the
ingly risk averse to larger losses. A s there may be a level of loss
same confidence level. A comparison in the spirit of Kerkhof and
where em ploying additional capital to absorb yet higher loss
M elenberg (2004) seem s not to have been conducted so far but
is no longer desirable, such losses should be given the highest
could easily be done.
weights from a regulator's angle because often the public would
Wong (2008) suggests another backtest statistic for ES that have to bear such losses. Intuitively, a w eight function that
accounts for the small sam ples of VaR exceedances. The statistic increases can also be thought of as marginal costs that rise while
is derived for normally distributed losses and turns out to per losses becom e increasingly rare, i.e ., large.
form very well under these assumptions. The test is also pow
Another advantage of SRM over ES (and VaR, a fortiori) is that
erful in detecting non-normal VaR exceedances. For the case
they are not bound to a single confidence level. Rather, one can
that a bank m odels non-normal losses when calculating the ES,
choose w to grow continuously with losses and thereby make
Wong suggests to derive adapted saddle-point approxim ations
the risk measure react to changes in the loss distribution more
for the estim ator's distribution or to use the sam ple transform
sm oothly than the ES, and avoid the risk that an atom in the dis
as used by Berkowitz (2001) and Kerkhof and M elenberg (2004).
tribution being slightly above or below the confidence level has
large effects.
10 Some of these tests require that the bank fully specifies the loss If the underlying risk model is sim ulation-based, the additional
distribution in the tail, not just the risk measure (ES or VaR). While this effort to calculate an SRM as opposed to the ES seem s negli
should not be a problem for bank internal purposes, a fully specified tail
distribution would entail a fairly complex interface between bank and gible; the sim ulated VaR realisation are just differently weighed
supervisor. (Acerbi (2002)).
In spite of their theoretical advantages, SRM s other than ES are indeed implements risk aversion over the whole range of losses
still seldom used in practice. However, insurers use the closely but particularly in the tail. It has been applied to the pricing of
related concept of distortion m easures (see the next section). catastrophe insurance contracts and exotic option pricing where
Prominent exam ples such as the measure based on the Wang Black-Scholes assumptions cannot be applied.
transform ation (see next page) are also SRM s.
Variance: The variance is historically the most im portant risk
Remark 4: Leaving aside that w must be increasing to m eet the measure and w idely used in practice. It has many desirable
definition of SRM , VaR is a limiting case of spectral risk m ea properties but at least two drawbacks from a regulatory per
sures: for instance, the sequence of SRMs based on the w eight spective. McNeil et. al. (2005) state "if we want to work with
functions wn(u) = 0.5n1{a_ n-i<u<a+n-ij converges to the a-VaR. variance, we have to assume that the second moment of the loss
distribution exists. . . . [V arian ce is a good measure of risk only
for distributions which are (approxim ately) sym m etric. . . . How
Other Risk Measures ever, in many areas of risk m anagem ent, we deal with highly
There also are a number of other risk measures which are briefly skewed distributions."
introduced in this subsection. The mean deviation, defined as MD(L) = E| L — EL|, can do
Distortion risk measures: These m easures are used in actuarial without second moments but suffers from the same problems
risk m easurem ent. The definition is very general; both spectral with skewed distributions as the variance. It is less accessible to
risk m easures (including ES) and the VaR are nested. To define analytical treatm ent than the variance and therefore rarely used
distortion risk m easures, let D be any distribution function on as a risk measure.
[0,1] that is right-continuous and increasing with D(0) = 0 and Upper partial moments (see McNeil et. al. (2005)): Given a
D(1) = 1. This D is called the distortion function. A distortion risk loss distribution F L, an exponent k > 0 and a reference point q,
m easure of loss L is defined as which could be some VaR, the upper partial moment UPM (k,q) is
defined as
DM(L) = / VaRu (L)dD(u)
Jo r. 00
UPM(k.q)= / (/ - q)k dFL (/)
Each spectral risk measure is clearly a distortion risk measure; J q
to see this, recall that the w eight function w integrates to 1 and
Hence, for k > 1 an UPM measures losses beyond the thresh
observe that the SRM and the distortion measure defined by the
old q with increasing weight. It is therefore related to spectral
r
antiderivative D(u) = / w(s)ds are identical.
risk measures in spirit but not equivalent in analytic term s. The
Jo
Distortion risk measures are not necessarily coherent; the defini higher k is, the more conservative is the UPM. For k = 1 and
tion allows for distortion functions with a non-monotonous continuous loss distributions, there is a close relationship with
derivative (this is just the w eight function of the corresponding expected shortfall:
SRM), whereas A cerbi (2002) has shown that the monotonicity of UPM(1 ,VaRa) = (1 - a )(E S a - V a R J
19
w is also necessary for the risk measure to be coherent.
Left-tail measure: In a similar vein of mean deviation and lower
The VaR has a representation as a distortion risk measure by
(upper) partial m om ent, Wu and Xiao (2002) propose a left-tail
D vaR( u ) = 1 {u > a }-
m easure, defined as the conditional standard deviation of VaR
The Wang transform (Wang (2001)) Dân9(u) = <D(cP~\u) + Iog0), exceedances, i.e.,
where <I>denotes the Gaussian distribution function and 0 < 1,
is an interesting distortion function. The corresponding risk V a R j f | L > VaRu
LTM
measure is also a spectral risk measure because the first deriva
tive of Dân9 is strictly increasing. Hence the Wang transform 1
2 Wu and Xiao (2002) show that the left-tail measure is useful
particularly for the m easurem ent of non-normal tail risks. This
11 At least one reputable risk consulting company reports it is currently risk measure has several undesirable features such as a lack of
implementing an SRM-based risk management system for some of its coherency and a heavy burden of calculation.
clients.
12 Wang (2001) claims all smooth distortion measures are coherent. This
is wrong as subadditivity is missing in general. Wang (2001) means to Conclusions
build on Wang, Young and Panjer (1997) which, however, state that a
distortion measure is subadditive if it is convex (in our notation). The lat W hile VaR has been criticised for its lack of coherence, until
ter is correct and conforms to Acerbi (2002). recently it was unclear w hether this flaw is relevant for real
asset portfolios, particularly for risks in the trading book. A survey of stress testing practices conducted by the Basel
Degen et. al. (2007) have shown that the lack of coherence Com m ittee in 2005 showed that most stress tests are designed
can be an im portant problem for trading book risk m easure around a series of scenarios based either on historical events,
ment. A risk m easurem ent based on VaR is thus not necessarily hypothetical events, or some combination of the tw o. Such
conservative. methods have been criticised by Berkowitz (2000a). W ithout
using a risk model the probability of each scenario is unknown,
The ES avoids the major flaws of VaR but its fundam ental dif
making its im portance difficult to evaluate. There is also the pos
ference from VaR— that it accounts for the magnitude of losses
sibility that many extrem e yet plausible scenarios are not even
beyond a threshold— is an equally im portant advantage. By this,
considered.
it aligns the interests of bank m anagers and owners to those of
the public much better than VaR. Berkow itz proposed the integration of stress testing into
form al risk m odelling by assigning probabilities to stress-
Much of the criticism of ES that has been brought forward
test scenarios. The resulting risk estim ates incorporate both
in defence of VaR could be refuted. Advanced sim ula
traditional m arket risk estim ates and the outcom es of stress
tion techniques have helped to make ES calculations stable
tests, as well as the probabilities of each. Th erefo re, they
enough, and ES and VaR backtests have similar power, if com
provide an integrated set of risk indicators and estim ates to
pared on the basis that both risk m easures have roughly the
w ork w ith.
same value.
Spectral risk measures are a promising generalisation of

Incorporating Stress Testing into
expected shortfall. The main advantages are improved sm ooth
ness and the intuitive link to risk aversion. If the underlying risk
Market-Risk Modelling
model is sim ulations-based, the additional calculation effort as Traditional stress testing exercises can be classified into
opposed to ES seem s negligible. three main types, which differ in how the scenarios are
constructed:
1 . historical scenarios;
6.5 STRESS TESTING PRACTICES FOR
2. predefined or set-piece scenarios where the im pact on
MARKET RISK P/L of adverse changes in a series of given risk factors is
sim ulated;
Overview
3 . mechanical-search stress tests, based on autom ated rou
VaR limitations have been highlighted by the recent financial tur tines to cover prospective changes in risk factors, then the
moil. Financial industry and regulators now regard stress tests as P/L is evaluated under each set of risk-factor changes, and
no less im portant than VaR methods for assessing a bank's risk the worst-case results are reported.
exposure. A new em phasis on stress testing exercises derives
All these approaches depend critically on the choice of sce
also from the am ended Basel II fram ework which requires banks
narios. A related problem is that the results of stress tests are
to com pute a valid stressed VaR number.
difficult to interpret because they give no idea of the prob
A stress test can be defined as a risk m anagem ent tool used abilities of the events concerned (Berkowitz (2000a)). These
to evaluate the potential im pact on portfolio values of unlikely, criticisms can be addressed by integrating stress testing into
although plausible, events or movements in a set of financial the m arket risk modelling process and assigning probabilities
variables (Lopez (2005)). They are designed to explore the tails to the scenarios used in stress testing. O nce scenarios are put
of the distribution of losses beyond the threshold (typically 99%) in probabilistic form , a unified and coherent risk m easurem ent
used in value-at-risk (VaR) analysis. system is obtained rather than two incom patible ones and
backtesting procedures can be applied to impose some (albeit
However, stress testing exercises often are designed and im ple
limited) check on scenarios. Inevitably, the choice of scenarios
mented on an ad hoc com partm entalised basis, and the results
will remain subjective, but even there, the need to assign
of stress tests are not integrated with the results of traditional
probabilities to scenarios will impose some discipline on risk
m arket risk (or VaR) m odels. The absence of an integrated
m anagem ent.
fram ework creates problem s for risk m anagers, who have to
choose which set of risk exposures are more reliable. There is Several authors have developed an integrated approach
also the related problem that traditional stress testing exercises to stress testing including Kupiec (1998) who exam ines
typically remain silent on the likelihood of stress-test scenarios. cross-m arket effects resulting from a m arket shock and
Aragones et. al. (2001) who incorporated hypothetical stress of the actions of trad ers during historical events, and it may
events into an Extrem e Value Theory (EVT) fram ework. g enerate inconsistent results by am plifying the m agnitude
of the losses. Such issues have not yet been addressed in the
A lexan d er and Sheedy (2008) analysed the problem of deter
literature.
mining the most suitable risk model in which to conduct
a stress test. O bviously if the model is m is-specified, their
approach is vulnerable to a considerable degree of model risk.
Hence a significant part of their research is supported through
Stressed VaR
backtests, which are designed to reduce the model risk in risk The pressing technical issue now facing financial institutions that
m odels that are used for stress testing. They conduct backtests intend to com ply with the am ended Basel II fram ework is to
for eight risk m odels, including both conditional and uncon understand how to calculate a valid stressed VaR number. After
ditional m odels and four possible return distributions. Their the revisions of Ju ly 2009, banks have to calculate a VaR using
backtesting exp erim ent suggests that unconditional historical the risk engine it normally uses but "with model inputs cali
sim ulation, currently the most popular VaR m ethodology in brated to historical data from a continuous 12-month period of
the industry according to Perignon and Smith (2006), is likely significant financial stress relevant to the bank's portfolio" (Basel
to be m is-specified and is therefore unsuited for stress testing Com m ittee on Banking Supervision (2009b)).
purposes.
An over-simplistic interpretation of this specification might be to
Breuer et. al. (2009) define an operational definition to three increase the assumed volatilities of the securities in a portfolio.
requirem ents which the Basel C om m ittee specifies for stress This would have the effect of lengthening the tails of the G auss
tests: plausibility and severity of stress scenarios as well as ian (normal) loss distributions that underlie all standard VaR
suggestiveness of risk-reducing actions. The basic idea of their calculations.
approach is to define a suitable region of plausibility in term s
However, in order to calculate stressed VaR accurately it is
of the risk-factor distribution and search system atically for the
also necessary to stress the correlation m atrix used in all VaR
scenario with the w orst portfolio loss over this region. O ne
m ethodologies. It is a repeated observation that during times
key innovation of their approach com pared with the existing
of extrem e volatility, such as occurs during every m arket crash,
literature is the solution of tw o open problem s. Th ey suggest
correlations are dram atically perturbed relative to their 'normal'
a m easure of plausibility that is not dep end ent to the prob
historical values. In general, most correlations tend to increase
lem of dim ensional dependence of maximum loss and they
during m arket crises, asym ptotically approaching 1.0 during
derive a way to consistently deal with situations where some
periods of com plete m eltdown, such as occurred in 1987, 1998
but not all risk factors are stressed. Th ey show that setting the
and 2008.
non-stressed risk factors to their conditional exp ected value
given the value of the stressed risk facto rs, the procedure first O ne possibility is to adopt the conditional stress test approach
suggested by Kupiec (1998), m axim ises plausibility among of Kupiec (1998). In this approach, the risk factor distributions
the various approaches used in the literature. Furtherm ore, are conditional on an extrem e value realisation of one or more
Breuer et. al. (2010b) propose a new m ethod for analyzing of the risk factors. Conditional on a large move of at least
m ulti-period stress scenarios for portfolio credit risk more one factor, the conditional factor covariance m atrix exhibits
system atically than in the current practice of macro stress te st much higher correlations among the remaining factors. In this
ing. This m ethod quantifies the plausibility of scenarios by approach, the apparent shift in the correlation structure is a
considering the distance of the stress scenario from an average consequence of conditioning the distribution on a large fac
scenario. For a given level of plausibility their m ethod searches tor shock. The unconditional correlations remain unchanged.
system atically for the most adverse scenario for the given Analysing a large number of stress test results for currency
portfolio. portfolios over the Asian currency crisis period, Kupiec shows
that the conditional stress test process perform s extrem ely well
Finally, as a general point, it m ust be underlined that for the
as very few stress test violations are recorded during this crisis
purposes of calculating the P&L im pact of stress shock-factors
period.
it is generally assum ed that the shock occurs instantaneously,
i.e ., th at trad ers have no opportunity to re-hedge or adjust An alternative approach to conditional correlation is to stress
th eir positions, and it is ignored the im pact of declining tenors the unconditional correlation m atrix of the risk factors. Unfor
for, for exam p le, futures and options contracts. A p a rt from tunately, this approach is not as straightforw ard as the co ndi
sim plifying the calculations, such an assum ption could be tional correlation approach or stretching the tails of the loss
unreasonable in som e cases given the practical exp erien ce distributions. The VaR calculation engine requires a correlation
m atrix that satisfies the m athem atical property of positive d efi 6.6 UNIFIED VERSUS
niteness, which is a w ay of saying that all of the correlations
are internally consistent with each other. Noisy or erroneous
COMPARTMENTALISED RISK
historical price data can result in m atrices that are not positive M EASUREM ENT
definite. Perturbing the correlation m atrix, which is necessary
for a true stressed VaR calculation, may result in correlation Overview
m atrices that also violate the internal consistency requirem ent.
In this section, we survey the academ ic literature on the im plica
If the m atrix is not positive definite the VaR calculus will fail,
tions of modelling the aggregate risks present across a bank's
so m ethods have to be devised to m odify the stressed m atrix
trading and banking books using either a com partm entalised
until it becom es positive definite. Kupiec (1998) discusses
approach— namely, the sum of risks measured separately— or a
som e practical m ethods that can be used to address this
unified approach that considers the interaction between these
problem .
risks explicitly. Finally, we survey the recent literature on the sys
Besides these technical issues one may also more fundam entally tem ic implications of the current regulatory capital requirem ents
consider concepts that are not covered by the current regula that aggregate capital requirem ents across risk types.
tory definition of stressed VaR. A more sophisticated approach
In many financial institutions, aggregate econom ic capital needs
might include not only linear transform s of multivariate nor
are calculated using a two step procedure. First, capital is cal
mal risk factors but also em ploying 'fat-tailed' distributions
culated for individual risk types, most prom inently for credit,
to model the extrem e loss events more accurately. Exam ples
m arket and operational risk. In a second step, the stand-alone
of those 'extrem e value theory' distributions are the Gum-
econom ic capital requirem ents are added up to obtain the over
bel, G eneralised Pareto, W eibull, Frechet, and the Tukey g&h
all capital requirem ent for the bank.
distributions.
The Basel fram ework for regulatory capital uses a similar idea.
However, one should keep in mind that the stressed VaR is from
As discussed by Cuenot, M asschelein, Pritsker, Schuermann and
a theoretical perspective an im perfect solution— its purpose
Siddique (2006), the Basel fram ework is based on a "building
is to reflect that current m arket conditions may not lead to an
block" approach such that a bank's regulatory capital require
accurate assessm ent of the risk in a more stressful environ
ment is the sum of the capital requirem ents for each of the
ment. Extrem e value theory distributions may already incorpo
defined risk categories (i.e., m arket, credit and operational
rate extrem e m arket conditions and could in principle make
risk), which are calculated separately within the form ulas and
a stressed VaR redundant. In general, these distributions are
rules that make up Pillar 1. Capital requirem ents for other risk
flexible enough to obtain very good fits but serious robustness
categories are determ ined by the supervisory process that fits
issues arise instead, as regulators and risk managers had to learn
within Pillar 2; see Figure 6.2 which is reproduced from Cuenot
in the context of operational risk, for instance.
Conclusions B a n k in g b o o k T ra d in g b o o k
More recent research advocates the integration of stress testing Credit risk
into the risk modelling fram ework. This would overcom e draw Counterparty credit risk
backs of reconciling stand-alone stress test results with standard Interest rate risk
(general and specific)
VaR model output.
Equity risk
Pillar 1
Progress has also been achieved in theoretical research (general and specific)
on the selection of stress scenarios. In one approach, for Foreign exchange risk
exam ple, the "optim al" scenario is defined by the maximum Commodity risk
loss event in a certain region of plausibility of the risk factor Operational risk
distribution. Interest rate risk

Concentration risk
The regulatory "stressed VaR" approach is still too recent to Pillar 2
Stress tests
have been analyzed in the academ ic literature. Certain methods
Other risks (liquidity, residual, business...)
that could be meaningful in this context can be identified in the
earlier literature on stress testing. Em ploying fat-tailed distribu F ia u re 6 .2 Overview of risk categories relevant for
tions for the risk factors and replacing the standard correlation banking book and trading book in Pillar 1 and Pillar 2.
m atrix with a stressed one are two exam ples. Source: Cuenot et. al. (2006).
et. al. (2006). This approach is therefore often referred to as a consisting of m arket, credit and operational risk; these risk
non-integrated approach to risk m easurem ent. An integrated categories are too intertwined in a modern financial institution
approach would, by contrast, calculate capital for all the risks to possibly separate in a meaningful way. In short, we cannot
borne by a bank sim ultaneously in one single step and account construct a subportfolio of risk factors. It is therefore incorrect to
ing for possible correlations and interactions, as opposed to think of the banking book as a subportfolio of the overall bank
adding up com partm entalised risk calculations. portfolio for which only credit risk is relevant. It is also incorrect
to view the trading book as another subportfolio related solely
Pressure to reconsider the regulatory com partm entalised
to m arket risk.
approach came mainly from the financial industry, where it has
been frequently argued that a procedure that simply adds up A sim ple way to summarise this argum ent is to consider a port
econom ic capital estim ates across portfolios ignores diversifica folio of loans. The interest rate risk related to such a portfolio is
tion benefits. These alleged benefits have been estim ated to be usually counted as a m arket risk, and this risk affects the bank's
between 10 and 30% for banks (see Brockmann and Kalkbrener refinancing costs and the revaluation of these loans. If the inter
( 2010 ) ) . est rate risk is borne by the creditors in some way, this market
risk suddenly may transform into a credit risk for the bank. So,
Capital diversification arguments and estim ates of potential
do assets with a value that fluctuates with interest rates belong
capital savings are partially supported in the academ ic literature.
in a subportfolio for market risk or in a subportfolio of credit
More recently this view and the estim ates have been fundam en
risk? They clearly belong to both, because each loan has a
tally challenged by the Basel Com m ittee (Basel Com m ittee on
m arket risk com ponent as well as a credit risk com ponent sim ul
Banking Supervision (2009)) and by Breuer et. al. (2010a). These
taneously. Trading book positions with counterparty risks or
papers have pointed out that nonlinear interaction between
positions related to carry trades fall into the same category.
risk categories may even lead to com pounding effects. This fact
questions w hether the com partm entalised approach will in gen Breuer et. al. (2010a) consider portfolios of foreign currency
eral give a conservative and prudent upper bound for econom ic loans, which are loans denom inated in a foreign currency
capital. extended to dom estic creditors with income in dom estic cur
rency. The credit risk in these portfolios is always a function of
Is this a merely academ ic debate or does it have practical impli
the m arket risk (i.e., exchange rate m ovem ents), and the risk
cations for reform considerations related to the trading book?
of each position in a foreign currency loan portfolio has simul
In this section, we survey the main argum ents and give a brief
taneously a credit and a m arket risk com ponent. Adding up
review of the main papers and their findings. W e then discuss
capital and hoping for an upper bound amounts to ignoring
policy im plications that might be relevant for a discussion
possible "m align risk interactions" as they are called in Breuer
related to potential future reform related to the trading book.
et. al. (2010a). This issue has been known in the m arket risk
literature for a long tim e as "wrong way risk." Wrong way risk
Aggregation of Risk: Diversification is the risk arising from the problem that the value of a trading
versus Compounding Effects position is inversely correlated with the default risk of some
counterparty.
Diversification is a term from portfolio theory referring to the
mix of a variety of investm ents within a portfolio. Since differ From these exam ples, we see that a formation of subportfolios
ent investm ents will develop differently in the future with value along the lines of risk factors— and for that m atter across bank
losses in some investm ent offset by value gains in another ing and trading books— is usually not possible. Breuer et. al.
investm ent, the overall portfolio risk is reduced through the (2010a) indeed show that the ability to form subportfolios along
spreading of risk. In a similar way, the assets of a bank can be the lines of risk categories is a sufficient condition for diversifica
thought of as an overall portfolio that can be divided into sub tion effects to occur. Since we can in general not form such sub
portfolios. If risk analysis is done by looking at risk measures at portfolios, we must anticipate the possibility that there can be
the level of the subportfolios and the risk measures are added risk com pounding effects between the banking and the trading
up, the intuition of diversification suggests that we should arrive book. In short, while the intuition of diversification is inviting, it
at a conservative risk measure for the bank as a whole. does not apply to the interaction of banking and trading books
since there are in general no pure subportfolios of m arket, credit
So, what is wrong with this straightforward intuition about diver
or operational risks.
sification between m arket, credit and other risk categories? The
flaw in the intuition lies in the fact that it is usually not possible This insight is im portant because it dem onstrates that "d iversi
to divide the overall portfolio of a bank into subportfolios purely fication effects" that are derived from papers using a so-called
"top-dow n" approach are often assuming what they want to The study by Jo b st, Mitra and Zenios (2006) provides some anal
derive. By construction, the assumption of splitting up the ysis along these lines. The authors construct a simulation model,
bank portfolio into subportfolios according to m arket, credit based on Jo b st and Zenios (2001), in which the risk underlying
and operational risk assumes that this can indeed be done. the future value of a bond portfolio is decom posed into:
If such a split were possible, it follows from the results in
• the risk of a borrower's rating change (including default);
Breuer et. al. (2010a) that diversification effects must occur
• the risk that credit spreads will change; and
necessarily.
• the risk that risk-free interest rates will change.
To estim ate the quantitative dimension of the problem , we
therefore must focus on papers working with a "bottom -up" Note that the first item is more narrowly defined to represent
approach. We also need to exam ine the results of papers based the portfolio's credit risk, while the last item is more narrowly
on the "top-dow n" approach that assumes risk separability at defined to represent the portfolio's market risk. However, the
the beginning of the analysis. In this section, we survey several middle item is sensitive to both risks and challenges the notion
key papers that use either of these risk aggregation methods. that market and credit risk can be readily separated in this analy
A s part of this literature survey, we provide a summary of recent sis. The authors use portfolios of US corporate bonds and one-
papers that estim ate the range and magnitude of these differ year VaR and CVaR risk measures at the 95% , 99% and 99.9%
ences between com partm entalised and unified risk measures. confidence levels for their analysis.
O ur proposed measure is a simple ratio of these two m easures, In their analysis, the authors generate risk measures under three
as used in other papers, such as Breuer et. al. (2010a). In that sets of assum ptions. To concentrate on the pure credit risk con
paper, the authors adopt the term "inter-risk diversification tributions to portfolio losses, they sim ulate only rating migration
index" for the ratio; see also the related measure in Alessandri and default events as well as recovery rates, while assuming that
and Drehmann (2010). Ratio values greater than one indicate future interest rates and credit spreads are determ inistic. The
risk com pounding, and values less than one indicate risk diver authors then allow future credit spreads to be stochastically
sification. In the summary tables later in this chapter, we list the determ ined, and finally, they allow future interest rates to be
various papers, the portfolio analysed, the risk measures used, stochastically determ ined. Note that the latter case provides an
the horizon over which the risks are m easured, and these risk integrated or unified risk m easurem ent, according to our defini
ratios. tion for this survey.13
The authors' results are quite strong regarding the m agnitude of

Papers Using the "Bottom-Up" Approach the risk m easures across risk types and credit ratings. For A A A
As m entioned above, a common assumption of most current risk rated bonds, the authors find that the unified risk measures at
m easurem ent models is that market and credit risks are separa all three tail percentiles are on the order o fte n tim es the pure
ble and can be addressed independently. Yet, as noted as early credit risk m easures, since highly-rated bonds are unlikely to
as Jarro w and Turnbull (2000), econom ic theory clearly does not default. As the credit quality of the portfolio declines, the ratio
support this simplifying assumption. between the unified risk measures and the risk measures for
pure credit risk drops to just above one for C-rated bonds.
W hile the reasons behind this common assumption are mostly
operational in nature, some studies have used numerical sim ula Table 6.1 presents a short summary of several papers for which
tion techniques to generate results. For exam ple, Barnhill and we can directly exam ine the ratio of unified to com partm ental
Maxwell (2002) exam ine the econom ic value of a portfolio of ised risk measures for bottom-up m odels. As m entioned earlier,
risky fixed income securities, which they define as a function of the recent work of Breuer et. al. (2008, 2010a) provides a lead
changes in the risk-free interest rate, bond spreads, exchange ing exam ple of how m arket and credit risk cannot be readily
rates, and the credit quality of the bond issuers. They develop separated in a portfolio, a fact that com plicates risk m easure
a numerical simulation m ethodology for assessing the VaR of ment and works to undermine the sim ple assum ptions underling
such a portfolio when all of these risks are correlated. Barnhill additive risk m easures.
et. al. (2000) use this m ethodology to exam ine capital ratios for
a representative South African bank. However, in these studies,
the authors do not exam ine the differing values of their chosen
risk m easures using a unified risk m easurem ent approach versus 13 Note, however, that the authors do not conduct an analysis of a mar
ket risk scenario (i.e., deterministic ratings and stochastic credit spreads
a com partm entalised approach that sums the independent risk and interest rates). Thus, we cannot examine their ratio of unified to
measures. compartmentalised risk measures as discussed above.
In Breuer et. al. (2010a), the authors present analysis of hypo In contrast to this work, the paper by Grundke (2005) lays out a
thetical loan portfolios for which the im pact of m arket and bottom-up model that assumes the separability of interest rate
credit risk fluctuations are not linearly separable. They argue risk (i.e., market risk) and credit spread risk (i.e., credit risk). The
that changes in aggregate portfolio value caused by market author exam ines a calibrated multi-factor credit risk model that
and credit risk fluctuations in isolation should sum up to the accom m odates various asset value correlations, correlations
integrated change incorporating all risk interactions very rarely. between credit spreads and other model factors, and distribu
The m agnitude and direction of the discrepancy between these tional assumptions for innovations. The author exam ines hypo
two types of risk assessm ents can vary broadly. For exam ple, thetical loan portfolios of varying credit quality over a three-year
the authors exam ine a portfolio of foreign currency loans for horizon, both with and without the joint modelling of interest
which exchange rate fluctuations (i.e., m arket risk) affect the rates and credit spreads. To assess the joint im pact of interest
size of loan paym ents and hence the ability of the borrowers to rate and credit risk, the author uses forward m arket interest
repay the loan (i.e., credit risk). For their em pirically calibrated rates instead of separate interest rate and credit spread pro
exam ple, they use expected shortfall at various tail percentiles cesses. Interestingly, the reported VaR measures at various tail
as their risk measure and exam ine portfolios of B B B + and B + percentiles lead to ratios of unified VaR measures to summed
rated loans. Their analysis shows that changes in m arket and VaR measures that range widely from near zero to one, which
credit risks can cause com pounding losses such that the sum of seem s to be due mainly to the separability of the interest rate
value changes from the individual risk factors are sm aller than risk (i.e., market risk) and credit spread risk (i.e., credit risk) in
the value change due to accounting for integrated risk factors. the model.
In particular, their reported inter-risk diversification index Kupiec (2007) proposes a single-factor, migration-style credit risk
for expected shortfall increased sharply as the tail quantile model that accounts for m arket risk. This modelling approach
decreased, which suggests that the sum of the two separate risk generates a portfolio loss distribution that accounts for the non-
measures becom es much less useful as an approxim ation of the diversifiable elem ents of the interactions between m arket and
total integrated risk in the portfolio as we go further into the tail. credit risks. The integrated exposure distribution of the model is
These index values also increase for all but the most extrem e tail used to exam ine capital allocations at various thresholds. These
percentiles as the original loan rating is lowered. The authors integrated capital allocations are com pared to the separated
argue that this exam ple presents evidence of a "m align inter assessm ents. The results show that capital allocations derived
action of m arket and credit risk which cannot be captured by from a unified risk measure im portantly alter the estim ates of
providing separately for market risk and credit risk capital." The the minimum capital needed to achieve a given target solvency
authors show a similar qualitative outcom e for dom estic currency margin. The capital amount could be larger or sm aller than capi
loans (i.e., loans for which default probability are simply a func tal allocations estim ated from com partm entalised risk measures.
tion of interest rates), although the index values are much lower. Regarding specifically the Basel II A IRB approach, the author
argues that the results show that no further diversification
In Breuer et. al. (2008), the authors use a similar analytical fram e
benefit is needed for banking book positions since no market
work to exam ine variable rate loans in which the interaction
risk capital is required. Thus, Basel II A IRB capital requirem ents
between market and credit risk can be analysed. In particular,
fall significantly short of the capital required by a unified risk
they model the dependence of credit risk factors— such as the
measure.
loans' default probabilities (PD), exposure at default (EAD ), and
loss-given-default (LGD)— on the interest rate environment. A key Num erically speaking, the risk measure used in this study is
risk of variable rate loans is the danger of increased defaults trig the am ount of capital that the unified and the com partm en
gered by adverse rate moves. For these loans, market and credit talised capital approaches generate as the appropriate value
risk factors cannot be readily separated, and their individual risk to assure funding costs of a certain m agnitude calibrated to
measures cannot be readily aggregated back to a unified risk historical funding rates for specific credit ratings. The hypo
measure. They conduct a simulation study based on portfolios thetical portfolios of interest are corporate loans with various
of 100 loans of equal size by borrowers rated B+ or B BB + over rating categories represented in proportion to historical data.
a one-year horizon using the expected shortfall measure at vari The author exam ines a wide variety of alternative separated
ous tail percentiles. They find that the ratio of unified expected approaches with which to calculate econom ic capital m easures,
shortfall to the sum of the separate expected shortfalls is slightly ranging from three different alternative credit risk models to
greater than one, suggesting that risk compounding effects can several methods for measuring market risk. Correspondingly,
occur. Furtherm ore, these compounding effects are more pro the range of inter-risk diversification index values is quite wide
nounced for lower-rated loans and higher loan-to-value ratios. for the A AA - and BBB-rated portfolios, ranging from about 0.60
to alm ost 4.00. In summary, the author's capital calculations risk in the banking book; i.e ., where all exposures are held to
show that capital allocations derived from a unified m arket and maturity. Note that they explicitly exam ine repricing mismatches
credit risk measure can be larger or sm aller than capital alloca (and thus market and credit risks) that typically arise between a
tions that are estim ated from aggregated com partm entalised bank's assets and liabilities.
risk m easures. For a hypothetical, average UK bank with exposures to only
The studies discussed above exam ine the different risk im pli the UK and US, they find that the difference between aggre
cations of a unified risk m easurem ent approach relative to a gated and unified econom ic capital levels is often significant
com partm entalised approach for specific portfolios. In contrast, but depends on various bank features, such as the granularity
Drehmann et. al. (2010) exam ine a hypothetical bank calibrated of assets, the funding structure or bank pricing behaviour. They
to be representative of the UK banking system as a whole. derive capital for the banking book over a one year horizon.
W ithin their analytical fram ework, they do not explicitly assume For credit and interest rate risk, they define unexpected losses
that m arket and credit risk are separable. The authors decom and thus econom ic capital as the difference between VaR at the
pose the total risk in their bank scenario analysis into: specified 99% confidence level and expected losses. Note that
their measures of econom ic capital for just credit risk and just
• the im pact of credit risk from non-interest rate factors,
interest rate risk do not fully disentangle these risks as the credit
• the im pact of interest rate risk (excluding the effect of risk measure incorporates the effects of higher interest rates on
changes in interest rates on credit risk), and default probabilities and the latter the effect of higher credit
• the im pact of the interaction of credit risk and interest risk on incom e. The key point is that the fram ework represents a
rate risk. plausible description of how current capital models for the bank
ing book capture these risks.
The latter is calculated as the difference between the total
im pact of the scenario shock and the sum of the first two The authors exam ine the ratio of unified economic capital to the
com ponents. sum of the com ponent measures at three VaR quantiles. For the
95th percentile of portfolio losses, unified capital measure is near
Their simulations confirm that interest rate risk and credit risk
zero, and thus the ratio is nearly zero as well. For the 99th per
must be assessed jointly for the whole portfolio to gauge overall
centile, the ratio is quite small at 0.03, but the ratio rises quickly
risk correctly. In particular, the authors find in their simulations
to just over 50% for the 99.9th percentile. Note, however, that
that if banks gauged credit risk by solely monitoring their w rite
this result still suggests that the com partm entalised approach is
offs, aggregate risk would be underestim ated in the short term
more conservative than the unified approach. The authors exam
since a rate increase would also lower its net interest income
ine certain modifications of their assumptions— such as infinitely
and profits. Correspondingly, the bank's aggregate risk would
fine-grained portfolios to increase the correlation of portfolio
be overestim ated in the long run as net interest income and
credit risk with the macroeconomic factors, banking funding sce
profits recover while write-offs continue to rise.
narios from all short-term debt that is frequently repriced to all
Their main variable of interest is net profits over tw elve quar long-term debt that is repriced only on a yearly basis— and find
ters after their m acroeconom ic stress scenario hits their repre some interesting difference with the base case scenario. How
sentative bank, although they also report separate measures ever, the lower integrated capital charge holds.
of write-offs and net interest incom e. They report that the
On balance, these authors conclude that the bank's capital is
interaction between interest rate and credit risk accounts for
mismeasured if risk interdependencies are ignored. In particular,
about 60% of the decline in capital adequacy for their cali
the addition of econom ic capital for interest rate and credit risk
brated bank. W hile the decline in capital adequacy does not
derived separately provides an upper bound relative to the inte
perfectly match our other risk m easures, we can still think of
grated capital level. Two key factors determ ine this outcom e.
the diversification index here as the ratio of the capital decline
First, the credit risk in this bank is largely idiosyncratic and thus
for the unified risk fram ework relative to the capital decline
less dependent on the m acroeconom ic environm ent; and sec
that would come from separate identification of m arket and
ond, bank assets that are frequently repriced lead to a reduc
credit risks. Given their reported num bers, that ratio here is
tion in bank risk. Given that these conditions may be viewed as
100% /(100% — 60%) = 2.5, which suggests a very clear contri
special cases, the authors recommend that "A s a consequence,
bution of this interaction to risk m anagem ent concerns.
risk m anagers and regulators should work on the presumption
Following up on the work of Drehmann et. al. (2010), Ales- that interactions between risk types may be such that the overall
sandri and Drehmann (2010) develop an integrated econom ic level of capital is higher than the sum of capital derived from
capital model that jointly accounts for credit and interest rate risks independently."
Papers Using the "Top-Down" Approach param eters to arrive at a range of risk aggregation and diversifi
cation results for a financial conglom erate. Based on survey data
An alternative method for determ ining total firm risk, primarily for Dutch banks on the correlations between losses within sp e
for enterprise-wide risk m anagem ent, is to aggregate risks cal cific risk categories, their calculations of econom ic capital at the
culated for different business lines or different risk types using 99.9% level is lower for the unified, firm-level calculation than
so-called "top-dow n" approaches. An im portant difference is for the sum of the risk-specific, com partm entalised calculations.
that top-down approaches always reference an institution as The ratio of these two quantities ranges from 0.72 through 0.85
a whole, whereas bottom-up approaches can range from the based on correlation assumptions across m arket, credit and
portfolio level up to an institutional level. With respect to mar operational risk.
ket and credit risk, the top-down approach explicitly assumes
Rosenberg and Schuermann (2006) conduct a more detailed,
that the risks are separable and can be aggregated in some
top-down analysis of a representative large, internationally
way. As outlined by Cuenot et. al. (2006), firms may com pute
active bank that uses copulas to construct the joint distribution
their m arket and credit risk capital separately and aggregate the
of losses. The copula technique com bines the marginal loss dis
two risk types by imposing some form of correlation between
tributions for different business lines or risk types into a joint dis
them . The top-down approach thus does not require a com
tribution for all risk types and takes account of the interactions
mon scenario across risk types, but because the correct form of
across risk types based on assum ptions. Using a copula, para
aggregation is not known, the approach "loses the advantages
m etric or nonparam etric marginals with different tail shapes can
of logical coherence." In addition, as suggested by Breuer et.
be com bined into a joint risk distribution that can span a range
al. (2008, 2010a), the assumption of separable risk will generally
of dependence types beyond correlation, such as tail depen
prevent the ability to gauge the degree of risk com pounding
dence. The aggregation of m arket, credit and operational risk
that might be present and instead typically provide support for
requires knowledge of the marginal distributions of the risk
risk diversification.
com ponents as well as their relative w eights. Rosenberg and
The literature is unclear on w hether the combination of finan Schuermann assign inter-risk correlations and specify a copula,
cial business lines within one organisation leads to an increase such as the Student-t copula, which captures tail dependence as
or decrease in risk. The literature as surveyed by Saunders and a function of the degrees of freedom . They impose correlations
W alters (1994) and Stiroh (2004) suggests mixed results. How of 50% for m arket and credit risk, and 20% for the other two
ever, as surveyed by Kuritzkes, Schuermann and W einer (2003), correlations with operational risk; all based on triangulation with
several studies, including their own, suggest that reductions in existing studies and surveys.14
econom ic capital arise from the combination of banking and
Rosenberg and Schuermann find several interesting results, such
insurance firms. The papers surveyed in Table 6.2 and below find
as that changing the inter-risk correlation between m arket and
this result as well for various risk com binations at the firm level.
credit risk has a relatively small im pact on total risk com pared
For exam ple, Dimakos and Aas (2004) decom pose the joint risk to changes in the correlation of operational risk with the other
distribution for a Norwegian bank with an insurance subsidiary risk types. The authors exam ine the sensitivity of their risk esti
into a set of conditional probabilities and impose sufficient con mates to business mix, dependence structure, risk m easure, and
ditional independence that only pair-wise dependence remains; estimation m ethod. O verall, they find that "assum ptions about
the total risk is then just the sum of the conditional marginals operational exposures and correlations are much more impor
(plus the unconditional credit risk, which serves as their anchor). tant for accurate risk estim ates than assum ptions about relative
Their simulations indicate that total risk measured using near m arket and credit exposures or correlations." Com paring their
tails (95% -99% ) is about 10%—12% less than the sum of the VaR measures for the 0.1% tail to the sum of the three different
individual risks. In term s of our proposed ratio, the value ranges VaR measures for the three risk types, they find diversification
from 0.88 to 0.90. Using the far tail (99.97% ), they find that total benefits in all cases. For our benchm ark measure of the ratio
risk is often overestim ated by more than 20% using the additive between the unified risk measure and the com partm entalised
m ethod. In term s of our proposed ratio of unified risk measure risk m easure, their results suggest values ranging from 0.42 to
to the sum of the com partm entalised risk m easures, its value 0.89. They found similar results when the expected shortfall (ES)
would be 0.80. measure was used.
Similarly, Kuritzkes et. al. (2003) exam ine the unified risk pro
file of a "typical banking-insurance conglom erate" using the
14 Note that different correlation values could lead to risk compounding,
simplifying assumption of joint normality across the risk types, but it is not clear what those values might be and what values would be
which allows for a closed-form solution. They use a broad set of implied by the bottom-up exercises discussed here.
Note that the authors state that the sum of the separate risk of com partm entalised measures may not be conservative and,
measures is always the most conservative and overestim ates in fact, may understate the total risk. Such an outcom e would
risk, "since it fixes the correlation m atrix at unity, when in fact clearly be undesirable as the necessary amount of capital could
the em pirical correlations are much lower." W hile the state be underestim ated by a significant margin.
ment of imposing unit correlation is m athem atically correct, it is
These conclusions seem to directly question w hether separate
based on the assumption that the risk categories can be linearly
capital requirem ents for the trading and banking books provide
separated. If that assumption were not correct, as suggested
a reasonable path to setting the appropriate level of capital for
by papers cited above, the linear correlations could actually be
the entire firm. If we retained the different capital treatm ents,
greater than one and lead to risk com pounding.
attem pts could be made to fully detail each type of risk within
Finally, Kuritzkes and Schuermann (2007) exam ine the distribu each book, and the subsequent aggregation might then be
tion of earnings volatility for US bank holding com panies with considered conservative. However, performing such an analysis
at least USD 1 billion in assets over the period from 1986.Q 2 to within the current and traditional separation between a trad
2 0 0 5 .Q 1 ; specially, they exam ine the 99.9% tail of this distribu ing and a banking book would require im portant changes in
tion. Using a decom position m ethodology based on the defini operational procedures. An alternative approach might be to
tion of net incom e, the authors find that market risk accounts develop a system of book keeping and risk allocation that does
for just 5% of total risk at the 99.9% level, while operational risk not artificially assign positions into different books when its risk
accounts for 12% of total risk. Using their risk measure of the characteristics are interrelated.
lower tail of the earnings distribution, as measured by the return
on risk-weighted assets, their calculations suggest that the ratio
of the integrated risk measure to the sum of the disaggregated
6.7 RISK M ANAGEM ENT AND
risk m easures ranges from 0.53 through 0.63. VALUE-AT-RISK IN A SYSTEM IC
CO N TEXT
Conclusions
Overview
Academ ic studies have generally found that at a high level of
aggregation, such as at the holding com pany level, the ratio of In this section, we survey the research literature on the sys
the risk measures for the unified approach to that of the sep a tem ic consequences of individual risk m anagem ent system s
rated approach is often less than one, i.e., risk diversification and regulatory capital charges that rely on them . A t the time
is prevalent and ignored by the separated approach. However, when the Basel Com m ittee im plem ented the M RA in 1996, risk
this approach often assumes that diversification is present. A t m anagem ent and banking regulation still was a subject that
a lower level of aggregation, such as at the portfolio level, this had received relatively little attention in the academ ic litera
ratio is also often found to be less than one, but im portant ture. Perhaps the most im portant change brought to the Basel
exam ples arise in which risk com pounding (i.e., a ratio greater fram ework by the M RA was the ability for banks to use their
than one) is found. These results suggest, at a minimum, that own quantitative risk models for determ ining the capital require
the assumption of risk diversification cannot be applied without ments for m arket risk.
questioning, especially for portfolios subject to both market and
Both conceptually and procedurally, this am endm ent was a
credit risk, regardless of where they reside on the balance sheet.
significant departure from the previous regulatory approaches
Recent literature on the system ic im plications of the current reg to determ ine bank capital. The conceptual innovation was that
ulatory capital requirem ents that aggregate capital requirem ents the notion of risk on which the new regulation relied was much
across risk types suggests that this com partm entalised approach closer to the notions of risk that were in use in the financial,
can— at least in general— be argued to contribute to the am plifi econom ic and statistical research literature. Procedurally the
cation of system ic risk, which is counter to its intentions. am endm ent am ounted to an official recognition that financial
institutions them selves are in the best positions to assess their
In term s of policy im plications, the academ ic literature sug
risk exposures. The new regulatory approach seem ed to suggest
gests that if we are able to divide risk types easily across the
that using and relying on this knowledge might be the best way
trading book and the banking book (as is assumed in the top-
to cope with m ethodological problem s of risk assessm ent in a
down studies), diversification benefits appear to be certain,
rapidly changing econom ic environm ent.
and aggregation of capital requirem ents across the books is
conservative. However, recent studies have shown that if this A t the time of the am endm ent and in the years after, the aca
risk separation cannot be done com pletely, sim ple aggregation dem ic literature on risk m anagem ent and regulation largely
accepted the conceptual reasoning behind the am endm ent and however, that some of the new regulatory initiatives will likely
confined itself mostly to developing the technology of quantita dampen procyclical effects in the future. The stre sse d VaR intro
tive risk m anagem ent itself. The discussion in the econom ics duced by the Ju ly 2009 revisions of the M arket Risk Fram ework
community remained sparse and largely sceptical. is a case in point: its calculation is based on estim ates from bad
historical periods of the econom y and so acts rather "through
Hellwig (1995, 1996) raised several im portant issues related
the cycle." Adm ittedly, the stressed VaR is only one addend of
to this new regulatory approach that did not take hold very
total trading book capital.
much in the regulatory community but sound very modern in
the current debate about the recent financial crises: Hellwig Although the literature for this section generally refers to VaR as
discussed incentive problem s. Banks may find it desirable to the risk measure at issue, it is im portant to bear in mind that the
bias their model developm ent towards the goal of minimising term VaR should be interpreted here in a wide sense since the
capital. With hindsight, we know that the practice of determ in results generally do not depend on this specific risk measure.
ing capital based on VaR models helped large and international
In the following we give a brief outline of the main arguments
active banks to reduce greatly the amount of capital to be held
and explain the boom and bust am plification mechanism identi
against any given asset during the pre-crisis boom years. He also
fied in this literature. We then go through some of the policy
pointed out the difficulties related to using statistical techniques
conclusions suggested by this analysis.
which work under the assumption of a stationary world in a non-
stationary environm ent like financial m arkets. He also criticised
the separation between m arket and credit risk while he acknowl Intermediation, Leverage and
edged that quantitative models of integrated risk m easurem ent Value-at-Risk: Empirical Evidence
are subject to the general problem s outlined above.
Adrian and Shin (2010) em pirically investigated the relationship
During the discussion of the new Basel II fram ework, in May between leverage and balance sheet size of the five major US
2001, a group of academ ics at the Financial M arkets Group
investm ent banks shortly before the financial crises. All these
(FM G) of the London School of Econom ics wrote a paper that
institutions meanwhile left the broker-dealer sector, either
raised a concern with respect to the use of value-at-risk that is
because they were taken over or went bankrupt or were con
more fundam ental.15 In the report's executive summary, there is
verted to bank holding com panies. A major reason why these
a conclusion that calls into question the conceptual construction institutions are particularly interesting is because they all show
of the 1996 am endm ent: "The proposed regulations fail to con
a very clear picture of how financial interm ediation works in a
sider the fact that risk is endogenous. Value-at-risk can destabi capital m arkets-based financial system with active balance sheet
lise and induce crashes when they would not otherwise occur." m anagem ent through risk m anagem ent system s.
In the current practice of risk m anagem ent and regulation, these W hen an interm ediary actively m anages its balance sheet,
conclusions so far have only partly lead to a serious reconsidera
leverage becom es procyclical because risk models and eco
tion of the fram ework initiated and extended more than a nomic capital require balance sheet adjustm ents as a response
decade ago. In the current regulatory discussion, the general
to changes in financial m arket prices and measured risks. This
view seem s to be that the conclusions from the financial crisis relationship follows from sim ple balance sheet m echanics. The
call for suitable expansions and am endm ents to the prevailing following exam ple is taken from Shin (2008a, pp. 24 ff.) Assum e
fram ework. In the m eantim e, the conclusions derived in the
a balance sheet is given with 100 in assets and a liability side
FM G paper have received more substantive underpinnings from which consists of 90 in debt claims and 10 in equity shares.
academ ic research, both em pirically and theoretically. The
Leverage is defined as the ratio of total assets to equity, 10 in
papers of Adrian and Shin (2008), the book of Shin (2008a) and our exam ple. If we assume more generally that the market value
joint work by Danielsson, Shin and Zigrand (2009) suggest that
of assets is A and make the simplifying assumption that the
the use of value-at-risk models in regulation intended to func value of debt stays roughly constant at 90 for small changes in
tion as a "fire extinguisher," function in practice rather like a A , we see that total leverage is given by:
"fire a cce le ra n t."16 Rather than suggesting improving the VaR-
based capital regulations by various refinem ents and am end
A - 90
ments to the concepts in place, this literature suggests to
abandon this approach and remove a VaR-based capital require Leverage is thus related inversely to the m arket value of total
ment from the regulatory fram ework. It should not be ignored, assets. W hen net worth increases, because A is rising, leverage
15 See Danielsson et. al. (2001). 16 See Hellwig (2009).
goes down, when net worth decreases, because A is falling, where A is the proportion of capital to be held per total value-
leverage increases. at-risk. This proportion may vary with tim e. Leverage is thus
Consider now what happens if an interm ediary actively manages A _ 1 A

its balance sheet to maintain a constant leverage of 10. If asset “ K ~ A X VaR
prices rise by 1%, the bank can take on an additional amount
Since VaR per value of assets is countercyclical, it directly follows
of 9 in debt, its assets have grown to 110, its equity is 11, and
that leverage is procyclical as the data in Adrian and Shin (2008)
the debt is 99. If asset values shrink by 1%, leverage rises. The 1 7
indeed show.
bank can adjust its leverage by selling securities worth 9 and pay
down a value 9 of debt to bring the balance sheet back to the The system ic consequences of this built-in risk limiting technol
targeted leverage ratio. ogy at the level of individual institutions works in the aggregate
as an am plifier of financial boom and bust cycles. The m echa
This kind of behaviour leads to a destabilising feedback loop,
nism by which the system ic am plification works is risk perception
because it induces an increase in asset purchases as asset prices
and the pricing of risk, even if all network effects and com plex
are rising and a sale o f assets when prices are falling. W hereas 1Q
interconnectedness patterns in the financial system are absent.
the textbook m arket mechanism is self stabilising because the
reaction to a price increase is a reduction in quantity dem anded Consider interm ediaries who run a VaR-based risk m anagem ent
and an expansion in quantity supplied, and to a price decreases system and start with a balance sheet consisting of risk-free
an expansion in quantity dem anded and a contraction in quan debt and equity. Now an asset boom takes place, leading to an
tity supplied, active balance sheet m anagem ent reverses this expansion in the values of securities. Since debt was risk-free
self stabilising mechanism into a destabilising positive feedback to begin with, without any balance sheet adjustm ent, this leads
loop. to a pure expansion in equity. The VaR constraint is relaxed

through the asset boom and creates new balance sheet capacity
Adrian and Shin (2010) docum ent this positive relationship
to take on more risky securities or increase its debt. The boom
between total assets and leverage for all of the (former) big Wall
gets am plified by the portfolio decisions of the leveraged bank
Street investm ent banks. Furtherm ore, they produce econom et
ing system .
ric evidence that the balance sheet adjustm ents brought about
by active risk m anagem ent of financial institutions indeed has Put differently, in a system of investors driven by a VaR con
an im pact on risk premiums and aggregate volatility in financial straint, investors' demand follows and am plifies the most recent
m arkets. price changes in the financial m arket. Price increases and bal
ance sheet effects becom e intertwined through the active VaR-
driven risk m anagem ent of financial institutions.
What Has All This to Do with VaR-Based
O f course, the described mechanism also works on the way
Regulation? down. A negative shock drives down m arket values, tighten
ing the VaR constraints of leveraged investors. These investors
W hy would a bank target a constant leverage and what is the
have to sell assets to reduce leverage to the new VaR constraint.
role of value-at-risk in all of this? The book of Shin (2008a) and
By hardwiring VaR-driven capital m anagem ent in banking
the papers by Shin (2008b) and Adrian and Shin (2008) as well as
regulation, a positive feedback loop with potent destabilising
by Danielsson, Shin and Zigrand (2009) explore this role in more
force both in booms and busts has been built into the financial
detail.
system .
If we consider the future value of bank assets A as a random
The mechanisms described in this section have been theoreti
variable, the value-at-risk (VaR) at a confidence level c is
cally analysed in Shin (2008a), Danielsson, Shin, Zigrand (2009)
defined by
theoretically and with explicit reference to value-at-risk. They are
pr(A < A 0 - VaR) < 1 - c also central in the work of Geanakoplos (2009), although there
the connection with VaR is not made explicit.1
8
7
The VaR is equal to the equity capital the firm must hold to be
solvent with probability c. The econom ic capital is tied to the
overall value-at-risk. 17 This formal derivation of the procyclicality of VaR is taken directly from
Shin (2008a).
If a bank adjusts its balance sheet to target a ratio of value-at-
18 For this point, see also Geanakoplos (2009), who has shown in a
risk to econom ic capital then bank capital to m eet VaR is theoretical model how risk-free debt may nevertheless give rise to fluc
tuations in leverage and risk pricing and thus create systemic spillover
K = A X VaR, effects.
Conclusions bank capital but it has yet to develop an alternative approach
that sim ultaneously satisfies all the (som etimes conflicting) regu
A literature stream on the system ic consequences of individual latory policy objectives.
risk m anagem ent system s as the basis of regulatory capital
charges has found that the mechanical link between measured
risks derived from risk models and historical data and regulatory
References
capital charges can work as a system ic am plifier of boom and
A cerb i, C (2002): "Spectral measures of risk: a coherent
bust cycles.
representation of subjective risk aversion," Jo u rn al o f Banking
The central mechanism that leads to this feedback loop works and Finance, vol 26, no 7, pp 1505-1518.
through the pricing of risk. In good tim es, when measured risks
A cerb i, C and G Scandolo (2008): "Liquidity risk theory and
look benign, a financial institution that targets a regulatory
coherent measures of risk," Q uantitative Finance, vol 8, no 7,
capital requirem ent as a function of a m odel-based risk measure
pp 681-692.
has slack capacity in its balance sheet that it can either use to
buy additional risky assets or to increase its debt. This means A cerb i, C and D Tasche (2002): "O n the coherence of
that we have a mechanism where institutions are buying more expected shortfall," Journal o f Banking and Finance, vol 26,
risky assets when the price of these assets is rising and where pp 1487-1503.
they are buying less of these assets when prices are falling. The Adrian, T and H S Shin (2010): "Liquidity and leverage," Journal
stabilising properties of the m arket mechanism are turned on o f Financial Interm ediation, vol 19, no 3, pp 418-437.
their head. By this m echanic link of measured risk to regulatory
-------- (2008), "Financial interm ediary leverage and value at
capital a powerful am plifier of booms and busts is created at
risk," Federal Reserve Bank of New York, staff report no 338.
the system level counteracting the intention of the regulation to
make the system as a whole safer. A lessandri, P and M Drehmann (2010): "An econom ic capital
model integrating credit and interest rate risk in the banking
It is im portant to recognise that while the current system may
book," Journal o f Banking and Finance, vol 34, pp 730-742.
im plem ent a set of rules that limit the risk taken at the level of
individual institutions, the system may also enable institutions Alexander, C and E Sheedy (2008): "D eveloping a stress testing
to take on more risk when tim es are good and thereby lay the fram ework based on m arket risk m odels," Journal o f Banking
foundations for a subsequent crisis. The very actions that are and Finance, vol 32, no 10, pp 2220-2236.
intended to make the system safer may have the potential to Alm gren, R and N Chriss (2001): "O ptim al execution of portfolio
generate system ic risk in the system. transactions," Journal o f Risk, vol 3, pp 5-39.
These results question a regulatory approach that accepts Am ihud, Y (2002): "Illiquidity and stock returns: cross-section
industry risk models as an input to determ ine regulatory capital and tim e-series effects," Journal o f Financial M arkets, pp 31-56.
charges. This critique applies in particular to the use of VaR to
A ragones, J , C Blanco and K Dowd (2001): "Incorporating
determ ine regulatory capital for the trading book but it ques
stress tests into m arket risk m odeling," D erivatives Q uarterly,
tions also an overall trend in recent regulation.
pp 44-49.
The amplifying mechanism identified in this section will be at
Aram onte, S, M Rodriguez and J Wu (2010): "Portfolio value-at-
work no m atter how sophisticated VaR becom es, w hether it is
risk: a dynamic factor approach," Federal Reserve Board.
replaced by more sophisticated risk m easures, like expected
shortfall, or w hether it goes beyond the naive categorisation Artzner, P F, J D elbaen, J Eber and D Heath (1999): "C oherent
of risk classes (m arket, credit and operational) towards a more measures of risk," M athem atical Finance, 203-228.
integrated risk m easurem ent. These changes generally do not Bakshi, G and G Panayotov (2010): "First-passage probability,
address the problem s raised by the papers reviewed in this jum p m odels, and intra-horizon risk," Journal o f Financial
section. One exception is the stre sse d VaR introduced in July Econom ics, vol 95, pp 20-40.
2009. This new com ponent of trading book capital acts more
Balaban, E, J O uenniche and D Politou (2005), "A note on return
"through the cycle" than the "norm al" VaR. Still some argue
distribution of UK stock indices," A p p lie d Econom ics Letters, vol
that what may be needed is a less mechanical approach to capi
12, pp 573-576.
tal adequacy that takes into account a system -wide perspective
on endogenous risk. The academ ic literature has identified many Bangia, A , F X Diebold, T Schuermann and J D Stroughair
potential shortcom ings in the currently regulatory approach for (1999a): "M odeling liquidity risk, with implication for traditional
m arket risk m easurem ent and m anagem ent," W harton working Breuer, T, M Jan d acka, K Rheinberger and M Sum m er (2008):
paper. "Com pounding effects between m arket and credit risk: the
case of variable rate loans," in A Resti (ed), The S e co n d Pillar in
-------- (1999b): "Liquidity on the outside," Risk, Decem ber,
Basel II and the C hallenge o f Eco n o m ic Capital, London: Risk
pp 68-73.
Books.
Barnhill, T and W Maxwell (2002): "M odeling correlated interest
-------- (2009): "H ow to find plausible, severe, and useful stress
rate, exchange rate, and credit risk in fixed income portfolios,"
scenarios," International Journal o f Central Banking, Septem ber.
Journ al o f Banking and Finance, vol 26, pp 347-374.
-------- (2010a): "D oes adding up of econom ic capital for market
Barnhill, T M, P Papapanagiotou and L Schum acher (2000):
and credit risk amount to a conservative risk estim ate?," Journal
"M easuring integrated m arket and credit risks in bank
o f Banking and Finance, vol 34, pp 703-712.
portfolios: an application to a set of hypothetical banks
operating in South A frica," IMF Working Paper #2000-212. Breuer, T, M Jand acka, J Mencia and M Summ er (2010b): "A
system atic approach to multi-period stress testing of portfolio
Barone-Adesi, G , F Bourgoin and K Giannopoulos (1998): "D o n't
credit risk," Bank of Spain Working Paper, Ju n e.
look back," Risk, November, pp 100-103.
Brockm ann, M and M Kalkbrener (2010): "O n the aggregation of
Basak, S and A Shapiro (2001): "Value-at-risk-based risk
risk," Journal o f Risk, vol 12, no 3.
m anagem ent: optimal policies and asset prices," The R eview o f
Financial Stu d ies, pp 371-405. C am pbell, S D (2005): "A review of backtesting and backtesting
procedures," FED S Working Paper Series.
Basel Com m ittee on Banking Supervision (2009a): Findings on
the interaction o f m arket and cred it risk, Working Paper no 16, Christoffersen, P (1998): "Evaluating interval forecasts,"
Basel. International Eco n o m ic Review , pp 841-862.
-------- (2009b): Revisions to the Basel II m arket risk fram ew ork, Christoffersen, P and F Diebold (2000): "H ow relevant is
http://w w w .bis.org/publ/bcbs158.pdf, July. volatility forecasting for financial risk m anagem ent?," The
R eview o f Econ om ics and Statistics, vol 82, no 1, pp 12-22.
Berkowitz, J (2000a): "A coherent fram ework for stress-testing,"
Journ al o f Risk, vol 2, pp 1-11. Christoffersen, P, F Diebold and T Schuermann (1998): "Horizon
problem s and extrem e events in financial risk m anagem ent,"
-------- (2000b): "Incorporating liquidity risk into value-at-risk
F R B N Y Econ om ic Policy Review , O ctober 1998, pp 109-118.
m odels," Working Paper, University of Houston, Septem ber.
Crouhy, M, D Galai and R Mark (2003): Risk m anagem ent,
-------- (2001): "Testing density forecasts, with applications to
M cGraw-Hill.
risk m anagem ent," Journ al o f Business and Eco n o m ic Statistics,
vol 19, no 4, pp 465-474. Cuenot, S, N M asschelein, M Pritsker, T Schuermann and A
Siddique (2006): "Interaction of m arket and credit risk: fram e
Berkowitz, J and J O 'Brien (2002): "H ow accurate are value-at-
work and literature review ," m anuscript, Basel Com m ittee
risk m odels at commercial banks?," Journ al o f Finance, vol 57,
Research Task Force Working Group.
pp 1093-1111.
Danielsson, J (2002): "The em peror has no clothes: limits to
Berkowitz, J , P F Christoffersen and D Pelletier (2010):
risk m odelling," Journal o f Banking and Finance, vol 26, pp
"Evaluating value-at-risk models with desk-level data," M anage
1273-1296.
m ent Scien ce.
Danielsson, J , P Em brechts, C G oodhart, C Keating, F
Berry, R P (2009): "B a ck te stin g value-at-risk," Investm ent
Muennich, O Renault and H Shin (2001): An academ ic resp o n se
Analytics and Consulting, Septem ber.
to Basel II.
Bervas, A (2006): "M arket liquidity and its incorporation
Danielsson, J , B N Jo rg ensen, G Sam orodnitsky, M Sarma
into risk m anagem ent," Banque de France Financial Stability
and C G de Vries (2005): "Subadditivity re-exam ined: the case
Review.
for value-at-risk," FM G Discussion Papers, London School of
Botha, A (2008): "Portfolio liquidity-adjusted value at risk," Econom ics.
S A JE M S NS, pp 203-216.
Danielsson, J and J Zigrand (2006): "O n time-scaling of risk and
Boudoukh, J , M Richardson and R W hitelaw (1998): "The best of the square-root-of-time rule," Journal o f Banking and Finance,
both w o rld s," Risk, May, pp 64-67. vol 30, pp 2701-2713.
Danielsson, J , H Shin and J Zigrand (2009): "Risk appetite and G ennotte, G and H Leland (1990): "M arket liquidity, hedging,
endogenous risk," mimeo, http://www.princeton.edu/~hsshin/ and crashes," Am erican Econ om ic Review , vol 80, no 5,
w w w /riskappetite.pdf. pp 999-1021.
D egen, M, P Em brechts and D Lam brigger (2007): "The Gourier, E, W Farkas and D Abbate (2009): "Operational risk quan
quantitative modeling of operational risk: between g-and-h and tification using extreme value theory and copulas: from theory to
EVT," A STIN Bulletin, vol 37, no 2, pp 265-291. practice," Journal o f Operational Risk, vol 4, no 3, pp 3-26.
D elbaen, F (2002), "C oherent risk m easures," lecture notes for G rundke, P (2005): "Risk m easurem ent with integrated market
University of Pisa lectures, draft. and credit portfolio m odels," Journal o f Risk, vol 7, pp 63-94.
Diebold, F, A Hickman, A Inoue and T Schuermann (1998), Haberle, R and P Persson (2000): "Incorporating m arket liqudity
"Scale m odels," Risk, no 11, pp 104-107. constraints in VaR," Bankers M arkets & Investors, vol 44,
01/ 01/ 2000.
Dimakos, X and K Aas (2004): "Integrated risk m odeling,"
Statistical M odelling, vol 4, no 4, pp 265-277. Hallerbach, W G (2003): "D ecom posing portfolio value-at-risk: a
general analysis," Journal o f Risk, vol 5, no 2, pp 1-18.
Drehm ann, M, S Sorensen and M Stringa (2010): "The
integrated im pact of credit and interest rate risk on banks: a Hellwig, M (1995), "System ic aspects of risk m anagem ent in
dynam ic fram ework and stress testing application," Journ al o f banking and finance," Sw iss Journal o f Econ om ics and Statistics,
Banking and Finance, vol 34, pp 713-742. vol 131, no 4/2, pp 723-737.
Dunn, G (2009), "M odelling m arket risk," UK FSA. -------- (1996), "Capital adequacy rules as instruments for the
regulation of banks," Sw iss Journal o f Econ o m ics and Statistics,
Egloff, D, M Leippold and S Johri (2005): "O ptim al im portance
vol 132, no 4/2, pp 609-612.
sampling for credit portfolios with stochastic approxim ation,"
working paper, http://ssrn.com /abstract=1002631. -------- (2009): "Brandbeschleuniger im Finanzsystem ," M ax
Planck Research, vol 2, pp 10-15.
Engle, R F (2002): "D ynam ic conditional correlation— a simple
class of m ultivariate G A R C H m odels," Jo u rn al o f Business and Hisata, Y and Y Yamai (2000): "Research toward the practical
Econ om ic Statistics, pp 339-350. application of liquidity risk evaluation m ethods," M onetary and
Econ om ic Stu dies, pp 83-128.
Engle, R F and R Ferstenberg (2006): "Execution risk," N BER
working paper 12165. Huberm an, G and W Stanzl (2005), "O ptim al liquidity trading,"
R eview o f Finance, vol 9, no 2, pp 165-200.
Engle, R F and B Kelly (2009): "D ynam ic equicorrelation," Stern
School of Business. International Accounting Standard 39, Financial instrum ents:
recognition and m easurem ent, last version 31 D ecem ber 2008.
Engle, R F and K F Kroner (1995): "M ultivariate sim ultaneous
generalized A R C H ," Eco n o m etric Theory, pp 122-150. Jarrow , R and S Turnbull (2000): "The intersection of market
and credit risk," Journal o f Banking and Finance, vol 24, no 1-2,
Engle, R F, N Shephard and K Sheppard (2007): "Fitting and
pp 271-299.
testing vast dimensional time-varying covariance m odels," New
York University. Jarrow , R and A Subramanian (2001): "The liquidity discount,"
M athem atical Finance, pp 447-474.
Finger, C (2009), "IR C com m ents," RiskM etrics G roup Research
M onthly, February. Jarrow , R and Ph Protter (2005): "Liquidity risk and risk measure
com putation," working paper.
Francois-Heude, A and P Van W ynendaele (2001): "Integrating
liquidity risk in a param etric intraday VaR fram ew ork," Universite Jo b st, N J , G Mitra and S A Zenios (2006): "Integrating m arket
de Perpignan, France, Facultes Universitaires Catholiques de and credit risk: a simulation and optimisation p ersp ective,"
Mons, Belgium . Journal o f Banking and Finance, vol 30, pp 717-742.
Franke, J , W K Hardle and C M Hafner (2008): "Value at risk and Jo b st, N J S A Zenios (2001): "The tail that wags the dog:
backtesting," in Statistics o f Financial M arkets, pp 321-332, integrating credit risk in asset portfolios," Jo u rn al o f Risk
Berlin, H eidelberg: Springer. Finance, vol 3, pp 31-43.
G eanakoplos, J (2009): "The leverage cycle," Cow les J.P. Morgan (1996): RiskM etrics Technical D ocum ent,
Foundation Discussion Paper no 1715R. http://w w w .riskm etrics.com /system /files/private/td4e.pdf.
Kalkbrener, M, H Lotter and L O verbeck (2004): "Sensible and Platen, E and M Schweizer (1998): "O n feedback effects from
efficient capital allocation for credit portfolios," Risk, January, hedging derivatives," M athem atical Finance, vol 8, pp 67-84.
pp 19-24.
Pritsker, M (2006), "The hidden dangers of historical sim ulation,"
Kalkbrener, M, A Kennedy and M Popp (2007): "Efficient calcula Journal o f Banking and Finance, vol 30, no 2, pp 561-582.
tion of expected shortfall contributions in large credit portfo Provizionatou, V, S Markose and O M enkens (2005): "Em pirical
lios," Journ al o f Com putational Finance, vol 11, no 2, pp 45-77. scaling rules for value-at-risk," University of Essex.
Kaufm an, R (2004), Long-term risk m anagem ent, PhD thesis, Q i, J and W L Ng (2009): "Liquidity adjusted intraday value at
ETH Zurich. risk," P ro ceedin g s o f the W orld C on gress o f Engineering.
Kerkhof, J and B M elenberg (2004): "Backtesting for risk-based Rogers, L C G and S Singh (2005): "O ption pricing in an illiquid
regulatory capital," Journ al o f Banking and Finance, vol 28, no m arket," Technical Report, University of Cam bridge.
8, pp 1845-1865.
Rosenberg, J and T Schuermann (2006): "A general approach
Kritzm an, M and D Rich (2002): "The m ism easurem ent of risk," to integrated risk m anagem ent with skew ed, fat-tailed risks,"
Financial Analysts Journal, vol 58, no 3, pp 91-99. Journal o f Financial Econom ics, vol 79, pp 569-614.
Kupiec, P (1998), "Stress testing in a value-at-risk fram ew ork," Saunders, A and I W alter (1994): Universal banking in the U nited
Journ al o f D erivatives, pp 7-24. S ta tes: what cou ld we gain? W hat cou ld we lose?, Oxford
-------- (1995), "Techniques for verifying the accuracy of risk University Press, New York.
m easurem ent m odels," Journal o f D erivatives, pp 73-84. Schonbucher, P J and P W ilm ott (2000): "The feedback effects
of hedging in illiquid m arkets," S IA M Journal on A p p lie d
-------- (2007), "An integrated structural model for portfolio
M athem atics, vol 61, pp 232-272.
m arket and credit risk," m anuscript, Federal D eposit Insurance
Corporation. Shang, D (2009): "A RCH -based value-at-risk with heavy-tailed
errors," London School of Econom ics.
Kuritzkes, A and T Schuermann (2007): "W hat we know, don't
know and can't know about bank risk: a view from the trenches," Shin, H S (2008a): "Risk and liquidity," Clarendon Lectures,
forthcom ing in F X Diebold, N Doherty and R J Herring (eds), O xford University Press, forthcom ing.
The Known, The Unknown and The Unknowable in Financial Risk
-------- (2008b), "Risk and liquidity in a system co n text," Journal
M anagem ent, Princeton University Press.
o f Financial Interm ediation, vol 17, no 3, pp 315-329.
Kuritzkes, A , T Schuermann and S M W einer (2003): "Risk
Sircar, K R and G Papanicolaou (1998): "G eneralized Black-
m easurem ent, risk m anagem ent and capital adequacy in Scholes models accounting for increased market volatility from
financial conglom erates," W harton Financial Institutions Center
hedging strategies," A p p lie d M athem atical Finance, vol 5, no 1,
Working Paper #2003-02. pp 45-82.
Lawrence, C and G Robinson (1997): "Liquidity, dynamic Smithson, C and L Minton (1996): "Value-at-risk," Risk, Septem
hedging and VaR," in: Risk m anagem ent fo r financial institutions, ber, pp 38-39.
Risk Publications, London, pp 63-72.
Stange, S and C Kaserer (2008): "W hy and how to integrate
Le Saout, E (2002): "Integration du risque de liquidite dans les liquidity risk into a VaR fram ew ork," C E F S Working Paper.
m odeles de valeur en risque," Bankers M arkets & Investors,
Stiroh, K J (2004): "Diversification in banking: Is noninterest
vol 61, N ovem ber-D ecem ber, pp 15-25.
income the answ er?," Journal o f M oney, C red it and Banking, vol
Lopez, J (1999): "M ethods for evaluating value-at-risk estim ates," 36, no 5, pp 853-82.
Federal R eserve Bank o f San Francisco Review, 2, pp 3-17.
Subram anian, A (2008): "O ptim al liquidation by a large
-------- (2005), "Stress tests: useful com plem ents to financial risk investor," S IA M Journal o f A p p lie d M athem atics, vol 68, no 4,
m odels," F R B S F Econ om ic Letter, pp 119-124. pp 1168-1201.
M cNeil, A , R Frey and P Em brechts (2005): Q uantitative risk Sy, W (2006): "O n the coherence of VaR risk measures for Levy
m anagem ent, Princeton. Stable distributions," Australian Prudential Regulation Authority.
Perignon, C and D Smith (2006): "The level and quality of value- W ang, S (2001): "A risk measure that goes beyond coherence,"
at-risk disclosure by commercial banks," Simon Fraser University. University of W aterloo, Institute of Insurance and Pension.
W ang, S, V Young and H Panjer (1997): "A xiom atic characteriza W ylie, J , Q Zhang and T Siu (2010): "Can expected shortfall and
tion of insurance p rices," Insurance: M athem atics and Eco n o m value-at-risk be used to statistically hedge options?," Q uantita
ics, vol 21, pp 173-183. tive Finance, vol 10, no 6, pp 575-583.
W ong, W K (2008): "Backtesting trading risk of com m er Yamai, Y and Y Yoshiba (2005): "Value-at-risk versus expected
cial banks using expected shortfall," Journ al o f Banking and shortfall: a practical p ersp ective," Jo u rn al o f Banking and
Finance, vol 32, no 7, pp 1404-1415. Finance, vol 29, pp 997-1015.
W u, G and Z Xiao (2002): "An analysis of risk m easures," Journal Zheng, H (2006): "Interaction of credit and liquidity risks: m odel
o f Risk, vol 4, no 4, pp 53-75. ling and valuation," Journal o f Banking and Finance, vol 30, pp
391-407.
W u, L (2009): "Incorporating liquidity risk in value-at-risk based
on liquidity adjusted returns," Southwestern University of Eco
nomics and Finance.
ANNEX
Table 6.1 Summary of "Bottom-Up" Risk Aggregation Papers in the Survey
Ratio of Unified Risk

Measure to Sum of
Compartmentalised Risk
Research Paper Portfolio Analysed Horizon Risk Measure Used Measures
Breuer, Jand acka, Hypothetical portfolios O ne year Expected shortfall at:

Rheinberger and Summer of foreign-exchange the 1% level 1.94
(2010a) denom inated loans of the 0.1% level 8.22
rating: the 1% level 3.54
BBB+ the 0.1% level 7.59
B+
Breuer, Jand acka, Hypothetical portfolios of O ne year Expected shortfall at:

Rheinberger and Summer variable rate loans of rating: the 1% level 1.11
(2008) the 0.1% level 1.16
BBB+
the 1% level 1.06
B+
the 0.1% level 1.10
Grundke (2005) Hypothetical portfolios Three years VaR at:

of loans with various the 1% level 0.07-0.97
credit ratings, asset value the 0.1% level 0 .0 9 -1 .0 0
correlations, distributional
assum ptions, and
correlations between the
risk-free rate, credit spreads
and firm asset returns
Kupiec (2007) Hypothetical portfolio of Six months Portfolio losses at funding

corporate loans with various cost levels consistent with:
rating categories calibrated A A A rating 0 .6 0 -3 .6 5
to historical data BBB rating 0.61-3.81
Drehm ann, Sorensen and Hypothetical UK bank Three years Decline in capital over the 2.5
Stringa (2010) horizon
Alessandri and Drehmann Hypothetical UK bank O ne year Value-at-risk at:

(2008) the 1% level 0.03
the 0.1% level 0.50
Chapter 6 Messages from the Academic Literature on Risk Management for the Trading Book 103
Table 6.2 Summary of "Top-Down" Risk Aggregation Papers in the Survey
Ratio of Unified Risk

Measure to Sum of
Compartmentalised Risk
Research Paper Portfolio Analysed Horizon Risk Measure Used Measures
Dimakos and Aas (2004) Norwegian financial Total risk exposure at:
conglom erate the 1% level 0.90
the 0.1% level 0.80
Rosenberg and Hypothetical, One year Value-at-risk based on a 0.42-0.89

Schuermann (2008) internationally-active normal copula at the 0.1% based on different
financial conglom erate level. (Note: similar results correlation assumptions
using expected shortfall.) between m arket, credit and
operational risk.
Kuritzkes, Schuermann Representative Dutch One year Econom ic capital 0.72-0.85

and W einer (2003) bank based on different
correlation assumptions
between m arket, credit and
operational risk.
Kuritzkes and US banking system Tail quantile of the

Schuermann (2007) from 1986.Q 2 through earnings distribution at:
2005.Q1 the 1% level 0.63
the 0.1% level 0.63
Correlation Basics:
Definitions,
Applications, and
Terminology
Learning Objectives
Describe financial correlation risk and the areas in which it Estim ate the im pact of different correlations between
appears in finance. assets in the trading book on the VaR capital charge.
Explain how correlation contributed to the global financial Explain the role of correlation risk in m arket risk and
crisis of 2007 to 2009. credit risk.
• Describe the structure, uses, and payoffs of a Relate correlation risk to system ic and concentration risk.
correlation swap.
E x c e rp t is C hapter 1 o f Correlation Risk Modeling and M anagem ent, 2nd Edition, by G unter M eissner.
Exam ples are:
"Behold the fool saith, 'Put not all thine eggs in the one 1. Correlating bond prices and their respective yields at a certain
basket'"
point in time, which will result in a negative association.
— Mark Twain 2 . The classic VaR (value-at-risk) m odel, which answers the
question: what is the maximum loss of correlated assets in
a portfolio with a certain probability for a given tim e period
In this introductory chapter, we define correlation and correla
(see "Risk m anagem ent and correlation" below).
tion risk, and show that correlations are critical in many areas
of finance such as investm ents, trading, and risk m anagem ent, 3 . The copula approach for C D O s (collateralised debt
as well as in financial crises and in financial regulation. We also obligations). It m easures the default correlations between
show how correlation risk relates to other risks in finance such all assets in the C D O , typically 125, for a certain time
as m arket risk, credit risk, system ic risk, and concentration risk. period.
Before we do, let's see how it all started. 4 . The binomial default correlation model of Lucas (1995),
which is a special case of the Pearson correlation model. It
measures the probability of two assets defaulting together
7.1 A SHORT HISTORY within a short tim e period.
O F CORRELATION Besides the static correlation concept, there are dynamic

correlations:
A s with many groundbreaking d isco veries, there is a bit of a (b) Definition: dynam ic financial correlations measure how two
controversy as to who the creator of the co ncep t of co rrela or more financial assets move together in tim e.
tion is. Foundations on the behaviour of error term s w ere laid
Exam ples are:
in 1846 by the French m athem atician A ug uste Bravais, who
essentially derived w hat is today term ed the "reg ression lin e ". 1 . In practice, "pairs trading" - where one asset is purchased
and another is sold - is perform ed. Let's assume that the
H owever, Helen W alker (1929) d escrib es Bravais nicely as "a
asset returns x and y have moved highly correlated in tim e.
kind of C olum bus, discovering correlation w ithout fully realis
ing that he had done s o ". Further sig nificant theo retical and If now asset X perform s poorly with respect to Y, then asset
X is bought and asset Y is sold with the expectation that the
em pirical w ork on correlation was done by Sir W alter G alton
in 1886, who created a sim ple linear regression and in terest gap will narrow.
ingly also discovered the statistical property of "R eg ressio n to 2 . Within the determ inistic correlation approaches, the
M ed io crity", which today w e call "M ean -R eversio n ". Heston model (1993) correlates the Brownian motions
dz -1 and dz2 of assets 1 and 2. The core equation is
A student of W alter G alto n , Karl Pearson, w hose w ork on
dz-|(t) = pdz 2 (t) + V (1 — p2)dz 3(t) where dz-1 and dz2 are
relativity, antim atter and the fourth dim ension inspired A lb ert
correlated in tim e with correlation param eter p.
Einstein, expanded the theory of correlation significantly.
Starting in 1900, Pearson defined the correlation coefficient 3 . Correlations behave random and unpredictable. Th ere
as a product m om ent co efficient, introduced the m ethod of fore, it is a good idea to model them as a stochastic
m om ents and principal com ponent analysis, and founded the process. Stochastic correlation processes are by con
co ncep t of statistical hypothesis testing , applying P-Values and struction tim e-dependent and can replicate correlation
Chi-squared distances. properties well.
"Suddenly everything was highly correlated"
Financial Tim es, April 2009
7.2 WHAT ARE FINANCIAL
CORRELATIONS? 7.3 WHAT IS FINANCIAL
CORRELATION RISK?
Heuristically (meaning non-mathematically), we can define two
types of financial correlations, static and dynamic:
Financial correlation risk is defined as the risk of financial loss
(a) Definition: static financial correlations measure how two or due to adverse movements in correlation between two or more
more financial assets are associated at a certain point in tim e or variables. These variables can com prise any financial variables.
within a certain tim e period. For exam ple, the positive correlation between M exican bonds
Fixed CDS spread s risk is transferred from the investor (or C D S buyer) to a
------------------------ ► Counterparty C counterparty (CD S seller). Let's assume an investor has
Investor and
ie, credit default
credit default bought USD 1 million in a bond from Spain. They are
swap seller
swap buyer i •*-------------------------- now worried about Spain defaulting and have purchased
(BNP Paribas)
Payout of USD 1 million
a CD S from a French bank, BNP Paribas. Graphically this
in case of default of r
is displayed in Figure 7.1.
USD 1 million coupon k The investor is protected against a default from Spain
since, in case of default, the counterparty BNP Paribas
▼ will pay the originally invested USD 1 million to the
investor. For sim plicity, let's assume the recovery rate
and accrued interest are zero.
The value of the CD S, ie, the fixed CD S spread s,1 is mainly

determined by the default probability of the reference
entity Spain. However, the spread s is also determined by
Figure 7.1 An investor hedging their Spanish bond
the joint default correlation of BNP Paribas and Spain. If
exposure with a CDS.
the correlation between Spain and BNP Paribas increases,
the present value of the CD S for the investor will decrease and
and G reek bonds can hurt Mexican bond investors, if G reek
they will suffer a paper loss. Worst-case scenario is the joint default
bond prices decrease, which happened in 2012 during the
of Spain and BNP Paribas, in which case the investor will lose their
G reek crisis. O r the negative correlation between com m odity
entire investment in the Spanish bond of USD 1 million.
prices and interest rates can hurt com m odity investors if interest
rates rise. A further exam ple is the correlation between a bond In other words, the investor is exposed to default correlation risk
issuer and a bond insurer, which can hurt the bond investor (see between the reference asset r (Spain) and the counterparty c (BNP
the exam ple displayed in Figure 7.1). Paribas). Since both Spain and BNP Paribas are in Europe, let's
assume that there is a positive default correlation between the
Correlation risk is especially critical in risk m anagem ent. An
two. In this case, the investor has "Wrong-Way Correlation Risk"
increase in the correlation of asset returns increases the risk
or, for short, "Wrong-Way Risk" (WWR). Let's assume the default
of financial loss, which is often measured by the VaR concept.
probabilities of Spain and BNP Paribas both increase. This means
For details see "Risk m anagem ent and correlation" below.
that the credit exposure to the reference entity Spain increases
An increase in correlation is typical in a severe, system ic crisis.
(since the CD S has a higher present value for the investor) and the
For exam ple, during the great recession from 2007 to 2009,
credit risk increases, since it is more unlikely that the counterparty
financial assets and financial m arkets w orldwide becam e highly
BNP Paribas can pay the default insurance.
correlated. Risk m anagers who had negatively or low correlated
assets in their portfolio suddenly witnessed many of them The m agnitude of the correlation risk is expressed graphically in
decline together, hence asset correlations increased sharply. Figure 7.2.
For more on system ic risk, see "The global financial crises 2007 From Figure 7.2, we observe that for a correlation of —0.3 and
to 2009 and correlation" below as well as C hapter 8, which dis higher, the higher the correlation, the lower is the CDS spread. This
plays em pirical findings of correlations. is because an increasing p means a higher probability of the refer
Correlation risk can also involve variables that are non-financial ence asset and the counterparty defaulting together. In the extreme
as econom ic or political events. For exam ple, the correlation case of a perfect correlation of 1, the CDS is worthless. This is
between the increasing sovereign debt and currency value can because, if Spain defaults, so will the insurance seller BNP Paribas.
hurt an exporter, as in Europe in 2012, where a decreasing euro We also observe from Figure 7.2 that, for a correlation from
hurt US exporters. G eopolitical tensions, as for exam ple in the about - 0 .3 to - 1 , the CD S spread decreases slightly. This seems
Middle East, can hurt airline com panies due to the increasing oil counterintuitive at first. However, an increase in the negative
price, or a slowing G D P in the US can hurt Asian and European
exporters and investors, since econom ies and financial markets
1 The CDS spread s is the premium or fee that the CDS buyer pays
are correlated w orldw ide. for getting protection. It is called a spread since it is approximately
the spread between the yield of the risky bond (the bond of Spain in
Let's look at correlation risk via an exam ple of a credit default Figure 7.1) in the CDS minus the yield of a riskless bond. See Meissner
swap (CD S). A C D S is a financial product in which the credit 2005, for details.
Chapter 7 Correlation Basics: Definitions, Applications, and Terminology ■ 107

(4) the global financial crisis and (5) regulation.
Naturally, if an entity is exposed to correlation, this
means that the entity has correlation risk, ie, the risk
of a change in the correlation.
Investments and Correlation

From our studies of the Nobel Prize-rewarded C ap i
tal A sset Pricing Model (M arkowitz (1952), Sharpe
(1964)), we rem em ber that an increase in diversifi
cation increases the return/risk ratio. Im portantly,
high diversification is related to low correlation.
Let's show this in an exam ple. Let's assume we
have a portfolio of two assets, X and Y. They have
Fiaure 7.2 CDS spread s of a hedged2 bond purchase perform ed as in Table 7.1.
(as displayed in Figure 7.1) with respect to the default correlation
between the reference entity r and the counterparty c. Let's define the return of asset X at time t as x t, and
the return of asset Y at tim e t as y t. A return is cal
correlation means a higher probability of either Spain or BNP culated as a percentage change, (St — St_ i) / S t_-|, where S is a
Paribas defaulting. Hence we have two scenarios: (a) in the case price or a rate. The average return of asset X for the tim efram e
of Spain defaulting (and BNP Paribas surviving) the CD S buyer 2014 to 2018 is /xx = 29.03% ; for asset Y the average return is
will get compensated by BNP Paribas; (b) if the insurance seller /xy = 20.07% . If we assign a w eight to asset X, wx, and a weight
BNP Paribas defaults (and Spain survives), the CD S buyer will lose to asset Y, wY, the portfolio return is:
his insurance and will have to repurchase it. This may have to be fXp — WxfXx + W y /X y (7.1)
done at a higher cost. The cost will be higher if the credit quality
where wx + wy = 1
of Spain has decreased since inception of the original C D S. For
exam ple, the CD S spread may have been 3% in the original CDS, The standard deviation of returns, called volatility, is derived for
but may have increased to 6% due to a credit deterioration of asset X with equation:
Spain. The scenarios (a) and (b) combined lead to a slight decrease 1 n
of the CD S spread. For more details on pricing CDSs with counter "x -n 4- -1 t±
2 i^ - Ax)' (7.2)
party risk and the reference asset - counterparty correlation - see

Kettunen and Meissner (2006). where x t is the return of asset X at tim e t and n is the number
of observed points in tim e. The volatility of asset Y is derived
We observe from Figure 7.2 that the dependencies between accordingly. Equation 7.2 can be com puted with = stdev in
a variable (here the C D S spread) and correlation may be non Excel and std in M A TLA B . From our exam ple in Table 7.1, we
m onotonic, ie, the C D S spread som etim es increases and som e
find that <rx = 44.51% and oy = 47.58% .
tim e decreases if correlation increases.
Table 7.1 Performance of a Portfolio with Two Assets

7.4 MOTIVATION: CORRELATIONS Return of Return of
AND CORRELATION RISK ARE Year Asset X Asset Y Asset X Asset Y
EVERYW HERE IN FINANCE 2013 100 200
W hy study financial correlations? That's an easy one. Financial 2014 120 230 20.00% 15.00%
correlations appear in many areas in finance. W e will briefly dis 2015 108 460 -1 0 .0 0 % 100.00%
cuss five areas: (1) investm ents, (2) trading, (3) risk m anagem ent,
2016 190 410 75.93% -1 0 .8 7 %
2017 160 480 -1 5 .7 9 % 17.07%
2018 280 380 75.00% -2 0 .8 3 %

2 To hedge means to protect More precisely, hedging means to enter
A verage 29.03% 20.07%
into a second trade to protect against the risk of an orginal trade.
With equal weights, ie, wx = wY = 0.5, the exam ple in
Table 7.1 results in op = 16.66% .
Importantly, the standard deviation (or its square, the

variance) is interpreted in finance as risk. The higher the
standard deviation, the higher the risk of an asset or a
portfolio. Is standard deviation a good measure of risk?
The answer is: it's not great, but it's one of the best
there are. A high standard deviation may mean high
upside potential of the asset in question! So it penalises
possible profits! But high standard deviation naturally
also means high downside risk. In particular, risk-averse
investors will not like a high standard deviation, ie, high
fluctuation of their returns.
An informative perform ance measure of an asset or a

portfolio is the risk-adjusted return, ie, the return/risk
ratio. For a portfolio it is /zP/<xP, which we derived in
Equations (7.1) and (7.5). In Figure 7.3 we observe one
F ia u re 7 .3 The negative relationship of the portfolio return/
of the few "free lunches" in finance: the lower (prefera
portfolio risk ratio iip/o-p with respect to the correlation p of the
bly negative) the correlation of the assets in a portfolio,
assets in the portfolio (input data are from Table 7.1).
the higher the return/risk ratio. For a rigorous proof,
see Markowitz (1952) and Sharpe (1964).
Let's now look at the covariance. The covariance measures how Figure 7.3 shows the high im pact of correlation on the portfolio
two variables "co-vary", ie, move together. More precisely, return/risk ratio. A high negative correlation results in a return/
the covariance measures the strength of the linear relationship risk ratio of close to 250% , whereas a high positive correlation
between two variables. The covariance of returns for assets X results in a 50% ratio. The equations (7.1) to (7.5) are derived
and V is derived with equation: within the fram ework of the Pearson correlation approach.
"O nly by great risks can great results be achieved"

COVxy = — — 2 ( x , - Mx)(y. - Mr) (7.3)
n 1 t=1 Xerxes
For our exam ple in Table 7.1 we derive CO V x y = —0.1567.
Equation (7.3) is = Covariance. S in Excel and cov in M A TLA B .
The covariance is not easy to interpret, since it takes values 7.5 TRAD IN G A N D CO RRELA TIO N
between — and + °°. Therefore, it is more convenient to use
the Pearson correlation coefficient pXY, which is a standardised In finance every risk is also an opportunity. Th erefo re, at every
covariance, ie, it takes values between —1 and +1. The Pearson m ajor investm ent bank and hedge fund, correlation desks
correlation coefficient is: exist. The traders try to forecast changes in correlation and
try to financially gain from these changes in correlation. We
CO V x y
Px y = (7.4) already m entioned the correlation strategy "pairs trading "
°x °Y
above. G en erally, correlation trading means trading assets,
For our exam ple in Table 1, pXY = —0.7403, showing that the w hose price is determ ined at least in part by the co-m ovem ent
returns of assets X and Y are highly negatively correlated. of one asset or more in tim e. Many types of correlation assets
Equation (7.4) is "co rrel" in Excel and "corrcoef" in M A TLA B. exist.
For the derivation of the numerical exam ples of equations (7.2)
Multi-asset options, also term ed rainbow options or moun
to (7.4) and more information on the covariances see the appen
tain range options. Many different types are traded. The most
dix of C hapter 1 and w w w .dersoft.com /m atrixprim er.xlsx, sheet
popular ones are listed below. S-, is the price of asset 1 and S2
"Covariance M atrix".
is the price of asset 2 at option maturity, K is the strike price,
We can calculate the standard deviation for our two-asset ie, the price determ ined at option start at which the underlying
portfolio P as: asset can be bought in case of a call or the price at which the
(TP = \^ w x a x + w Ya Y + 2 w x w y C O V x y (7.5) underlying asset can be sold in case of a put.

• Option on the better of tw o. Payoff = max (S-|, S 2).
• Option on the worse of tw o. Payoff = min (S-|, S 2).
• Call on the maximum of two.
Payoff = m ax [0, max(S-|, S2) — K].
• Exchange option (such as a convertible bond).
Payoff = m ax (0, S 2 — S|).
• Spread call option. Payoff = max [0, (S 2 - S 2) — K\.

• Option on the better of two or cash. Payoff = max
(S-|, S2, cash).
• Dual strike call option. Payoff = max

(0, s2- #Ci s2- K2).
• Basket option.
F ia u re 7 .4 Exchange option price with respect to correlation
2 n,S, - of the assets in the K,
0
portfolio.
L /=1 J
where n,- is the w eight of assets /.
Importantly, the price of these correlation options is highly sensi high correlation and USD 15.08 for a negative correlation of —1.
tive to the correlation between the asset prices S-| and S2. In the As in Figures 7.2 and 7.3, the correlation approach underlying
list above, except for the option on the worse of two, and the Figure 7.4 is the Pearson correlation model.
basket option, the lower the correlation, the higher is the option
Another interesting correlation option is the quanto option.
price. This makes sense since a low, preferably negative correla
This is an option that allows a dom estic investor to exchange
tion means that, if one asset decreases (on average), the other
their potential option payoff in a foreign currency back into
increases. So one of the two assets is likely to result in a high price
their home currency at a fixed exchange rate. A quanto option
and therefore in a high payoff. Multi-asset options can be conve
therefore protects an investor against currency risk. Let's assume
niently priced analytically with extensions of the Black-Scholes-
an Am erican believes the Nikkei will increase, but they are wor
Merton option model (1973).
ried about a decreasing yen, which would reduce or eliminate
Let's look at the evaluation of an exchange option with a her profits from the Nikkei call option. The investor can buy a
payoff of m ax(0, S2 — S-1). The payoff shows that the option quanto call on the Nikkei, with the yen payoff being converted
buyer has the right to give away A sset 1 and receive A sset 2 into dollars at a fixed (usually the spot) exchange rate.
at option maturity. Hence, the option buyer will exercise their
O riginally, the term quanto com es from the word "quantity",
right, if S2 > S v The price of the exchange option can be easily
meaning that the amount that is re-exchanged to the home cur
derived. We first rewrite the payoff equation m ax(0, S2 — S-|) as:
rency is unknown, because it depends on the future payoff of
S -1 m ax(0, (S2/S-|) — 1). We then input the covariance between
the option. Therefore the financial institution that sells a quanto
asset S -1 and S2 into the implied volatility function of the
call, does not know two things:
exchange option using a variation of equation (7.5):
1. How deep will the call be in the money at option maturity,
crE = \^ o a + ob ~ 2CO Va b (7.5a) ie, which yen am ount has to be converted into dollars?
where crE is the implied volatility of S2/S-i, which is input into the 2 . W hat is the exchange rate at option maturity at which the
standard Black-Scholes-M erton option pricing model (1973). stochastic yen payoff will be converted into dollars?
For an exchange option pricing model and further discussion,
The correlation between (1) and (2), ie, the price of the under
see the model at w w w .dersoft.com /exchangeoption.xlsm .
lying S' and the exchange rate X, significantly influences the
Im portantly, the exchange option price is highly sensitive to quanto call option price. Let's consider a call on the Nikkei S' and
the correlation between the asset prices S-| and S2, as seen in an exchange rate X defined as dom estic currency per unit foreign
Figure 7.4. currency (so USD/1 yen for a dom estic American) at maturity.
From Figure 7.4 we observe the strong im pact of the correla If the correlation is positive, an increasing Nikkei will also mean
tion on the exchange option price. The price is close to 0 for an increasing yen. That is in favour of the call seller. They have to
settle the payoff, but need only a small yen amount to achieve Fixed percentage eg, p = 10%
Correlation -------------------------------- ► Correlation
the dollar paym ent. Therefore, the more positive the correlation
fixed rate fixed rate
coefficient, the lower is the price for the quanto option. If the payer ◄------------------------------- receiver
Realised p
correlation coefficient is negative, the opposite applies: if the
Nikkei increases, the yen decreases in value. Therefore more F ia u re 7 .5 A correlation swap with a fixed 10%
yen are needed to m eet the dollar paym ent. A s a consequence, correlation rate.
the lower the correlation coefficient, the more expensive is the
quanto option. Hence we have a similar negative relationship
correlations of the log-returns at maturity are as displayed in
between the option price and correlation, as in Figure 7.4.
Table 7.2.
Quanto options can be conveniently priced analytically with
The average correlation between the three assets is derived by
an extension of the Black-Scholes-M erton model (1973). For
equation (7.6). W e only apply the correlations in the shaded area
a pricing model and a more detailed discussion on a quanto
from Table 7.2, since these satisfy / > j. Hence we have
option, see w w w .dersoft.com /quanto.xls.
The correlation between assets can also be traded directly with

Prealised = ^2 _ 3 ^ + 0-3 + = 0-3
a correlation swap. In a correlation swap, a fixed (ie, known) cor
relation is exchanged with the correlation that will actually occur, Following equation (7.7), the payoff for the correlation fixed-
called realised or stochastic (ie, unknown) correlation, as seen in rate payer at swap maturity is USD 1,000,000 X (0.3 — 0.1) =
Figure 7.5. USD 200,000.
Paying a fixed rate in a correlation swap is also called "buying cor

Correlation swaps can indirectly protect against decreasing
relation". This is because the present value of the correlation swap
stock prices. As we will see in this chapter in "H ow does cor
will increase for the correlation buyer if the realised correlation
relation risk fit into the broader picture of risks in finance?",
increases. Naturally the fixed-rate receiver is "selling correlation".
Figure 7.8, as well as in Chapter 8, when stock prices decrease,
The realised correlation p in Figure 7.5 is the correlation typically the correlation between the stocks increases. Hence a
between the assets that actually occur during the tim e of the fixed correlation payer protects them selves indirectly against a
swap. It is calculated as: stock market decline.
A t the time of writing there is no industry-standard valuation

P re a lise d o — 2 X (7 - 6 )
n - n ■,>j model for correlation swaps. Traders often use historical data
to anticipate preaiised- T ° apply swap valuation techniques, we
where p(J is the Pearson correlation between asset / and j, and n
require a term structure of correlation in tim e. However, no
is the number of assets in the portfolio. The payoff of a correla
correlation term structure currently exists. We can also apply
tion swap for the correlation fixed rate payer at maturity is:
stochastic correlation models to value a correlation swap.
N { p r e a lis e d ~ P fix e d ) (? • ? ) Stochastic correlation models are currently em erging.
where N is the notional amount. Let's look at an exam ple of a Another way of buying correlation (ie, benefiting from an
correlation swap. increase in correlation) is to buy put options on an index such as
the Dow Jo n es Industrial A verage (Dow) and sell put options on
Example 7.1 individual stock of the Dow. As we will see in C hapter 8, there is
W hat is the payoff of a correlation swap with three assets, a a positive relationship between correlation and volatility.
fixed rate of 10%, a notional amount of USD 1,000,000 and a

1-year maturity?
Table 7 .2 Pairwise Pearson Correlation Coefficient at
First, the daily log-returns ln(St/ S t_-|) of the three assets are Swap Maturity
calculated for one year.3 Let's assume the realised pairwise
Sj= si = 2 s i =3
S,= i 1 0.5 0.7

Log-returns ln(S-|/S0) are an approximation of percentage returns
(S-1 — Sq)/So- We typically use log-returns in finance since they are S ;=2 0.5 1 0.3
additive in time, whereas percentage returns are not. For details see
S,-= 3 0.1 0.3 1
Appendix A2.

Therefore, if the correlation between the stocks of the Dow chapter on m arket risk. M arket risk consists of four types of risk:
increases, for exam ple in a m arket downturn, so will the implied (1) equity risk, (2) interest-rate risk, (3) currency risk and (4) com
volatility4 of the put on the Dow. This increase is expected to modity risk.
outperform the potential loss from the increase in the short put
There are several concepts to measure the m arket risk of a port
positions on the individual stocks.
folio such as VaR, expected shortfall (ES), enterprise risk
Creating exposure on an index and hedging with exposure on m anagem ent (ERM) and more. VaR is currently (year 2018) the
individual com ponents is exactly what the "London w h ale", JP most w idely applied risk-m anagem ent m easure. Let's show the
Morgan's London trader Bruno Iksil, did in 2012. Iksil was called im pact of asset correlation on V aR.6
the London whale because of his enormous positions in C D S s.5
First, what is value-at-risk (VaR)? VaR m easures the maximum
He had sold C D Ss on an index of bonds, the C D X .N A .IG .9 , and
loss of a portfolio with respect to m arket risk for a certain proba
"h ed g ed " it with buying C D Ss on individual bonds. In a recover
bility level and for a certain tim e fram e. The equation for VaR is:
ing econom y this is a promising trade: volatility and correlation
typically decrease in a recovering econom y. Therefore, the sold VaRp = (jpaV /x (7.8)
C D Ss on the index should outperform (decrease more than) the where VaRP is the value-at-risk for portfolio P, and
losses on the CD Ss of the individual bonds.
a: Abscise value of a standard normal distribution,
But what can be a good trade in the medium and long term s corresponding to a certain confidence level. It can be derived
can be disastrous in the short term . The positions of the London as = norm sinv(confidence level) in Excel or norminv(confidence
whale were so large, that hedge funds "short squeezed" Iksil: level) in M A TLA B ; a takes the values —°c < a < + =°;
they started to aggressively buy the C D S index C D X .N A .IG .9 .
x: Tim e horizon for the VaR, typically measured in days;
This increased the C D S values in the index and created a huge
(paper) loss for the whale. JP Morgan was forced to buy back tt p\ Volatility of the portfolio P, which includes the correlation
the C D S index positions at a loss of over USD 2 billion. between the assets in the portfolio. We calculate crP via:
a? = V /ShC fl, (7.9)

Risk Management and Correlation where fa is the horizontal [3 vector of invested amounts (price
Since the global financial crises of 2007 to 2009, financial tim e quantity); fa is the vertical (3 vector of invested amounts
markets have become more risk-averse. Commercial banks (also price time quantity);7 C is the covariance m atrix of the
and investment banks as well as nonfinancial institutions have returns of the assets.
increased their risk-management efforts. As in the investment and
Let's calculate VaR for a two-asset portfolio and then analyse the
trading environment, correlation plays a vital part in risk m anage im pact of different correlations between the two assets on VaR.
ment. Let's first clarify what risk management means in finance.
Definition: Financial risk m anagem ent is the process of identify E x a m p le 7 .2

ing, quantifying and, if desired, reducing financial risk.
W hat is the 10-day VaR for a two-asset portfolio with a correlation
The main types of financial risk are: coefficient of 0.7, daily standard deviation of returns of asset 1
1 . m arket risk; of 2%, asset 2 of 1%, and USD 10 million invested in asset 1 and
USD 5 million invested in asset 2, on a 99% confidence level?
2. credit risk; and
3 . operational risk.
Additional types of risk may include system ic risk, liquidity risk,

volatility risk and correlation risk. W e will concentrate in this
6 We will use a "variance-covariance VaR" approach in this book to

4 Implied volatility is volatility derived (implied) by option prices. The derive VaR. Another way to derive VaR is the "non-parametric VaR".
higher the implied volatility, the higher the option price. This approach derives VaR from simulated historical data. See Markovich
(2007) for details.
5 Simply put, a CDS is an insurance against default of an underlying
(eg, a bond). However, if the underlying is not owned, a long CDS is a 7 More mathematically, the vector fa is the transpose of the vector
speculative instrument on the default of the underlying (just like a naked fa and vice versa: fa T — fa and fa T — fa. Hence we can also write
put on a stock is a speculative position on the stock going down). See Equation (7.9) as o> = V faC fa. See www.dersoft.com/matrixprimer.xlsx
Meissner (2005) for more. sheet "Matrix Transpose" for more.
First, we derive the covariances Cov: Let's now analyse the im pact of different correlations between
the asset 1 and asset 2 on VaR. Figure 7.6 shows the impact.
C o v -11 = p-|-|cr-|cr-| = 1 X 0.02 X 0.02 = 0.00048 (7.10)
As exp e cte d , we observe from Figure 7.6 that the lower the
Cov-|2 Pi2^ i 0*7 X 0.02 X 0.01 0.00014
correlation, the lower is the risk, m easured by VaR. Preferably
C 0 V21 P 21 <72 cr-| 0.7 X 0.01 X 0.02 0.00014 the correlation is negative. In this case, if one asset decreases,
C 0 V22 = P22cr2 cr2 = 1 X 0.01 X 0.01 = 0.0001 the other asset on average increases, hence reducing the over
all risk. The im pact of correlation on VaR is strong: for a perfect
Hence our covariance m atrix is
negative correlation of - 1 , VaR is USD 1.1 m illion; for a perfect
( 0.0004 0 .0 0 0 1 4 \ positive correlation, VaR is close to USD 1.9 m illion. A sp read
“ V0.00014 0.0001 ) sheet for calculating tw o-asset VaRs can be found at www.der-
so ft.co m /2assetV aR .xlsx (case-sensitive).
Let's calculate aP following equation (7.9). We first derive C
"There are no toxic assets, just toxic p eo p le."
J
/ 0.0004 0.00014 \
(10 5)
V0.00014 0.0001
= (10 X 0.0004 + 5 X 0.0004 10 X 0.00014 + 5 X 0.0001) The Global Financial Crises 2007 to 2009
= (0.0047 0.0019) and Correlation
and then
Currently, in 2018, the global financial crisis of 2007 to 2009
seem s like a distant memory. The Dow Jo nes Industrial A verage
(/3hQf3v = (0.0047 0.0019)
has recovered from its low in March 2009 of 6,547 points
= 10 X 0.0047 + 5 X 0.0019 = 5.65% and has alm ost quadrupled to over 25,000 as of O ctober
2018. World econom ic growth is at a m oderate 2.5% . The US
Hence we have
unem ploym ent rate as of O ctober 2018 is historically low at
<7> = V /3hC/3v = V 5 .6 5 X = 23.77% 3.7% . However, to fight the crisis, governm ents engaged in
We find the value for a in equation (7.8) from Excel as huge stimulus packages to revive their faltering econom ies. As
=norm sinv(0.99) = 2.3264, or M A TLA B as norminv(0.99) = a result, enormous sovereign deficits are plaguing the world
2.3264. econom y. The US debt is also far from benign with a total
gross-debt-to-GDP ratio of about 107%. One of the few nations
Following equation (7.8), we now calculate the V aR P as
that are enjoying these enormous debt levels is China, which is
0.2377 X 2.3264 X VlO = 1.7486.9
happy buying the debt and taking in the proceeds.
Interpretation: We are 99% certain that we will not lose more
than USD 1.7486 million in the next 10 days due to correlated
market price changes of asset 1 and 2.
The number USD 1.7486 million is the 10-day VaR on

a 99% confidence level. This means that on average
once in a hundred 10-day periods (so once every
1,000 days), this VaR number of USD 1.7486 million
will be exceed ed . If we have roughly 250 trading
days in a year, the com pany is expected to exceed
the VaR about once every four years.
8 The attentive reader realises that we calculated the cova

riance differently in Equation (7.3). In Equation (7.3) we
derived the covariance "from scratch", inputting the return
values and means. In Equation (7.10) we are assuming that
we already know the correlation coefficient p and the stan
dard deviation (T.
9 This calculation, including Excel matrix multiplication, F ia u re 7 .6 VaR of the two-asset portfolio of Example 7.2 with
can be found at www.dersoft.com/2assetVaR.xlsx. respect to correlation p.

A crisis that brought the financial and econom ic system w orld 35
wide to a standstill is naturally not mono-causal, but has many
reasons. Here are the main ones.
(a) An extrem ely benign econom ic and risk environm ent from
2003 to 2006 with record low credit spreads, low volatility
and low interest rates.
(b) Increasing risk-taking and speculation of traders and inves

tors who tried to benefit in these presum ably calm tim es.
This led to a bubble in virtually every m arket segm ent like
the housing m arket, the m ortgage m arket (especially the 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99
-5
subprim e m ortgage market), the stock m arket and the Correlation
com m odity m arket. In 2007, US investors had borrowed 0%-3% -»-3% -7% 7%-10% —k—10%—15% —*-15%-30%
470% of the US national income to invest and speculate in F ia u re 7 .7 CDO tranche spreads with respect to
the real-estate, financial and com m odity m arkets. correlation between the assets in the CDO.
(c) A new class of structured investm ent products such as
C D O s, C D O squared, C P D O s (constant-proportion debt
to finance by David Li in 2000, could presum ably manage
obligations) and CPPI (constant proportion portfolio
the default correlations in the C D O s.
insurance), as well as new products such as options on
C D Ss, credit indices etc. A first correlation-related crisis, which was a forerunner of the
major one to come in 2007 to 2009, occurred in May 2005. G en
(d) The new copula correlation m odel, which was trusted
eral Motors was downgraded to BB and Ford was downgraded
naively by many investors and which could presumably
to B B + , so both com panies were now in "junk status". A down
correlate the n(n - 1)/2 assets in a structured prod
grade to junk typically leads to a sharp bond price decline, since
uct. Most C D O s contained 125 assets. Hence there are
many mutual funds and pension funds are not allowed to hold
125(125 — 1)/2 = 7,750 asset correlation pairs to be quan
junk bonds.
tified and m anaged.
Im portantly, the correlation of the bonds in C D O s (which origi
(e) A moral hazard of rating agencies, who were paid by the
nally w ere only investm ent-grade bonds) d ecreased , since
same com panies whose assets they rated. A s a conse
bonds of different credit qualities are typically lower-
quence, many structured products received A A A ratings
co rrelated . This led to huge losses of hedge funds, which had
and gave the illusion of low price and default risk.
put on a strategy w here they w ere long the equity tranche of
(f) Risk managers and regulators who lowered their standards the C D O and short the m ezzanine tranche of the C D O .
in light of the greed and profit frenzy. We recommend an Figure 7.7 shows the dilem m a. H edge funds had invested in
excellent - anonymous - paper in The Econom ist: "C o n fe s the equity tran ch e10*1
2(0% to 3% in Figure 7.7) to collect the
sions of a Risk M anager". high-equity tranche spread. They had then presum ably
11 10
hedged the risk by going short the m ezzanine tranche
The topic of this book is correlation risk, so let's concentrate on
(3% to 7% in Figure 7.7). However, as we can see from
the correlation aspect of the crisis. Around 2003, two years after
Figure 7 .7 , this "h e d g e " is flaw ed.
the Internet bubble burst, the risk appetite of the financial mar
kets increased and investm ent banks, hedge funds, and private W hen the correlations between the assets in the C D O
investors began to speculate and invest in the stock m arkets, decreased, the hedge funds lost on both positions.
com m odities and especially in the real-estate market.
In particular, residential m ortgages becam e an investm ent

object. The m ortgages w ere packaged in C D O s and then sold 10 Investing in the equity tranche means "assuming credit risk" since a
credit deterioration hurts the investor. This is similar to a bond, where
off to investors locally and globally. The C D O s typically consist
the investor assumes the credit risk. Investors in the equity tranche
of several tranches, ie, the investor can choose a particular receive the high equity tranche contract spread.
degree of default risk. The equity tranche holder is exposed to 11
To hedge means to protect or to reduce risk.
the first 3% of m ortgage defaults, the m ezzanine tranche holder
12 Going short the mezzanine tranche means being "short credit",
is exposed to the 3-7% of defaults and so on. The new copula ie, benefiting from a credit deterioration. Going short the mezzanine
correlation m odel, derived by A be Sklar in 1959 and transferred tranche means paying the (fairly low) mezzanine tranche contract spread.
1. The equity tranche spread increased sharply (see Arrow 60 %
1). Hence the spread that the hedge fund received in
- 50%
the original transaction was now significantly lower than
the current market spread, resulting in a paper loss. 40%
2. In addition, the hedge funds lost on their short m ezza - 30%
nine tranche position, since a lower correlation lowers 20 %

the mezzanine tranche spread (see Arrow 2). Hence the
- 10%
spread that the hedge fund paid in the original transac
tions was now higher than the market spread, resulting 0%
in another paper loss.
CNI C \J CNI C \l C \J CNI C \l C \J CNI C \l CNI CM C \J CNI CM C \J CNI CM CNI C \l
As a result of the huge losses, several hedge funds such as LO 00 C \J LO OO C sJ LO CO C \j LO OO C sJ LO CO CNJ
Marin C apital, Aman Capital and Baily Coates Crom well

Fiqure 7.8 Relationship between the Dow (graph with
filed for bankruptcy. It is im portant to point out that the
triangles, numerical values on left axis) and correlation
losses resulted from a lack of understanding of the cor
between the stocks in the Dow (numerical values on right
relation properties of the tranches in the C D O . The C D O s
axis) before and during the systemic 2007-09 global financial
them selves can hardly be blamed or called toxic for their
crisis; one-year moving average of monthly correlations.
correlation properties.
From 2003 to 2006 the C D O market, mainly referencing

To make things worse, many investors had leveraged the super
residential m ortgages, had exploded and increased from USD
senior tranches, term ed LSS (leveraged super-senior tranche)
64 billion to USD 455 billion. To fuel the C D O s, more and more
to receive a higher spread. This leverage was typically 10 or 20
questionable subprim e m ortgages were given, named N IN JA
tim es, meaning an investor paid USD 10,000,000 but had risk
loans, standing for "no income, no job or assets". W hen housing
exposure of USD 100,000,000 or USD 200,000,000. W hat made
prices started levelling off in 2006, the first m ortgages started
things technically even w orse, was that these LSSs came with an
to default. In 2007 more and more m ortgages defaulted, finally
option for the investors to unwind the super-senior tranche if the
leading to a real-estate market collapse. With it the huge C D O
spread had widened (increased). So many investors started to
market collapsed, leading to the stock market and commodity
sell the LSS at low prices, realising a loss and increasing the LSS
market crash and a freeze in the credit markets. The financial
tranche spread even further.
crisis spread to the world econom ies, creating a global severe
recession now called the "great recession". In addition to the overinvestm ent in C D O s, the C D S m arket also
exploded from its beginnings in the mid-1990s from about USD
In a system ic crash like this, naturally many types of correla
8 trillion in 2004 to alm ost USD 60 trillion in 2007. CD Ss are typi
tions increase; see also Figure 7.8. From 2007 to 2009, default
cally used as insurance to protect against default of a debtor, as
correlations between the m ortgages in the C D O s increased.
we discussed in Figure 7.1. No one will argue that an insurance
This actually helped equity tranche investors, as we can see
contract is toxic. On the contrary, it is the principle of insurance
from Figure 7.7. If default correlations between the assets in
to spread the risk to a w ider audience and hence reduce indi
the C D O increase, the equity tranche spread decreases, lead
vidual risk, as we can see from health insurance or life insurance
ing to an increase in the value of the equity tranche. However,
contracts.
this increase was overcom pensated by a strong increase in
default probability of the m ortgages; as a consequence, tranche C D Ss, though, can also be used as a speculative instrument. For
spreads increased sharply, resulting in a huge loss of the equity exam ple, the C D S seller (ie, the insurance seller) hopes that the
tranche investors as well investors in the other tranches. insured event (eg, default or credit deterioration of the com
pany) will not occur. In this case, the C D S seller keeps the CD S
Correlations between the tranches of the C D O s also increased
spread (ie, the insurance premium), as income as A IG tried to do
during the crisis. This had a devastating effect on the super
in the crisis. A C D S buyer, when they do not own the underly
senior tranches. In normal tim es, these tranches were consid
ing asset, speculates on the credit deterioration of the underly
ered extrem ely safe since (a) there were A AA -rated and (b) they
ing asset, just like a naked put option holder speculates on the
were protected by the lower tranches. But, with the increased
decline of the underlying asset.
tranche correlation and the generally deteriorating credit mar
ket, these super-senior tranches were suddenly considered risky So who can we blame for the 2007-09 global financial crises? The
and lost up to 20% of their value. quants, who created the new products such as CDSs and C D O s

and the models to value them? The upper management and the Regulation and Correlation
traders, who authorised and conducted the overinvesting and
extrem e risk-taking? The rating agencies, who gave an A A A rat Correlations are critical inputs in regulatory fram eworks such as
ing to many C D O s? The regulators, who approved the overinvest the Basel accords, especially in regulations for m arket risk and
ments? The risk managers, who allowed the excessive risk taking? credit risk. We will discuss the correlation approaches of the
Basel accords in this book. First, let's clarify.
The answer is: All of them . The whole global financial crisis can
be summed up in one word: greed! It was the upper m anage
What are Basel \, II and III?
ment, the traders and investors who engaged in excessive trad
ing and irresponsible risk taking to receive high returns, huge Basel I, im plem ented in 1988, Basel II, im plem ented in 2006,
salaries and generous bonuses. And most risk m anagers and and Basel III, which is currently being developed and im ple
regulators turned a blind eye. mented until 2019, are regulatory guidelines to ensure the sta
bility of the banking system.
For exam ple, the London unit of the insurance company AIG had
sold close to USD 500 billion in CD Ss without much reinsurance! The term Basel com es from the beautiful city of Basel in Switzer
Their main hedging strategy seem ed to have been: pray that the land, where the honourable regulators m eet. None of the Basel
insured contracts don't deteriorate. The investment banks of Ice accords has legal authority. However, most countries (about 100
land, a small country in Northern Europe, had borrowed 10 times for Basel II) have created legislation to enforce the Basel accords
Iceland's national G D P and invested it. With this leverage, Iceland for their banks.
naturally went de facto into bankruptcy in 2008, when the credit
markets deteriorated. Lehman Brothers, before filing for bank Why Basel!, II and III?
ruptcy in Septem ber 2008, reported a leverage of 30.7, ie, USD The objective of the Basel accords is to provide incentives
691 billion in assets and only USD 22 billion in stockholders' for banks to enhance their risk m easurem ent and m anage
equity. The true leverage was even higher, since Lehman tried to m ent system s and to contribute to a higher level of safety
hide their leverage with materially misleading repo transactions.13 and soundness in the banking system . In particular, Basel III
In addition, Lehman had 1.5 million derivatives transactions with addresses the deficiencies of the banking system during the
8,000 different counterparties on their books. financial crisis 2007 to 2009. Basel III introduces many new
Did the upper m anagem ent and traders of hedge funds and ratios to ensure liquidity and adequate leverage of banks. In
investm ent banks adm it to their irresponsible leverage, exces addition, new correlation m odels are im plem ented that deal
sive trading and risk taking? No. Instead they created the myth with double defaults in insured risk transactions as displayed
of the "to xic asset", which is absurd. It is like a m urderer saying: in Figure 7 .1 . C o rrelated defaults in a m ulti-asset portfolio
"I did not shoot that person - it was my gun!" Toxic are not the quantified with the Gaussian copula, correlations in derivatives
financial products, but humans and their greed. transactions term ed credit value adjustm ent (CVA) and cor
relations in w hat is called "w rong-w ay risk" (W W R) have been
Most traders were well aware of the risks that they were taking.
proposed.
In the few cases where traders did not understand the risks,
the asset itself cannot be blam ed, rather the incom petence of
the trader is the reason for the loss. W hile it is ethically disap
pointing that the investors and traders did not adm it to their 7.6 HOW D O ES CO RRELA TIO N RISK
wrongdoing, at the same tim e it is understandable. If they would FIT INTO TH E B R O A D ER PICTURE O F
adm it to irresponsible trading and risk taking, they would im m e RISKS IN F IN A N C E ?
diately be prosecuted.
Naturally risk m anagers and regulators have to take part of the As already m entioned, we differentiate three main types of risks
blame to allow the irresponsible risk taking. The moral hazard of in finance: m arket risk, credit risk and operational risk. A d d i
the rating agencies, being paid by the same com panies whose tional types of risk may include system ic risk, concentration risk,
assets they rate, needs to also be addressed. liquidity risk, volatility risk, legal risk, reputational risk and more.
Correlation risk plays an im portant part in m arket risk and credit
risk and is closely related to system ic risk and concentration risk.
13 Repo stands for repurchase transaction. It can be viewed as a short
term collateralised loan. Let's discuss it.
Correlation Risk and Market Risk exam ple, the number 3.8% in the upper left corner means that,
if a certain bond in the auto industry defaulted, there is a 3.8%
Correlation risk is an integral part of m arket risk. M arket risk probability that another bond in the auto industry will default.
com prises equity risk, interest-rate risk, currency risk and com The number —2.5% in the column named "Fin " in the fourth row
modity risk. M arket risk is typically measured with the VaR means that, if a bond in the energy sector defaulted, this actu
concept. Since VaR has a covariance m atrix of the assets in the ally decreases the probability that a bond in the financial sector
portfolio as an input, VaR im plicitly incorporates correlation risk, defaults by 2.5% and vice versa.
ie, the risk that the correlations in the covariance m atrix change.
From Table 7.3 we also observe that default correlations
We have already studied the im pact of different correlations on
between industries are mostly positive, with the exception of
VaR in "Risk m anagem ent and correlation" above.
the energy sector. This sector is typically viewed as a recession-
M arket risk is also quantified with expected shortfall (ES), also resistant, stable sector with no or low correlation to other sec
term ed "conditional VaR" or "tail risk". Expected shortfall m ea tors. W e also observe that the default correlation within sectors
sures m arket risk for extrem e events, typically for the worst is higher than between sectors. This suggests that system atic
0.1% , 1% or 5% of possible future scenarios. A rigorous valua factors (such as a recession or structural weakness as the general
tion of expected shortfall naturally includes the correlation decline of a sector) im pact on defaults more than idiosyncratic
between the asset returns in the portfolio, as VaR d o e s.14 factors. Flence if General Motors defaults, it is more likely that
Ford defaults, rather than Ford benefiting from the default of its
Correlation Risk and Credit Risk rival GM .
Since the intra-sector default correlations are higher than inter

Correlation risk is also a critical part of credit risk. Credit risk
sector default correlations, a lender is advised to have a sector-
com prises (a) migration risk and (b) default risk. Migration risk
diversified loan portfolio to reduce default correlation risk.
is the risk that the credit quality of a debtor decreases, ie,
m igrates to a lower credit state. A lower credit state typically Defaults are binomial events: either default or no default.
results in a lower asset price, so a paper loss for the creditor Therefore, to model defaults, often a simple binomial model is
occurs. We already studied the effect of correlation risk of an applied. Flowever, we can also analyse defaults in more detail
investor, who has hedged their bond exposure with a C D S ear and look at term structure of defaults. Let's assume a creditor
lier in the section titled, "W hat is financial correlation risk?". We has given loans to two debtors. One debtor is A-rated and one
derived that the investor is exposed to changes in the correla is CC-rated. A historical default term structure of these bonds is
tion between the reference asset and the counterparty, ie, the displayed in Table 7.4.
C D S seller. The higher the default correlation, the higher is the To clarify, the number 0.15% in the column corresponding to
C D S paper loss for the investor and, im portantly, the higher is the fifth year and second row means that an A-rated bond has
the probability of a total loss of their investm ent. a 0.15% probability to default in year 5. For most investment-
The degree to which defaults occur together (ie, default correla grade bonds, the term structure of default probabilities
tion) is critical for financial lenders such as commercial banks, increases in tim e, as we see from Table 7.4 for the A-rated bond.
credit unions, m ortgage lenders and trusts, which give many This is because the longer the tim e horizon, the higher the prob
types of loans to com panies and individuals. Default correlations ability of adverse internal events as m ism anagem ent, or external
are also critical for insurance com panies, which are exposed to events as increased com petition or a recession. For bonds in
credit risk of numerous debtors. Naturally, a low default correla distress, however, the default term structure is typically inverse,
tion of debtors is desired to diversify the credit risk. Table 7.3 as seen for the CC-rated bond in Table 7.4. This is because for a
shows the default correlation from 1981 to 2001 of 6,907 com distressed com pany, the im m ediate future is critical. If the com
panies, of which 674 defaulted. pany survives the coming problem atic years, the probability of
default decreases.
The default correlations in Table 7.3 are one-year default cor
relations averaged over the time period 1981 to 2001. For For a creditor, the default correlation of his debtors is critical. As
m entioned, a creditor will benefit from a low default correlation
of their debtors, which spreads the default correlation risk. We
14 See the original ES paper by Artzner (1997), an educational paper
by Yamai and Yoshia (2002), as well as Acerbi and Tasche (2001), and can correlate the default term structures in Table 7.4 with the
McNeil et a/ (2005). fam ous (now infamous) copula m odel, which will be discussed in

oo
Table 7.3 Default Correlation of 674 Defaulted Companies by Industry
Auto Cons Ener Fin Build Chem HiTec Insur Leis Tele Trans Util
Auto 3.8% 1.3% 1.2% 0.4% 1.1% 1.6% 2.8% - 0 .5 % 1.0% 3.9% 1.3% 0.5%
Cons 1.3% 2.8% - 1 .4 % 1.2% 2.8% 1.6% 1.8% 1.1% 1.3% 3.2% 2.7% 1.9%
Ener 1.2% - 1 .4 % 6.4% - 2 .5 % - 0 .5 % 0.4% - 0 .1 % - 1 .6 % - 1 .0 % - 1 .4 % - 0 .1 % 0.7%
Fin 0.4% 1.2% - 2 .5 % 5.2% 2.6% 0.1 % 0.4% 3.0% 1.6% 3.7% 1.5% 4.5%
Build 1.1% 2.8% - 0 .5 % 2.6% 6.1% 1.2% 2.3% 1.8% 2.3% 6.5% 4.2% 1.3%
Chem 1.6% 1.6% 0.4% 0.1% 1.2% 3.2% 1.4% - 1 .1 % 1.1% 2.8% 1.1% 1.0%
HiTec 2.8% 1.8% - 0 .1 % 0.4% 2.3% 1.4% 3.3% 0.0% 1.4% 4.7% 1.9% 1.0%
Insur - 0 .5 % 1.1% - 1 .6 % 3.0% 1.8% - 1 .1 % 0.0% 5.6% 1.2% - 2 .6 % 2.3% 1.4%
Leis 1.0% 1.3% - 1 .0 % 1.6% 2.3% 1.1% 1.4% 1.2% 2.3% 4.0% 2.3% 0.6%
Tele 3.9% 3.2% - 1 .4 % 3.7% 6.5% 2.8% 4.7% - 2 .6 % 4.0% 10.7% 3.2% 0.8%
Trans 1.3% 2.7% - 0 .1 % 1.5% 4.2% 1.1% 1.9% 2.3% 2.3% 3.2% 4.3% 0.2%
Util 0.5% 1.9% 0.7% 4.5% 1.3% 1.0% 1.0% 1.4% 0.6% - 0 .8 % - 0 .2 % 9.4%
Correlations above 5% are in bold.
Note: One year US default correlations - non - investment grade bonds 1981-2001.
Table 7.4 Term Structure of Default Probabilities for an A-rated Bond and a CC-Rated Bond in 2002
Year
Financial Risk Manager Exam Part II: Market Risk Measurement and Management
1 2 3 4 5 6 7 8 9 10
A 0.02% 0.07% 0.13% 0.14% 0.15% 0.17% 0.18% 0.21% 0.24% 0.25%
CC 23.83% 13.29% 10.31% 7.62% 5.04% 5.13% 4.04% 4.62% 2.62% 2.04%
Source: Moody's.
C hapter 7. This will allow us to answer questions as "W hat is the From this list we can see that the consum er staples sector (which
joint probability of D ebtor 1 defaulting in Year 3 and Debtor 2 provides basic necessities as food and basic household items)
defaulting in Year 5?" fared well during the crisis. The educational sector also typically
thrives in a crisis, since many unemployed seek to further their
"Correlations always increase in stressed m arkets"
education.
John Hull
Im portantly, system ic financial failures such as the one from
2007 to 2009 typically spread to the econom y with a decreasing
7.7 CORRELATION RISK AND GDP, increasing unem ploym ent and, therefore, a decrease in the
SYSTEM IC RISK standard of living.
System ic risk and correlation risk are highly dependent. Since a

So far, we have analysed correlation risk with respect to market system ic decline in stocks involves alm ost the entire stock mar
risk and credit risk and have concluded that correlations are a criti ket, correlations between the stocks increase sharply. Figure 7.8
cal input when quantifying market risk and credit risk. Correlations shows the relationship between the percentage change of the
are also closely related to systemic risk, which we define as the risk Dow and the correlation between the stocks in the Dow before
that a financial market or an entire financial system collapses. the crisis from May 2004 to O ctober 2007 and during the crisis
An exam ple of system ic risk is the collapse of the entire credit from O ctober 2007 to March 2009.
m arket in 2008. A t the height of the crisis in Septem ber 2008, In Figure 7.8 we downloaded daily closing prices of all 30 stocks
when Lehman Brothers filed for bankruptcy, the credit markets in the Dow and put them into monthly bins. We then derived
were virtually frozen with essentially no lending activities. Even monthly 30 X 30 correlation matrices using the Pearson correla
as the Federal Reserve guaranteed interbank loans, lending tion measure and averaged the m atrices. We then sm oothed the
resumed only very gradually and slowly. graph by taking the one-year moving average.
The stock m arket crash starting in O ctober 2007, with the Dow From Figure 7.8 we can observe a som ewhat stable correla
(Dow Jo n es Industrial Average) at 14,093 points and then fall tion from 2004 to 2006, when the Dow increased m oderately.
ing by 53.54% to 6,547 points by March 2009, is also a system ic In the tim e period from January 2007 to February 2008 we
m arket collapse. All but one of the Dow 30 stocks had declined. observe that the correlation in the Dow increases when the
W alm art was the lone stock, which was up during the crisis. O f Dow increases more strongly. Im portantly, in the time of the
the S&P 500 stocks, 489 declined during this tim efram e. The 11 severe decline of the Dow from August 2008 to March 2009
stocks that were up w ere: we observe a sharp increase in the correlation from non-crisis
• Apollo Group (A PO L), educational sector; provides educa levels of an average 27% to over 50%. In Chapter 8, we will
tional program m es for working adults and is a subsidiary of observe em pirical correlations in detail and we will find that, at
the University of Phoenix; the height of the crisis in February 2009, the correlation of the
stocks in the Dow reached a high of 96.97% . Hence, portfolios
• Autozone (AZ0), auto industry; provides auto replacem ent
that were considered well diversified in benign tim es exp eri
parts;
enced a sharp increase in correlation and hence unexpected
• C F Industries (CF), agricultural industry; provides fertiliser; losses due to the com bined, highly correlated decline of many
• DeVry Inc. (DV), educational sector; holding com pany of stocks during the crisis.
several universities;
• Edward Lifesciences (EW ), pharm aceutical-industry; provides
products to treat cardiovascular diseases; 7.8 CORRELATION RISK AND
• Fam ily Dollar (FD O ), consum er staples; CONCENTRATION RISK
• G ilead Pharm aceuticals (G ILD ), pharm aceutical industry; pro
Concentration risk is a fairly new risk category and therefore not
vides HIV, hepatitis m edication;
yet uniquely defined. A sensible definition is the risk of finan
• N etflix (N FLX), entertainm ent industry; provides Internet sub
cial loss due to a concentrated exposure to a specific group of
scription service;
counterparties.
• Ross Stores (RO ST), consum er staples;
Concentration risk can be quantified with the concentration
• Southwestern Energy (SW N), energy sector; and ratio. For exam ple, if a creditor has 10 loans of equal size,
• W alm art (W M T), consum er staples. the concentration ratio would be 1/10 = 0.1. If a creditor has

only one loan to one counterparty, the concentration ratio
would be 1. Naturally, the lower the concentration ratio, the
more diversified is the default risk of the creditor, assuming
the default correlation between the counterparties is sm aller
than 1.
We can also categorise counterparties into groups - for exam

ple, sectors. We can then analyse sector concentration risk. The
higher the number of different sectors a creditor has lent to,
the higher is their sector diversification. High sector diversifica Fiqure 7.10 Probability space for loans to companies
tion reduces default risk, since intra-sector defaults are higher X and Y.
correlated than counterparties in different sectors, as seen in
Table 7.3. Case (c) If we further decrease the concentration ratio, the
Naturally, concentration and correlation risk are closely related. worst-case scenario, ie, the expected loss of 10% decreases
Let's verify this in an exam ple. further. Let's assume the lender C gives loans to three com pa
nies X, Y and Z, of USD 3.33 million each. The default prob
ability of X, Y and Z is 10% each. Therefore, the concentration
Example 7.3
ratio decreases to a third. The probabilities are displayed in
Case (a) The commercial bank C has lent USD 10,000,000 to a Figure 7.11.
single com pany W. So C's concentration ratio is 1. Com pany
Hence, from Figures 7.9 to 7.11 we observe the benefits of a
W has a default probability Pw of 10%. Hence the expected
lower concentration ratio. The worst-case scenario, an expected
loss (EL) for bank C is USD 10,000,000 X 0.1 = USD 1,000,000.
loss of USD 1,000,000, reduces with a decreasing concentration
G raphically, we have Figure 7.9.
ratio.
Case (b) The commercial bank C has lent USD 5,000,000 to
A decreasing concentration ratio is closely related to a decreas
com pany X and USD 5,000,000 to com pany Y. Both X and Y
ing correlation coefficient. Let's show this. The defaults of com
have a 10% default probability. So C's concentration ratio is
panies X and Y are expressed as two binomial variables that take
reduced to 1/2.
the value 1 if in default, and 0 otherwise. Equation (7.11) gives
If the default correlation between X and Y is bigger than 0 and the joint probability of default for the two binomial events:
sm aller than 1, we derive that the worst-case scenario, ie, the
P ( x n Y) = P x y V P x d - px) Py (1 - Py) + PyPy (7.11)
default of X and Y, P (X D Y ), with a loss of USD 1,000,000 is
reduced, as seen in Figure 7.10. where px y is the correlation coefficient and
The exact joint default probability depends on the correlation V P ^ 1 - Px) (7.12)
model and correlation param eter values, which will be discussed
is the standard deviation of the binomially distributed variable X.
in Chapters 4 to 8. For any model, though, if default correla
tion between X and Y is 1, then there is no benefit from the Let's assume again that the lender C has given loans to X and Y
lower concentration ratio. The probability space would be as in of USD 5,000,000 each. Both X and Y have a default probability
Figure 7.9. of 10%. Following equation (7.12), this means that the standard
deviation for X and Y is \ / 0 .1 x (1 — 0.1) = 0.3.
Let's first look at the case where the default correlation is

P x y = 1• This means that X and Y cannot default individu
ally. They can only default together or survive together. The
probability that they default together is 10%. Hence the
expected loss is the same as in case a) EL = (USD 5,000,000 +
USD 5,000,000) X 0.1 = USD 1,000,000. W e can verify this with
equation (7.11) for the joint probability of two binomial events,
p(xn>o = 1 x V o . i d - o .i) x o .id - o.D + o .i x o .i = 10%.
Fiaure 7.9 Probability space for the default The probability space is graphically the same as Figure 7.9 with
probability of a single loan to W. Px = Py = 10% as the probability event.
correlation trading. In trading practice, the term "correlation" is
typically applied quite broadly, referring to any co-m ovem ent of
asset prices in tim e.
However, in financial theory, especially in recent publications,

the term "correlation" is often defined more narrowly, referring
only to the linear Pearson correlation model, as in Cherubini et
al (2004), Nelsen (2006) and G regory (2010). These authors refer
to other than Pearson correlation coefficients as dependence
Fiqure 7.11 Probability space for loans to companies measures or measures of association. However, in financial
X, Y and Z. theory the term "correlation" is also often applied to generally
describe dependencies, as in the term s "cred it correlation",
"default correlation" and "volatility-asset return correlation",
If we now decrease the correlation coefficient, we can see
which are quantified by non-Pearson models as in Heston
from equation (7.11) that the worst-case scenario, the joint
(1993), Lucas (1995) and Li (2000).
default probability of X and Y, P (X H Y ), will decrease. For
exam ple, pxy = 0.5 results in P (X n Y ) = 5.5% , pxv = 0 results In this book, we will refer to the Pearson coefficient as correla
in P (X n V ) = 1 % . Interestingly, even a slightly negative correla tion coefficient and the coefficients derived by non-Pearson
tion coefficient can result in a positive joint default probability models as dependency coefficients. In accordance with most
if the standard deviation of the binomial events is fairly low and literature, we will refer to all m ethodologies that measure some
the default probabilities are high. In our exam ple, the standard form of dependency as correlation models or dependency
deviation of both entities is 30% and a default probability of models.
both entities is 10%. Together with a negative correlation coef
ficient of —0.1, following equation (7.11) leads to a joint default
probability of 0.1% . SUMMARY
In conclusion, we have shown the beneficial asp ect of a lower There are two types of financial correlations: (1) static correla
concentration ratio that is closely related to a low er co rre la tions, which measure how two or more financial assets are
tion co efficien t. In particular, both a low er concentration ratio associated within a certain tim e period, for exam ple a year; (2)
and a low er correlation co efficien t reduce the w orst-case dynam ic financial correlations, which measure how two or more
scenario for a creditor, the jo in t probability of d efau lt of his financial assets move together in tim e.
d eb to rs.
Correlation risk can be defined as the risk of financial loss due
We will verify this result and find that a higher (copula) cor to adverse m ovem ents in correlation betw een two or more
relation between assets results in a higher credit value-at-risk variables. These variables can be financial variables such as
(CVaR). CVaR measures the maximum loss of a portfolio of cor correlated defaults betw een two debtors or nonfinancial such
related debt with a certain probability for a certain tim efram e. as the correlation betw een political tensions and an exchange
Hence CVaR m easures correlated default risk and is analogous rate. Correlation risk can be non-m onotonic, m eaning that the
to the VaR concept for correlated m arket risk, which we dis dependent variab le, for exam ple the C D S spread, can increase
cussed earlier. or decrease when the correlation param eter value increases.
Correlations and correlation risk are critical in many areas in

finance such as investm ents, trading and especially risk man
7.9 A WORD ON TERM INOLOGY
agem ent, where different correlations result in very different
degrees of risk. Correlations also play a key role in a system ic
As m entioned in the section "Trading and correlation" above,
crisis, where correlations typically increase and can lead to high
we find the term s "correlation desks" and "correlation trading"
unexpected losses. A s a result, the Basel III accord has intro
in trading practice. Correlation trading means that traders trade
duced several correlation concepts and measures to reduce cor
assets or execute trading strategies, whose value is at least in
relation risk.
part determ ined by the co-m ovem ent of two or more assets
in tim e. We already m entioned the strategy "pairs trading ", Correlation risk can be categorised as its own type of risk. How
the exchange option and the quanto option as exam ples of ever, correlation param eters and correlation m atrices are critical

inputs and hence a part of market risk and credit risk. M arket Example A1: Statistical Independence:
risk and credit risk are highly sensitive to changing correlations.
Correlation risk is also closely related to concentration risk, as The historical default probability of com pany A , P(A) = 3%, the
well as system ic risk, since correlations typically increase in a historical default probability of com pany B, P(B) = 4%, and the
historical joint probability of default is 3% X 4% = 0.12% . In
system ic crisis.
this case P(A) and P(B) are independent. This is because, from
The term "correlation" is not uniquely defined. In trading prac equation (A2), we have
tice "correlation" is applied quite broadly and refers to the
P (A P lB ) 3% X 4%
co-movements of assets in time, which may be measured by P(A | B) = 3% =
P(B) 4%
different correlation concepts. In financial theory, the term "cor
relation" is often defined more narrowly, referring only to the Since P(A) = P(A| B), the event A is independent from the event
linear Pearson correlation coefficient. Non-Pearson correlation B. Using equation (A3), we can do the same exercise for event
measures are term ed "dependence measures" or "m easures of B, which is independent from event A .
association".
Correlation
A P P EN D IX A1 As m entioned in the section on term inology above, the term
"correlation" is not uniquely defined. In trading practice,
Dependence and Correlation the term "correlation" is used quite broadly, referring to any
Dependence co-m ovem ent of asset prices in tim e. In statistics, correlation is
typically defined more narrowly and typically referred to as the
In statistics, two events are considered dependent if the occur linear dependency derived in the Pearson correlation model.
rence of one affects the probability of another. Conversely, two Let's look at the Pearson covariance and relate it to the depen
events are considered independent if the occurrence of one dence discussed above.
does not affect the probability of another. Form ally, two events
A covariance measures how strong the linear relationship
A and B are independent if and only if the joint probability
between two variables is. These variables can be determ inistic
equals the product of the individual probabilities:
(which means their outcom e is known), as the historical default
P (A D B ) = P( A) P( B) (A1) probabilities in exam ple Al above. For random variables (vari
Solving equation (A1) for P(A), we get ables with an unknown outcom e such as flipping a coin), the
Pearson covariance is derived with expectation values:
P(A p |B)
P(B) CO V(X, Y) = E[(X - E(X))(Y - E(Y))j = E(XY) - E(X)E(Y) (A4)
Following the Kolm ogorov definition where E(X) and E(Y) are the expected values of X and Y respec
tively, also known as the mean. E(XY) is the expected value of
P(A P)B)
- P(A | B) the product of the random variables X and Y. The covariance in
P(B)
equation (A4) is not easy to interpret. Therefore, often a nor
we derive malised covariance, the correlation coefficient is used. The Pear
P(A P)B) son correlation coefficient p(XY) is defined as
= P(A | B)
P(B) CO V(X,Y)
p(X,Y) =
where P(A| B) is the conditional probability of A with respect to cr(X)a(Y)
B. P(A| B) reads "probability of A given B ". In equation (A2) the where <r(X) and a{Y) are the standard deviations of X and Y
probability of A , P(A), is not affected by B, since P(A) = P(A| B), respectively. W hile the covariance takes value between —°c
hence the event A is independent from B. and +oo, the correlation coefficient conveniently takes values
From equation (A2), we also derive between —1 and +1.
P (A P lB )
P(A)
= P(B| A)
Independence and Uncorrelatedness
Hence from equation (A1) it follows that A is independent from From equation (A1) above we find that the condition for inde
B and B is independent from A . pendence for two random variables is E(XY) = E(X) E(Y). From
equation (A4) we see that E(XY) = E(X) E(Y) is equal to a covari This is a good approximation for small differences between S t
ance of zero. Therefore, if two variables are independent, their and St_-|. Ln(St/St_ i) is called a log-return. The advantage of
covariance is zero. using log-returns is that they can be added over tim e. Relative
changes are not additive over tim e. Let's show this in two
Is the reverse also true? Does a zero covariance mean indepen
exam ples.
dence? The answer is no. Two variables can have a zero covari
ance even when they are dependent! Let's show this with an
exam ple. For the parabola Y = X 2, Y is clearly dependent on X, Example 1
since Y changes when X changes. However, the correlation of A stock price at t0 is USD 100. From t0 to t1f the stock increases
the function Y = X 2 derived by equations (A4) or (A5) is zero!
by 10%. Hence the stock increases to USD 110. From t, to t2,
This can be shown numerically and algebraically. For a numeri
the stock increases again by 10%. So the stock price increases
cal derivation, see the sim ple spreadsheet w w w .dersoft.com / to USD 110 X 0.1 = USD 121. This increase of 21% higher than
dependenceandcorrelation.xlsm , sheet 1. A lgebraically, we have
adding the percentage increases of 10% + 10% = 20% . Hence
from equation (A4):
percentage changes are not additive over tim e.
CO V(X, Y) = E(XY) - E(X)E(Y) Let's look at the log-returns. The log-return from t0 to t| is
Inputting Y = X 2, we derive ln(110/100) = 9.531% . From t| to t2 the log-return is
ln(121/110) = 9.531% . W hen adding these returns, we get
= E(X X 2) - E(X) E(X2)
9.531% + 9.531% = 19.062% . This is the same as the log-return
= E(X3) - E(X) E(X2) from to to t2, ie, ln(121/100) = 19.062% . Hence log-returns are
Let X be a uniform variable bounded in [—1, +1]. Then the mean additive in tim e .15
E(X) and E(X3) are zero and we have
Let's now look at another, more extrem e exam ple.
= 0 - 0 E(X2)
= 0 Example 2
For a numerical exam ple, see www.dersoft.com /dependenceand- A stock price in t0 is USD 100. It moves to USD 200 in t|
correlation.xlsm , sheet 2. and back to USD 100 in t2. The percentage change from
In conclusion, the Pearson covariance or correlation coefficient t0 to t| is (USD 200 - USD 100)/USD 100 = 100%. The
can give values of zero, ie, tells us the variables are uncorre percentage change from t-| to t2 is (USD 100 — USD 200)/
lated, even if the variables are dependent! This is because the (USD 200) = —50%. Adding the percentage changes, we derive
Pearson correlation concept measures only linear dependence. + 100% — 50% = +50% , although the stock has not increased
It fails to capture nonlinear relationships. This shows the limita from to to t2! Naturally this type of perform ance measure is
tion of the Pearson correlation concept for finance, since most incorrect and not allowed in accounting.
financial relationships are nonlinear. See Chapter 3 for a more Log-returns give the correct answer: the log-return from to
detailed discussion on the Pearson correlation model. to t| is ln(200/100) = 69.31% . The log-return from t| to t2 is
ln(100/200) = —69.31% . Adding these log-returns in tim e,
we get the correct return of the stock price from t0 to t2 of
A P P EN D IX A2
69.31% - 69.31% = 0%.
On Percentage and Logarithmic Changes These exam ples are displayed in a simple spreadsheet at
w w w .dersoft.com /logreturns.xlsx.
In finance, growth rates are expressed as relative changes,
(St — St_ i)/S t_-|, where S t and St_-| are the prices of an asset at time
tan d t —1, respectively. For exam ple, if S t = 110, and St_i = 100,
the relative change is (110 — 100)/100 = 0.1 = 10%.
We often approxim ate relative changes with the help of the 15 We could have also solved for the absolute value 121, which
natural logarithm: matches a logarithmic growth rate of 9.531 %: ln(x/110) = 9.531 %,
or, ln(x) — ln(110) = 9.531%, or, ln(x) = ln(110) + 9.531%. Taking the
(St - S m J/S m - ln(S/SH ) (A6) power of e we get, e(ln(x)) = X = e(ln(110)+0 09531) = 121.

The following questions are i to help candidates understand the material. They are not actual FRM exam questions.
QUESTIONS
7.1 W hat two types of financial correlations exist? 7.14 In the global financial crisis 2007-09, many investors in
7.2 W hat is "wrong-way correlation risk" or for short "w rong the presum ably safe super-senior tranches got hurt. W hat
exactly happened?
way risk"?
7.3 Correlations can be non-monotonous. What does this mean? 7.15 W hat is the main objective of the Basel III accord?
7.4 Correlations are critical in many areas in finance. Name five. 7.16 The Basel accords have no legal authority. So why do
most developed countries im plem ent them ?
7.5 High diversification is related with low correlation. W hy is
7.17 How is correlation risk related to m arket risk and credit
this considered one of the few "free lunches" in finance?
risk?
7.6 Create a numerical exam ple and show why a lower
correlation results in a higher return/risk ratio. 7.18 How is correlation risk related to system ic risk and
concentration risk?
7.7 W hat is "correlation trading "?
7.19 How can we measure the joint probability of occurrence
7.8 W hat is "pairs trading "?
of a binomial event as default or no-default?
7.9 Name three correlation options, in which a lower
7.20 Can it be that two binomial events are negatively
correlation results in a higher option price.
correlated but they have a positive probability of joint
7.10 Name one correlation option where a lower correlation default?
results in a lower option price.
7.21 W hat is value-at-risk (VaR) and credit value-at-risk (CVaR)?
7.11 Create a numerical exam ple of a two-asset portfolio and How are they related?
show that lower correlation coefficient leads to a lower
7.22 Correlation risk is quite broadly defined in trading
VaR number.
practice, referring to any co-m ovem ent of assets in tim e.
7.12 W hy do correlations typically increase in a system ic How is the term "correlation" defined in statistics?
market crash?
7.23 W hat do the term s "m easure of association" and
7.13 In 2005, a correlation crisis with respect to C D O s "m easure of dependence" refer to in statistics?
occurred that led to the default of several hedge funds.
W hat happened?
Empirical Properties
of Correlation: How
Correlations
Behave in the Real
World?
Learning Objectives
• Describe how equity correlations and correlation volatili- • Identify the best-fit distribution for equity, bond, and
ties behave throughout various econom ic states. default correlations.
Calculate a mean reversion rate using standard regression

and calculate the corresponding autocorrelation.
E x c e rp t is C hapter 2 o f Correlation Risk Modeling and M anagem ent, 2nd Edition, by G unter M eissner.
125
The composition of the Dow is changing in tim e, with successful
"Anything that relies on correlation, is charlatanism " stocks being input into the Dow and unsuccessful stocks being
rem oved. O ur study com prises the Dow stocks that represent
— Nassim Taleb
the Dow at each particular point in tim e.
Figure 8.1 shows the 534 monthly averaged correlation levels:

we created monthly 30 by 30 bins of the Dow stock returns from
1972 to 2017, derived the Pearson correlation between each
In this chapter we show that, contrary to common beliefs, finan
Dow stock returns, eliminated the unit correlation on the diago
cial correlations display statistically significant and expected
nal and averaged the remaining correlation values. We then dif
properties.
ferentiated the three states: an expansionary period with G D P
(gross dom estic product) growth rates of 3.5% or higher, a normal
economic period with growth rates between 0% and 3.49% and a
8.1 HOW DO EQUITY CORRELATIONS recession with two consecutive quarters of negative growth rates.
BEHAVE IN A RECESSION, NORMAL
Figure 8.2 shows the volatility of the averaged monthly correla
ECONOM IC PERIOD OR STRONG tions. For the calculation of volatility, see C hapter 7.
EXPANSION?
From Figures 8.1 and 8.2, we observe the som ewhat erratic
behaviour of Dow correlation levels and volatility. However,
In our study, we observed daily closing prices of the 30 stocks in
Table 8.1 reveals some expected results:
the Dow Jo n es Industrial A verage (Dow) from January 1972 to
Ju ly 2017. This resulted in 11,214 daily observations of the Dow From Table 8.1, we observe that correlation levels are lowest
stocks and hence 11,214 X 30 = 336,420 closing prices. We in strong econom ic growth tim es. The reason may be that in
built monthly bins and derived 900 correlation values (30 X 30) strong growth periods equity prices react primarily to idiosyn
for each month, applying the Pearson correlation approach. cratic, not to m acroeconom ic, factors. In recessions, correlation
Since we had 534 months in the study, altogether we derived levels are typically high as shown in Table 8.1. In addition, we
534 X 900 = 480,600 correlation values. had already displayed in C hapter 7, Figure 7.8, that correlation
*— *— L o v o r ^ o o o o ^ — c N o o ^ L o o r ^ c 3 o o o ^ — c N o o ^ L n o r ^
O O O O O O O O O O O O O O O ^ -^ -T -^ -T -T -T -T -
CN CO in o r^ 00 o o CN OO LO o r^ OO o o CN 00
o o o o o o o o o o o o o o o o o o o o o o o
r^ r^ r^ r^ r^ 00 00 OO OO OO OO OO 00 OO 00 o o o o o
o o o o o o o o o o O o o O o o o o o o o o
T— ^— *— T— *— *— ^— *— T— *— T- ^— *— ^- *— ^— *— T— T— *— ^— *— *— t- * - t- * - * - C \ ] C \ J C \ ] C \ J C \ J C \ J C N C \ J C \ J C \ ] C N C \ ] C N J C \ J C M C \ J C \ ] O sJ
oooooooooooooooooooooooooooooooooooooooooooooo c o r o f o r o f o r o r o n r o f o f o r o f o f o c o f o f o f o f o f o r o n r o
Expansion Normal Econ Period Recession
Fiaure 8.1 Average correlation of monthly 30 X 30 Dow stock return bins. The light grey background displays
an expansionary economic period, the medium gray background a normal economic period and the white
background represents a recession. The horizontal line shows the polynomial trendline of order 4.
.8 U.b
0 0
c\ioo^|-Ln'O r\cooo^— c M r o ^ tL n 'O r ^ c o o o ^ — cMoo^-LDsor^coo O ^ - L o o r ^ c o o ^ o ^ — cNco^tLOOr^
o o o o
CN CO
r ' ^ r s^ r ' ^ r ^ r ' ~ r ' ' r ' ~ r ^ O O O O O O O O O O O O O O O O O O O O O O O O
0 0 - 0 0 0 0
0 s 0s 0s 0s 0s 0s 0s O'* 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s O'' 0s 0s 0s 0s 0s 0s 0s 0s o o o o o o o o o o o o o o o o o o
C \ i C \ J C \ J C \ J C \ J C \ i C \ J C \ i C \ | C \ ] C \ J C \ J C \ i C \ | C \ | C \ J C \ i C N
oororororocororororororocororororororororoforororocororoncororororocororonrorororororonro
Expansion Normal Econ Period Recession

Figure 8.2 Correlation volatility of the average correlation of monthly 30 X 30 Dow stock return bins with
respect to the state of the economy. The horizontal line shows the polynomial trendline of order 4.
Table 8.1 Correlation Level and Correlation Volatility with Respect to the State of the Economy
Correlation Level Correlation Volatility
Expansionary period 27.46% 71.17%
Normal economic period 33.06% 83.06%
Recession 36.96% 80.48%
levels increased sharply in the great recession from

2007 to 2009. In a recession, m acroeconom ic fac
tors seem to dom inate idiosyncratic factors, lead
ing to a downturn across multiple stocks.
A further expected result in Table 8.1 is that correla & 0.35

tion volatility is lowest in an economic expansion and
higher in worse economic states. We did expect a
higher correlation volatility in a recession compared
with a normal economic state. However, it seems
that high correlation levels in a recession remain
high without much additional volatility. We will anal
yse whether the correlation volatility is an indicator
for future recessions below. Altogether, Table 8.1
displays the higher correlation risk in bad economic
times, which traders and risk managers should con 0 0.2 0.4 0.6 0.8 1
sider in their trading and risk management. Correlation leve
From Table 8.1, we observe a generally positive rela Figure 8.3 Positive relationship between correlation level and
tionship between correlation level and correlation correlation volatility with a polynomial trendline of order 2 (data
volatility. This is verified in more detail in Figure 8.3. from 1972 to 2017).
Chapter 8 Empirical Properties of Correlation: How Do Correlations Behave in the Real World? ■ 127
8.2 DO EQ U ITY CO R R ELA TIO N S where
EXH IBIT M EAN R EV ER S IO N ? St: Price at tim e t
St_-|: Price at the previous point in time t — 1

Mean reversion is the tendency of a variable to be pulled back
to its long-term mean. In finance, many variables, such as bonds, a: Degree of mean reversion, also called mean reversion rate
interest rates, volatilities, credit spreads and more, are assumed or gravity, 0 < a < 1
to exhibit mean reversion. Fixed coupon bonds, which do not fxs: Long term mean of S
default, exhibit strong mean reversion: a bond is typically issued
crs: Volatility of S
at par - for exam ple, at USD 100. If the bond does not default,
at maturity it will revert to exactly that price of USD 100, which e: Random drawing from a standardised normal distribu
is typically close to its long term mean. tion at tim e t, e(t) = n ~ (0, 1). We can com pute s as
= normsinv(rand()) in Excel/V BA and norminv(rand) in MAT-
Interest rates are also assumed to be m ean-reverting: in an
LA B. See w w w .dersoft.com /epsilon.xlsx for details.
econom ic expansion, typically, dem and for capital is high and
interest rates rise. These high interest rates will eventually lead W e are currently interested only in mean reversion, so for now
to cooling off of the econom y, possibly leading to a recession. In we will ignore the stochastic part in equation (8.2), ass V A t.
this process capital dem and decreases and interest rates decline
For ease of explanation, let's assume A t = 1. Then, from equa
from their high levels towards their long-term mean, eventually
tion (8.2), we see that a mean reversion param eter of a = 1
falling below it. Being in a recession, econom ic activity eventu
will pull St_i to the long-term mean /xs com pletely at every time
ally increases again, often supported by m onetary and fiscal
step, assuming St- i was below the mean. For exam ple if St_i
policy. In this reviving econom y, dem and for capital increases, in
is 80 and us is 100, then =1 X (100 — 80) = 20 so the St_-| of
turn increasing interest rates to their long term mean.
80 is "m ean-reverted up" to its long-term mean of 100 in one
tim e step. Naturally, a mean-reversion param eter "a " of 0.5 will
How Can We Quantify Mean Reversion? lead to a mean reversion of 50% at each tim e step, and a mean-
reversion param eter "a " of 0, will result in no mean reversion.
Mean reversion is present if there is a negative relationship
between the change of a variable, S t — St_ 1( and the variable at Let's now quantify mean reversion. Setting A t = 1, equation
t — 1, St_-|. Form ally, mean reversion exists if (8.2) without stochasticity reduces to
St St_-| a(/xs St_-|) (8.3)

( 8 . 1)
or
where
Sf St-i — aMs aSt-i (8.4)
S t: Price at time t
To find the mean reversion rate " a " , we can run a standard
St_-|: Price at the previous point in tim e t - 1
regression analysis of the form
Partial derivative coefficient
d:
y = a +px
Equation (8.1) tells us: If St_-| increases by a very small amount,
Following equation (8.4) we are regressing S t — St_i with
S t — S t_-1will decrease by a certain amount and vice versa. In
respect to St- i:
particular, if St_i has decreased (in the denom inator), then at the
St St_*i a/is aSt-1
----
next point in tim e t, mean reversion will "pull up" St_-| to St, and ----------
a (8.5)
px
therefore increasing S t — St_-|. Conversely, if St_ i has increased
(in the denom inator) and is high in t — 1, then at the next Importantly, from equation (8.5), we observe that the regression
point in tim e t, mean reversion will "pull down" St_ i to S t and coefficient (3 is equal to the negative mean-reversion parameter "a".
therefore decreasing 5t — St_-|. The degree of the "pull" is the W e now run a regression of equation (8.5) to find the empirical
degree of the mean reversion, also called mean reversion rate, mean reversion of our correlation data. Hence S represents the
mean reversion speed, or gravity. 30 X 30 Dow stock monthly average correlations from 1972 to
Let's quantify the degree of mean reversion. Let's start with the 2017. The regression analysis is displayed in Figure 8.4.
discrete Vasicek 1987 process, which goes back to O rnstein- The regression function in Figure 8.4 displays a strong mean
Uhlenbeck 1930: reversion of 79.03% . This means that, on average in every
S t — S t_-1 = a[/xs — St_-|)At + crs\ / rA t (8.2) month, a deviation from the long-term correlation mean (32.38%
0.8 Nobel prize-rewarded A RCH (Autoregres
sive Conditional Heterosce-dasticity) model of
0.6 Robert Engle (1982) or its extension G A RC H
(Generalized Autoregressive Conditional Het-
,i. 0.4
c eroscedasticity) by Tim Bollerslev (1988). How
O
ever, we can also regress the time series of a
03
JP 0.2
CD
variable to its past time series values to derive
0
o 0 autocorrelation. This is the approach we will
1
+-» take here.
c
O -0.2
5
_03 In finance, positive autocorrelation is also
CD
term ed "p ersisten ce". In mutual-fund or
o -0.4
u hedge-fund perform ance analysis, an inves
tor typically wants to know if an above-m arket
- 0.6
perform ance of a fund has persisted for some
- 0.8 tim e, ie, is positively correlated to its past strong
Correlation t-1 perform ance.
Regression function (2.5) for 534 monthly average
F ia u re 8 .4 Autocorrelation is the "reverse property" to
Dow stock return correlations from 1972 to 2017. mean reversion: the stronger the mean rever
sion, ie, the stronger a variable is pulled back
to its long-term mean, the lower is the autocorrelation, ie, the
in our study) is pulled back to that long-term mean by 79.03% .
lower is its correlation to its past values, and vice versa.
We can observe this strong mean reversion also by looking at
Figure 8.1. An upward spike in correlation is typically followed For our em pirical correlation analysis, we derive the autocorrela
by a sharp decline in the next tim e period, and vice versa. tion A C for a tim e lag of one period with the equation
Let's look at an exam ple of modelling correlation with mean

COV(pt,pt_1)
reversion. AC(pt, pt_-1) ( 8 . 6)
o-(ptM p t_i)
Exam ple 8.1: The long-term mean of the correlation data is
32.38% . In February 2017, the averaged correlation of the where
30 X 30 Dow correlation m atrices was 26.15% . From the regres A C : Autocorrelation
sion function from 1972 to 2017, we find that the average mean
pt: Correlation values for tim e period t (in our study, the
reversion is 79.03% . W hat is the expected correlation for March
monthly average of the 30 X 30 Dow stock return correlation
2017 following equation (8.3) or (8.4)?
matrices from 1972 to 2017, after eliminating the unity cor
Solving equation (8.3) for St, we have S t = a(us - St_-|) + St_-|. relation on the diagonal)
Hence the expected correlation in March is
pt_-\: Correlation values for time period t — 1 (ie, the monthly
St = 0.7903 X (0.3238 - 0.2615) + 0.2615 = 0.3107 correlation values starting and ending one month prior than
As a result, when applying equation (8.3) with the mean rever period t
sion rate of 79.03% , we exp ect the correlation in March 2017 to CO V: Covariance, see equation (1.3) for details
be 3 1 .0 7 % .'
Equation (8.6) is algebraically identical with the Pearson correlation
coefficient equation (1.4). The autocorrelation just uses the correla
8.3 DO EQ U ITY CO R R ELA TIO N S tion values of time period tan d time period t — 1 as inputs.
EXH IBIT A U T O C O R R ELA T IO N ? Following equation (8.6), we find the one-period lag autocorre
lation of the correlation values from 1972 to 2017 to be 20.97% .
Autocorrelation is the degree to which a variable is correlated As m entioned above, autocorrelation is the "opposite property"
to its past values. Autocorrelation can be quantified with the 1 of mean reversion. Therefore, not surprisingly, the autocorrela
tion of 20.97% and the mean reversion is our study of 79.03%
1 Note that we have omitted any stochasticity, which is typically included (see the above section "D o equity correlations exhibit mean
when modelling financial variables, as shown in equation (8.2). reversion?") add up to 1.
Figure 8.5 shows the autocorrelation with respect
0.13
to different tim e lags.
0.12
From Figure 8.5, we observe that 2-month lag 0.11
autocorrelation, so autocorrelation with respect 0.1
to two months prior, produces the highest auto 0.09
correlation. A ltogether we observe the expected 0.08
decay in autocorrelation with respect to tim e lags 2 0.07
of earlier periods. 0.06
0.05
8.4 HOW A RE EQ UITY 0.04

0.03
CORRELATIONS DISTRIBUTED? 0.02
0.01
The input data of our distribution tests are daily
0
correlation values between all 30 Dow stocks from -1 - 0.8 - 0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
1972 to 2017. This resulted in 464,580 correlation Histogram - Johnson SB
values. The distribution is shown in Figure 8.6. Fiqure 8.6 Histogram of 464,580 correlations between the Dow
From Figure 8.6, we observe that most correla 30 stocks from 1972 to 2017; the continuous line shows the Johnson
tions between the stocks in the Dow are positive. SB distribution, which provided the best fit.
In fact, 77.23% of all 464,580 correlation values
were positive. slightly but not significantly different; see w w w .dersoft.com /
We tested 61 distributions for fitting the histogram in Figure 8.6, correlationfitting.docx.
applying three standard fitting tests: (a) Kolm ogorov-Sm irnov, (b)
Anderson-Darling and (c) Chi-Squared. Not surprisingly, the ver
satile Johnson SB distribution with four param eters —y and 8 for 8.5 IS EQ U IT Y CO RRELA TIO N
the shape, /x for location and a for scale-provided the best fit.
VO LATILITY AN INDICATOR FO R
Standard distributions such as normal distribution, lognormal FU TU RE R E C E S S IO N S ?
distribution or beta distribution provided a poor fit.
We also tested the correlation distribution between the Dow In our study from 1972 to 2017, six recessions occurred: (1) a
severe recession in 1973-74 following the first oil price shock,
stocks for different states of the econom y. The results were
(2) a short recession in 1980, (3) a severe recession in 1981-82
following the second oil price shock, (4) a mild reces
sion in 1990-91, (5) a mild recession in 2001 after the
Internet bubble burst and (6) the "great recession"
2007-09, following the global financial crisis. Table 8.2
displays the relationship of a change in the correlation
volatility preceding the start of a recession.
From Table 8.2, we observe the severity of the

2007-09 "great recession", which exceeded the
severity of the oil price shock induced recessions in
1973-74 and 1981-82.
From Table 8.2, we also notice that, excep t for the

mild recession in 1990-91, before every recession
a downturn in correlation volatility occurred. This
coincides with the fact that correlation volatility is
low in an expansionary period (see Table 8.1), which
Fiqure 8.5 Autocorrelation of monthly average 30 X 30 Dow stock often precedes a recession. However, the relationship
correlations from 1972 to 2017; the time period of the lags is months. between a decline in volatility and the severity of the
Table 8.2 Decrease in Correlation Volatility, Preceding a Recession. The Decrease in Correlation Volatility is
Measured as a 6 -Month Change of 6 -Month Moving Average Correlation Volatility. The Severity of the Recession is
Measured as the Total GDP Decline During the Recession
% Change in Correlation Volatility Before Recession Severity of Recession (% change of GDP)
1973-74 - 7 .2 2 % -1 1 .9 3 %
1980 - 1 0 .1 2 % - 6 .5 3 %
1981-82 - 4 .6 5 % -1 2 .0 0 %
1990-91 0.06% - 4 .0 5 %
2001 - 5 .5 5 % - 1 .8 0 %
2007-09 - 2 .6 4 % -1 4 .7 5 %
recession is statistically non-significant. The regression function were extrem ely high during the great recession of 2007-09
is alm ost horizontal and the R2 is close to zero. Studies with and reached 96.97% in February 2009.
more data, going back to 1920, are currently being conducted.
(b) Eq uity correlation volatility is lowest in an expansionary
period and higher in normal and recessionary econom ic
periods. Traders and risk managers should take these
8.6 PRO PERTIES O F BON D higher correlation levels and higher correlation volatil
CO RRELA TIO N S AN D DEFAULT ity that m arkets exhibit during econom ic distress into
PRO BABILITY CO R R ELA TIO N S consideration.
(c) Equity correlation levels and equity correlation volatility are

O ur preliminary studies of 7,645 bond correlations and 4,655 positively related.
default probability correlations display similar properties as equity
correlations. Correlation levels were higher for bonds (41.67%)
(d) Equity correlations show very strong mean reversion. The
Dow correlations from 1972 to 2017 showed a monthly
and slightly lower for default probabilities (30.43%) compared
mean reversion of 79.03% . Hence, when modelling correla
with equity correlation levels (34.83%). Correlation volatility was
tion, mean reversion should be included in the model.
lower for bonds (63.74%) and slightly higher for default probabili
ties (87.74%) compared with equity correlation volatility (79.73%). (e) Since equity correlations display strong mean reversion, they
display low autocorrelation. The degree of autocorrelations
Mean reversion was present in bond correlations (25.79%) and in
shows the typical decrease with respect to time (ie, the
default probability correlations (29.97% ). These levels were lower
autocorrelation is higher for more recent time lags).
than the very high equity correlation mean reversion of 77.51% .
The default probability correlation distribution is sim ilar to

(f) The equity correlation distribution showed a distribution,
which can be replicated well with the Johnson SB distribu
equity correlation distribution (see Figure 8.4) and can be repli
tion. O ther distributions such as normal, lognormal and
cated best by the Johnson SB distribution. However, the bond
beta distribution do not provide a good fit.
correlation distribution shows a more normal shape and can be
best fitted with the generalised extrem e value distribution and (g) First results show that bond correlations display similar
quite well with the normal distribution. Some fitting results are properties as equity correlations. Bond correlation levels
at w w w .dersoft.com /correlationfitting.docx. The bond correla and bond correlation volatilities are generally higher in
tion and default probability results are currently being verified econom ic bad tim es. In addition, bond correlations exhibit
with a larger sam ple data base. mean reversion, although lower mean reversion than equity
correlations exhibit.
(h) First results show that default correlations also exhibit

SUM M ARY properties seen in equity correlations. Default probability
correlation levels are slightly lower than equity correlation
The following are the main findings of our empirical analysis:
levels, and default probability correlation volatilities are
(a) O ur study confirmed that the worse the state of the econ slightly higher than equity correlations. Studies with more
omy the higher are equity correlations. Equity correlations data are currently being conducted.
The following questions are i to help candidates understand the material. They are not actual FRM exam questions.
Q U ESTIO N S
8.1 In which state of the econom y are equity correlations the 8.7 For equity correlations, we see the typical decrease of auto
highest? correlation with respect to time lags. What does that mean?
8.2 In which state of the economy is equity correlation volatility 8.8 How are mean reversion and autocorrelation related?
high?
8.9 What is the distribution of equity correlations?
8 .3 What follows from Questions 1 and 2 for risk management? 8.10 W hen modelling stocks, bonds, com m odities, exchange
8.4 What is mean reversion? rates, volatilities and other financial variables, we
typically assume a normal or lognormal distribution. Can
8.5 How can we quantify mean reversion?
we do this for equity correlations?
8.6 What is autocorrelation? Name two approaches for how to
quantify autocorrelation.
132 Financial Risk Manager Exam Part II: Market Risk Measurement and Management
Financial Correlation
Modeling—
Bottom-Up
Approaches
Learning Objectives
Explain the purpose of copula functions and the transla Summarize the process of finding the default tim e of an
tion of the copula equation. asset correlated to all other assets in a portfolio using the
Gaussian copula.
Describe the Gaussian copula and explain how to use it to
derive the joint probability of default of two assets.
E x c e rp t is taken from C hapter 5 o f Correlation Risk Modeling and M anagem ent, 2n d Edition, by G u n ter M eissner.
9.1 C O P U LA CO RRELA TIO N S One-factor copulas Two-factor copulas
A fairly recent and fam ous as well as infa Gaussian Archimedian Student's t Frechet Marshall-Olkin
mous correlation approach ap p lied in p = correlation ipa(t) = generator p = correlation p, q = linear m, n = weighting
v = d.o.f combination factors
finance is the copula approach. C o pulas go
back to A b e Sklar (1959). Exten sio n s are
provided by Sch w eizer and W olff (1981)
and Sch w eizer and Sklar (1983). O ne-facto r
copulas w ere introduced to finance by
O ld rich V asicek in 1987. M ore ve rsatile , m ul
tivariate copulas w ere applied to finance by
David Li in 2000.
Fiq u re 9.1 Popular copula functions in finance
W hen flexible copula functions were introduced
to finance in 2000, they were enthusiastically Numerous types of copula functions exist. They can be broadly
em braced but then fell into disgrace when the global finan categorised in one-param eter copulas as the Gaussian co p ula;1
cial crisis hit in 2007. Copulas becam e popular because they and the Archim edean copula fam ily, the most popular being
could presum ably solve a com plex problem in an easy way: it G um bel, Clayton and Frank copulas. O ften cited tw o-param eter
was assumed that copulas could correlate multiple assets; for copulas are student-t, Frechet, and M arshall-Olkin. Figure 9.1
exam ple, the 125 assets in a C D O , with a single, although multi shows an overview of popular copula functions.
dim ensional, function. Let's first look at the maths of the copula
correlation concept.
The Gaussian Copula
Copula functions are designed to simplify statistical problem s.
They allow the joining of multiple univariate distributions to a Due to its convenient properties, the Gaussian copula C q is
single m ultivariate distribution. Form ally, a copula function C among the most applied copulas in finance. In the n-variate
transform s an n-dimensional function on the interval [0,1] into a case, it is defined
unit-dimensional one: Cg [G,( u ,)......... G„(u„)] = M^N-'tGû,))........... N-1(G„(u„)); pM]

(9.3)
C : [ 0 , i r — [ 0, 1] (9.1)
where M n is the joint, n-variate cum ulative standard normal
More explicitly, let G,(u() be a univariate, uniform distribution
distribution with p Ml the n X n sym m etric, positive-definite cor
with Uj = u - , un, and / G N. Then there exists a copula func
relation m atrix of the n-variate normal distribution M n. N 1 is the
tion C such that
inverse of a univariate standard normal distribution.
C [G ,(Ul)...........G„(u„)] = Fn[F T '(G ,(Ul))..............F ; 1(G„(u„)); P f]
If the G x(ux) are uniform, then the N ~ \G x[ux)) are standard nor
(9.2) mal and M n is standard multivariate normal. For a proof, see
where G,(u,) are called marginal distributions and Fn is the joint Cherubini et al 2005.
cum ulative distribution function. Fj~1 is the inverse of F r p F is the It was David Li (2000), who transferred the copula approach of
correlation structure of Fn. equation (9.3) to finance. He defined the cum ulative default
Equation (9.2) reads: given are the marginal distributions probabilities Q for entity / at a fixed tim e t, Q,(t) as marginal
G-i ( u -i ) to G n(un). There exists a copula function that allows the distributions. Hence we derive the Gaussian default tim e
m apping of the marginal distributions Gi(u-]) to G n(un) via F -1 copula C GD,
and the joining of the (abscise values) F -1(G (-(u,)) to a single, C GD[Q/(t)........... Qn(t)] = Mn[N~1(Q-|(t)), . . . N _1(Q n(t)); p M]
n-variate function F n[F _1(G-|(u-|)), . . . , F n \ G n(un))] with correla (9.4)
tion structure of p F.
If the m apped values F]“1(G;(u,)) are continuous, it follows that C

1 Strictly speaking, only the bivariate Gaussian copula is a one-parameter
is unique. For detailed properties and proofs of equation (9.2),
copula, the parameter being the copula correlation coefficient. A multi
see Sklar (1959) and Nelsen (2006). A short proof is given in variate Gaussian copula may incorporate a correlation matrix, containing
A ppendix A 2. various correlation coefficients.
Equation (9.4) reads: given are the marginal distributions, ie, to percentile to a cum ulative standard normal distribution via
the cum ulative default probabilities Q of entities / = 1 to n at N-'(Qcaa(t»-
tim es t, Qj(t). There exists a Gaussian copula function C GD, which
We have now derived the percentile to percentile mapped
allows the mapping of the marginal distributions Q,(t) via /\T1 to
cum ulative default probability values of our com panies to a
standard normal and the joining of the (abscise values) N 1Q,(t)
cum ulative standard normal distribution. These values are dis
to a single n-variate standard normal distribution M n with the
played in Table 9.2, columns 3 and 5.
correlation structure p M.
We can now use the derived and apply them to equation (9.4).
More precisely, in equation (9.4) the term /\T1 maps the cum ula
tive default probabilities Q of asset / for time t, Q,(t), percentile Since we have only n = 2 com panies B and Caa in our exam ple,
to percentile to a univariate standard normal distribution. So equation (9.4) reduces to
the 5th percentile of Q,(t) is m apped to the 5th percentile of M 2{N ~ '(Q B(t)), N - '(Q Caa(t)Y, (9.5)
the standard normal distribution; the 10th percentile of Q,(t)
From equation (9.5) we see that since we have only two assets
is m apped to the 10th percentile of the standard normal dis
in our exam ple, we have only one correlation coefficient p, not a
tribution, etc. As a result, the N _1(Q,(t)) in equation (9.4) are
correlation m atrix pm -
abscise (x-axis) values of the standard normal distribution. For a
numerical exam ple see exam ple 9.1 and Figure 9.2 below. The
N L1(Q,(t)) are then joined to a single n-variate distribution Mn Cumulative normal distribution
by applying the correlation structure of the m ultivariate normal

distribution with correlation m atrix pM. The probability of n cor
related defaults at tim e t is given by Mn.
We will now look at the Gaussian copula in an exam ple.
Example 9.1 Let's assume we have two companies, B

and Caa, with their estimated default probabilities for
years 1 to 10 as displayed in Table 9.1.
Default probabilities for investm ent-grade com panies typically
increase in tim e, since uncertainty increases with tim e. However,
in Table 9.1 both com panies are in distress. For these com panies
the next years are the most difficult. If they survive these next
years, their default probability decreases.
Let's now find the joint default probabilities of the com panies B
and Caa for any time t with the Gaussian copula function (9.4).
First, we map the cum ulative default probabilities Q(t), which
are in columns 3 and 5 in Table 9.1, to the standard normal
distribution via /\T1(Q(T)). Com putationally, this can be done
with = normsin-v(Q(t)) in Excel or norminv(Q(t)) in M A TLA B.
G raphically the mapping can be represented in two steps, which
are displayed in Figure 9.2. In the lower graph of Figure 9.2, the
cum ulative default probability of asset B, Q B(t), is displayed. We
first map these cumulative probabilities percentile to percentile to
a cumulative standard normal distribution in the upper graphs of
Figure 9.1 (up arrows). In a second step the abscise (x-axis) values
of the cumulative normal distribution are found (down arrows).
The same mapping procedure is done for com pany C aa, ie,
the cum ulative default probabilities of com pany Caa, which Fiq u re 9 .2 Graphical representation of the copula
are displayed in Table 9.1 in column 5 are m apped percentile mapping N ~1(Q (t)).
Chapter 9 Financial Correlation Modeling— Bottom-Up Approaches ■ 135

Table 9.1 Default Probability and Cumulative Default Probability of Companies B and Caa
Company B Default Company B Cumulative Company Caa Default Company Caa Cumulative
Default Time t Probability Default Probability Q B(t) Probability Default Probability Qcaa(t)
1 6.51% 6.51% 23.83% 23.83%
2 7.65% 14.16% 13.29% 37.12%
3 6.87% 21.03% 10.31% 47.43%
4 6.01% 27.04% 7.62% 55.05%
5 5.27% 32.31% 5.04% 60.09%
6 4.42% 36.73% 5.13% 65.22%
7 4.24% 40.97% 4.04% 69.26%
8 3.36% 44.33% 4.62% 73.88%
9 2.84% 47.17% 2.62% 76.50%
10 2.84% 50.01% 2.04% 78.54%
Table 9 .2 Cumulative Default Probabilities Mapped Percentile to Percentile to Standard Normal. For Example,
Using Excel, the Value -1.5133 is Derived using = normsinv(0.0651) = -1.5133
Company B Cumulative Company Caa Company Caa Cumulative
Default Company B Cumulative Standard Normal Cumulative Default Standard Normal Percentiles
Time t Default Probability Q B(t) Percentiles N _1 (Q B(t)) Probability Q Caa(t) N - '( Q c M )
1 6.51% - 1 .5 1 3 3 23.83% - 0 .7 1 1 8
2 14.16% - 1 .0 7 3 2 37.12% - 0 .3 2 8 7
3 21.03% - 0 .8 0 5 4 47.43% - 0 .0 6 4 5
4 27.04% - 0 .6 1 1 6 55.05% 0.1269
5 32.31% - 0 .4 5 9 0 60.09% 0.2557
6 36.73% - 0 .3 3 9 0 65.22% 0.3913
7 40.97% - 0 .2 2 8 3 69.26% 0.5032
8 44.33% - 0 .1 4 2 6 73.88% 0.6397
9 47.17% - 0 .0 7 1 0 76.50% 0.7225
10 50.01% 0.0003 78.54% 0.7906
Im portantly, the copula model now assumes that we can apply

the correlation structure p M or p of the m ultivariate distribution
(in our case the Gaussian m ultivariate distribution M), to the
transform ed marginal distributions N_1(Q B(t)) and /\T1(Q Caa(t)).
This is done for mathematical and com putational convenience.
The bivariate normal distribution M2 is displayed in Figure 9.3.
The code for the bivariate cum ulative normal distribution M can
be found on the Internet. It is also displayed at w w w .dersoft
.com /2assetdefaulttim ecopula.xlsm in Module 1.
We now have all necessary ingredients to find the joint default F ia u re 9 .3 Bivariate (non-cumulative)
probabilities of our com panies B and Caa. For exam ple, we normal distribution.
can answer the question: what is the joint default probability Q with the cum ulative individual default probability Q of a s s e t ; at
of com panies 8 and Caa in the next year assuming a one-year tim e t , Q;(t ,-). Therefore,
Gaussian default correlation of 0.4? The solution is
M n(-) = Q,(r,) or (9.8)
Q (tb ^ 1 H tCaa ^ 1)
x ,= Q -'(M „(-)) (9.9)
= M ( x b < - 1 .5 1 3 3 D xcaa ^ - 0 .7 1 1 8 , p = 0.4) = 3.44% (9.6)
There is no closed-form solution for equation (9.8) or (9.9). To
where tB is the default tim e of com pany 8 and tCaa is the default find the solution, we first take the sam ple Mn( •) and use equa
tim e of com pany Caa. x B and x c aa are the m apped abscise val tion (9.8) to equate it to Q,(r/). This can be done with a search
ues of the bivariate normal distribution, which are derived from procedure such as Newton-Raphson. We can also use a simple
Table 9.2. lookup function in Excel.
n another exam ple, we can answer the question: what is the Let's assume the random drawing from Mn( •) was 35%. We
joint probability of com pany B defaulting in year 3 and com pany now equate 35% with the market-given function Q ,( t ,) and find
Caa defaulting in year 5? It is the expected default time of asset /, t ;. This is displayed in Fig
ure 9.4, where t ; = 5.5 years. We repeat this procedure numer
Q (tB — 3 Pi tCaa ^ 5)
ous tim es, for exam ple 100,000 tim es and average each t (- of
= M ( x b < - 0 .8 0 5 4 D x Caa < 0.2557, p = 0.4) = 16.93% (9.7) every simulation to find our estim ate for t /. Im portantly, the esti
Equations (9.6) and (9.7) show why this type of copula is also mated default tim e of asset /, r it includes the default correlation
called "default-time copula". We are correlating the default times with the other assets in the portfolio, since the correlation matrix
of two or more assets t/. A spreadsheet that correlates is an input of the n-variate standard normal distribution M n.
the default times of two assets can be found at w w w .dersoft.
com /2assetdefaulttim ecopula.xlsm . The numerica Q(t)
value of 3.44% of equation (9.6) is in cell Q 17.
Simulating the Correlated Default

Time for Multiple Assets
The preceding exam ple considers only two assets.
We will now find the default tim e for an asset that is
correlated to the default tim es of all other assets in
a portfolio using the Gaussian copula. To derive the
default tim e r of asset /, t -„ which is correlated to the
default tim es of all other assets / = 1, . . . , n, we first
derive a sam ple Mn(*) from a m ultivariate copula (r.h.s.
of equation (9.5) in the Gaussian case), Mn( •) G [0, 1 ].
This is done via Cholesky decom position, which is
explained in A pp end ix A1 of this chapter. The sample
includes the default correlation via the default correla Fiq u re 9 .4 Finding the default time r of 5.5 years from
tion m atrix p M of the n-variate standard normal distri equation (9.8) for a random sample of the n-variate normal
bution Mn. We equate the sam ple (•) from Mn, Mn(*) distribution Mn(*) of 35%.
Chapter 9 Financial Correlation Modeling— Bottom-Up Approaches ■ 137

Learning Objectives
Explain the drawbacks to using a DV01-neutral hedge for Calculate the face value of multiple offsetting swap posi
a bond position. tions needed to carry out a two-variable regression hedge.
Describe a regression hedge and explain how it can Com pare and contrast level and change regressions.
improve a standard DV01-neutral hedge.
Describe principal com ponent analysis and explain how it
Calculate the regression hedge adjustm ent factor, beta. is applied to constructing a hedging portfolio.
Calculate the face value of an offsetting position needed

to carry out a regression hedge.
E x c e rp t is C hapter 6 o f Fixed Income Securities: Tools for Today's M arkets, Third Edition, by Bruce Tuckman and A n g e l Serrat.
139
Central to the DV01-style metrics and hedges and the m ultifac views about inflation. In the relative value trade of this section, a
tor metrics and hedges are im plicit assum ptions about how rates trader bets that this inflation-induced spread will increase.
of different term structures change relative to one another. In
The trader plans to short USD 100 million of the (nominal) 35/8s
this chapter, the necessary assumptions are derived directly from
of August 15, 2019, and, against that, to buy some amount of
data on rate changes.
the TIPS 1^8S of Ju ly 15, 2019. Table 10.1 shows representa
The chapter begins with single-variable hedging based on tive yields and DV01s of the two bonds. The TIPS sells at a
regression analysis. In the exam ple of the section, a trader relatively low yield, or high price, because its cash flows are
tries to hedge the interest rate risk of U.S. nominal versus real protected from inflation while the DV01 of the TIPS is relatively
rates. This exam ple shows that em pirical models do not always high because its yield is low. In any case, what face amount of
describe the data very precisely and that this imprecision the TIPS should be bought so that the trade is hedged against
expresses itself in the volatility of the profit and loss of trades the level of interest rates, i.e., to both rates moving up or down
that depend on the em pirical analysis. together, and exposed only to the spread between nominal and
real rates?
The chapter continues with two-factor hedging based on
multiple regression. The exam ple for this section is that of O ne choice is to make the trade DV01 -neutral, i.e ., to buy F R
an EUR swap m arket maker who hedges a custom er trade of face amount of TIPS such that
20-year swaps with 10- and 30-year swaps. The quality of this
~ .081 .067
hedge is shown to be quite a bit better than that of nominal F r X — — = 100mm X
100 100
versus real rates. Before concluding the discussion of regres
.067
sion techniques, the chapter com m ents on level versus change F r - 100mm X = USD 82.7mm ( 10 . 1)
.081
regressions.
This hedge ensures that if the yield on the TIPS and the nominal
The final section of the chapter introduces principal com ponent
bond both increase or decrease by the same number of basis
analysis, which is an em pirical description of how rates move
points, the trade will neither make nor lose money. But the
together across the curve. In addition to its use as a hedging
trader has doubts about this choice because changes in yields
tool, the analysis provides an intuitive description of the em piri
on TIPS and nominal bonds may very well not be one-for-one.
cal behavior of the term structure. The data illustrations for this
To investigate, the trader collects data on daily changes in yield
section are taken from USD, EUR, GBP, and JP Y swap m arkets.
of these two bonds from August 17, 2009, to Ju ly 2, 2010, which
Considerable effort has been made to present this material at as
are then graphed in Figure 10.1, along with a regression line, to
low a level of m athem atics as possible.
be discussed shortly. It is im m ediately apparent from the graph
A them e across the illustrations of the chapter is that empirical that, for exam ple, a five basis-point change in the yield of the
relationships are far from static and that hedges estim ated over TIPS does not imply, with very high confidence, a unique change
one period of time may not work very well over subsequent in the nominal yield, nor even an average change of five basis
periods. points. In fact, while the daily change in the real yield was about
five basis points several tim es over the study period, the change
in the nominal yield over those particular days ranged from 2.2
to 8.4 basis points. This lack of a one-to-one yield relationship
10.1 SIN G LE-VA RIA BLE
calls the DV01 hedge into question. For context, by the way, it
R EG R ESSIO N -B A SED H ED G IN G should be noted that graphing the changes in the yield of one
nominal Treasury against changes in the yield of another, of
This section considers the construction of a relative value trade
similar maturity, would result in data points much more tightly
in which a trader sells a U.S. Treasury bond and buys a U.S. Trea
surrounding the regression line.
sury TIPS (Treasury Inflation Protected Securities). As mentioned
in the O verview , TIPS make real or inflation-adjusted payments
by regularly indexing their principal amount outstanding for Table 10.1 Yields and DV01s of a TIPS and a Nominal
inflation. Investors in TIP S, therefore, require a relatively low real U.S. Treasury as of May 28, 2010
rate of return. By contrast, investors in U.S. Treasury bonds—
called nominal bonds when distinguishing them from T IP S — Bond Y ie ld (% ) DV01
require a real rate of return plus com pensation for expected TIPS 17/8s of 7/15/19 1.237 .081
inflation plus, perhaps, an inflation risk premium. Thus the
35/8s of 8/15/19 3.275 .067
spread between rates of nominal bonds and TIPS reflects market
According to Equation (10.3), changes in the real yield,
the independen t variable, are used to predict changes
in the nominal yield, the d ep e n d en t variable. The inter
cept, a, and the slope, f3, need to be estimated from
the data. The error term et is the deviation of the nom
inal yield change on a particular day from the change
predicted by the model. Least-squares estimation of
(10.3), to be discussed presently, requires that the
model be a true description of the dynamics in ques
tion and that the errors have the same probability dis
tribution, are independent of each other, and are
A
uncorrelated with the independent variable.
As an exam ple of the relationship between the nomi

nal and real yields in (10.3), say that the param eters
Regression of changes in the yield of the Treasury
A
Fiaure 10.1 estim ated with the data, denoted a and f3, are 0 and
35/8s of August 15, 2019, on changes in the yield of the TIPS 1.02 respectively. Then, if A y f is 5 basis points on a
1.875s of July 15, 2019, from August 17, 2009, to July 2, 2010. particular day, the predicted change in the nominal
yield, written Ay^, is
A y tN = a + /3Ay f
With respect to improving on the DV01 hedge, there is not
= 0 + 1.02 X 5 = 5.1 (10.4)
much the trader can do about the dispersion of the change in
the nominal yield for a given change in the real yield. That is Furtherm ore, should it turn out that the nominal yield changes
part of the risk of the trade and will be discussed later. But the by 5.5 basis points on that day, then the realized error that day,
trader can estim ate the average change in the nominal yield written et, following Equation (10.3), is defined as
for a given change in the real yield and adjust the DV01 hedge
st = A y ? - a - p&yR
accordingly. For exam ple, were it to turn out— as it will— that
= A y tN - A yf' (10.5)
the nominal yield in the data changes by 1.0189 basis points per
basis-point change in the real yield, the trader could adjust the In this exam ple,
hedge such that
s t = 5.5 - 5.1 = .4 (10.6)
p .081 .067 A
Fr X —— = 100mm X —-— X 1.0189 Least-squares estimation of a and [3 finds the estim ates a and (3
100 100
that minimize the sum of the squares of the realized error term s
F r = USD 100mm X X 1.0189 = USD 84.3mm (10.2) over the observation period,
Relative to the DV01 hedge of USD 82.7 million in (10.1), the = S W y f' - a - M y ? )2 (10.7)
t t1
hedge in (10.2) increases the am ount of TIPS to com pensate for
the em pirical fact that, on average, the nominal yield changes
by more than one basis point for every basis-point change in the
1 Since the nominal rate is the real rate plus the inflation rate, the error
real yield.
term in Equation (10.3) contains the change in the inflation rate. There
The next subsection introduces reg ressio n analysis, which fore, the assumption that the independent variable be uncorrelated with
the error term requires here that the real rate be uncorrelated with the
is used both to estim ate the coefficient 1.0189, used
inflation rate. This is a tolerable, though far from ideal, assumption: the
in Equation (10.2), and to assess the properties of the inflation rate can have effects on the real economy and, consequently,
resulting hedge. on the real rate.
If the regression were specified such that the real rate were the depen
dent variable and the nominal rate the independent variable, the
Least-Squares Regression Analysis requirement that the error and the dependent variable be uncorrelated
would certainly not be met. In that case, the error term contains the
Let A y ? and Ay^ be the changes in the yields of the nominal inflation rate and there is no credible argument that the nominal rate is
and real bonds, respectively, and assume that even approximately uncorrelated with the inflation rate. Consequently,
a more advanced estimation procedure would be required, like that of
A y f' = a + /3Ay? + et (10.3) instrumental variables.
Chapter 10 Empirical Approaches to Risk Metrics and Hedging ■ 141

where the equality follows from (10.5). The squaring of the errors Table 10.2 Regression Analysis of Changes in the
ensures that offsetting positive and negative errors are not consid Yield of the 35/ss of August 15, 2019, on the Changes
ered as acceptable as zero errors and that large errors in absolute in Yield of the TIPS 1%s of July 15, 2019, from August
values are penalized substantially more than smaller errors. 17, 2009, to July 2, 2010
Least-squares estimation is available through many statistical No. of Observations 229
packages and spreadsheet add-ins. A typical summary of the
R-Squared 56.3%
regression output from estimating Equation (10.3) using the data
in Figure 10.1 is given in Table 10.2. The /3 reported in the table is Standard Error 3.82
1.0189, which says that, over the sample period, the nominal yield Regression Coefficients Value Std. Error
increases by 1.0189 basis points per basis-point increase in real
Constant (a ) 0.0503 .2529
yields. The constant term of the regression, a, is not very different
Change in Real Yield (/3) 1.0189 .0595
from zero, which is typically the case in regressions of changes in
a yield on changes in a comparable yield. The economic interpre
tation of this regularity is that a yield does not usually trend up or
down while a comparable yield is not changing.
The second useful statistic reported in Table 10.2 is the standard
Table 10.2 reports standard errors of a and [3 of .2529 and error of the regression, denoted here by a and given as 3.82
.0525, respectively. Under the assumptions of least squares basis points. Algebraically, a is essentially the standard devia
and the availability of sufficient data, the param eters a and [3 tion of the realized error term s et,2 defined in Equation (10.5).
are normally distributed with means equal to the true model G raphically, each et is the vertical line from a data point directly
values, a and f3 respectively, and with standard deviations that down or up to the regression line and a is essentially the stan
can be estim ated as the standard errors given in the table. dard deviation of these distances. Either way, a m easures how
Therefore, relying on the properties of the normal distribution, well the model fits the data in the same units as the dependent
the confidence interval .0503 ± 2 X .2529 or (—.4555, .5561) variable, which, in this case, are basis points.
has a 95% chance of falling around the true value a . And since
this confidence interval does include the value zero, one cannot
reject the statistical hypothesis that a = 0. Similarly, the 95% The Regression Hedge
confidence interval with respect to f3 is 1.0189 ± 2 X .0595, or
The use of the regression coefficient in the hedging exam ple
(.8999, 1.1379). So, while regression hedging makes heavy use of
A of this section was discussed in the developm ent of Equation
the point estim ate [3 = 1.0189, the true value of [3 may very well
(10.2). More form ally, denoting the face amounts of the real
be som ewhat higher or lower.
and nominal bonds by F R and FN and their DV01s by DV01R and
Substituting the estim ated coefficients from Table 10.2 into the DV01n, the regression-based hedge, characterized earlier as the
predicted regression equation in the first line of (10.4), DV01 hedge adjusted for the average change of nominal yields
relative to real yields, can be written as follows:
A y ? = a + |3A y?
D M DV01 n
A y ? = .0503 + 1.0189 X A y f (10.8) F r = - F n X ---------- X /3 (10.9)
D V 01R
This relationship is known as the fitte d regression line and is the
It turns out, however, that this regression hedge has an even
straight line through the data that appears in Figure 10.1.
stronger justification. The profit and loss (P&L) of the hedged
Table 10.2 reports two other useful statistics, the R-squared and position over a day is
the standard error of the regression. The R-squared in this case is
D V01" „ ^ D V01W( n
56.3% , which means that 56.3% of the variance of changes in the -F r X . . . A y R - Fn X A y tN ( 10 . 10)
100 100
nominal yield can be explained by the model. In a one-variable
regression, the R-squared is just the square of the correlation
of the two changes, so the correlation between changes in the
nominal and real yields is the square root of 56.3% or about 2 If the number of observations is n, the standard error of the regression
7.5% . This is a relatively low number com pared with typical cor 2#
is actually defined as the square root of ----- —. The average of the et
relations between changes in two nominal yields, echoing the
in a regression with a constant is zero by construction, so the standard
com m ent made in reference to the relatively wide dispersion of error of the regression differs from the standard deviation of the errors
the points around the regression line in Figure 10.1. only because of the division by n — 2 instead of n — 1.
Appendix A in this chapter shows that the hedge of Equation (10.9) The Stability of Regression Coefficients
minimizes the variance of the P&L in (10.10) over the data set
over Time
shown in Figure 10.1 and used to estimate the regression param
eters of Table 10.2. An im portant difficulty in using regression-based hedging in
practice is that the hedger can never be sure that the hedge
In the example of this section, F N = —USD 100mm, /3 = 1.0189,
coefficient, [3, is constant over tim e. Put another way, the errors
D V01 n = .067, and D V01R = .081, so, from (10.9), as derived
around the regression line might be random outcom es around
before, F R = USD 84.279m m . Because the estimated [3 happens
a stable relationship, as described by Equation (10.3), or they
to be close to one, the regression hedge of about USD 84.3 million
might be m anifestations of a changing relationship. In the for
is not very different from the DV01 hedge of USD 82.7 million
mer situation a hedger can safely continue to use a previously
calculated earlier. In fact, some practitioners would describe this
estim ated [3 for hedging while, in the latter situation, the hedger
hedge in terms of the DV01 hedge. Rearranging the terms of (10.9),
should re-estimate the hedge coefficient with more recent data,
-F r X D V 01R if available, or with data from a past, more relevant tim e period.
101.89% (10 .11)
F n X D\701n But how can the hedger know which situation prevails?
In words, the risk of the (TIPS) hedging portfolio, measured by A useful start for thinking about the stability of an estim ated
DV01, is 101.89% of the risk of the underlying (nominal) position, regression coefficient is to estim ate that coefficient over dif
measured by DV01. Alternatively, the risk w eight of the hedge ferent periods of tim e and then observe if the result is stable
portfolio is 101.89% . This term inology does connect the hedge or not. To this end, with the same data as before, Figure 10.2
to the common DV01 benchmark but is som ewhat misleading graphs (3 for regressions over rolling 30-day windows. This
because the whole point of the regression-based hedge is that means that the full data set of changes from August 18, 2009,
the risks of the two securities cannot properly be measured by to Ju ly 2, 2010, is used in 30-day increm ents, as follows: the first
/\
the DV01 alone. It should also be noted at this point that the /3 comes from a regression of changes from August 18, 2009,
regression-based and DV01 hedges are certainly not always this to Septem ber 28, 2009; the second [3 from that regression from
close in m agnitude, even in other cases of hedging TIPS versus August 19, 2000, to Septem ber 29, 2009, etc.; and the last /3
nominals, as will be illustrated in the next subsection. from May 24, 2010, to Ju ly 2, 2010. The estim ates of (3 in the
figure certainly do vary over tim e, but the range of .75 to 1.29
An advantage of the regression fram ework for hedging is that
is not extrem ely surprising given the previously com puted 95%
it autom atically provides an estim ate of the volatility of the
confidence interval with respect to f3 of (.8999, 1.1379). More
hedged portfolio. To see this, substitute F R from (10.9) into the
troublesom e, perhaps, is the fact that the most recent values
P&L expression (10.10) and rearrange term s to get the following
of (3 have been trending up, which may indicate a change in
expression for the P&L of the hedged position:
regime in which even higher values of (3 characterize the rela
n \/m N tionship between nominal and real rates.
- F N x - ^ - ( A y tN - M y ? ) ( 10 . 12)
From the definition of et in (10.5), the term in parentheses 1.4

equals et + a. But since a is typically not very im portant, the C
CD
standard error of the regression a can be used to approxim ate

the standard deviation of Ay^ — M y ? - Hence, the standard
deviation of the P&L in (10.12) is approxim ately
DV 01N „
F X 100 x & (1013)
In the present exam ple, recalling that the standard error of the
regression can be found in Table 10.2, the daily volatility of the
P&L of the hedged portfolio is approxim ately O
CO
0.6 H----------- 1----------- 1----------- 1----------- 1----------- 1-----------

Aug-09 Oct-09 Dec-09 Feb-10 Apr-10 Jun-10 Aug-10
USD 100mm X X 3.82 = USD 255, 940 (10.14)
F ia u re 1 0.2 Rolling 30-day regression coefficient
The trader would have to com pare this volatility with an for the change in yield of the Treasury 35/8s of
expected gain to decide w hether or not the risk-return profile of August 15, 2019, on the change in yield of the TIPS
the trade is attractive. 1 7/8 s of July 15, 2019.

Table 10.3 Regression Analysis of Changes in the Equation (10.15) can be estim ated by least squares, analogously
Yield of the 6 V2s of February 15, 2010, on the Changes to the single-variable case, by minimizing
in Yield of the TIPS 41As of January 15, 2010, from
2 ( A y t20 - a - (3]0A y l° - (330A y f ) 2 (10.16)
February 15, 2000, to February 15, 2002 t
No. of Observations 519 with respect to the param eters a, /310 and /330. The estimation
R-Squared 43.0% of these param eters then provides a predicted change for the
20-year swap rate:
Standard Error 4.70
Regression Coefficients Value Std. Error A y 20 = a + 0 1OA y ]° + (330A y?° (10.17)
Constant (a) - .0 2 6 7 .2067 To derive the notional face amount of the 10- and 30-year
1.5618 .0790 swaps, F 10 and F30, respectively, required to hedge F20 face
Change in Real Yield (/§)
am ount of the 20-year swaps, generalize the reasoning given in
the single-variable case as follows. W rite the P&L of the hedged
position as
For a bit more perspective before closing this subsection, the
period February 15, 2000, to February 15, 2002, when rates were 9n DV0120 9n in D V0110 , n ooDVOI30 , n
- F 20 A y 20- F 10 — —— A y ]0 - F30 — —— A y 30 (10.18)
substantially higher, was characterized by significantly higher levels 100 100 100
of [3 and higher levels of uncertainty with respect to the regression
Then substitute the predicted change in the 20-year rate from
relationship. The two bonds used in this analysis are the TIPS 4%s
(10.17) into (10.18), retaining only the term s depending on A y ]0
of January 15, 2010, and the Treasury 6 V2S of February 15, 2010.
and Ay^°, to obtain
Summary statistics for the regression of changes in yields of the
nominal 6 V2S on the real 4%s are given in Table 10.3. DVQ120 10 D V0110
73
Com pared with Table 10.2, the estim ated (3 here is 50% larger 100 100
and the precision of this regression, measured by the R-squared DV0120 30 r m P V 0 1 30
P (10.19)
or the standard error of the regression, is substantially worse. 100 100
The contrast across periods again em phasizes the potential pit-
falls of relying on estim ated relationships persisting over tim e. Finally, choose F 10 and F30 to set the term s in brackets equal to
This does not imply, of course, that blindly assuming a [3 of one, zero, i.e ., to elim inate the dependence of the predicted P&L on
as in DV01 hedging, is a generally superior approach. changes in the 10- and 30-year rates. This leads to two equa
tions with the following solutions:
10.2 TW O-VARIABLE r ?n P W 1 20
( 10. 20 )
R EG R ESSIO N -B A SED H ED G IN G D V0110
F20 P V 0 1 20 ( 10 . 21 )
To illustrate regression hedging with two independent variables, DVD 130
this section considers the case of a m arket maker in EUR interest
As in the single-variable case, this 10s-30s hedge of the 20-year
rate swaps. An algebraic introduction is followed by an empirical
can be expressed in term s of risk weights. More specifically, the
analysis.
DV01 risk in the 10-year part of the hedge and the DV01 risk in
The m arket maker in question has bought or received fixed in the 30-year part of the hedge can both be expressed as a frac
relatively illiquid 20-year swaps from a custom er and needs to tion of the DV01 risk of the 20-year. M athem atically, these risk
hedge the resulting interest rate exposure. Im m ediately paying weights can be found by rearranging (10.20) and (10.21):
fixed or selling 20-year swaps would sacrifice too much if not
all of the spread paid by the customer, so the m arket maker - F 10 X D V 0 1 10
= /310 (10.22)
chooses instead to sell a combination of 10- and 30-year swaps. F20 X DV0120
Furtherm ore, the m arket maker is willing to rely on a two- - F 30 X DVQ130
= /330 (10.23)
variable regression model to describe the relationship between F20 X DV0120
changes in 20-year swap rates and changes in 10- and 30-year
Proceeding now to the empirical analysis, the market maker, as
swap rates:
of July 2006, performs an initial regression analysis using data on
A yt20 = a + /310A y]° + (330A y 30 + s t (10.15) changes in the 10-, 20-, and 30-year EUR swap rates from July 2,
2001, to July 3, 2006. Summary statistics for the
regression of changes in the 20-year EUR swap rate
on changes in the 10- and 30-year EUR swap rates
are given in Table 10.4. The statistical quality of these
results, characteristic of all regressions of like rates,
(/> 0
are far superior to those of the nominal against real Q.
_Q
yields of the previous section: the R-squared or per O ♦♦
♦♦
cent variance explained by the regression is 99.8%; LU -2
u.
the standard error of the regression is only .14 basis

points; and the 95% confidence intervals with respect
-4
to the two coefficients are extremely narrow, i.e.,
(.2153, 2289) for the 10-year and (.7691, .7839) for
-6
the 30-year. Lastly, in a result similar to those of the
Jul-01 Jul-04 Jul-07 Jul-10
regressions of the previous section, the constant is
insignificantly different from zero.
Fig u re 10.3 In- and out-of-sample errors for a regression of
changes of 20-year and 10- and 30-year EUR swap rates with
Applying the risk-weight interpretation of the estimation period July 2, 2001, to July 3, 2006.
regression coefficients given in Equations (10.22)
and (10.23), the results in Table 10.4 say that
22.21 % of the DV01 of the 20-year swap should be hedged with end, Figure 10.3 tracks the errors of the hedge over tim e. All of
a 10-year swap and 77.65% with a 30-year swap. The sum of these errors are com puted as the realized change in the 20-year
these w eights, 99.86% , happens to be very close to one, m ean yield minus the predicted change for that yield based on the
ing that the DV01 of the regression hedge very nearly matches estim ated regression in Table 10.4:
the DV01 of the 20-year swap, although this certainly need not
lu + .7765A y.3
et = A y f - (- .0 0 1 4 + .2 2 2 1 A y 10 0' (10.24)
f u)
be the case: minimizing the variance of the P&L of a hedged
position, when rates are not assumed to move in parallel, need The errors to the left of the vertical dotted line are in-sample in
not result in a DV01-neutral portfolio. that the same Ay^° used to com pute s t in (10.24) were also used
to com pute the coefficient estim ates —.0014, .2221, and .7765.
Tight as the in-sample regression relationship seem s to be, the
In other words, it is not that surprising that the et to the left of
real test of the hedge is w hether it works out-of-sam ple.3 To this
the dotted line are small because the regression coefficients
were estim ated to minimize the sum of squares of these errors.
By contrast, the errors to the right of the dotted line are out-of-
Table 1 0 .4 Regression Analysis of Changes in the
sam ple: these et are com puted from realizations of A y 30 after
Yield of the 20-year EUR Swap Rate on Changes in the
Ju ly 3, 2006, but using the regression coefficients estim ated over
10- and 30-Year EUR Swap Rates from July 2, 2001, to
the period from Ju ly 2, 2001, to Ju ly 3, 2006. It is, therefore, the
July 3, 2006
size and behavior of these out-of-sample errors that provide evi
No. of Observations 1281 dence as to the stability of the estim ated coefficients over time.
R-Squared 99.8% From inspection of Figure 10.3 the out-of-sample errors are
Standard Error .14 indeed small, for the most part, until August and Septem ber
2008, a peak in the financial crisis of 2007-2009. After then the
Regression Coefficients Value Std. Error
daily errors ran as high as about four basis points and as low as
Constant (a) - .0 0 1 4 .0040 about - 5 .3 basis points. And while the accuracy of the relation
Change in 10-Year Swap Rate (/310) .2221 .0034 ship seems to have recovered som ewhat to the far right-end of
the graph, by the summer of 2009, the errors there are not nearly
Change in 30-Year Swap Rate (/330) .7765 .0037
so well behaved as at the start of the out-of-sample period.
It is obvious and easy to say that the market maker, during the tur
bulence of a financial crisis, should have replaced the regression
3 The phrase in-sample refers to behavior within the period of estima of Table 10.4 and the resulting hedging rule. But replace these
tion, in this case July 2, 2001, to July 3, 2006. The phrase out-of-sample
refers to behavior outside the period of estimation, usually after but with what? W hat does the market maker do at that time, before
possibly before that period as well. there exist sufficient post-crisis data points? And what does the

market maker do after the worst of the crisis: estimate a regres the error today will be not far from - 4 % and that the yield of
sion from data during the crisis or revert to some earlier, more the y-bond yield will be closer to 1% than the 5% predicted by
stable period? These are the kinds of issues that make regression (10.25). Put another way, the errors in (10.25) are not likely to be
hedging an art rather than a science. In any case, it should again independent of each other, as assum ed, but rather persistent, or
be emphasized that avoiding these issues by blindly resorting to a correlated over tim e.
one-security DV01 hedge, or a two-security DV01 hedge with arbi
The change regression (10.26) assumes the opposite extrem e
trarily assigned risk weights, like 50%-50%, is even less satisfying.
with respect to the errors, i.e ., that they are com pletely persis
tent. Continuing with the exam ple of the previous paragraph,
with the yield on the y-bond at 1% yesterday and the yield on
10.3 L E V E L V ER SU S C H A N G E the x-bond unchanged from yesterday, the change regression
R EG R ESSIO N S predicts that y-bond will remain at 1%. But, as reasoned above,
it is more likely that the y-bond yield will move some of the way
W hen estimating regression-based hedges, some practitioners
back from 1% to 5%. Hence, the error term s in (10.26) are also
regress changes in yields on changes in yields, as in the previous unlikely to be independent of each other.
sections, while others prefer to regress yields on yields. M ath
The first lesson to be drawn from this discussion is that because
em atically, in the single-variable case, the level-on-level regres
the error term s in both (10.26) and (10.25) are likely to be cor
sion with dependent variable y and independent variable x is
related over tim e, i.e ., serially correlated, their estim ated coef
yt = a + (3xt + s t (10.25) ficients are not efficient. But, with nothing to gainsay the validity
while the change-on-change regression is4 of the other assum ptions concerning the error term s, the esti
mated coefficients of both the level and change specifications
Yt ~ Yt- 1 = Ayt = /3Axt + Aet (10.26)
are still unbiased and consistent.
By theory that is beyond the scope of this book, if the error
The second lesson to be drawn from the discussion of this sec
term s s t are independently and identically distributed random
tion is that there is a more sensible way to model the relation
variables with mean zero and are uncorrelated with the indepen
ship between two bond yields than either (10.26) or (10.25). In
dent variable, then so are the A s t, and least squares on either
particular, model the behavior that the y-bond's yield will, on
(10.25) or (10.26) will result in coefficient estim ators that are
average, move som ewhat closer from 1% to 5%. M athem atically,
unbiased,5 co n sisten t,6 and efficient, i.e., of minimum variance,
assume (10.25) with the error dynamics
in the class of linear estim ators. If the error term s of either speci
fication are not independent of each other, however, then the s t = pet_ 1 + v t (10.27)
least-squares coefficients of that specification are not necessarily
for some constant p < 1. Assumption (10.27) says that today's
efficient, but retain their unbiasedness and consistency.
error consists of some portion of yesterday's error plus a new ran
To illustrate the econom ics behind the assumption that error dom fluctuation. In terms of the numerical exam ple, if p = 75%,
term s are independent of each other, say that a = 0. that f3 = 1, then yesterday's error of —4% would generate an average error
that y is the yield on a coupon bond, and that x is the yield on today of 75% X 4% or 3% and, therefore, an expected
another, near-maturity coupon bond. Say further that the yield y-bond yield of 5% — 3% or 2%. In this way the error structure
on the x-bond was 5% yesterday and 5% again today while the (10.27) has the yield of the y-bond converging to its predicted
yield on the y-bond was 1% yesterday. Because the yield on value of 5% given the yield of the x-bond at 5%. While beyond
the x-bond is 5% today, the level Equation (10.25) predicts that the scope of this book, the procedure for estimating (10.25) with
the yield on the y-bond will be 5% today, despite its being 1% the error structure (10.27) is presented in many statistical texts.
yesterday. But if the m arket yield was so far off yesterday's pre
diction, with a realized error of —4%, then it is more likely that
10.4 PRINCIPAL C O M P O N EN T S
AN ALYSIS
4 It is usual to include a constant term in the change-on-change regres
sion, but for the purposes of this section, to maintain consistency across
the two specifications, this constant term is omitted.
Overview
5 An unbiased estimator of a parameter is such that its expectation
equals the true value of that parameter. Regression analysis tries to explain the changes in the yield of
6 A consistent estimator of a parameter, with enough data, becomes one bond relative to changes in the yields of a small number
arbitrarily close to the true value of the parameter. of other bonds. It is often useful, however, to have a single,
em pirical description of the behavior of the term structure that observation period spans from O ctober 2001 to O ctober 2008.
can be applied across all bonds. Principal C om pon en ts (PCs) (Data from more recent dates will be presented and discussed
provide such an em pirical description. later in this section.)
To fix ideas, consider the set of swap rates from 1 to 30 years Colum ns (2) to (4) in Table 10.5 correspond to the three PC
at annual m aturities. O ne way to describe the time series fluc curves in Figure 10.4. These com ponents can be interpreted as
tuations of these rates is through the variances of the rates follows. A one standard-deviation increase in the "Le ve l" PC,
and their pairwise covariances or correlations. Another way to given in column (2), is a sim ultaneous 3.80 basis-point increase
describe the data, however, is to create 30 interest rate factors in the one-year swap rate, a 5.86 basis-point increase in the
or com ponents, where each factor describes a change in each of 2-year, etc., and a 5.38 basis-point increase in the 30-year. This
the 30 rates. So, for exam ple, one factor might be a sim ultane PC is said to represent a "level" change in rates because rates
ous change of 5 basis points in the 1-year rate, 4.9 basis points of all maturities move up or down together by, very roughly, the
in the 2-year rate, 4.8 basis points in the 3-year rate, etc. Prin same amount. A one standard-deviation increase in the "Slo p e"
cipal Com ponents Analysis (PCA) sets up these 30 such factors PC , given in column (3), is a sim ultaneous 2.74 basis-point drop
with the following properties: in the 1-year rate, a 3.09 basis-point drop in the 2-year rate,
etc., and a 6.74 basis-point increase in the 30-year rate. This PC
1. The sum of the variances of the PCs equals the sum of the
is said to represent a "slo p e" change in rates because short
variances of the individual rates. In this sense the PCs cap
term rates fall while longer-term rates increase, or vice versa.
ture the volatility of this set of interest rates.
Finally, a one standard-deviation increase in the "Short Rate"
2 . The PCs are uncorrelated with each other. W hile changes PC , given in column (4), is made up of sim ultaneous increases in
in the individual rates are, of course, highly correlated
short-term rates (e.g ., one- and two-year term s), small decreases
with each other, the PCs are constructed so that they are in interm ediate-term rates (e.g ., 5- and 10-year term s), and
uncorrelated.
small increases in long-term rates (e.g ., 20- and 30-year term s).
3 . Subject to these two properties or constraints, each PC is W hile this PC is often called a "curvature" change, because
chosen to have the maximum possible variance given all interm ediate-term rates move in the opposite direction from
earlier PCs. In other words, the first PC explains the largest short- and long-term rates, the short-term rates moves dom i
fraction of the sum of the variances of the rates; the second nate. Hence, the third PC is interpreted here as an additional
PC explains the next largest fraction, etc. factor to describe m ovem ents in short-term rates.
PCs of rates are particularly useful because of an empirical regu O ne feature of the shape of the level PC warrants additional
larity: the sum of the variances of the first three PCs is usually discussion. Short-term rates might be exp ected to be more
quite close to the sum of variances of all the rates. Hence, rather volatile than longer-term rates because changes in short-term
than describing movements in the term structure by describing rates are determ ined by current econom ic conditions, which
the variance of each rate and all pairs of correlation, one can sim are relatively volatile, while longer-term rates are determ ined
ply describe the structure and volatility of each of only three PCs. mostly by expectations of future econom ic conditions, which
The next subsections illustrate PCs and their uses in

the co n text of USD and then global swap m arkets.
For interested readers, A p p e n d ix B in this chapter
d escrib es the construction of PCs with slightly more
m athem atical d etail, using the sim pler co n text of
three interest rates and three PC s. Fully general
and more m athem atical descriptions are available in
num erous other books and articles.
PCAs for USD Swap Rates

Figure 10.4 graphs the first three principal com ponents
from daily data on USD swap rates while Table 10.5
provides a selection of the same information in tabular rerm
form. Thirty different data series are used, one series Fiq u re 1 0 .4 The first three principal components from USD
for each annual maturity from one to 30 years, and the swap rates from October 2001 to October 2008.

Table 10.5 Selected Results of Principal Components for the USD Swap Curve from October 1, 2001, to
October 2, 2008. Units are Basis Points or Percentages
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
PCs % of PC Variances
PC Vol/
Total Vol
Term Level Slope Short Rate PC Vol Total Vol Level Slope Short Rate (% )
1 3.80 - 2 .7 4 1.48 4.91 4.96 59.8 31.0 9.1 99.05
2 5.86 - 3 .0 9 0.59 6.65 6.67 77.7 21.5 0.8 99.74
5 6.85 - 1 .5 3 - 0 .5 7 7.04 7.06 94.7 4.7 0.7 99.85
10 6.35 0.06 - 0 .3 4 6.36 6.37 99.7 0.0 0.3 99.83
20 5.69 0.82 0.14 5.75 5.75 97.9 2.0 0.1 99.95
30 5.38 1.09 0.39 5.51 5.52 95.6 3.9 0.5 99.79
Total 32.47 6.74 2.28 33.25 33.29 95.4 4.1 0.5 99.87
are relatively less volatile. But since the Board of G overnors Finally, the last row of the table gives statistics on the square
of the Federal Reserve System , like many other central banks, root of the sum of the variances across rates of different m aturi
anchors the very short-term rate at some desired level, the ties. The sum of the variances is not a particularly interesting
volatility of very short-term rates is significantly dam pened. The econom ic quantity— it does not, for exam ple, represent the vari
level factor, which, as will be discussed shortly, explains the vast ance of any interesting portfolio— but, as m entioned in the over
majority of term structure m ovem ents, and reflects this behav view of P C A , this sum is used to ensure that the PCs capture all
ior on the part of central banks: very short-term rates move of the volatility of the underlying interest rate series.
relatively little. Then, at longer m aturities, the original effect
Having explained the calculations of Figure 10.4 and Table 10.5,
prevails and longer-term rates move less than interm ediate and
the te xt can turn to interpretation. First and forem ost, column
shorter-term rates.
(10) of Table 10.5 shows that, for rates of all m aturities, the three
Column (5) of Table 10.5 gives the com bined standard deviation principal com ponents explain over 99% of rate volatility. And,
or volatility of the three principal com ponents for a given rate, across all rates, the three PCs explain 99.87% of the sum of the
and column (6) gives the total or em pirical volatility of that rate. variability of these rates. W hile these findings represent rela
For the one-year rate, for exam ple, recalling that the principal tively recent data on U.S. swap rates, similarly high explanatory
com ponents are uncorrelated, the com bined volatility, in basis powers characterize the first three com ponents of other kinds
points, from the three com ponents is of rates, like U.S. governm ent bond yields and rates in fixed
income markets in other countries. These results provide a great
V 3 .8 0 2 + (—2.74)2 + 1.482 = 4.91 (10.28)
deal of com fort to hedgers: while in theory many factors (and,
The total or empirical volatility of the one-year rate, however, therefore, securities) might be required to hedge the interest
com puted directly from the time series data, is 4.96 basis points. rate risk of a particular portfolio, in practice, three factors cover
Column (10) of the table gives the ratio of columns (5) and (6), the vast majority of the risk.
which, for the 1-year rate is 4.9099/4.9572 or 99.05% . (For read
Colum ns (7) through (9) of Table 10.5 show that the level com
ability, many of the entries of Table 10.5 are rounded although
ponent is far and aw ay the m ost im portant in explaining the
calculations are carried out to higher precision.)
vo latility of the term structure. The construction of principal
Colum ns (7) through (9) of Table 10.5 give the ratios of the com ponents, described in the overview , does ensure that
variance of each PC com ponent to the total PC variance. the first com ponent is the m ost im portant com ponent, but
For the 1-year rate, these ratios are 3.802/4.912 = 59.9% ; the extrem e dom inance of this com ponent is a feature of the
(—2.74)2/4.912 = 31.1% ; and 1.482/4.912 = 9.1%. data. This finding is useful for thinking about the costs and
benefits of adding a second or third facto r to a one-factor Table 10.6 Par Swap Rates and DV01s as of
hedging fram ew ork. Interestingly too, the dom inance of the May 28, 2010
first facto r is sig nificantly muted in the very short end of the
Term Rate DV01
curve. This im plies that hedging one short-term bond with
another will not be so effective as hedging one longer-term 2 1.235% .0197
bond with another. Or, put another w ay, relatively more fa c
5 2.427% .0468
tors or hedging securities are needed to hedge portfolios
10 3.388% .0842
that are co ncentrated at the short end of the curve. This
m akes intuitive sense in the co n text of the exten sive inform a 30 4.032% .1731
tion m arket participants have about near-term events and
th eir effects on rates relative to the inform ation they have on
events further into the future. Similarly, the equation that neutralizes the overall exposure to
the slope PC is
Hedging with PCA and an Application .0197 .0842

X (- 3 .0 9 ) - F 10 X .06 100
to Butterfly Weights 100 100
A PCA-based hedge for a portfolio would proceed along the .0468

X (- 1 .5 3 ) = 0 (10.30)
100
lines of the multi-factor approaches. Start with the current price
of the portfolio under the current term structure. Then, shift
Solving, F 2 = - 1 2 0 .2 6 and F 10 = -3 4 .0 6 or, in term s of risk
each principal com ponent in turn to obtain new term structures
weights relative to the DV01 of the five-year swap,
and new portfolio prices. N ext, calculate an '01 with respect
to each principal com ponent using the difference between the .0197
respective shifted price and the original price. Finally, using 120.26 X
100
these portfolio '01s and analogously constructed '01s for a cho 50.6% (10.31)
.0468
sen set of hedging securities, find the portfolio of hedging secu
.0842
rities that neutralizes the risk of the portfolio to the m ovem ent 34.06 X
100
of each PC. 61.3% (10.32)
.0468
P C A is particularly useful for constructing em pirically-based
hedges for large portfolios; it is im practical to perform and In w ords, the D\/01 of the five-year sw ap is hedged 50.6% by
assess individual regressions for every security in a large portfo the tw o-year sw ap and 61.3% by the 10-year sw ap. Note that
lio. For illustration purposes, however, this subsection will illus the sum of the risk w eights is not 100% : the hedge neutral
trate how PC A is used, in practice, to hedge a butterfly trade. izes exposures to the level and slope PC s, not exp o sures to
Most typically, butterfly trades use three securities and either parallel shifts. To the exten t that the term structure changes as
buy the security of interm ediate maturity and short the wings or assum ed, i.e ., as som e com bination of the first tw o PC s, then
short the interm ediate security and buy the wings. the hedge will w ork exactly. On the other hand, to the exten t
that the actual change deviates from a com bination of these
To take a relatively common butterfly, consider a trader who
tw o PC s, the hedge will not, ex p o st, have fully hedged inter
believes that the 5-year swap rate is too high relative to the
est rate risk.
2- and 10-year swap rates and is, therefore, planning to receive
in the 5-year and pay in the 2- and 10-year. As of May 28, 2010, Hedging the interest rate risk of the five-year swap with two
the par swap rates and DV01s of the swaps of relevant terms other swaps in not uncom m on, a practice supported by the
are listed in Table 10.6. (The 30-year data will be used shortly.) large fraction of rate variance explained by the first two PCs.
To calculate the P C A hedge ratios, assume that the trader will A trader m ight also d ecid e, however, to hedge the third PC as
receive on 100 notional am ount of 5-year swaps and will trade w ell. A hedge against the first three PC s, found by generalizing
F2 and F 10 notional amount of 2- and 10-year swaps. Using the the tw o-security hedge just discussed, gives rise to risk w eights
data from Tables 10.5 and 10.6, the equation that neutralizes of 28.1% , 139.1% , and -6 7 .4 % in the 2-, 10-, and 30-year
the overall portfolio's exposure to the level PC is sw aps, respectively, i.e ., pay in the 2- and 10-year, but receive
in the 30-year.
o .0197 _in .0842 .0468
-F2 x 5.86 - F 10 x 6.35 - 100 x ——— x 6.85 = 0 Is hedging the third PC worthwhile? The answer depends on
100 100 100
(10.29) the trader's risk preferences, but the following analysis is useful.

Say that the trader hedges the first two com ponents
alone and then the third com ponent experiences a one
standard-deviation decrease. The P&L of the trade, per
100 face am ount of the 5-year swap, would be
.0197 .0468
-1 2 0 .2 6 X X .59 + 100 X X (-.5 7 )
100 100
.0842
- 34.06 X X (-.3 4 ) -.0 3 1 (10.33)
100
or, for a two standard-deviation move, a loss of a bit

more than 6 cents per 100 face amount of the 5-year
swap. As these two standard deviations of short rate Term
risk equates to not even 1.5 basis points of conver Fiaure 10.5 The first three principal components from EUR
gence of the 5-year swap, a trader might very well not swap rates from October 2001 to October 2008.
bother with this third leg of the hedge.
Principal Component Analysis of 9.00

EUR, GBP, and JPY Swap Rates
6.75
Figures 10.5 to 10.7 show the first three PCs for the EUR,
GBP, and JP Y swap rate curves over the same sample 4.50
C
o
•
period as the USD PCs in Figure 10.4. The striking fact

r) 2.25
<
about these graphs is that the shape of the PCs are very
much the same across USD, EUR, and GBP. The only sig “ 0.00
nificant difference is in m agnitudes, with the USD level
-2.25
com ponent entailing larger-sized moves than the level
com ponents of EUR and GBP. The PCs of the JP Y curve -4.50
0 5 10 15 20 25 30
are certainly sim ilar to those of these other countries, Term
but the level com ponent in JP Y does not have the same
Fiaure 10.6 The first three principal components from
hump: in JP Y the first PC does not peak at the five-year
GBP swap rates from October 2001 to October 2008.
maturity point as do the other curves, but increases mono-
tonically with maturity before ultimately leveling off. The
significance of this difference in shape will be discussed in
the next subsection.
The Shape of PCs over Time

As with any em pirically based hedging m ethodology, a
decision has to be made about the relevant tim e period
over which to estim ate param eters. This is an issue for
regression-based m ethods, as discussed in this chapter,
and it is no less an issue for PC A . As will be discussed in
this subsection, the qualitative shapes of PCs have, until
very recently, remained rem arkably stable. This does not
imply, however, that differences in PCs estim ated over
different tim e periods can be ignored in the sense that The first three principal components from JPY
Fiqure 10.7
they have no im portant effects on the quality of hedges. swap rates from October 2001 to October 2008.
But having made this point, the te xt focuses on the
relatively recent changes in the shapes of PCs around
the world.
Figure 10.4 showed the first three USD PCs com

puted over the period 2001 to 2008, but, for quite
some tim e, the qualitative shapes of these PCs was
pretty much the sam e.7 The volatility of rates has
changed over tim e, and with it the magnitude or
height of the PC curves, but the qualitative shapes
have not changed much. Most recently, however,
there has been a qualitative change to the shape of
the first PC in USD, EUR, and GBP. In fact, these
shapes have becom e more like the past shape of the
first PC in JPY! Fiaure 10.8 The first principal component in USD and EUR
Fig ures 10.8 and 10.9 co n trast the level PC over
swap rates estimated from October 2001 to October 2008 and
the historical period O cto b e r 2001 to O cto b e r
from October 2008 to October 2010.
2008 with th at of the post-crisis p erio d , O cto b e r
2008 to O cto b e r 2010. Figure 10.8 m akes the
com parison fo r USD and EU R w hile Figure 10.9
does the sam e fo r G B P and JP Y . Th e historical
m axim um of the level PC at a term of about five
years in U SD , EU R , and G B P has been pushed out
d ram atically to 10 years and beyond. In fa ct, these
shapes now more clo sely resem b le the level PC of
J P Y over the e a rlier estim ation p erio d . O ne e x p la
nation fo r this is the increasing certain ty th at ce n
tral banks will m aintain easy m onetary conditions
and low rates fo r an e xte n d e d period of tim e . This
dam pens the vo latility of short- and interm ed iate-
term rates relative to th at of longer-term rates,
low ers the absolute vo latility of short-term rates,
Figure 10.9 The first principal component in GBP and JPY swap
and increases the vo latility of long-term rates,
reflecting the u ncertainty of the ultim ate results of
rates estimated from October 2001 to October 2008 and from
central bank p o licy. M eanw hile, the level PC for
October 2008 to October 2010.
J P Y in the m ost recent period has becom e even more pro-
nouncedly upw ard -sloping, co nsistent with an
even longer period of central-bank control over the A P P EN D IX A
short-term rate.
The Least-Squares Hedge Minimizes

the Variance of the P&L of the Hedged
Position
7 See, for example, Figure 2 of Bulent Baygun, Janet Showers,
The P&L of the hedged position, given in (10.10) and repeated
and George Cherpelis, Salomon Smith Barney, "Principles of
Principal Com ponents," January 31, 2000. The shapes of the three here, is
PCs in that graph, covering the period from January 1989 to
February 1998, are qualitatively extremely similar to those of DV01
-A y ? - X (10.34)
Figure 10.4 in this chapter. 100

Let V{ •) and Cov( •, •) denote the variance and covariance func- The combination of data on volatilities and correlations are
tions. The variance of the P&L expression in (10.34) is usefully com bined into a variance-covariance matrix, denoted
by V, where the elem ent in the /th row and j th column gives
N \2
D V 01RY :W/A ~ D V 01 the covariance of the rate of term / with the rate of term j,
Fr X V(Ay?) + ( f- x — , V(AytN)
100 100 I v yt ) or, the correlation of / and j tim es the standard deviation of
/ tim es the standard deviation of j. For exam ple, the covari
+ 2l x x lCov{Ay* N) aa3 5 )
100 100 ance of the 20-year swap rate with the 30-year swap rate is
.99 X 4.20 X 4.15, or 17.26. The variance-covariance m atrix for
To find the face am ount FR that minimizes this variance, differen the exam ple of this appendix is
tiate (10.35) with respect to F R and set the result to zero:
/18.06 16.96 15.87
V = 16.96 17.64 17.26 (10.39)
0 ^ 15.87 17.26 17.22
(10.36) O ne use of a variance-covariance m atrix is to write succinctly the

variance of a particular portfolio of the relevant securities. C o n
Then, rearranging term s,
sider a portfolio with a total DV01 of .50 in the 10-year swap,
—1.0 in the 20-year swap, and .60 in the 30-year swap. W ith
C o v jA y t, Ay,w)
Fn X DV01N X -F r XD VD 1r (10.37) out m atrix notation, then, the dollar variance of the portfolio,
V(Ay?)
denoted by a 2 would be given by
But, by the properties of least squares, not derived in this text, O-2 = .524 .2 5 2 + ( - 1 ) 2 4 .2 0 2 + .624 .1 5 2
C o v (A y f, Ay,w)
+ 2 x.5 x (- 1 ) x .95 x 4.25 x 4.20
(10.38) + 2 X.5 X .6 X .90 X 4.25 X 4.15
V(Ay?)
+ 2 x(- 1 ) x .6 x .99 x 4.20 x 4.15
Therefore, substituting (10.38) into (10.37) gives the regression = .4642 (10.40)
hedging rule (10.9) of the text.
With m atrix notation, letting the transpose of the vector w be
w' = (.5, —1, .6), the dollar variance of the portfolio is given
A P P EN D IX B more com pactly by
/1 8 .0 6 16.96 1 5 .8 7 \/.5 \
Constructing Principal Components from w 'V w = (.5, - 1 , .6) 16.96 17.64 17.26 -1 (10.41)
\ 15.87 17.26 1 7 . 2 2 / \ .6 /
Three Rates
Finally, note that the sum of the variances of the rates is
The goal of this appendix is to dem onstrate the construction
4.252 + 4.202 + 4.152 = 52.925, or, for a measure of total volatil
and properties of PCs with a minimum of m athem atics. To
ity, take the square root of that sum to get 7.27 basis points.
this end, consider three swap rates, the 10-year, 20-year, and
30-year. O ver some sam ple period, the volatilities of these rates, Returning now to principal com ponents, the idea is to create
in basis points per day, are 4.25, 4.20, and 4.15. Furtherm ore, three factors that capture the same information as the variance-
the correlations among these rates are given in the correlation covariance matrix. The procedure is as follows. Denote the first
m atrix of Table 10.7. principal com ponent by the vector a = (a1f a2, a3) \ Then find the
elem ents of this vector by maximizing a'V a such that a 'a = 1.
As m entioned in the PC A overview, this maximization ensures
Tab le 1 0 .7 Correlation Matrix for Swap Rate that, among the three PCs to be found, the first PC explains the
Example largest fraction of the variance. The constraint, a 'a = 1, along
with a similar constraint placed on the other PCs, will ensure
Term 10-Year 20-Year 30-Year
that the total variance of the PCs equals the total variance of
10-Year 1.00 0.95 0.90 the underlying data. Performing this m axim ization, which can be
done with the solver in Excel, a = (.5758, .5866, .5696). Note that
20-Year 0.95 1.00 0.99
the variance of this first com ponent is a 'V a = 51.041 which is
30-Year 0.90 0.99 1.00
51.041/52.925 or 96.44% of the total variance of the rates.
The second principal com ponent, denoted by the vector Table 1 0.8 Transformed PCs for the Swap Rate
b = (b-1, bz, b3) is found by maximizing b'Vb such that b'b = 1 Example
and b'a = 0. The maximization and the first constraint are anal
Term 1 st PC 2nd PC 3rd PC
ogous to those for finding the first principal com ponent. The
second constraint requires that the PC b is uncorrelated with the 10-Year 4.114 21.068 .032
first PC , a. Solving, gives b = (—.7815, .1902, .5941). Note that 20-Year 4.191 .260 2.103
b'Vb = 1.867 which explains 1.867/52.925 or 3.53% of the total
30-Year 4.069 .812 .075
variance of the rates.
Finally, the third PC , denoted by c = (c1# c2, c3) is found by solv

ing the three equations, c'c = 1; c'a = 0; and c'b = 0. The
• The square root of the sum of the variances of the PCs is the
solution is c = (.2402, 2.7872, .5680).
square root of the sum of the variances of the rates, which
As will be clear in a moment, it turns out to be more intuitive quantity was given above as 7.27 basis points:
to work with a different scaling of the PCs, namely, by m ultiply
V 7 .1 4 2 + 1.372 + .132 = V 5 2 .9 2 5 = 7.27 (10.46)
ing each by its volatility. In the exam ple, this means multiplying
the first PC by \/51.041 or 7.14; the second PC by V l.8 6 7 or • The volatility of any portfolio can be found by computing its
1.37; and the third by \/.0 1 7 or .13. This gives the PCs, to be volatility with respect to each of the PCs and then taking the
a, b, c, as recorded in Table 10.8.
'V '
denoted square root of the sum of the resulting variances. Returning

to the portfolio with DV01 weights of w' = (.5, —1, .6), its
Under this scaling the PCs have a very intuitive interpretation:
volatility with respect to each of the PCs can be com puted as
a one standard-deviation increase of the first PC or factor is a
in Equations (10.47) through (10.49). Then, adding the sum
4.114 basis-point increase in the 10-year rate, a 4.191 basis-
of these squares and taking the square root, gives a portfolio
point increase in the 20-year rate, and a 4.069 basis-point
volatility of .464, as com puted earlier from the variances and
increase in the 30-year rate. Sim ilarly, a one standard-deviation
covariances.
increase of the second PC is a 1.068 basis-point drop in the
10-year rate, a .260 basis-point increase in the 20-year rate, and
V ( w ' a ) 2 = V ( .5 x 4.114 - 1 X 4.191 + .6 X 4.069)2 = .3074
a .812 basis-point increase in the 30-year rate. Finally, a one
(10.47)
standard-deviation increase of the third PC constitutes changes
of .032, -.1 0 3 , and .075 basis points in each of the rates, V ( w ' b ) 2 = V ( .5 x (- 1 .0 6 8 - 1 x .260 + .6 x .812)2 = .3068
respectively. (10.48)
To appreciate the scaling of the PCs in Table 10.8, note the fol
V T w ' l y = V ( .5 X .032 - 1 X (- .1 0 3 ) + .6 X .075)2 = .1640
lowing im plications:
(10.49)
• By construction, the PCs are uncorrelated. Hence, the volatil
In summary, the PCs in Table 10.8 contain the same information
ity of the 10-year rate can be recovered from Table 10.8 as
as the variances and covariances, but have the interpretation of
V 4 .1 1 4 2 + (-1 .0 6 8 )2 + 0.322 = 4.25 (10.42) one standard-deviation changes in the level, slope, and short
rate factors. O f course, the power of the m ethodology is evi
And the volatilities of the 20- and 30-year rates can be recov
dent not in a sim ple exam ple like this, but when, as in the text,
ered equivalently.
changes in 30 rates can be adequately expressed with changes
• The variance of each PC is the sum of squares of its elem ents, in three factors.
or, its volatility is the square root of that sum of squares. For
the three PCs,
V 4 .1 1 4 2 + 4.1912 + 4.0692 = 7.14 (10.43)
V (- 1 .0 6 8 ) 2 + 2602 + .8122 = 1.37 (10.44)
V .0 3 2 2 + (—.103)2 + .0752 = .13 (10.45)

The Science of
Term Structure
Models
Learning Objectives
Calculate the expected discounted value of a zero-coupon Describe the rationale behind the use of recombining
security using a binomial tree. trees in option pricing.
Construct and apply an arbitrage argument to price a call Calculate the value of a constant maturity Treasury
option on a zero-coupon security using replicating portfolios. swap, given an interest rate tree and the risk-neutral
probabilities.
Define risk-neutral pricing and apply it to option pricing.
Evaluate the advantages and disadvantages of reducing
Distinguish between true and risk-neutral probabilities, the size of the tim e steps on the pricing of derivatives on
and apply this difference to interest rate drift. fixed-incom e securities.
Explain how the principles of arbitrage pricing of deriva Evaluate the appropriateness of the Black-Scholes-M erton
tives on fixed income securities can be extended over model when valuing derivatives on fixed income securities.
multiple periods.
Define option-adjusted spread (OAS) and apply it to

security pricing.
155
This chapter uses a very simple setting to show how to price The price tree for USD 1,000 face value of a one-year zero is the
interest rate contingent claims relative to a set of underlying following:
securities by arbitrage argum ents. Unlike the arbitrage pricing
of securities with fixed cash flows, the techniques of this chapter
require strong assumptions about how interest rates evolve in
the future. This chapter also introduces option-adjusted sp rea d
(OAS) as the most popular measure of deviations of market
prices from those predicted by m odels.
The three date 2 prices of USD 1,000 are, of course, the maturity
11.1 RATE AND PRICE TREES values of the one-year zero. The two date 1 prices come from
discounting this certain USD 1,000 at the then-prevailing six-month
Assum e that the six-month and one-year spot rates are 5% and rate. Hence, the date 1 up-state price is USD 1,000/(1 + 055A )
5.15% respectively. Taking these m arket rates as given is equiva or USD 973.2360, and the date 1 down-state price is USD 1,000/
lent to taking the prices of a six-month bond and a one-year ( 1 + 045/2 ) or USD 977.9951. Finally, the date 0 price is computed
bond as given. Securities with assumed prices are called under using the given date 0 one-year rate of 5.15%: USD 1,000/
lying securities to distinguish them from the contingent claims (1 + 0515/2) 2 or 950.423.
priced by arbitrage arguments.
The probabilities of moving up or down the tree may be used
N ext, assume that six months from now the six-month rate to com pute the average or expected values. As of date 0, the
will be either 4.50% or 5.50% with equal probability. This expected value of the one-year zero's price on date 1 is
very strong assumption is depicted by means of a binom ial
tree, where "binom ial" means that only two future values are 1 USD 973.24 + ^ USD 978.00 = USD 975.62 (11.1)
possible:
Discounting this expected value to date 0 at the date 0,
5.50% A
six-month rate gives an e x p e c te d d isco u n ted value of
l USD 973.24 + l USD 978.00

4.50% -------------------- ^ ---------------= USD 951.82 (11.2)
(1 +
Note that the columns in the tree represent dates. The six-
month rate is 5% today, which will be called date 0. On the next Note that the one-year zero's expected discounted value of
date six months from now, which will be called date 1, there are USD 951.82 does not equal its given market price of USD 950.42.
two possible outcom es or sta tes o f the w orld. The 5.50% state These two numbers need not be equal because investors do not
will be called the up-state while the 4.50% state will be called price securities by expected discounted value. Over the next six
the dow n-state. months the one-year zero is a risky security, worth USD 973.24 half
of the time and USD 978 the other half of the time for an average
Given the current term structure of spot rates (i.e., the current
or expected value of USD 975.62. If investors do not like this price
six-month and one-year rates), trees for the prices of six-month
uncertainty, they would prefer a security worth USD 975.62 on date
and one-year zero-coupon bonds may be com puted. The price
1 with certainty. More specifically, a security worth USD 975.62 with
tree for USD 1,000 face value of the six-month zero is
certainty after six months would sell for USD 9 75.62/( 1 + 05/2)
or USD 951.82 as of date 0. By contrast, investors penalize the risky
one-year zero-coupon bond with an average price of USD 975.62
after six months by pricing it at USD 950.42. The next chapter elab
orates further on investor risk aversion and how large an impact it
since USD 1,000/(1 + 05/2) = USD 975.61. (For easy readability, might be expected to have on bond prices.1
currency symbols are not included in price trees).
Note that in a tree for the value of a particular security, the 1 Over one period, discounting the expected value and taking the
maturity of the security falls with the date. On date 0 of the pre expectation of discounted values are the same. But, over many periods
the two are different and, with the approach taken by the short rate
ceding tree the security is a six-month zero, while on date 1 the models, taking the expectation of discounted values is correct— hence
security is a maturing zero. the choice of the term "expected discounted value."
11.2 A R B ITR A G E PRICING replicated by buying about USD 630.25 face value of one-year
zeros and simultaneously shorting about USD 613.39 face amount
O F DERIVATIVES of six-month zeros. Since this is the case, the law of one price
requires that the price of the option equal the price of the repli
The te xt now turns to the pricing of a derivative security. W hat
cating portfolio. But this portfolio's price is known and is equal to
is the price of a call option, maturing in six months, to purchase
USD 1,000 face value of a then six-month zero at USD 975? .97561F 5 + .9 5 0 4 2 F1 = -.9 7 5 6 1 X USD 613.3866 + .95042
Begin with the price tree for this call option: X USD 630.2521 = USD .58 (11.5)
Therefore, the price of the option must be USD .58.
• • • Recall that pricing based on the law of one price is enforced

by arb itrage. If the price of the option w ere less than USD
.58, arbitrageurs could buy the option, short the replicating
If on date 1 the six-month rate is 5.50% and a six-month zero portfolio, keep the difference, and have no future liabilities.
sells for USD 973.23, the right to buy that zero at USD 975 is Sim ilarly, if the price of the option w ere greater than U S D .58,
worthless. On the other hand, if the six-month rate turns out to arbitrageurs could short the option, buy the replicating portfo
be 4.50% and the price of a six-month zero is USD 978, then the lio, keep the difference, and, once again, have no future liabili
right to buy the zero at USD 975 is worth USD 978 - USD 975 or ties. Thus, ruling out profits from riskless arbitrage im plies an
USD 3. This description of the option's terminal payoffs em pha option price of U S D .58.
sizes the derivative nature of the option: its value depends on
It is im portant to em phasize that the option cannot be priced
the value of an underlying security.
by expected discounted value. Under that m ethod, the option
A security is priced by arbitrage by finding and pricing its rep price would appear to be
licating portfolio. W hen, as in that context, cash flows do not
.5 X USD 0 + .5 X USD 3
depend on the levels of rates, the construction of the replicating USD 1.46 ( 11 . 6 )
1 + ^
portfolio is relatively sim ple. The derivative context is more dif
ficult because cash flows do depend on the levels of rates, and The true option price is less than this value because investors
the replicating portfolio must replicate the derivative security for dislike the risk of the call option and, as a result, will not pay as
any possible interest rate scenario. much as its expected discounted value. Put another way, the risk
penalty im plicit in the call option price is inherited from the risk
To price the option by arbitrage, construct a portfolio on date
penalty of the one-year zero, that is, from the property that the
0 of underlying securities, namely six-month and one-year zero-
price of the one-year zero is less than its expected discounted
coupon bonds, that will be worth USD 0 in the up-state on date
value. O nce again, the m agnitude of this effect is discussed in
1 and USD 3 in the down-state. To solve this problem , let F 5 and
the next chapter.
F 1 be the face values of six-month and one-year zeros in the rep
licating portfolio, respectively. Then, these values must satisfy This section illustrates arbitrage pricing with a call option, but it
the following two equations: should be clear that arbitrage can be used to price any security
with cash flows that depend on the six-month rate. Consider, for
F 5 + .9 7 3 2 4 F1 = USD 0 (11.3)
exam ple, a security that, in six months, requires a paym ent of
F 5 + .9 7 8 0 0 F1 = USD 3 (11.4)
USD 200 in the up-state but generates a paym ent of USD 1,000
Equation (11.3) may be interpreted as follows. In the up-state, in the down-state. Proceeding as in the option exam ple, find the
the value of the replicating portfolio's now maturing six-month portfolio of six-month and one-year zeros that replicates these
zero is its face value. The value of the once one-year zeros, now two terminal payoffs, price this replicating portfolio as of date 0,
six-month zeros, is .97324 per dollar face value. Hence, the left- and conclude that the price of the hypothetical security equals
hand side of Equation (11.3) denotes the value of the replicating the price of the replicating portfolio.
portfolio in the up-state. This value must equal USD 0, the value
A rem arkable feature of arbitrage pricing is that the prob
of the option in the up-state. Sim ilarly, Equation (11.4) requires
abilities of up and down m oves never enter into the calcula
that the value of the replicating portfolio in the down-state
tion of the arbitrage price. See Equations (11.3) to (11.5). The
equal the value of the option in the down-state.
explanation for this som ew hat surprising observation follows
Solving Equations (11.3) and (11.4), F 5 = —USD 613.3866 and from the principles of arbitrage. A rb itrag e pricing requires
F 1 = —USD 630.2521. In words, on date 0 the option can be that the value of the replicating portfolio m atches the value of
Chapter 11 The Sci of Term Structure Models ■ 157

the option in both the up and the dow n-states. Th erefo re, the and down-states be p and (1 — p), respectively. Then, solve the
com position of the replicating portfolio is the sam e w hether following equation:
the probability of the up-state is 20% , 50% , or 80% . But if the
USD 9 7 3 .24p + USD 978.00(1 - p)
com position of the portfolio does not depend directly on the ----------------------- rr--------------- — = USD 950.42 (11.8)
i -r
^ •UD
~ 2 ~
probabilities, and if the prices of the securities in the portfolio

are given, then the price of the replicating portfolio and hence The solution is p = .8024. In words, under the risk-neutral prob
the price of the option cannot depend directly on the prob abilities of .8024 and .1976 the expected discounted value
abilities either. equals the m arket price.
D espite the fact that the option price does not depend In later chapters the difference between true and risk-neutral
directly on the p robabilities, these probabilities must have probabilities is described in term s of the drift in interest rates.
som e im pact on the option price. A fte r all, as it becom es Under the true probabilities there is a 50% chance that the
more and more likely that rates will rise to 5.50% and that six-month rate rises from 5% to 5.50% and a 50% chance that
bond prices will be low, the value of options to purchase it falls from 5% to 4.50% . Hence the expected change in the
bonds must fall. The resolution of this apparent paradox is six-month rate, or the drift of the six-month rate, is zero. Under
that the option price depend s indirectly on the probabilities the risk-neutral probabilities there is an 80.24% chance of a
through the price of the one-year zero. W ere the probability of 50-basis point increase in the six-month rate and a 19.76%
an up move to increase suddenly, the current value of a chance of a 50-basis point decline for an expected change of
one-year zero would d eclin e. And since the replicating p o rt 30.24 basis points. Hence the drift of the six-month rate under
folio is long one-year zeros, the value of the option would these probabilities is 30.24 basis points.
decline as w ell. In sum m ary, a derivative like an option As pointed out in the previous section, the expected dis
depend s on the probabilities only through current bond counted value of the option payoff is USD 1.46, while the arbi
prices. G iven bond prices, however, probabilities are not trage price is U SD .58. But what if expected discounted value
needed to derive arbitrage-free prices.
is com puted using the risk-neutral probabilities? The resulting
option value would be:
11.3 RISK-N EUTRAL PRICING .8024 X USD 0 + .1976 X USD 3

U SD .58 (11.9)
1I fi i05
2
Risk-neutral pricing is a technique that m odifies an assumed
The fact that the arbitrage price of the option equals its
interest rate process, like the one assumed at the start of this
expected discounted value under the risk-neutral probabilities
chapter, so that any contingent claim can be priced without
is not a coincidence. In general, to value contingent claims
having to construct and price its replicating portfolio. Since the
by risk-neutral pricing, proceed as follows. First, find the risk-
original interest rate process has to be modified only once, and
neutral probabilities that equate the price of the underlying
since this modification requires no more effort than pricing a
securities with their expected discounted values. (In the simple
single contingent claim by arbitrage, risk-neutral pricing is an
exam ple of this chapter the only risky, underlying security is the
extrem ely efficient way to price many contingent claims under
one-year zero.) Second, price the contingent claim by expected
the same assumed rate process.
discounted value under these risk-neutral probabilities. The
In the exam ple of this chapter, the price of a one-year zero does rem ainder of this section will describe intuitively why risk-neutral
not equal its expected discounted value. The price of the one- pricing works. Since the argum ent is a bit com plex, it is broken
year zero is USD 950.42, com puted from the given one-year up into four steps.
spot rate of 5.15% . A t the same tim e, the expected discounted
S te p 1: Given trees for the underlying securities, the price of a
value of the one-year zero is USD 951.82, as derived in Equation
security that is priced by arbitrage does not depend on investors'
(11.2) and reproduced here:
risk preferences. This assertion can be supported as follows.
l USD 973.24 + ^ USD 978.00
-------------------------------------- = USD 951.82 (11.7) A security is priced by arbitrage if one can construct a portfolio
(1 + f )
that replicates its cash flows. Under the assumed process for
The probabilities of V2 for the up and down-states are the interest rates in this chapter, for exam ple, the sam ple bond
assumed true or real-world probabilities. But there are other option is priced by arbitrage. By contrast, it is unlikely that a
probabilities, called risk-neutral probabilities, that do cause the specific common stock can be priced by arbitrage because no
expected discounted value to equal the m arket price. To find portfolio of underlying securities can mimic the idiosyncratic
these probabilities, let the risk-neutral probabilities in the up fluctuations in a single common stock's market value.
If a security is priced by arbitrage and everyone agrees on the the tree is said to be nonrecom bining. From an economic per
price evolution of the underlying securities, then everyone will spective, there is nothing wrong with this kind of tree. To justify
agree on the replicating portfolio. In the option exam ple, both this particular tree, for exam ple, one might argue that when short
an extrem ely risk-averse, retired investor and a professional rates are 5% or higher they tend to change in increments of 50
gam bler would agree that a portfolio of USD 630.25 face of basis points. But when rates fall below 5%, the size of the change
one-year zeros and —USD 613.39 face of six-month zeros starts to decrease. In particular, at a rate of 4.50% , the short rate
replicates the option. And since they agree on the composition may change by only 45 basis points. A volatility process that
of the replicating portfolio and on the prices of the underlying depends on the level of rates exhibits sta te-d ep en d en t volatility.
securities, they must also agree on the price of the derivative.
Despite the economic reasonableness of nonrecombining trees,
Step 2: Imagine an economy identical to the true economy with practitioners tend to avoid them because such trees are difficult or
respect to current bond prices and the possible value of the six- even impossible to implement. After six months there are two pos
month rate over time but different in that the investors in the sible states, after one year there are four, and after N semiannual
imaginary economy are risk neutral. Unlike investors in the true periods there are 2 N possibilities. So, for example, a tree with semi
economy, investors in the imaginary economy do not penalize annual steps large enough to price 10-year securities will, in its right
securities for risk and, therefore, price securities by expected dis most column alone, have over 500,000 nodes, while a tree used to
counted value. It follows that, under the probabilities in the imagi price 20-year securities will in its rightmost column have over 500
nary economy, the expected discounted value of the one-year billion nodes. Furthermore, as discussed later in the chapter, it is
zero equals its market price. But these probabilities satisfy Equa often desirable to reduce substantially the time interval between
tion (11.8), namely the risk-neutral probabilities of .8024 and .1976. dates. In short, even with modern computers, trees that grow this
quickly are computationally unwieldy. This doesn't mean, by the
Step 3: The price of the option in the imaginary econom y, like
way, that the effects that give rise to nonrecombining trees, like
any other security in that econom y, is com puted by expected
state-dependent volatility, have to be abandoned. It simply means
discounted value. Since the probability of the up-state in that
that these effects must be implemented in a more efficient way.
econom y is .8024, the price of the option in that econom y is
given by Equation (11.9) and is, therefore, USD .58. Trees in which the up-down and down-up-states have the same
value are called recom bining trees. An exam ple of this type of
Step 4: Step 1 implies that given the prices of the six-month and
tree that builds on the two-date tree of the previous sections is
one-year zeros, as well as possible values of the six-month rate, the
price of an option does not depend on investor risk preferences. It
follows that since the real and imaginary economies have the same
bond prices and the same possible values for the six-month rate,
the option price must be the same in both economies. In particu
lar, the option price in the real economy must equal USD .58, the
option price in the imaginary economy. More generally, the price
of a derivative in the real economy may be computed by expected
Note that there are two nodes after six months, three after
discounted value under the risk-neutral probabilities.
one year, and so on. A tree with w eekly rather than semiannual
steps capable of pricing a 30-year security would have only
11.4 A R BITR A G E PRICING IN 52 X 30 T 1 or 1,561 nodes in its rightm ost column. Evidently,
A MULTI-PERIOD SETTIN G recombining trees are much more m anageable than nonrecom
bining trees from a com putational viewpoint.
Maintaining the binomial assumption, the tree of the previous
As trees grow it becom es convenient to develop a notation with
section might be extended for another six months as follows:
which to refer to particular nodes. O ne convention is as follows.
The dates, represented by columns of the tree, are numbered
from left to right starting with 0. The states, represented by rows
of the tree, are numbered from bottom to top, also starting from
0. For exam ple, in the preceding tree the six-month rate on date
2, state 0 is 4%. The six-month rate on state 1 of date 1 is 5.50% .
Continuing where the option exam ple left off, having derived
W hen, as in this tree, an up move followed by a down move does the risk-neutral tree for the pricing of a one-year zero, the goal
not give the same rate as a down move followed by an up move, is to extend the tree for the pricing of a 1.5-year zero assuming

that the 1.5-year spot rate is 5.25% . Ignoring the probabilities By definition, expected discounted value under risk-neutral
for a moment, several nodes of the 1.5-year zero price tree can probabilities must produce market prices. With respect to the
be written down im m ediately: 1.5-year zero price on date 0, this requires that
.8024P-I + .1976P-I o
----------------- ------- - = USD 925.21 (11.14)
A j .Uj
I 1- 2
W ith respect to the prices of a then one-year zero on date 1,
USD 9 7 0 .87q + USD 975.61(1 - q)

(11.15)
1 +
USD 975.61 q + USD 980.39(1 - q)

p!,o = ---------------V - o H --------------- -- (11-16)
I -1 - 2
On date 3, the zero with an original term of 1.5 years matures W hile Equations (11.14) through (11.16) may appear com pli
and is worth its face value of USD 1,000. On date 2, the value of cated, substituting (11.15) and (11.16) into (11.14) results in a
the then six-month zero equals its face value discounted for six linear equation in the one unknown, q. Solving this resulting
months at the then-prevailing spot rates of 6%, 5%, and 4% in equation reveals that q = .6489. Therefore, the risk-neutral
states 2, 1, and 0, respectively: interest rate process may be summarized by the following tree:
USD 1,000
--------V
| .UO
- = USD 970.87 (11.10)
I f ~2
A
USD 1,000
--------=
A j tUJ USD 975.61 (11.11)
I -I- ~2
USD 1,000
--------'-t t — = USD 980.39 (11.12)
1 +
Finally, on date 0, the 1.5-year zero equals its face value dis
counted at the given 1.5-year spot rate: Furtherm ore, any derivative security that depends on the
six-month rate in six months and in one year may be priced by
USD 1,000
------- :£= — = USD 925.21 (11.13) com puting its discounted expected value along this tree. An
(1 + )3
exam ple appears in the next section.
The prices of the zero on date 1 in states 1 and 0 are denoted
The difference between the true and risk-neutral probabilities
P -11 and P1(o respectively. The then one-year zero prices are not
may once again be described in term s of drift. From dates 1 to
known because, at this point in the developm ent, possible val
2, the drift under the true probabilities is zero. Under the risk-
ues of the one-year rate in six months are not available.
neutral probabilities the drift is com puted from a 64.89% chance
The previous section showed that the risk-neutral probability of of a 50-basis point increase in the six-month rate and a 35.11%
an up move on date 0 is .8024. Letting q be the risk-neutral chance of a 50-basis point decline in the rate. These numbers
probability of an up move on date 1,2 the tree becom es give a drift or expected change of 14.89 basis points.
Substituting q = .6489 back into Equations (11.15) and (11.16)

com pletes the tree for the price of the 1.5-year zero:
2 For simplicity alone, this example assumes that the probability of mov
ing up from state 0 equals the probability of moving up from state 1.
It follows im m ediately from this tree that the one-year spot rate USD 1,000,000 face value of a stylized constant-m aturity Trea
six months from now may be either 5.5736% or 4.5743% since sury (CM T) sw ap struck at 5%. This swap pays
USD 1,000 YCM T 5%

USD 946.51 = (11.17) USD 1,000,000 (11.19)
(1 + 5.5736%n) 2
USD 1,000
USD 955.78 = (11.18) every six months until it matures, where ycM T 's a sem iannu
(1 + 4.5743%\) 2 ally com pounded yield, of a predeterm ined maturity, on the
The fact that the possible values of the one-year spot rate can paym ent date. The text prices a one-year C M T swap on the
be extracted from the tree is at first surprising. The starting six-month yield. In practice, C M T swaps trade most commonly
point of the exam ple is the date 0 values of the .5-, 1-, and on the yields of the most liquid m aturities, i.e ., on 2-, 5- and
1.5-year spot rates as well as an assumption about the evolution 10-year yields.
of the six-month rate over the next year. But since this inform a
Since six-month sem iannually com pounded yields equal six-
tion, in combination with arbitrage or risk-neutral argum ents, is
month spot rates, rates from the tree of the previous section
sufficient to determ ine the price tree of the 1.5-year zero, it is
can be substituted into (11.19) to calculate the payoffs of the
sufficient to determ ine the possible values of the one-year spot
C M T swap. On date 1, the state 1 and state 0 payoffs are,
rate in six months. Considering this fact from another point of
respectively,
view, having specified initial spot rates and the evolution of the
six-month rate, a m odeler may not make any further assum p 5 50°/ — 5°/
USD 1 ,0 0 0 ,0 0 0 - ---- --------- = USD 2,500 (11.20)
tions about the behavior of the one-year rate.
The six-month rate process com pletely determ ines the one-year 4 50°/ — 5°/
USD 1 ,0 0 0 ,0 0 0 —------------- = USD 2,500 (11.21)
rate process because the model presented here has only one
factor. W riting down a tree for the evolution of the six-month
Similarly on date 2, the state 2, 1, and 0 payoffs are,
rate alone im plicitly assumes that prices of all fixed income secu
respectively,
rities can be determ ined by the evolution of that rate.
Ju st as some replicating portfolio can reproduce the cash flows USD 1,000,000 6% ~ 5% = USD 5,000 (11.22)
of a security from date 0 to date 1, some other replicating port
folios can reproduce the cash flows of a security from date 1 to
USD 1,000,000 5% ~ 5°° = USD 0 (11.23)
date 2. The com position of these replicating portfolios depends
on the date and state. More specifically, the replicating portfo
lios held on date 0, on state 0 of date 1, and on state 1 of date USD 1,000,000 4% ~ 5% = USD 5,000 (11.24)
1 are usually different. From the trading perspective, the repli
cating portfolio must be adjusted as tim e passes and as interest The possible values of the C M T swap at maturity, on date 2, are
rates change. This process is known as dynam ic replication, in given by Equations (11.22) through (11.24). The possible values
contrast to the static replication strategies. A s an exam ple of on date 1 are given by the expected discounted value of the
static replication, the portfolio of zero-coupon bonds that rep date 2 payoffs under the risk-neutral probabilities plus the date
licates a coupon bond does not change over tim e nor with the 1 payoffs given by (11.20) and (11.21). The resulting date 1 val
level of rates. ues in states 1 and 0, respectively, are
Having built a tree out to date 2 it should be clear how to

.6489 X USD 5,000 + .3511 X USD 0
extend the tree to any number of dates. Assum ptions about the .055
+ USD 2,500 = USD 5,657.66
1+
future possible values of the short-term rate have to be extrap o (11.25)
lated further into the future and risk-neutral probabilities have to
.6489 X 0 + .3511 X (-U S D 5,000)
be calculated to recover a given set of bond prices. - USD 2,500 = -U S D 4,216.87
.045
1+
(11.26)
11.5 EX A M P LE: PRICING A Finally, the value of the swap on date 0 is the expected dis
CONSTANT-MATURITY TR EA SU R Y counted value of the date 1 payoffs, given by (11.25) and
SW AP (11.26), under the risk-neutral probabilities:
.8024 X USD 5,657.66 + .1976 X (-U SD 4,216.87)

Equipped with the last tree of interest rates in the previous sec = USD 3,616.05
1+^
tion, this section prices a particular derivative security, namely (11.27)
Chapter 11 The of Term Structure Models 161

The following tree sum marizes the value of the stylized C M T exam ple, cash flows are still com puted using Equations (11.20)
swap over dates and states: through (11.24).
5,000 Com pleting the valuation with an O A S of 10 basis points, use
3,157.66 + 2,500 the results of (11.28) and (11.29) and a discount rate of 5% plus
the O A S spread of 10 basis points, or 5.10% , to obtain an initia
3,616.05 0
C M T swap value of
-1 ,7 1 6 .8 7 -2 ,5 0 0
.8024 X USD 5,656.13 + .1976 X (-U S D 4,216.03)
-5,000 USD 3,613.25
1+^1
(11.30)
A value of USD 3,616.05 for the C M T swap might seem surpris
ing at first. A fter all, the cash flows of the C M T swap are zero at Hence, as claim ed, discounting at the risk-neutral rates plus an
a rate of 5%, and 5% is, under the real probabilities, the average O A S of 10 basis points produces a model price equal to the
rate on each date. The explanation, of course, is that the risk- given m arket price of USD 3,613.25.
neutral probabilities, not the real probabilities, determ ine the
If a security's O A S is positive, its m arket price is less than its
arbitrage price of the swap. The expected discounted value of
model price, so the security trades cheap. If the O A S is nega
the swap under the real probabilities can be com puted by fol
tive, the security trades rich.
lowing the steps leading to (11.25) through (11.27) but using .5
for all up and down moves. The result of these calculations does Another perspective on the relative value implications of an
give a value close to zero, namely —USD 5.80. O A S spread is the fact that the expected return of a security
with an O A S , under the risk-neutral process, is the short-term
The expected cash flow of the C M T swap on both dates 1 and
rate plus the O A S per period. Very sim ply, discounting a secu
2, under the real probabilities, is zero. It follows im m ediately
rity's expected value by a particular rate per period is equivalent
that the discounted value of these expected cash flows is zero.
to that security's earning that rate per period. In the exam ple
A t the same tim e, the expected discounted value of the C M T
of the C M T swap, the expected return of the fairly-priced swap
swap is —USD 5.80.
under the risk-neutral process over the six months from date 0
to date 1 is
11.6 OPTION-ADJUSTED SPREAD
.8024 X USD 5,657.66 - .1976 X USD 4,216.87 - USD 3,616.05
= 2.5%
USD 3,616.05
O ption-adjusted spread (OAS) is a widely-used measure of the (11.31)
relative value of a security, that is, of its market price relative to
its model value. O A S is defined as the spread such that the mar which is six month's worth of the initial rate of 5%. On the other
ket price of a security equals its model price when discounted hand, the expected return of the cheap swap, with an O A S of 10
values are com puted at risk-neutral rates plus that spread. To basis points, is
illustrate, say that the market price of the C M T swap in the
.8024 X USD 5,656.13 - .1976 X USD 4,216.03 - USD 3,613.25
previous section is USD 3,613.25, USD 2.80 less than the model = 2.55%
USD 3,613.25
price. In that case, the O A S of the C M T swap turns out to be 10 (11.32)
basis points. To see this, add 10 basis points to the discounting
rates of 5.5% and 4.5% in Equations (11.25) and (11.26), respec which is six month's worth of the initial rate of 5% plus the O A S
tively, to get new swap values of of 10 basis points, or half of 5.10% .
.6489 X USD 5,000 + .3511 X USD 01

+ USD 2,500 = USD 5,656.13
1+^
(11.28) 11.7 PROFIT AND LOSS ATTRIBUTION
WITH AN OAS
.6489 X 0 + .3511 X (-U S D 5,000)
.046
-U S D 2,500 = USD 4,216.03
1+ 2 W e introduced profit and loss (P&L) attribution. This section
(11.29)
gives a mathematical description of attribution in the context of
term structure models and of securities that trade with an O A S.
Note that, when calculating value with an O A S spread, rates are
only shifted for the purpose of discounting. Rates are not shifted By the definition of a one-factor model, and by the definition
for the purposes of computing cash flows. In the C M T swap of O A S , the market price of a security at time t and a factor
value of x can be written as Pt (x, O A S). Using a first-order Taylor in order, carry-roll-down,4 gains or losses from rate changes, and
approxim ation, the change in the price of the security is gains or losses from spread change. For m odels with predictive
power, the O A S converges or tends to zero, or, equivalently, the
dP , dP , dP
dx + — dt + dO AS (11.33) security price converges or tends toward its fair value according
dx dt dO AS
to the model.
Dividing by the price and taking expectations,
The decom positions (11.37) and (11.38) highlight the useful
^dP ness of O A S as a measure of the value of a security with respect
E[d x] + (11.34)
P dx to a particular m odel. According to the model, a long position
Since the O A S calculation assumes that O A S is constant over in a cheap security earns superior returns in two ways. First, it
the life of the security, moving from (11.33) to (11.34) assumes earns the O A S over tim e intervals in which the security does not
that the expected change in the O A S is zero. converge to its fair value. Second, it earns its sensitivity to O A S
tim es the extent of any convergence.
As m entioned in the previous section, if expectations are taken
with respect to the risk-neutral process,3 then, for any security The decom position equations also provide a fram ework for
priced according to the model, thinking about relative value trading. W hen a cheap or rich secu
rity is identified, a relative value trader buys or sells the security
(11.35) and hedges out all interest rate or factor risk. In term s of the
decom positions, dP/dx = 0. In that case, the expected return
But Equation (11.35) does not apply to securities that are not or P&L depends only on the short-term rate, the O A S , and any
priced according to the m odel, that is, to securities with an O A S convergence. Furtherm ore, if the trader finances the trade at
not equal to zero. For these securities, by definition, the cash the short-term rate, i.e ., borrows P at a rate rto purchase the
flows are discounted not at the short-term rate but at the short security, the expected return is sim ply equal to the O A S plus
term rate plus the O A S. Equivalently, as argued in the previous any convergence return.
section, the expected return under the risk-neutral probabilities
is not the short-term rate but the short-term rate plus the O A S.
Hence, the more general form of (11.35), is 11.8 R ED U CIN G TH E TIM E STEP
= (r + O A S)d t (11.36) To this point this chapter has assumed that the time elapsed
between dates of the tree is six months. The m ethodology out
Com bining these pieces, substitute (11.34) and (11.36) into lined previously, however, can be easily adapted to any time
(11.33) and rearrange term s to break down the return of a secu step of A t years. For monthly tim e steps, for exam ple, A t = 1/ i2
rity into its com ponent parts: or .0833, and one-month rather than six-month interest rates
appear on the tree. Furtherm ore, discounting must be done
dP 1 dP 1 dP
= (r + O A S )d t + (d x - E[dx]) + dO AS over the appropriate tim e interval. If the rate of term A t is r,
P P dx P dO AS
(11.37) then discounting means dividing by 1 + r A t. In the case of
monthly tim e steps, discounting with a one-month rate of 5%
Finally, multiplying through by P, means dividing by 1 + 05/i2
dP = (r + O A S )P d t + ^ ( d x - E[dx]) + dO AS In practice, there are two reasons to choose tim e steps sm aller
dx dO A S than six months. First, a security or portfolio of securities rarely
(11.38)
makes all of its paym ents in even six-month intervals from the
In words, the return of a security or its P&L may be divided into starting date. Reducing the tim e step to a month, a w eek, or
a com ponent due to the passage of tim e, a com ponent due to even a day can ensure that all cash flows are sufficiently close
changes in the factor, and a com ponent due to the change in in tim e to som e date in the tree. Second, assuming that the
the O A S. The term s on the right-hand side of (11.38) represent, six-month rate can take on only two values in six months, three
values in one year, and so on, produces a tree that is too coarse
3 Taking expected values with respect to the true probabilities would

add a risk premium term to the right-hand side of this equation. See 4 For expositional simplicity, no explicit coupon or other direct cash
Chapter 12. flows have been included in this discussion.

0.07
11.9 FIX ED IN CO M E V ER SU S
0.06 EQ U IT Y DERIVATIVES
0.05
W hile the ideas behind pricing fixed income and equity
= 0.04 derivatives are similar in many ways, there are im portant
-Q
o 0.03 differences as well. In particular, it is worth describing
why models created for the stock m arket cannot be
0.02 adopted without modification for use in fixed income
0.01 m arkets.
♦♦ The fam ous Black-Scholes-M erton pricing analysis of
0 --- 1--------- 1--------- 1--------- 1--------- 1--------- 1-----
3.00% 3.50% 4.00% 4.50% 5.00% 5.50% 6.00% 6.50% 7.00% stock options can be sum marized as follows. Under the
Rate assumption that the stock price evolves according to a
F ia u re 11.1 Sample probability distribution of the six-month particular random process and that the short-term inter
rate in six months with daily time steps. est rate is constant, it is possible to form a portfolio of
stocks and short-term bonds that replicates the payoffs
for many practical pricing problem s. Reducing the step size can of an option. Therefore, by arbitrage argum ents, the price of the
fill the tree with enough rates to price contingent claims with option must equal the known price of the replicating portfolio.
sufficient accuracy. Figure 11.1 illustrates this point by show
Say that an investor wants to price an option on a five-year bond
ing a relatively realistic-looking probability distribution of the
by a direct application of this logic. The investor would have to
six-month rate in six months from a tree with daily tim e steps,
begin by making an assumption about how the price of the five-
a drift of zero, and a horizon standard deviation of 65 basis
year bond evolves over tim e. But this is considerably more com
points.
plicated than making assumptions about how the price of a
W hile sm aller tim e steps generate more realistic interest rate stock evolves over tim e. First, the price of a bond must con
distributions, it is not the case that sm aller tim e steps are verge to its face value at maturity while the random process
always desirable. First, the greater the num ber of com puta describing the stock price need not be constrained in any similar
tions in pricing a security, the more attention must be paid to way. Second, because of the maturity constraint, the volatility of
num erical issues like round-off error. Second, since d ecreas a bond's price must eventually get sm aller as the bond
ing the tim e step increases com putation tim e, practitioners approaches maturity. The sim pler assumption that the volatility
requiring quick results cannot make the tim e step too sm all. of a stock is constant is not so appropriate for bonds. Third,
Custom ers calling m arket m akers in options on sw aps, or sw a p since stock volatility is very large relative to short-term rate vola
tions, for exam ple, exp ect price quotations within minutes tility, it may be relatively harmless to assume that the short-term
if not sooner. Hence, the tim e step in a model used to price rate is constant. By contrast, it can be difficult to defend the
sw aptions must be consistent with the m arket m aker's required assumption that a bond price follows some random process
response tim e. while the short-term interest rate is constant.5
The best choice of step size ultim ately depends on the problem These objections led researchers to make assumptions about
at hand. W hen pricing a 30-year callable bond, for exam ple, a the random evolution of the interest rate rather than of the
model with monthly tim e steps may provide a realistic enough bond price. In that way bond prices would naturally approach
interest rate distribution to generate reliable prices. The same par, price volatilities would naturally approach zero, and the
monthly steps, however, will certainly be inadequate to price a interest rate would not be assumed to be constant. But this
one-month bond option: that tree would imply only two pos approach raises another set of questions. Which interest rate
sible rates on the option expiration date.
W hile the trees in this chapter assume that the step size is the
same throughout the tree, this need not be the case. Sophis 5 Because these three objections are less important in the case of short
ticated im plem entations of trees allow step size to vary across term options on long-term bonds, practitioners do use stock-like models
in this fixed income context. Also, it is often sufficient to assume, some
dates in order to achieve a balance between realism and com what more satisfactorily, that the relevant discount factor is uncorrelated
putational concerns. with the price of the underlying bond.
is assumed to evolve in a particular way? Making assumptions the evolution of the short-term rate is sufficient, com bined with
about the 5-year rate over tim e is not particularly helpful for two arbitrage argum ents, to build a model of the entire term struc
reasons. First, 5-year coupon bond prices depend on shorter- ture. In short, despite the enormous im portance of the Black-
term rates as well. Second, pricing an option on a 5-year bond Scholes-M erton analysis, the fixed income context does demand
requires assum ptions about the bond's future possible prices. special attention.
But knowing the 5-year rate over tim e is insufficient because, in
Having reached the conclusion at the end of the previous para
a very short tim e, the option's underlying security will no longer
graph, there are some contexts in which practitioners invoke
be a 5-year bond. Therefore, one must often make assumptions
assum ptions so that the Black-Scholes-M erton m odels can be
about the evolution of the entire term structure of interest rates
applied in place of more difficult-to-implement term structure
to price bond options and other derivatives. In the one-factor
m odels.
case described in this chapter it has been shown that modeling

The Evolution of
Short Rates and
the Shape of the
Term Structure
Learning Objectives
Explain the role of interest rate expectations in determ in Evaluate the im pact of changes in maturity, yield, and
ing the shape of the term structure. volatility on the convexity of a security.
A pply a risk-neutral interest rate tree to assess the effect Calculate the price and return of a zero coupon bond
of volatility on the shape of the term structure. incorporating a risk premium.
Estim ate the convexity effect using Jensen's inequality.
167
This chapter presents a fram ework for understanding the shape 12.2 EXPECTA TIO N S
of the term structure. In particular, it is shown how spot or for
ward rates are determ ined by expectations of future short-term The word exp ecta tio n s implies uncertainty. Investors might
rates, the volatility of short-term rates, and an interest rate risk exp ect the one-year rate to be 10%, but know there is a good
premium. To conclude the chapter, this fram ework is applied to chance it will turn out to be 8% or 12%. For the purposes of this
swap curves in the United States and Jap an. section alone, the text assumes away uncertainty so that the
statem ent that investors exp ect or forecast a rate of 10% means
that investors assume that the rate will be 10%. The sections to
12.1 IN TRO DUCTIO N follow reintroduce uncertainty.
From assum ptions about the interest rate process for the short To highlight the role of interest rate forecasts in determ ining the
term rate and from an initial term structure implied by market shape of the term structure, consider the following sim ple exam
prices, Chapter 11 showed how to derive a risk-neutral process ple. The one-year interest rate is currently 10% and all investors
that can be used to price all fixed income securities by arbi forecast that the one-year interest rate next year and the year
trage. M odels that follow this approach, i.e ., m odels that take after will also be 10%. In that case, investors will discount cash
the initial term structure as given, are called arbitrage-free m od flows using forward rates of 10%. In particular, the price of one-,
els. A different approach, however, is to start with assumptions two- and three-year zero-coupon bonds per dollar face value
about the interest rate process and about the risk premium (using annual com pounding) will be
dem anded by the m arket for bearing interest rate risk and then 1
( 12 . 1)
derive the risk-neutral process. Models of this sort do not neces 1.10
sarily match the initial term structure and are called equilibrium 1 1
( 12 . 2)
_
m odels.1 (1.10)(1.10) “ 1.102
This chapter describes how assumptions about the interest rate 1 _ 1

(12.3)
(1.10)(1.10)(1.10) “ 1.103
process and about the risk premium determ ine the level and
shape of the term structure. For equilibrium m odels, an under From inspection of Equations (12.1) through (12.3), the term
standing of the relationships between the model assumptions structure of spot rates in this exam ple is flat at 10%. Very sim
and the shape of the term structure is im portant in order to ply, investors are willing to lock in 10% for two or three years
make reasonable assumptions in the first place. For arbitrage- because they assume that the one-year rate will always be 10%.
free m odels, an understanding of these relationships reveals the
Now assume that the one-year rate is still 10%, but that all
assum ptions implied by the m arket through the observed term
investors forecast the one-year rate next year to be 12% and the
structure.
one-year rate in two years to be 14%. In that case, the one-year
Many econom ists might find this chapter rem arkably narrow. An spot rate is still 10%. The two-year spot rate, r(2), is such that
econom ist asked about the shape of the term structure would
1 1
undoubtedly make reference to such m acroeconom ic factors (12.4)
( 1 . 10) ( 1. 12) (1 + ?(2))2
as the marginal productivity of capital, the propensity to save,
and exp ected inflation. The more m odest goal of this chapter is Solving, r(2) = 10.995% . Similarly, the three-year spot rate, ?(3),
to connect the dynam ics of the short-term rate of interest and is such that
the risk premium with the shape of the term structure. W hile
1 1
this goal does fall short of answers that an econom ist might (12.5)
(1.10)(1.12)(1.14) (1 + ?(3))3
provide, it is more am bitious than the derivation of arbitrage
restrictions on bond and derivative prices given underlying Solving, r(3) = 11.998% . Hence, the evolution of the one-year
bond prices. rate from 10% to 12% to 14% generates an upward-sloping
term structure of spot rates: 10%, 10.995% , and 11.988% . In this
case, investors require rates above 10% when locking up their
money for two or three years because they assume one-year
rates will be higher than 10%. No investor, for exam ple, would
buy a two-year zero at a yield of 10% when it is possible to buy
1 This nomenclature is somewhat misleading. Equilibrium models, in the
context of their assumptions, which do not include market prices for the a one-year zero at 10% and, when it matures, buy another one-
initial term structure, are also arbitrage-free. year zero at 12%.
Finally, assume that the one-year rate is 10%, but that
investors forecast that it will fall to 8% in one year and to
6% in two years. In that case, it is easy to show that the
term structure of spot rates will be downward-sloping. In
particular, r(1) = 10%, r(2) = 8.995% , and r(3) = 7.988% .
These sim ple exam ples reveal that expectations can cause
the term structure to take on any of a myriad of shapes.
O ver short horizons, the financial community can have very
specific views about future short-term rates. O ver longer
horizons, however, expectations cannot be so granular.
It would be difficult, for exam ple, to defend the position
that the expectation for the one-year rate 29 years from
now is substantially different from the expectation of the Fig u re 12.1 An illustration of convexity
one-year rate 30 years from now. On the other hand, an
argum ent can be made that the long-run expectation of of 10% imply a flat term structure of spot rates. That is not the
the short-term rate is 5%: 3% due to the long-run real rate of
case in the presence of volatility.
interest and 2% due to long-run inflation. Hence, forecasts can
The price of a one-year zero is, by definition, !/i.io or .909091,
be very useful in describing the shape and level of the term
structure over short-term horizons and the level of rates at very implying a one-year spot rate of 10%. Under the assumption of
risk-neutrality, the price of a two-year zero may be calculated by
long horizons. This conclusion has im portant im plications for
extracting expectations from observed interest rates (see the discounting the terminal cash flow using the preceding interest
rate tree:
application at the end of this chapter) and for choosing among
term structure m odels. 1
12.3 VO LA TILITY AN D C O N V E X IT Y
826720 1
This section drops the assumption that investors believe that

their forecasts will be realized and assumes instead that inves
tors understand the volatility around their expectations. To 1
isolate the im plications of volatility on the shape of the term
Hence, the two-year spot rate is such that .82672 = (1 + r(2))~2,
structure, this section assumes that investors are risk-neutral so
implying that r(2) = 9.982% .
that they price securities by expected discounted value. The
next section drops this assumption. Even though the one-year rate is 10% and the expected one-
year rate in one year is 10%, the two-year spot rate is 9.982% .
Assum e that the following tree gives the true process for the
The 1.8-basis point difference between the spot rate that would
one-year rate:
obtain in the absence of uncertainty, 10%, and the spot rate in
14% the presence of volatility, 9.982% , is the effect of convexity on
that spot rate. This convexity effect arises from the m athem ati
cal fact, a special case of Je n se n 's Inequality, that
10% 10% 1 1 1
> ( 12 . 6)
1 + r E[1 + r] 1 + E[r]
Figure 12.1 graphically illustrates this equation. There are two

6% possible values of r and, consequently, of the function 1/i +<• in
the fig ure,2 shown as points A and D. The height or vertical-axis
Note that the expected interest rate on date 1 is
.5 X 8% + .5 X 12% or 10% and that the expected rate on date
2 is .25 X 14% + .5 X 10% + .25 X 6% or 10%. In the previous 2 The curve shown is actually a power of Vi +r; i.e., the price of a longer-
section, with no volatility around expectations, flat expectations term zero-coupon bond, so that the curvature is more visible.
Chapter 12 The Evolution of Short Rates and the Shape of the Term Structure ■ 169
coordinate of point B is the average of these two function val of Figure 12.1, the longer the maturity of the illustrated pricing
ues. Under the assumption that the two possible values of r function, the more convex the curve.
occur with equal probability, this average can be thought of as
Securities with greater convexity perform better when yields
E [1/i +r] in (12.6). And under the same assum ption, the horizontal-
change a lot and perform worse when yields do not change by
axis coordinates of the points B and C can be thought of as E[r]
much. The discussion in this section shows that convexity does,
so that the height of point C can be thought of as 1/i +EM Clearly,
in fact, lower bond yields. The mathematical developm ent in
the height of B is greater than that of C , or E [1/i +r] > Vi + e m . T o
a later section ties these observations together by showing
sum marize, Equation (12.6) is true because the pricing function
exactly how the advantages of convexity are offset by lower
of a zero-coupon bond, 1/i+r, is convex rather than concave.
yields.
Returning to the exam ple of this section, Equation (12.6) may
The previous section assumes no interest rate volatility and, con
be used to show why the one-year spot rate is less than 10%.
sequently, yields are com pletely determ ined by forecasts. In this
The spot rate one year from now may be 12% or 8%. According
section, with the introduction of volatility, yield is reduced by
to (12.6),
the value of convexity. So it may be said that the value of con
1 1 1 1 vexity arises from volatility. Furtherm ore, the value of convexity
.5 X + .5 X > (12.7)
1.12 1.08 .5 x 1.12 + .5 x 1.08 1.10 increases with volatility. In the tree introduced at the start of the
section, the standard deviation of rates is 200 basis points a
Dividing both sides by 1.10,
year. Now consider a tree with a standard deviation of 400
1 1 1 1 basis points:
5 X + .5 X > ( 12 . 8)
1.10 1.12 1.08 1. 10;
The left-hand side of (12.8) is the price of the two-year zero-

coupon bond today. In words, then, Equation (12.8) says that
the price of the two-year zero is greater than the result of dis
counting the terminal cash flow by 10% over the first period and
by the expected rate of 10% over the second period. It follows
im m ediately that the yield of the two-year zero, or the two-year
spot rate, is less than 10%.
The tree presented at the start of this section may also be used The exp ected one-year rate in one year and in two years is
to price a three-year zero. The resulting price tree is still 10%. Spot rates and co nvexity values for this case may be
derived along the sam e lines as before. Figure 12.2 graphs
1
three term structures of spot rates: one with no volatility
around the exp ectatio n of 10%; one with a volatility of 200
basis points a year (the tree of the first exam p le); and one with
a volatility of 400 basis points per year (the tree preceding this
1
paragraph). Note that the value of co nvexity, m easured by
.752309 the distance betw een the rates assum ing no volatility and the
rates assum ing volatility, increases with volatility. Figure 12.2
1
also illustrates that the value of co nvexity increases with
m aturity.
For very short term s and realistic levels of volatility, the value of
1 convexity is quite small. But since simple exam ples must rely on
short term s, convexity effects would hardly be discernible w ith
The three-year spot rate, such that .752309 = (1 + r(3))-3, is
out raising volatility to unrealistic levels. Therefore, this section
9.952% . Therefore, the value of convexity in this spot rate is
had to make use of unrealistically high volatility. The application
10% — 9.952% or 4.8 basis points, whereas the value of convex
at the end of this chapter uses realistic volatility levels to present
ity in the two-year spot rate was only 1.8 basis points.
typical convexity values.
It is generally true that, all else equal, the value of convexity
increases with maturity. This will becom e evident shortly. For
now, suffice it to say that the convexity of the pricing function of Chapter 13 describes the computation of the standard deviation of
a zero maturing in N years, (1 + r)~N, increases with N. In term s rates implied by an interest rate tree.
10.25% Risk-averse investors dem and a return higher than
10 . 20 % 10% for the two-year zero over the next year. This
10.15% return can be effected by pricing the zero-coupon
10.10% bond one year from now at less than the prices of
10.05% Vi. 14 and 1/.o 6 . Equivalently, future cash flows could
10 . 00 % be discounted at rates higher than the possible
9.95% rates of 14% and 6%. The next section shows that
9.90% adding, for exam ple, 20 basis points to each of
9.85%
these rates is equivalent to assuming that inves
tors demand an extra 20 basis points for each year
9.80%
of duration risk. Assum ing this is indeed the fair
9.75%
1 2 3 market risk prem ium , the price of the two-year zero
Volatility = 0 bps — | — Volatility = 200 bps — A— Volatility = 400 bps

would be com puted as follows:
Fig u re 1 2.2 Volatility and the shape of the term structure in 1 1

three-date binomial models. 1.142 + 1.062
.826035 = ( 12 . 11)
1.10
12.4 RISK PREM IUM The price in (12.11) is below the value obtained in (12.9) which
assumes that investors are risk-neutral. Put another way, the
To illustrate the effect of risk premium on the term structure, increase in the discounting rates has increased the expected
consider again the second interest rate tree presented in the return of the two-year zero. In one year, if the interest rate is
preceding section, with a volatility of 400 basis points per year. 14%, then the price of a one-year zero will be 1/.i4 or .877193. If
Risk-neutral investors would price a two-year zero by the follow the rate is 6%, then the price will be 1/i.o 6 or .943396. Therefore,
ing calculation: the expected return of the two-year zero priced at .826035 is
1 1 .5[.877193 + .943396] - .826035

+ 10. 20 % ( 12. 12)
.1.14 1.06 .826035
.827541 =
T10
Hence, recalling that the one-year zero has a certain return of
5[.877193 + .943396]
(12.9) 10%, the risk-averse investors in this exam ple dem and 20 basis
TTTo
points in expected return to com pensate them for the one year
By discounting the exp ected future price by 10%, of duration risk inherent in the two-year zero .5
Equation (12.9) implies that the expected return from owning
Continuing with the assumption that investors require 20 basis
the two-year zero over the next year is 10%. To verify this
points for each year of duration risk, the three-year zero, with its
statem ent, calculate this expected return directly:
approxim ately two years of duration risk,6 needs to offer an
r .877193 - .827541 r .943396 - .827541 expected return of 40 basis points. The next section shows that
.827541 + ' .827541 this return can be effected by pricing the three-year zero as if
= .5 X 6% + .5 X 14% rates next year are 20 basis points above their true values and as
= 10% ( 12 . 10) if rates the year after next are 40 basis points above their true
values. To sum m arize, consider trees (a) and (b) below. If tree
Would investors really invest in this two-year zero offering an
(a) depicts the actual or true interest rate process, then pricing
expected return of 10% over the next year? The return will, in fact,
with tree (b) provides investors with a risk premium of 20 basis
be either 6% or 14%. While these two returns do average to 10%,
points for each year of duration risk. If this risk premium is, in
an investor could, instead, buy a one-year zero with a certain
fact, em bedded in m arket prices, then, by definition, tree (b) is
return of 10%. Presented with this choice, any risk-averse investor
the risk-neutral interest rate process.
would prefer an investment with a certain return of 10% to an
investment with a risky return that averages 10%. In other words,
investors require compensation for bearing interest rate risk.4 5 The reader should keep in mind that a two-year zero has one year of
interest rate risk only in this stylized example: it has been assumed that
rates can move only once a year. In reality, rates can move at any time,
so a two-year zero has two full years of interest rate risk.
4 This is an oversimplification. See the discussion at the end of the 6 A three-year zero has two years of interest rate risk only in this stylized
section. example. See the previous footnote.
Finally, then, the expected return of the three-year zero over the
next year is
14.2%
.5 (7 6 9 0 6 7 + .889587) - .751184
10.40% (12.15)
.751184
To sum m arize, in order to com pensate investors for two years of
6 . 2%
duration risk, the return on the three-year zero is 40 basis points
above a one-year zero's certain return of 10%.
(a) (b) Continuing with the assumption of 400-basis-point volatility,

Figure 12.3 graphs the term structure of spot rates for three
cases: no risk premium, a risk premium of 20 basis points per
The text now verifies that pricing the three-year zero with the
year of duration risk, and a risk premium of 40 basis points. In
risk-neutral process does offer an expected return of 10.4% ,
the case of no risk premium, the term structure of spot rates is
assuming that rates actually move according to the true process.
downward-sloping due to convexity. A risk premium of 20 basis
The price of the three-year zero can be com puted by discount points pushes up spot rates while convexity pulls them down. In
ing using the risk-neutral tree: the short end, the risk premium effect dom inates and the term
structure is mildly upward-sloping. In the long end, the convex
1
ity effect dom inates and the term structure is mildly downward-
sloping. The next section clarifies why risk premium tends to
dom inate in the short end while convexity tends to dom inate in
1 the long end. Finally, a risk premium as large as 40 basis points
dom inates the convexity effect and the term structure of spot
751184 rates is upward-sloping. The convexity effect is still evident,
however, from the fact that the curve increases more rapidly
1
from one to two years than from two to three years.
Ju st as the section on volatility uses unrealistically high levels of

volatility to illustrate its effects, this section uses unrealistically
1
high levels of the risk premium to illustrate its effects. The appli
To find the expected return of the three-year zero over the next cation at the end of this chapter focuses on reasonable magni
year, proceed as follows. Two years from now the three-year tudes for the various effects in the context of the USD and JP Y
zero will be a one-year zero with no interest rate risk.7 Th ere swap m arkets.
fore, its price will be determ ined by discounting at the actual Before closing this section, a few remarks on the sources of
interest rate at that tim e: 1/i.is or .847458, 1/uo or .909091, and an interest rate risk premium are in order. A sset pricing theory
Vi.02 or .980392. One year from now, however, the three-year (e.g ., the Capital A sset Pricing M odel, or CAPM ) teaches that
zero will be a two-year zero with one year of duration risk. assets whose returns are positively correlated with aggregate
Therefore, its price at that tim e will be determ ined by using the wealth or consumption will earn a risk premium. Consider, for
risk-neutral rates of 14.20% and 6.20% . In particular, the two exam ple, a traded stock index. That asset will alm ost certainly
possible prices of the three-year zero in one year are do well if the econom y is doing well and poorly if the econom y
.5(.847458 + .909091) is doing poorly. But investors, as a group, already have a lot of
.769067 = (12.13) exposure to the econom y. To entice them to hold a little more
1.142
of the econom y in the form of a traded stock index requires
and the paym ent of a risk premium; i.e., the index must offer an
■5(.909091 + .980392) expected return greater than the risk-free rate of return. On the
.889587 = (12.14)
1.062 other hand, say that there exists an asset that is negatively cor
related with the econom y. Holdings in that asset allow investors
to reduce their exposure to the econom y. A s a result, investors
would accept an expected return on that asset below the risk
7 Once again, this is an artifact of this example in which rates change free rate of return. That asset, in other words, would have a
only once a year. negative risk premium.
its theoretical solidity. On average, over the past
75 years, the term structure of interest rates has
sloped up w ard .8 W hile the m arket may from tim e
to tim e exp ect that interest rates will rise, it is hard
to believe that the m arket exp ects interest rates to
rise on average. Therefore, expectations cannot
explain a term structure of interest rates that, on
average, slopes upward. C o nvexity, of course,
leads to a downward-sloping term structure.
Hence, of the three effects described in this chap
ter, only a positive risk premium can explain a term
structure that, on average, slopes upward.
An uncom fortable fact, however, is that over earlier

Risk Prem. = 40 bps — | — Risk Prem. = 20 bps — ±— Risk Prem. = 0 bps
tim e periods the term structure has, on average,
Fig u re 12.3 Volatility, risk premium, and the shape of the term
been fla t.9 W hether this means that an interest rate
structure in three-date binomial models.
risk premium is a relatively recent phenomenon that
is here to stay or that the experience of persistently
This section assumes that bonds with interest rate risk earn a upward-sloping curves is only partially due to a risk premium is
risk premium. In term s of asset pricing theory, this is equiva a question beyond the scope of this book. In short, the theoreti
lent to assuming that bond returns are positively correlated cal and em pirical questions with respect to the existence of an
with the econom y or, equivalently, that falling interest rates interest rate risk premium have not been settled.
are associated with good tim es. O ne argum ent supporting this
assumption is that interest rates fall when inflation and expected
inflation fall and that low inflation is correlated with good tim es.
8 See, for example, Homer, S., and Richard Sylla, A History o f Interest
The concept of a risk premium in fixed income m arkets has Rates, 3rd Edition, Revised, Rutgers University Press, 1996, pp. 394-409.
probably gained favor more for its em pirical usefulness than for 9 Ibid.
Learning Objectives
Construct and describe the effectiveness of a short-term Describe uses and benefits of the arbitrage-free models
interest rate tree assuming normally distributed rates, and assess the issue of fitting models to m arket prices.
both with and without drift.
Describe the process of constructing a simple and recom
Calculate the short-term rate change and standard bining tree for a short-term rate under the Vasicek Model
deviation of the rate change using a model with normally with mean reversion.
distributed rates and no drift.
Calculate the Vasicek Model rate change, standard devia
Describe methods for addressing the possibility of nega tion of the rate change, expected rate in T years, and
tive short-term rates in term structure m odels. half-life.
• Construct a short-term rate tree under the Ho-Lee Model Describe the effectiveness of the Vasicek M odel.
with tim e-dependent drift.
175
Chapters 11 and 12 show that assum ptions about the true numerical exam ple, the process (13.1) says that the drift is zero
and risk-neutral short-term rate processes determ ine the and that the standard deviation of the rate is a \ Z d t, which is
term structure of interest rates and the prices of fixed income 1.13% X \ Z r k = -326% or 32.6 basis points per month.
derivatives. The goal of this chapter is to describe the most
A rate tree may be used to approxim ate the process (13.1). A
common building blocks of short-term rate m odels. Selecting
tree over dates 0 to 2 takes the following form :
and rearranging these building blocks to create suitable models
for the purpose at hand is the art of term structure m odeling.
This chapter begins with an extrem ely sim ple model with no
drift and norm ally distributed rates. The next sections add
and discuss the im plications of alternate specifications of the
drift: a constant drift, a tim e-determ inistic shift, and a mean-
reverting drift.
13.1 M O D EL 1: N ORM ALLY In the case of the numerical exam ple, substituting the sample
DISTRIBUTED RATES A N D NO DRIFT values into the tree gives the following:
6.832%
The particularly simple model of this section will be called
Model 1. The continuously com pounded, instantaneous rate rt is
assumed to evolve according to the following equation:
6.180%
d r = crdw (13.1)
The quantity d r denotes the change in the rate over a small time
interval, dt, measured in years; er denotes the annual basis-point
5.528%
volatility of rate changes; and d w denotes a normally distributed
random variable with a mean of zero and a standard deviation of To understand why these trees are representations of the pro
V d t' cess (13.1), consider the transition from date 0 to date 1. The
Say, for exam ple, that the current value of the short-term rate is change in the interest rate in the up-state is aX ^ dt and the
6.18% , that volatility equals 113 basis points per year, and that change in the down-state is —a\Adt. Therefore, with the prob
the tim e interval under consideration is one month or Vi years. abilities given in the tree, the expected change in the rate, often
M athem atically, ro = 6.18% ; a = 1.13% ; and d t = 1/i 2. A month denoted E [dr], is
passes and the random variable dw, with its zero mean and its E[dr] = .5 X aV dt + .5 X - a V dt = 0 (13.3)
standard deviation of or .2887, happens to take on a value
The variance of the rate, often denoted V [d r], from date 0 to
of .15. With these values, the change in the short-term rate
date 1 is com puted as follows:
given by (13.1) is
V[dr] = E id r2] - {E [d r]}2
d r = 1.13% X .15 = .17% (13.2)
= .5 X (a V d t )2 + .5 X { -aV dtf - 0
or 17 basis points. Since the short-term rate started at 6.18% ,
the short-term rate after a month is 6.35% . = a 2dt (13.4)
Since the expected value of dw is zero, (13.1) says that the Note that the first line of (13.4) follows from the definition of
expected change in the rate, or the drift, is zero. Since the variance. Since the variance is cr dt, the standard deviation,
standard deviation of d w is V dt, the standard deviation of the which is the square root of the variance, is crX^dt.
change in the rate is a X / d t. For the sake of brevity, the stan Equations (13.3) and (13.4) show that the drift and volatility
dard deviation of the change in the rate will be referred to as
implied by the tree match the drift and volatility of the interest
simply the standard deviation of the rate. Continuing with the rate process (13.1). The process and the tree are not identical
because the random variable in the process, having a normal
distribution, can take on any value while a single step in the
1 It is beyond the mathematical scope of the text to explain why the ran
dom variable c/w is denoted as a change. But the text uses this notation tree leads to only two possible values. In the exam ple, when
since it is the convention of the field. d w takes on a value of .15, the short rate changes from 6.18%
rates, however, using a normal model could be dangerous.
Consider the extrem e exam ple of a 10-year call option to buy a
long-term coupon bond at a yield of 0%. The model of this sec
tion would assign that option much too high a value because the
model assigns too much probability to negative rates.
The challenge of negative rates for term structure models is

much more acute, of course, when the current level of rates is
low, as it is at the tim e of this writing. Changing the distribu
tion of interest rates is one solution. To take but one of many
exam ples, lognormally distributed rates, as will be seen in
Rate C hapter 14, cannot becom e negative. As will becom e clear later
F ia u re 13.1 Distribution of short rates after one year, in that chapter, however, building a model around a probability
Model 1. distribution that rules out negative rates or makes them less
likely may result in volatilities that are unacceptable for the
to 6.35% . In the tree, however, the only two possible rates are purpose at hand.
6.506% and 5.854% . N evertheless, as shown in Chapter 9, after
Another popular method of ruling out negative rates is to con
a sufficient number of time steps the branches of the tree used
struct rate trees with w hatever distribution is desired, as done in
to approxim ate the process (13.1) will be numerous enough to
this section, and then simply set all negative rates to zero.3 In
approxim ate a normal distribution. Figure 13.1 shows the distri
this m ethodology, rates in the original tree are called the
bution of short rates after one year, or the term inal distribution
shadow rates of interest while the rates in the adjusted tree
after one year, in Model 1 with r0 = 6.18% and cr = 1.13%. The
could be called the observed rates of interest. W hen the
tick marks on the horizontal axis are one standard deviation
observed rate hits zero, it can remain there for a while until the
apart from one another.
shadow rate crosses back to a positive rate. The econom ic ju sti
Models in which the terminal distribution of interest rates has a fication for this fram ework is that the observed interest rate
normal distribution, like Model 1, are called normal or Gaussian should be constrained to be positive only because investors
models. One problem with these models is that the short-term have the alternative of investing in cash. But the shadow rate,
rate can become negative. A negative short-term rate does not the result of aggregate savings, investm ent, and consumption
make much economic sense because people would never lend decisions, may very well be negative. O f course, the probability
money at a negative rate when they can hold cash and earn a zero distribution of the observed interest rate is not the same as that
rate instead.2 The distribution in Figure 13.1, drawn to encompass of the originally postulated shadow rate. The change, however,
three standard deviations above and below the mean, shows that is localized around zero and negative rates. By contrast, chang
over a horizon of one year the interest rate process will almost ing the form of the probability distribution changes dynamics
certainly not exhibit negative interest rates. The probability that across the entire range of rates.
the short-term rate in the process (13.1) becomes negative, how
Returning now to Model 1, the techniques of C hapter 11 may
ever, increases with the horizon. O ver 10 years, for exam ple, the
be used to price fixed coupon bonds. Figure 13.2 graphs the
standard deviation of the terminal distribution in the numerical
sem iannually com pounded par, spot, and forward rate curves
exam ple is 1.13% X V T o or 3.573% . Starting with a short-term
for the numerical exam ple along with data from U.S. dollar swap
rate of 6.18% , a random negative shock of only 6/l8V3.573% or 1.73
par rates. The initial value of the short-term rate in the exam ple,
standard deviations would push rates below zero.
6.18% , is set so that the model and market 10-year, sem iannu
The extent to which the possibility of negative rates makes ally com pounded par rates are equal at 6.086% . All of the other
a model unusable depends on the application. For securities data points shown are quite different from their model values.
whose value depends mostly on the average path of the interest The desirability of fitting market data exactly is discussed in
rate, like coupon bonds, the possibility of negative rates typically its own section, but Figure 13.2 clearly dem onstrates that the
does not rule out an otherwise desirable model. For securities simple model of this section does not have enough flexibility to
that are asym m etrically sensitive to the probability of low interest capture the sim plest of term structure shapes.
2 Actually, the interest rate can be slightly negative if a security or bank 3 Fischer Black, "Interest Rates as Options," Journal o f Finance, Vol. 50,
account were safer or more convenient than holding cash. 1995, pp. 1371-1376.
Chapter 13 The Art of Term Structure Models: Drift ■ 177

that a volatility of 113 basis points a year is reasonable.
In fact, the volatility of the 10-year swap rate on the data
date, as implied by options m arkets, was 113 basis
points. The convexity numbers are not necessarily realis
tic, however, because, as this chapter will dem onstrate,
the m agnitude of the convexity effect depends on the
model and Model 1 is alm ost certainly not the best
model of interest rate behavior.
The term structure of volatility in Model 1 is constant at

113 basis points per year. In other words, the standard
deviation of changes in the par rate of any maturity is
Fiqure 13.2 Rate curves from Model 1 and selected 113 basis points per year. A s shown in Figure 13.3, this
market swap rates, February 16, 2001. implication fails to capture the implied volatility structure
in the m arket. The volatility data in Figure 13.3 show that
the term structure of volatility is hum ped, i.e., that vola
Table 13.1 Convexity Effects on par Rates in a tility initially rises with term but eventually declines. As
Parameterization of Model 1 this shape is a feature of fixed income m arkets, it will be
Term (years) C o n v e x ity (bps) revisited again in this chapter and in Chapter 14.
2 Th e last asp ect of this m odel to be analyzed is its

- 0 .8
facto r structure. The m odel's only facto r is the sh o rt
5 - 5 .1
term rate. If this rate increases by 10 sem iannually
10 - 1 8 .8 com pounded basis points, how would the term stru c
30 - 1 3 5 .3 ture change? In this sim ple m odel, the answ er is that
all rates would increase by 10 basis points. (See the
closed-form solution for spot rates in M odel 1 in the
1. 20 % A p p e n d ix in C h ap ter 14). Th erefo re, M odel 1 is a
model of parallel shifts.
1.15%
CD
1.10%
(0
1.05% 13.2 M O D EL 2: DRIFT AN D RISK
PREM IUM
1.00%
0.95% The term structures implied by Model 1 always look like

0 10 15 20 25 30 35 Figure 13.2: relatively flat for early term s and then down
Term ward sloping. Chapter 12 pointed out that the term struc
------ Par Rate Volatility ■ Market Implied Volatility ture tends to slope upward and that this behavior might
Fiaure 13.3 Par rate volatility from Model 1 and be explained by the existence of a risk premium. The
selected implied volatilities, February 16, 2001. model of this section, to be called Model 2, adds a drift to
Model 1, interpreted as a risk premium, in order to obtain
The model term structure is downward-sloping. As the model a richer model in an economically coherent way.
has no drift, rates decline with term solely because of convexity. The dynamics of the risk-neutral process in Model 2 are written as
Table 13.1 shows the m agnitudes of the convexity effect on par
rates of selected term s.4 The numbers are realistic in the sense dr = A d t + a d w (13.5)
The process (13.5) differs from that of Model 1 by adding a drift

4 The convexity effect is the difference between the par yield in the
model with its assumed volatility and the par yield in the same structural to the short-term rate equal to A dt. For this section, consider
model but with a volatility of zero. the values r0 = 5.138% , A = .229% , and a = 1.10% . If the
realization of the random variable dw is again .15 over a 8 .0 %
month, then the change in rate is
d r = .229% X ^ + 1.10% X .15 = .1841% (13.6)

+-»
(U
Starting from 5.138% , the new rate is 5.322% .
The drift of the rate is .229% X V12 or 1.9 basis points per
month, and the standard deviation is 1.10% X or
31.75 basis points per month. As discussed in Chapter 12,
0 10 15 20 25 30
the drift in the risk-neutral process is a combination of Term
the true expected change in the interest rate and of a P a r ------ Spot Forward ■ Market
risk premium. A drift of 1.9 basis points per month may
F ia u re 1 3 .4 Rate curves from Model 2 and selected market
arise because the market expects the short-term rate to
swap rates, February 16, 2001.
increase by 1.9 basis points a month, because the short
term rate is expected to increase by one basis point with
a risk premium of .9 basis points, or because the short-term rate is
relatively high Sharpe ratio of about .21. On the other hand,
expected to fall by .1 basis points with a risk premium of two basis
interpreting A as a com bination of true drift and risk premium
points.
is difficult in the long end where, as argued in C hapter 12, it is
The tree approxim ating this model is difficult to make a case for rising expected rates. These interpre
r0 + 2A dt + 2(ridt tive difficulties arise because Model 2 is still not flexible enough
to explain the shape of the term structure in an econom ically
r0 + Adt + a id t meaningful way. In fact, the use of r0 and A to match the 2- and
10-year rates in this relatively inflexible model may explain why
r0 + 2Adt
the model curve overshoots the 30-year par rate by about 25
basis points.
ro + A d t - a id t
Moving from Model 1 with zero drift to Model 2 with a constant
r0 + 2Adt - 2a id t
drift does not qualitatively change the term structure of vola
It is easy to verify that the drift and standard deviation of the tility, the m agnitude of convexity effects, or the parallel-shift
tree match those of the process (13.5). nature of the model.
The terminal distribution of the numerical exam ple of this process Models 1 and 2 would be called equilibrium models because no
after one year is normal with a mean of 5.138% + 1 X .229% effort has been made to match the initial term structure closely.
or 5.367% and a standard deviation of 110 basis points. After The next section presents a generalization of Model 2 that is in
10 years, the terminal distribution is normal with a mean of the class of arbitrage-free models.
5.138% + 10 X .229% or 7.428% and a standard deviation of
1.10% X V l O or 347.9 basis points. Note that the constant drift,
13.3 TH E H O -LEE M O D EL:
by raising the mean of the terminal distribution, makes it less
likely that the risk-neutral process will exhibit negative rates. TIM E-D EP EN D EN T DRIFT
Figure 13.4 shows the rate curves in this exam ple along with par The dynamics of the risk-neutral process in the Ho-Lee model
swap rate data. The values of r0 and A are calibrated to match are written as
the 2- and 10-year par swap rates, while the value of a is chosen
dr = Atd t + crdw (13.7)
to be the average implied volatility of the 2- and 10-year par
rates. The results are satisfying in that the resulting curve can In contrast to Model 2, the drift here depends on tim e. In other
match the data much more closely than did the curve of Model words, the drift of the process may change from date to date.
1 shown in Figure 13.2. Slightly unsatisfying is the relatively It might be an annualized drift of —20 basis points over the first
high value of A required. Interpreted as a risk premium alone, month, of 20 basis points over the second month, and so on. A
a value of .229% with a volatility of 110 basis points implies a drift that varies with tim e is called a tim e-d ep en d en t drift. Ju st

as with a constant drift, the tim e-dependent drift over each time desk may price the odd-maturity swap using an arbitrage-free
period represents some combination of the risk premium and of model essentially as a means of interpolating between observed
expected changes in the short-term rate. m arket prices.
The flexibility of the Ho-Lee model is easily seen from its corre Interpolating by means of arbitrage-free models may very well
sponding tree: be superior to other curve-fitting m ethods, from linear interpola
tion to more sophisticated approaches. The potential superior
r0 + (A-| + A2)dt + 2a id t
ity of arbitrage-free models arises from their being based on
r0 + A-,dt + a\[dt econom ic and financial reasoning. In an arbitrage-free m odel,
the expectations and risk premium built into neighboring swap
r0 + (A-, + A2)dt rates and the convexity implied by the model's volatility assum p
tions are used to com pute, for exam ple, the three-year and
r0 + X^dt-crÎdt
four-month swap rate. In a purely mathematical curve fitting
r0 + (A-, + A2)d t- Icrid t technique, by contrast, the chosen functional form heavily deter
mines the interm ediate swap rate. Selecting linear or quadratic
The free param eters Ai and A2 may be used to match the interpolation, for exam ple, results in interm ediate swap rates
prices of securities with fixed cash flows. The procedure may with no obvious econom ic or financial justification. This potential
be described as follows. With d t = V12, set r0 equal to the superiority of arbitrage-free models depends crucially on the
one-month rate. Then find Ai such that the model produces a validity of the assumptions built into the m odels. A poor volatil
two-month spot rate equal to that in the m arket. Then find A2 ity assum ption, for exam ple, resulting in a poor estim ate of the
such that the model produces a three-month spot rate equal effect of convexity, might make an arbitrage-free model perform
to that in the m arket. Continue in this fashion until the tree worse than a less financially sophisticated technique.
ends. The procedure is very much like that used to construct
Another im portant use of arbitrage-free models is to value and
the trees in Chapter 11. The only difference is that C hapter 11
hedge derivative securities for the purpose of making markets
adjusts the probabilities to match the spot rate curve while this
or for proprietary trading. For these purposes, many practitio
section adjusts the rates. As it turns out, the two procedures are
ners wish to assume that some set of underlying securities is
equivalent so long as the step size is small enough.
priced fairly. For exam ple, when trading an option on a 10-year
The rate curves resulting from this model match all the rates bond, many practitioners assume that the 10-year bond is itself
that are input into the m odel. Ju st as adding a constant drift priced fairly. (An analysis of the fairness of the bond can always
to Model 1 to obtain Model 2 does not affect the shape of the be done separately.) Since arbitrage-free models match the
term structure of volatility nor the parallel-shift characteristic prices of many traded securities by construction, these models
of the m odel, adding a tim e-dependent drift does not change are ideal for the purpose of pricing derivatives given the prices
these features either. of underlying securities.
That a model matches m arket prices does not necessarily imply

that it provides fair values and accurate hedges for derivative
13.4 D ESIRABILITY O F FITTING TO
securities. The argum ent for fitting models to m arket prices is
TH E TERM STRU CTU RE that a good deal of information about the future behavior of
interest rates is incorporated into m arket prices, and, th ere
The desirability of matching m arket prices is the central issue in
fore, a model fitted to those prices captures that interest rate
deciding between arbitrage-free and equilibrium m odels. Not
behavior. W hile this is a perfectly reasonable argum ent, two
surprisingly, the choice depends on the purpose of building the
warnings are appropriate. First, a m ediocre or bad model can
model in the first place.
not be rescued by calibrating it to match m arket prices. If, for
O ne im portant use of arbitrage-free m odels is for quoting the exam ple, the parallel shift assumption is not a good enough
prices of securities that are not actively traded based on the description of reality for the application at hand, adding a time-
prices of more liquid securities. A custom er might ask a swap dependent drift to a parallel shift model so as to match a set
desk to quote a rate on a swap to a particular date, say three of m arket prices will not make the model any more suitable for
years and four months away, while liquid m arket prices might that application. Second, the argum ent for fitting to market
be observed only for three- and four-year swaps, or som etim es prices assumes that those m arket prices are fair in the context
only for two- and five-year swaps. In this situation, the swap of the m odel. In many situations, however, particular securities,
particular classes of securities, or particular maturity ranges of equilibrium approaches is an im portant part of the art of term
securities have been distorted due to supply and dem and im bal structure m odeling.
ances, taxes, liquidity differences, and other factors unrelated
to interest rate m odels. In these cases, fitting to m arket prices
will make a model worse by attributing these outside factors to 13.5 TH E V A S IC E K M O D EL:
the interest rate process. If, for exam ple, a large bank liquidates M EAN R EV ER SIO N
its portfolio of bonds or swaps with approxim ately seven years
to maturity and, in the process, depresses prices and raises Assuming that the econom y tends toward some equilibrium
rates around that maturity, it would be incorrect to assume that based on such fundam ental factors as the productivity of capital,
expectations of rates seven years in the future have risen. Being long-term m onetary policy, and so on, short-term rates will be
careful with the word fair, the seven-year securities in this exam characterized by mean reversion. W hen the short-term rate is
ple are fair in the sense that liquidity considerations at a particu above its long-run equilibrium value, the drift is negative, driving
lar tim e require their prices to be relatively low. The seven-year the rate down toward this long-run value. W hen the rate is
securities are not fair, however, with respect to the expected below its equilibrium value, the drift is positive, driving the rate
evolution of interest rates and the m arket risk premium. For this up toward this value. In addition to being a reasonable assum p
reason, in fact, investors and traders might buy these relatively tion about short rates,5 mean reversion enables a model to cap
cheap bonds or swaps and hold them past the liquidity event in ture several features of term structure behavior in an
the hope of selling at a profit. econom ically intuitive way.
Another way to express the problem of fitting the drift to the The risk-neutral dynamics of the Vasicek m odel6 are written as
term structure is to recognize that the drift of a risk-neutral pro
dr = k[0 — r)dt + crdw (13.8)
cess arises only from expectations and risk premium. A model
that assumes one drift from years 15 to 16 and another drift The constant 0 denotes the long-run value or central tendency
from years 16 to 17 im plicitly assumes one of two things. First, of the short-term rate in the risk-neutral process and the positive
the expectation today of the one-year rate in 15 years differs constant k denotes the speed of mean reversion. Note that in
from the expectation today of the one-year rate in 16 years. this specification, the greater the difference between r a n d 0, the
Second, the risk premium in 15 years differs in a particular greater the expected change in the short-term rate toward 0.
way from the risk premium in 16 years. Since neither of these Because the process (13.8) is the risk-neutral process, the drift
assum ptions is particularly plausible, a fitted drift that changes com bines both interest rate expectations and risk premium.
dram atically from one year to the next is likely to be erroneously Furtherm ore, m arket prices do not depend on how the risk-
attributing non-interest rate effects to the interest rate process. neutral drift is divided between the tw o. N evertheless, in order
If the purpose of a model is to value bonds or swaps relative to to understand w hether or not the param eters of a model make
one another, then taking a large number of bond or swap prices sense, it is useful to make assum ptions sufficient to separate
as given is clearly inappropriate: arbitrage-free m odels, by con the drift and the risk premium. Assum ing, for exam ple, that the
struction, conclude that all of these bond or swap prices are true interest rate process exhibits mean reversion to a long
fair relative to one another. Investors wanting to choose among term value Too and, as assumed previously, that the risk premium
securities, m arket makers looking to pick up value by strategi
cally selecting hedging securities, or traders looking to profit
from tem porary m ispricings must, therefore, rely on equilibrium 5 While reasonable, mean reversion is a strong assumption. Long time
models. series of interest rates from relatively stable markets might display mean
reversion because there happened to be no catastrophe over the time
Having starkly contrasted arbitrage-free and equilibrium m od period, that is, precisely because a long time series exists. Hyperinfla
els, it should be noted that, in practice, there need not be a tion, for example, is not consistent with mean reversion and results in the
destruction of a currency and its associated interest rates. When mean
clear line between the two approaches. A model might posit
reversion ends, the time series ends. In short, the most severe critics of
a determ inistic drift for a few years to reflect relatively short mean reversion would say that interest rates mean revert until they don't.
term interest rate forecasts and posit a constant drift from then 6 O. Vasicek, "An Equilibrium Characterization of the Term Structure,
on. A nother model might take the prices of 2-, 5-, 10- and Journal o f Financial Economics, 5, 1977, pp. 177-188. It is appropriate
30-year bond or swap rates as given, thus assuming that the to add that this paper started the literature on short-term rate models.
The particular dynamics of the model described in this section, which is
most liquid securities are fair while allowing the model to value commonly known as the Vasicek model, is a very small part of the contri
other securities. The proper blending of the arbitrage-free and bution of that paper.

enters into the risk-neutral process as a constant drift, the To sum m arize, the most straightforward tree representation of
Vasicek model takes the following form: (13.8) takes the following form:
£
d r = k(rx - r)dt + Ad t + crdw / 5.8902%
x 5.5060%'
r ) d t + crdw (13.9) Vy
^5.1628%
5.121%
^ 5.1643%
The process in (13.8) is identical to that in (13.9) so long as x 4.7786%-
v'v4.4369%
(13.10)
This tree does not recom bine since the drift increases with the
Note that very many com binations of and A give the same
difference between the short rate and 0. Since 4.7786% is fur
0 and, through the risk-neutral process (13.8), the same market ther from 0 than 5.5060% , the drift from 4.7786% is greater than
prices.
the drift from 5.5060% . In this model, the volatility com ponent
For the purposes of this section, let k = .025, a = 126 basis of an up move followed by a down move does perfectly cancel
points per year, r x = 6.179% , and A = .229% . According to the volatility com ponent of a down move followed by an up
(13.10), then, 0 = 15.339% . W ith these param eters, the process move. But since the drift from 4.7786% is greater, the move up
(13.8) says that over the next month the expected change in the from 4.7786% produces a larger short-term rate than a move
short rate is down from 5.5060% .
There are many ways to represent the Vasicek model with a

.025 X (15.339% - 5.121% ) ^ = .0213% (13.11)
recombining tree. O ne method is presented here, but it is
beyond the scope of this book to discuss the numerical effi
or 2.13 basis points. The volatility over the next month is
ciency of the various possibilities.
126 X \/^2 or 36.4 basis points.
The first time step of the tree may be taken as shown previously:
Representing this process with a tree is not quite so straightfor
ward as the sim pler processes described previously because the
most obvious representation leads to a nonrecombining tree.
O ver the first tim e step,
5 .121% + -025(15.339%-5.121%) + 4 1 2 6 = 5 . 5 0 6 0 o/o

12 712 N ext, fix the center node of the tree on date 2. Since the
5.121 expected perturbation due to volatility over each time step is
zero, the drift alone determ ines the expected value of the pro
5.121% + .025(15.339%-5.121%) _ J)1 2 6 = 4<7786% cess after each tim e step. A fter the first tim e step, the expected
12 712
value is
To extend the tree from date 1 to date 2, start from the up state
5.121% + .0 2 5 (1 5 .3 3 9 % - 5.121% ) — 5.1423% (13.12)
of 5.5060% . The tree branching from there is
, c/uo/ , .025(15.339%-5.5060%) , .0126 5 8902% A fter the second tim e step, the expected value is
D .D U O /o H------------------------------- —-------------------------------r
12 /12
5.5060% 5.1423% + .025 (15.339% - 5.1423% ) ^ = 5.1635% (13.13)
5 5Q60/ + .025(15.339%-5.5060%) _ .0126

5.1628% Take this value as the center node on date 2 of the recombining
12 712
tree:
while the tree branching from the date 1 down-state of
4.7786% is
4.7786% + .025(15.339%-4.7786%) + ^ 126 =

12 VT2
4.7786% + .025(15.339%-4.7786%) _ ^ 1 2 6 =
12 Vl2
The parts of the tree to be solved for, namely, the missing prob To extend the tree to the next date, begin again at the center.
abilities and interest rate values, are given variable names. From the center node of date 2, the expected rate of the pro
cess is
According to the process (13.8) and the param eter values set
in this section, the expected rate and standard deviation of the
5.1635% + .025 X (15.339% - 5 .1 6 3 5 % )^ = 5.1847% (13.21)
rate from 5.5060% are, respectively,
As in constructing the tree for date 1, adding and subtracting

5.5060% + .025 (15.339% - 5 .5 0 6 0 % ) ^ = 5.5265% (13.14)
the standard deviation of .3637% to the average value 5.1847%
(obtaining 5.5484% and 4.8210% ) and using probabilities of
and
50% for up and down m ovem ents satisfy the requirem ents of
1.26 % J ^ = .3637% (13.15) the process at the center of the tree:
J .U U U
For the recombining tree to match this expectation and stan

dard deviation, it must be the case that
p X ruu + (1 - p) X 5.1635% = 5.5265% (13.16)

5.5484%
and, by the definition of standard deviation,
V p ( r uu - 5.5265% f + (1 - p)(5.6135% - 5.5265% f = .3637%

4.8210%
(13.17)
Solving Equations (13.16 and (13.17), ruu = 5.8909% and

p = .4990.
rddd
The same procedure may be followed to com pute rdd and q.
The expected rate from 4.7786% is The unknown param eters can be solved for in the same manner
as described in building the tree on date 2.
4.7786% + .025 (15.339% - 4.7786% ) ^ = 4.8006% . (13.18)
The text now turns to the effects of mean reversion on the term
structure. Figure 13.5 illustrates the im pact of mean reversion
and the standard deviation is again 36.37 basis points. Starting
on the term inal, risk-neutral distributions of the short rate at
from 4.7786% , then, it must be the case that
different horizons. The expectation or mean of the short-term
q X 5.1635% + (1 - q) X rdd = 4.8006% (13.19) rate as a function of horizon gradually rises from its current value
of 5.121% toward its limiting value of 0 = 15.339% . Because
and
the mean-reverting param eter k = .025 is relatively small, the
V q (5 .1 6 3 5 % - 4.8006% )2 + (1 + q)(rdd - 4.8006% )2 = .3637% horizon expectation rises very slowly toward 15.339% . W hile
(13.20)
Solving Equations (13.19) and (13.20), rdd = 4.4361%

and q = .5011.
Putting the results from the up- and downstates

together, a recombining tree approxim ating the process
(13.8) with the param eters of this section is
5.8909%
Horizon
5.121 5.1635%
-------Theta .............. No MR +1 sd ------- No MR -1 sd
-------M e a n -------- k = .025 +1 sd ------- k = .025 -1 sd
Fiqure 13.5 Mean reversion and the terminal distribution of
4.4361% short rates.

m athem atically beyond the scope of this book, it can be shown In the numerical exam ple, with a mean-reverting param eter
that the distance between the current value of a factor and its of .025 and a volatility of 126 basis points, the short rate in 10
goal decays exponentially at the mean-reverting rate. Since the years is normally distributed with an expected value of 7.3812% ,
interest rate is currently 15.339% — 5.121% or 10.218% away derived earlier, and a standard deviation of
from its goal, the distance between the expected rate at a
.01262
10-year horizon and the goal is (1 - e -2 x . 025X10' (13.27)
2 X .025
10.2180% X e--025x1° = 7.9578% (13.22)
or 353 basis points. Using the same expected value and a but
Therefore, the expectation of the rate in 10 years is no mean reversion the standard deviation is a V f = 1 .2 6 % V lO
15.3390% - 7.9578% or 7.3812% . or 398 basis points. Pulling the interest rate toward a long-term
goal dam pens volatility relative to processes without mean
For com pleteness, the expectation of the rate in the Vasicek
reversion, particularly at long horizons.
model after T years is
To avoid confusion in term inology, note that the mean-reverting
r0e - kT + (9(1 - e~kT) (13.23)
model in this section sets volatility equal to 125 basis points
In words, the expectation is a weighted average of the current "p er year." Because of mean reversion, however, this does not
short rate and its long-run value, where the w eight on the cur mean that the standard deviation of the terminal distribution
rent short rate decays exponentially at a speed determ ined by after T years increases with the square root of tim e. W ithout
the mean-reverting parameter. mean reversion, this is the case, as m entioned in the previ
The mean-reverting param eter is not a particularly intuitive way ous paragraph. With mean reversion, the standard deviation
of describing how long it takes a factor to revert to its long-term increases with horizon more slowly than that, producing a stan
goal. A more intuitive quantity is the factor's half-life, defined as dard deviation of only 353 basis points after 10 years.
the tim e it takes the factor to progress half the distance toward Figure 13.6 graphs the rate curves in this param eterization of
its goal. In the exam ple of this section, the half-life of the inter the Vasicek m odel. The values of ro and 0 were calibrated to
est rate, r , is given by the following equation: match the 2- and 10-year par rates in the m arket. As a result,
Figure 13.6 qualitatively resem bles Figure 13.4. The mean rever
(15.339% - 5.121%)e~ 025T = ^ (15.339% - 5.121% ) (13.24)
sion param eter might have been used to make the model fit the
observed term structure more closely, but, as discussed in the
Solving,
next paragraph, this param eter was used to produce a particular
0 —.025t _ 1_ term structure of volatility. In conclusion, Figure 13.6 shows that
2 the model as calibrated in this section is probably not flexible
M 2) enough to produce the range of term structures observed in
T
.025 practice.
t = 27.73 (13.25)
A model with mean reversion and a model without mean rever
w here In (• ) is the natural logarithm function. In w ords, the sion result in dram atically different term structures of volatility.
interest rate facto r takes 27.73 years to cover half the
distance betw een its starting value and its goal. This
8 . 0%
can be seen visually in Figure 13.5 w here the exp ected
rate 30 years from now is about halfw ay betw een its
current value and 0. Larger m ean-reverting param eters
produce shorter half lives. <
+ u
C-T»3
Figure 13.5 also shows one-standard deviation intervals
around expectations both for the mean-reverting process 5.0%
of this section and for a process with the same exp ecta 4.5%
tion and the same a but without mean reversion ("N o
M R"). The standard deviation of the terminal distribution
of the short rate after T years in the Vasicek model is
P a r ------ Spot Forward ■ Market
F ia u re 1 3.6 Rate curves from the Vasicek model and

(13.26)
selected market swap rates, February 16, 2001.
Figure 13.8 shows the shape of the interest rate factor in a
mean-reverting model, that is, how the spot rate curve is
affected by a 10-basis point increase in the short-term rate.
By definition, short-term rates rise by about 10 basis points
but longer term rates are impacted less. The 30-year spot
rate, for exam ple, rises by only 7 basis points. Hence a
model with mean reversion is not a parallel shift model.
The im plications of mean reversion for the term structure

of volatility and factor shape may be better understood
by reinterpreting the assumption that short rates tend
Par Rate Volatility Market Implied Volatility toward a long-term goal. Assuming that short rates move
Fiqure 13.7 Par rate volatility from the Vasicek model and as a result of some news or shock to the econom ic sys
selected implied volatilities, February 16, 2001. tem , mean reversion implies that the effect of this shock
eventually dissipates. A fter all, regardless of the shock, the short
rate is assumed to arrive ultimately at the same long-term goal.
Table 1 3 .2 Convexity Effects on par Rates in a Economic news is said to be long-lived if it changes the market's
Parameterization of the Vasicek Model view of the economy many years in the future. For exam ple, news
of a technological innovation that raises productivity would be a
Term (years) C o n v e x ity (bps)
relatively long-lived shock to the system. Economic news is said
2 - 1 .0 to be short-lived if it changes the market's view of the economy
5 - 5 .8 in the near but not far future. An exam ple of this kind of shock
might be news that retail sales were lower than expected due to
10 -1 9 .1
excessively cold weather over the holiday season. In this interpre
30 - 7 4 .7
tation, mean reversion measures the length of economic news in a
term structure model. A very low mean reversion parameter, i.e.,
a very long half-life, implies that news is long-lived and that it will
Figure 13.7 shows that the volatilities of par rates decline with affect the short rate for many years to come. On the other hand,
term in the Vasicek model. In this exam ple the mean reversion a very high mean reversion parameter, i.e., a very short half-life,
and volatility param eters are chosen to fit the implied 10- and implies that news is short-lived and that it affects the short rate for
30-year volatilities. As a result, the model m atches the m arket at a relatively short period of time. In reality, of course, some news
those two term s but overstates the volatility for shorter term s. is short-lived while other news is long-lived, a feature captured by
W hile Figure 13.7 certainly shows an im provem ent relative to the multi-factor Gauss + model.
the flat term structure of volatility shown in Figure 13.3, mean
Interpreting mean reversion as the length of econom ic news
reversion in this model generates a term structure of volatility
explains the factor structure and the downward-sloping term
that slopes downward everyw here.
Since mean reversion lowers the volatility of longer-term

par rates, it must also lower the impact of convexity on
these rates. Table 13.2 reports the convexity effect at sev
eral terms. Recall that the convexity effects listed in Table
13.1 are generated from a model with no mean reversion
and a volatility of 113 basis points per year. Since this
section sets volatility equal to 126 basis points per year
and since mean reversion is relatively slow, the convexity
effects for terms up to 10 years are slightly larger in Table
13.2 than in Table 13.1. But by a term of 30 years the
dampening effect of mean reversion on volatility manifests
itself, and the convexity effect in the Vasicek model of
about 75 basis points is substantially below the 135 basis Fiaure 13.8 Sensitivity of spot rates in the Vasicek model to
point in the model without mean reversion. a 10-basis-point change in the factor.

structure of volatility in the Vasicek model. Rates of every term flat term structure of volatility. In a model with mean reversion,
are com binations of current econom ic conditions, as measured short-term rates are determ ined mostly by current econom ic
by the short-term rate, and of long-term econom ic conditions, conditions while longer-term rates are determ ined mostly by
as measured by the long-term value of the short rate (i.e., 0). In long-term econom ic conditions. A s a result, shocks to the short
a model with no mean reversion, rates are determ ined exclu rate affect short-term rates more than longer-term rates and
sively by current econom ic conditions. Shocks to the short-term give rise to a downward-sloping term structure of volatility and a
rate affect all rates equally, giving rise to parallel shifts and a downward-sloping factor structure.
The Art of Term
Structure Models:
Volatility and
Distribution
Learning Objectives
Describe the short-term rate process under a model with Describe the short-term rate process under the Cox-
tim e-dependent volatility. Ingersoll-Ross (CIR) and lognormal models.
Calculate the short-term rate change and determ ine the Calculate the short-term rate change and describe the
behavior of the standard deviation of the rate change basis point volatility using the CIR and lognormal models.
using a model with tim e dependent volatility.
Describe lognormal models with determ inistic drift and
A ssess the efficacy of tim e-dependent volatility m odels. mean reversion.
E x c e rp t is C hapter 10 o f Fixed Income Securities: Tools for Today's M arkets, Third Edition, b y Bruce Tuckman and A n g e l Serrat.
187
This chapter continues the presentation of the elem ents
of term structure m odeling, focusing on the volatility
of interest rates and on models in which rates are not
normally distributed.
14.1 TIM E-D EP EN D EN T

VOLATILITY: M O D EL 3
Ju st as a tim e-dependent drift may be used to fit many
bond or swap rates, a tim e-dependent volatility function Horizon
may be used to fit many option prices. A particularly
Fiaure 14.1 Standard deviation of terminal distributions of
simple model with a tim e-dependent volatility function
short rates in Model 3.
might be written as follows:
d r = A(t)dt + a [t)d w (14.1)

The behavior of standard deviation as a function of horizon in
Unlike the m odels presented in C hapter 13, the volatility Figure 14.1 resem bles the im pact of mean reversion on horizon
of the short rate in Equation (14.1) depends on tim e. If, for standard deviation in Figure 13.5. In fact, setting the initial vola
exam ple, the function a{t) were such that a (1) = 1.26% and tility and decay rate in Model 3 equal to the volatility and mean
cr(2) = 1.20% , then the volatility of the short rate in one year is reversion rate of the numerical exam ple of the Vasicek model,
126 basis points per year while the volatility of the short rate in the standard deviations of the terminal distributions from the
two years is 120 basis points per year. two m odels turn out to be identical. Furtherm ore, if the time-
To illustrate the features of tim e-dependent volatility, consider dependent drift in Model 3 matches the average path of rates in
the following special case of (14.1) that will be called Model 3: the numerical exam ple of the Vasicek m odel, then the two mod
els produce exactly the same terminal distributions.
d r = A(t)dt + cre~atd w (14.2)
W hile these param eterizations of the two models give equiva
In (14.2), the volatility of the short rate starts at the constant a
lent terminal distributions, the m odels remain very different in
and then exponentially declines to zero. Volatility could have
other ways. As is the case for any model without mean rever
easily been designed to decline to another constant instead of
sion, Model 3 is a parallel shift m odel. Also, the term structure
zero, but Model 3 serves its pedagogical purpose well enough.
of volatility in Model 3 is flat. Since the volatility in Model 3
Setting a = 126 basis points and a = .025, Figure 14.1 graphs changes over tim e, the term structure of volatility is flat at levels
the standard deviation of the terminal distribution of the short that change over tim e, but it is still always flat.
rate at various horizons.1 Note that the standard deviation rises
The arguments for and against using time-dependent volatility
rapidly with horizon at first but then rises more slowly. The par
resemble those for and against using a time-dependent drift. If the
ticular shape of the curve depends, of course, on the volatility
purpose of the model is to quote fixed income options prices that
function chosen for (14.2), but very many shapes are possible
are not easily observable, then a model with time-dependent vola
with the more general volatility specification in (14.1). tility provides a means of interpolating from known to unknown
D eterm inistic volatility functions are popular, particularly among option prices. If, however, the purpose of the model is to value
m arket m akers in interest rate options. C onsider the exam ple and hedge fixed income securities, including options, then a
of caplets. A t expiration, a caplet pays the difference between model with mean reversion might be preferred for two reasons.
the short rate and a strike, if positive, on som e notional
First, while mean reversion is based on the econom ic intuitions
am ount. Furtherm ore, the value of a caplet depends on the dis
outlined earlier, tim e-dependent volatility relies on the difficult
tribution of the short rate at the caplet's expiration. Therefore,
argum ent that the m arket has a forecast of short-term volatility
the flexibility of the determ inistic functions A(t) and cr(t) may be
in the distant future. A modification of the model that addresses
used to match the m arket prices of caplets expiring on many
this objection, by the way, is to assume that volatility depends
different dates. *
on tim e in the near future and then settles at a constant.
Second, the downward-sloping factor structure and term

1 This result is presented without derivation. structure of volatility in mean-reverting models capture the
behavior of interest rate m ovem ents better than parallel
shifts and a flat term structure of volatility. It may very well
be that the Vasicek model does not capture the behavior
of interest rates sufficiently well to be used for a particular
valuation or hedging purpose. But in that case it is unlikely
that a parallel shift model calibrated to match caplet
prices will be better suited for that purpose.
14.2 THE COX-INGERSOLL-ROSS

AND LOGNORM AL MODELS: Rate
VOLATILITY AS A FUNCTION OF C o n s t a n t ------ Square Root -------- Proportional
THE SHORT RATE Fig u re 1 4.2 Three volatility specifications.
The models presented so far assume that the basis-point

volatility of the short rate is independent of the level of the specifications. For comparison purposes, a is set in all three
short rate. This is alm ost certainly not true at extrem e levels cases such that basis-point volatility equals 100 at a short rate of
of the short rate. Periods of high inflation and high short-term 8%. M athem atically,
interest rates are inherently unstable and, as a result, the basis- a bP = .01 (14.6)
point volatility of the short rate tends to be high. Also, when the
X V 8 % = 1% ==>a CIR = .0354 (14.7)
short-term rate is very low, its basis-point volatility is limited by
the fact that interest rates cannot decline much below zero. ay X 8% = 1% =^ = 12.5% (14.8)
Econom ic arguments of this sort have led to specifying Note that the units of these volatility measures are som ewhat
the basis-point volatility of the short rate as an increasing different. Basis-point volatility is in the units of an interest rate
function of the short rate. The risk-neutral dynamics of the (e.g ., 100 basis points), while yield volatility is expressed as a
Cox-Ingersoll-Ross (CIR) model are percentage of the short rate (e.g ., 12.5%).
d r = k(0 - r)dt + a V r d w (14.3) As shown in Figure 14.2, the CIR and proportional volatility
specifications have basis-point volatility increasing with rate but
Since the first term on the right-hand side of (14.3) is not a
at different speeds. Both models have the basis-point volatility
random variable and since the standard deviation of d w equals
equal to zero at a rate of zero.
\ / d t by definition, the annualized standard deviation of dr
(i.e., the basis-point volatility) is proportional to the square root Th e p ro p erty th at basis-point vo latility equals zero when the
of the rate. Put another way, in the CIR model the parameter a is short rate is zero, com bined with the condition that the drift
constant, but basis-point volatility is not: annualized basis-point is positive when the rate is zero, g uarantees that the short
volatility equals c r V r and increases with the level of the short rate. rate cannot becom e neg ative. In som e respects this is an
im provem ent over m odels with constant basis-point vo latility
Another popular specification is that the basis-point volatility is
th at allow in terest rates to becom e neg ative. It should be
proportional to rate. In this case the param eter a is often called
noted again, how ever, th at choosing a m odel d ep end s on
yield volatility. Two exam ples of this volatility specification are
the purpose at hand. C o n sid e r a tra d e r who b elieves the
the Courtadon model,
fo llo w in g . O n e, the assum ption of constant vo latility is best
d r - k{0 - r)dt + crrdw (14.4) in the current econom ic environm ent. Two, the p ossibility
and the sim plest lognorm al m odel, to be called Model 4, a vari of negative rates has a small im pact on the pricing of the
ation of which will be discussed in the next section: secu rities under co n sid eratio n . A nd th ree, the com putational
sim plicity of constant vo latility m odels has great value. This
d r = ardt + a rd w (14.5)
tra d e r m ight very w ell opt for a m odel that allow s som e
In these two specifications, yield volatility is constant but basis- p ro b ab ility of negative rates.
point volatility equals err and increases with the level of the rate.
Figure 14.3 graphs term inal distributions of the short rate
Figure 14.2 graphs the basis-point volatility as a function of rate after 10 years under the C IR , norm al, and lognorm al volatility
for the cases of the constant, square root, and proportional sp ecificatio n s. In order to em phasize the difference in the
Chapter 14 The Art of Term Structure Models: Volatility and Distribution ■ 189
Redefining the notation of the tim e-dependent drift so
that a(t) = a(t) — 1/2cr2, Equation (14.11) becom es
/ cf[ln (r)] = a(t)dt + crdw (14.12)

tco
o / / Equation (14.12) says that the natural logarithm of the
Q
short rate is normally distributed. Furtherm ore, by defi
i f \ \ nition, a random variable has a lognormal distribution
if its natural logarithm has a normal distribution. Th ere
fore, (14.12) implies that the short rate has a lognormal
- j j
-5% 0% 10% 15% distribution.
Equation (14.12) may be described as the Ho-Lee

Lognorma model based on the natural logarithm of the short rate
Fiq u re 14.3 Terminal distributions of the short rate after ten instead of on the short rate itself. Adapting the tree
years in CIR, normal, and lognormal models. for the Ho-Lee model accordingly, the tree for the first
three dates is
shape of the three distributions, the param eters have been In r0 + (a-| + a2)dt + 2 a id t
chosen so that all of the distributions have an exp ected
value of 5% and a standard deviation of 2.32% . The figure
illustrates the advantage of the CIR and lognorm al m odels In r0 + (a-| + a2)c/t
with respect to not allowing negative rates. The figure also
indicates that out-of-the-m oney option prices could differ
significantly under the three m odels. Even if, as in this case,
the mean and volatility of the three distributions are the In r0 + (a-| + a2)d t- 2a id t
sam e, the probability of outcom es aw ay from the m eans are

To express this tree in rate, as opposed to the natural logarithm
different enough to g enerate significantly different options
of the rate, exponentiate each node:
prices. More g enerally, the shape of the distribution used in
an interest rate model is an im portant determ inant of that
m odel's perform ance.
14.3 TREE FOR THE ORIGINAL

SALOM ON BROTHERS M ODEL
This section shows how to construct a binomial tree to
approxim ate the dynamics for a lognormal model with a
This tree shows that the perturbations to the short rate in a
determ inistic drift, a model attributed here to researchers at
lognormal model are m ultiplicative as opposed to the additive
Salomon Brothers in the '80s. The dynamics of the model are
perturbations in normal models. This observation, in turn, reveals
as follows:
why the short rate in this model cannot becom e negative. Since
d r = a(t)rdt + crrdw (14.9) ex is positive for any value of x, so long as r0 is positive every
node of the lognormal tree results in a positive rate.
By Ito's Lemm a, which is beyond the mathematical scope of this
book, The tree also reveals why volatility in a lognormal model is
expressed as a percentage of the rate. Recall the m athem ati
d[ln (r)] = — - l< r2d t (14.10) cal fact that, for small values of x, ex ~ 1 + x. Setting a-| = 0
r 2
Substituting (14.9) into (14.10), and d t = 1, for exam ple, the top node of date 1 may be
approxim ated as
d[ln (r)] = a(t)- j<r2 d t + crdw (14.11)
r0e" « r0(1 + cr) (14.13)
Volatility is clearly a percentage of the rate in equation (14.13). respectively. It follows that the step down from the up-state
If, for exam ple, a = 12.5% , then the short rate in the up-state is requires a rate of
12.5% above the initial short rate. r e o-(1)V/d te k (2 )[ln 0 (2 )-{ln r1+ o -(1 )V d t}]d t-o -(2 )\/d t
(14.18)
A s in the Ho-Lee m odel, the constants that determ ine the drift
(i.e., a-| and a2) may be used to match m arket bond prices. while the step up from the down-state requires a rate of
r e —o-(1)\/dte /c(2)[ln0(2)—{In r ^ - a O )V d t}]d t+ a (2 )V Jt

(14.19)
A little algebra shows that the tree recom bines only if

14.4 TH E BLA CK-KA RA SIN SKI M O D EL:
A LO G N O R M A L M O D EL WITH M EAN <r(D ~ <r(2)
(14.20)
cr(1)dt
R EV ER SIO N
Imposing the restriction (14.20) would require that the
mean reversion speed be com pletely determ ined by the
The final model to be presented in this chapter is a lognormal
tim e-dependent volatility function. But these elem ents of a term
model with mean reversion called the Black-Karasinski model.
structure model serve two distinct purposes. A s dem onstrated
The model allows volatility, mean reversion, and the central
in this chapter, mean reversion controls the term structure of
tendency of the short rate to depend on tim e, firmly placing the
volatility while tim e-dependent volatility controls the future
model in the arbitrage-free class. A user may, of course, use or
volatility of the short-term rate (and the prices of options that
remove as much tim e dependence as desired.
expire at different tim es). To create a model flexible enough
The dynamics of the model are written as to control mean reversion and tim e-dependent volatility sepa
dr = k(t)(lnO(t) - In r)dt + a (t)rd w (14.14) rately, the model has to construct a recom bining tree without
imposing (14.20). To do so it allows the length of the tim e step,
or, equivalently,2 as dt, to change over tim e.
d[ln r] = k(t)(ln 0{t) — In r)dt + cr[t)dw (14.15) Rewriting Equations (14.18) and (14.19) with the tim e steps
labeled dt| and dt2 gives the following values for the up-down
In words, Equation (14.15) says that the natural logarithm of the
and down-up rates:
short rate is normally distributed. It reverts to In Q{t) at a speed
of k(t) with a volatility of cr(t). Viewed another way, the natural r ê a { v V ^ e m \n0(2)H \nr,+ *w V^ }]dt2-a(2)Vdt2
(14.21)
logarithm of the short rate follows a tim e-dependent version of
f e -o-(1) V d ^ e k(2)[ln0(2 )-{ln r^ -a ('\)V d ^ }]d t2+ a (2 )V d t2 (14.22)
the Vasicek model.
As in the previous section, the corresponding tree may be w rit A little algebra now shows that the tree recom bines if
ten in term s of the rate or the natural logarithm of the rate.
o-(2)V dT2 ~
Choosing the former, the process over the first date is (14.23)
o-(1 ) V d t \ -
r0e k(1)(ln 0(1) - In rQ)dt + <r(1)'/dt = r ê o-(1)>/dt The length of the first time step can be set arbitrarily. The
length of the second tim e step is set to satisfy (14.23), allowing
the user freedom in choosing the mean reversion and volatility
rQ e ^(1)(ln «9(1) —In r0)d t-a ^ )-!d t = r ê -o-{'\)'Idt functions independently.
The variable q is introduced for readability. The natural loga

rithms of the rates in the up and down-states are
14.5 A P P EN D IX
In n + o -(1 )V d t (14.16)
and Closed-Form Solutions for Spot Rates

In r, - o -(1 )V d t (14.17) This appendix lists form ulas for spot rates, without derivation, in
various models mentioned in the text. These can be useful for
some applications and also to gain intuition about applying term
2 This derivation is similar to that of moving from Equation (14.9) to

structure m odels. The spot rates of term T, r(T), are continuously
Equation (14.12). com pounded rates.
Chapter 14 The Art of Term Structure Models: Volatility and Distribution ■ 191
Model 1 Cox-Ingersoll-Ross
r( D = r0 - (14.24) Let P(T) be the price of a zero-coupon bond maturing at time T

(from which the spot rate can be easily calculated). Then,
Model 2 P(T) = A(T)e~B(T>r° (14.28)

AT
r( 7) = r0 + (14.25) where
2
2kG/(T2
2 he{k+h)T^2
(14.29)
_ 2 h + (k + h)(eht - 1)_
-kT
1 —e
r(T ) = 0 + (r0 ~ 0)
kJ 2(eht — 1)
(14.30)
2h + (k + h)(ehT - 1)
(T 1 - e~2kT
1 + (14.26)
2k 2kT h = V k 2 + 2d2 (14.31)
Model 3 with A(t) = A
2 2a 2T2 - 2 a T + 1 - e~2aT ^ ^
r(T) = r0 + - a-2 ---------------- z--------------- 14.27
8 a 3T
Volatility Smiles
Learning Objectives
Define volatility smile and volatility skew. Describe alternative ways of characterizing the volatility
smile.
Explain the im plications of put-call parity on the implied
volatility of call and put options. Describe volatility term structures and volatility surfaces
and how they may be used to price options.
Com pare the shape of the volatility smile (or skew) to the
shape of the implied distribution of the underlying asset Explain the im pact of the volatility smile on the calculation
price and to the pricing of options on the underlying asset. of the "G re e k s".
Describe characteristics of foreign exchange rate distribu Explain the im pact of a single asset price jum p on a
tions and their im plications on option prices and implied volatility smile.
volatility.
Describe the volatility smile for equity options and foreign

currency options and provide possible explanations for its
shape.
Excerpt is Chapter 20 o f Options, Futures, and Other Derivatives, Tenth Edition, by John C. Hull.
193
How close are the m arket prices of options to those predicted cmkt are the m arket values of these options. Because put-call
by the Black-Scholes-M erton m odel? Do traders really use the parity holds for the Black-Scholes-M erton model, we must have
Black-Scholes-M erton model when determ ining a price for an
P bs + $oe qT = cBs + Ke rT
option? Are the probability distributions of asset prices really
In the absence of arbitrage opportunities, put-call parity also
lognormal? This chapter answers these questions. It explains
holds for the m arket prices, so that
that traders do use the Black-Scholes-M erton model— but not
in exactly the way that Black, Scholes, and Merton originally Pmkt + S0e“ qT = cmkt + Ke~rJ
intended. This is because they allow the volatility used to price Subtracting these two equations, we get
an option to depend on its strike price and tim e to maturity.
P bs - Pmkt = CBS + cmkt (15.2)
A plot of the implied volatility of an option with a certain life This shows that the dollar pricing error when the Black-Scho les-
as a function of its strike price is known as a volatility smile. Merton model is used to price a European put option should be
This chapter describes the volatility smiles that traders use in exactly the same as the dollar pricing error when it is used to
equity and foreign currency m arkets. It explains the relation price a European call option with the same strike price and time
ship between a volatility smile and the risk-neutral probability to maturity.
distribution being assumed for the future asset price. It also
Suppose that the implied volatility of the put option is 22%. This
discusses how option traders use volatility surfaces as pricing
means that p BS = pmkt when a volatility of 22% is used in the
tools.
Black-Scholes-M erton model. From equation (15.2), it follows
that cBs = cmkt when this volatility is used. The implied volatility
15.1 W H Y TH E VO LATILITY SM ILE IS of the call is, therefore, also 22%. This argument shows that the
TH E SA M E FO R CA LLS AN D PUTS implied volatility of a European call option is always the same
as the implied volatility of a European put option when the two
This section shows that the implied volatility of a European call have the same strike price and maturity date. To put this another
option is the same as that of a European put option when they way, for a given strike price and maturity, the correct volatility
have the same strike price and tim e to maturity. This means that to use in conjunction with the Black-Scholes-M erton model to
the volatility smile for European calls with a certain maturity is price a European call should always be the same as that used to
the same as that for European puts with the same maturity. This price a European put. This means that the volatility smile (i.e.,
is a particularly convenient result. It shows that when talking the relationship between implied volatility and strike price for a
about a volatility smile we do not have to worry about whether particular maturity) is the same for European calls and European
the options are calls or puts. puts. More generally, it means that the volatility surface (i.e., the
implied volatility as a function of strike price and time to maturity)
A s exp lain ed in earlier ch ap ters, p u t-call parity provides a
is the same for European calls and European puts. These results
relationship betw een the prices of European call and put
are also true to a good approximation for American options.
options when they have the sam e strike price and tim e to
m aturity. W ith a dividend yield on the underlying asset of
q, the relationship is Example 15.1
p + S0e~qT = c + Ke~rT (15.1) The value of a foreign currency is USD 0.60. The risk-free interest
rate is 5% per annum in the United States and 10% per annum in
As usual, c and p are the European call and put price. They have
the foreign country. The market price of a European call option on
the same strike price, K, and tim e to maturity, T. The variable S q
the foreign currency with a maturity of 1 year and a strike price of
is the price of the underlying asset today, and r is the risk-free
USD 0.59 is 0.0236. DerivaGem shows that the implied volatility
interest rate for maturity T.
of the call is 14.5%. For there to be no arbitrage, the put-call par
A key feature of the put-call parity relationship is that it is based ity relationship in equation (15.1) must apply with q equal to the
on a relatively sim ple no-arbitrage argum ent. It does not require foreign risk-free rate. The price p of a European put option with a
any assumption about the probability distribution of the asset strike price of USD 0.59 and maturity of 1 year therefore satisfies
price in the future. It is true both when the asset price distribu
p + 0.60e~°'10x1 = 0.0236 + 0.59e~005x1
tion is lognormal and when it is not lognormal.
so that p = 0.0419. D erivaGem shows that, when the put has
Suppose that, for a particular value of the volatility, p BS and cBs
this price, its implied volatility is also 14.5% . This is what we
are the values of European put and call options calculated using
exp ect from the analysis just given.
the Black-Scholes-M erton model. Suppose further that p mkt and
15.2 FO R EIG N C U R R EN C Y O PTIO N S
The volatility smile used by traders to price foreign currency

options has the general form shown in Figure 15.1. The implied
volatility is relatively low for at-the-money options. It becom es
progressively higher as an option moves either into the money
or out of the money.
In the appendix at the end of this chapter, we show how to

determ ine the risk-neutral probability distribution for an asset
price at a future tim e from the volatility sm ile given by options
maturing at that tim e. We refer to this as the im plied Fiaure 15.1 Volatility smile for foreign currency
distribution. The volatility sm ile in Figure 15.1 corresponds options ( K = strike price, S0 = current exchange rate).
to the im plied distribution shown by the solid line in
Figure 15.2. A lognorm al distribution with the sam e mean and
standard deviation as the im plied distribution is shown by the Table 15.1 exam ines the daily movements in 10 different
dashed line in Figure 15.2. It can be seen that the exchange rates over a 10-year period between 2005 and 2015.
im plied distribution has heavier tails than the lognormal The exchange rates are those between the U.S. dollar and the
d istrib u tio n .1 following currencies: Australian dollar, British pound, Canadian
dollar, Danish krone, euro, Jap an ese yen, Mexican peso, New
To see that Figures 15.1 and 15.2 are consistent with each other,
Zealand dollar, Swedish krona, and Swiss franc. The first step in
consider first a deep-out-of-the-money call option with a high
the production of the table is to calculate the standard devia
strike price of K2 {K2/S q well above 1.0). This option pays off
tion of daily percentage change in each exchange rate. The
only if the exchange rate proves to be above K2. Figure 15.2
next stage is to note how often the actual percentage change
shows that the probability of this is higher for the implied prob
exceeded 1 standard deviation, 2 standard deviations, and so
ability distribution than for the lognormal distribution. We
on. The final stage is to calculate how often this would have
therefore exp ect the implied distribution to give a relatively high
happened if the percentage changes had been normally distrib
price for the option. A relatively high price leads to a relatively
uted. (The lognormal model implies that percentage changes
high implied volatility— and this is exactly what we observe in
are alm ost exactly normally distributed over a one-day time
Figure 15.1 for the option. The two figures are therefore con
period.)
sistent with each other for high strike prices. Consider next
a deep-out-of-the-money put option with a low strike price Daily changes exceed 3 standard deviations on 1.30% of days.
of /<•] (/C|/f>o well below 1.0). This option pays off only if the The lognormal model predicts that this should happen on only
exchange rate proves to be below /C|. Figure 15.2 shows that 0.27% of days. Daily changes exceed 4, 5, and 6 standard devia
the probability of this is also higher for the implied probability tions on 0.49% , 0.24% , and 0.13% of days, respectively. The log
distribution than for the lognormal distribution. We therefore normal model predicts that we should hardly ever observe this
exp ect the implied distribution to give a relatively high price,
and a relatively high implied volatility, for this option as well.
Again, this is exactly what we observe in Figure 15.1.
Table 15.1 Percentage of Days When Daily Exchange
Rate Moves are Greater than 1, 2, . . . , 6 Standard
Empirical Results Deviations (SD = Standard Deviation of Daily Change)
We have just shown that the volatility smile used by traders for Real World Lognormal Model
foreign currency options implies that they consider that the >1 SD 23.32 31.73
lognormal distribution understates the probability of extrem e
> 2 SD 4.67 4.55
m ovem ents in exchange rates. To test w hether they are right,
> 3 SD 1.30 0.27
> 4 SD 0.49 0.01

1 This is known as kurtosis. Note that, in addition to having a heavier tail,
the implied distribution is more "peaked." Both small and large move > 5 SD 0.24 0.00
ments in the exchange rate are more likely than with the lognormal dis
> 6 SD 0.13 0.00
tribution. Intermediate movements are less likely.
Chapter 15 Volatility Smiles ■ 195

Black, Scholes, and Merton in their option pricing model
assume that the underlying asset price has a lognormal
distribution at future tim es. This is equivalent to the
assumption that asset price changes over a short period
of tim e, such as one day, are normally distributed. Sup
pose that most market participants are com fortable with
the Black-Scholes-M erton assum ptions for exchange
rates. You have just done the analysis in Table 15.1 and
know that the lognormal assumption is not a good one for
exchange rates. W hat should you do?
The answer is that you should buy deep-out-of-the-money

Fiq u re 1 5.2Implied and lognormal distribution for call and put options on a variety of different currencies
foreign currency options. and w ait. These options will be relatively inexpensive and
more of them will close in the money than the lognormal
happening. The table therefore provides evidence to support model predicts. The present value of your payoffs will on
the existence of heavy tails (Figure 15.2) and the volatility smile average be much greater than the cost of the options.
used by traders (Figure 15.1). Business Snapshot 15.1 shows
In the mid-1980s, a few traders knew about the heavy tails
how you could have made money if you had done the analysis in
of foreign exchange probability distributions. Everyone
Table 15.1 ahead of the rest of the market.
else thought that the lognormal assumption of Black-
Scholes-M erton was reasonable. The few traders who were
Reasons for the Smile in Foreign Currency well informed followed the strategy we have described—
and made lots of money. By the late 1980s everyone real
Options
ized that foreign currency options should be priced with a
W hy are exchange rates not lognormally distributed? Two of the volatility smile and the trading opportunity disappeared.
conditions for an asset price to have a lognormal distribution are:
1. The volatility of the asset is constant.
2. The price of the asset changes sm oothly with no jum ps. The result of all this is that the volatility smile becom es less
pronounced as option maturity increases.
In practice, neither of these conditions is satisfied for an
exchange rate. The volatility of an exchange rate is far from con
stant, and exchange rates frequently exhibit jum ps, som etim es
15.3 EQ U IT Y O PTIO N S
in response to the actions of central banks. It turns out that both
a nonconstant volatility and jum ps will have the effect of making
Prior to the crash of 1987, there was no marked volatility smile
extrem e outcom es more likely.
for equity options. Since 1987, the volatility smile used by trad
The im pact of jum ps and nonconstant volatility depends on the ers to price equity options (both on individual stocks and on
option maturity. As the maturity of the option is increased, the stock indices) has had the general form shown in Figure 15.3.
percentage im pact of a nonconstant volatility on prices becom es This is som etim es referred to as a volatility skew . The volatility
more pronounced, but its percentage im pact on implied volatil decreases as the strike price increases. The volatility used to
ity usually becom es less pronounced. The percentage impact of price a low-strike-price option (i.e., a deep-out-of-the-money
jum ps on both prices and the implied volatility becom es less put or a deep-in-the-money call) is significantly higher than
pronounced as the maturity of the option is increased.2 that used to price a high-strike-price option (i.e., a deep-in-the-
money put or a deep-out-of-the-money call).
2 When we look at sufficiently long-dated options, jumps tend to get The volatility smile for equity options corresponds to the implied
"averaged out," so that the exchange rate distribution when there
are jumps is almost indistinguishable from the one obtained when the probability distribution given by the solid line in Figure 15.4.
exchange rate changes smoothly. A lognormal distribution with the same mean and standard
F ia u re 1 5 .4 Implied distribution and lognormal
distribution for equity options.
Fig u re 15.3 Volatility smile for equities (K = strike concerns leverage. As equity prices move down (up), lever
price, S0 = current equity price). age increases (decreases) and as a result volatility increases
(decreases). Another is referred to as the volatility fe ed b a ck
effect. As volatility increases (decreases) because of external fac
deviation as the implied distribution is shown by the dotted line.
tors, investors require a higher (lower) return and as a result the
It can be seen that the implied distribution has a heavier left tail
stock price declines (increases). A further explanation is crashop-
and a less heavy right tail than the lognormal distribution.
hobia (see Business Snapshot 15.2).
To see that Figures 15.3 and 15.4 are consistent with each other,
W hatever the reason for the negative correlation, it means
we proceed as for Figures 15.1 and 15.2 and consider options
that stock price declines are accom panied by increases in
that are deep out of the money. From Figure 15.4, a deep-out-
volatility, making even greater declines possible. Stock price
of-the-money call with a strike price of K2 (K2/S0 well above 1.0)
increases are accom panied by decreases in volatility, m ak
has a lower price when the implied distribution is used than
ing further stock price increases less likely. This explains the
when the lognormal distribution is used. This is because the
heavy left tail and thin right tail of the im plied distribution in
option pays off only if the stock price proves to be above K2,
Figure 15.4.
and the probability of this is lower for the implied probability
distribution than for the lognormal distribution. Therefore, we
exp ect the implied distribution to give a relatively low price for
the option. A relatively low price leads to a relatively low implied
BUSINESS
volatility— and this is exactly what we observe in Figure 15.3 for
SNAPSHOT 15.2 C r a s h o p h o b ia
the option. Consider next a deep-out-of-the-money put option
with a strike price of K f. This option pays off only if the stock It is interesting that the pattern in Figure 15.3 for equities
price proves to be below /C| (/C|/So well below 1.0). Figure 15.4 has existed only since the stock m arket crash of O ctober
shows that the probability of this is higher for the implied prob 1987. Prior to O ctober 1987, implied volatilities were
ability distribution than for the lognormal distribution. We much less dependent on strike price. This has led Mark
therefore exp ect the implied distribution to give a relatively Rubinstein to suggest that one reason for the equity vola
high price, and a relatively high implied volatility, for this option. tility smile may be "crashophobia." Traders are concerned
Again, this is exactly what we observe in Figure 15.3. about the possibility of another crash similar to O ctober
1987, and they price options accordingly.
There is some em pirical support for this explanation.

The Reason for the Smile in Equity Options
Declines in the S&P 500 tend to be accom panied by a
There is a negative correlation between equity prices and steepening of the volatility skew. W hen the S&P increases,
volatility. As prices move down (up), volatilities tend to move the skew tends to becom e less steep.
up (down). There are several possible reasons for this. One

15.4 ALTERN ATIVE W AYS O F Volatility surfaces com bine volatility smiles with the volatility
term structure to tabulate the volatilities appropriate for pricing
CH A R A CTER IZIN G TH E VO LATILITY an option with any strike price and any maturity. An exam ple
SM ILE of a volatility surface that might be used for foreign currency
options is given in Table 15.2.
There are a number of ways of characterizing the volatility smile.
Som etim es it is shown as the relationship between implied vola O ne dimension of Table 15.2 is K/S q ; the other is tim e to matu
tility and strike price K. However, this relationship depends on rity. The main body of the table shows implied volatilities cal
the price of the asset. A s the price of the asset increases culated from the Black-Scholes-M erton m odel. A t any given
(decreases), the central at-the-money strike price increases tim e, some of the entries in the table are likely to correspond to
(decreases) so that the curve relating the implied volatility to the options for which reliable m arket data are available. The implied
strike price moves to the right (left).3 For this reason the implied volatilities for these options are calculated directly from their
volatility is often plotted as a function of the strike price divided m arket prices and entered into the table. The rest of the table
by the current asset price, K/S q . This is what we have done in is typically determ ined using interpolation. The table shows
Figures 15.1 and 15.3. that the volatility smile becom es less pronounced as the option
maturity increases. A s m entioned earlier, this is what is observed
A refinem ent of this is to calculate the volatility smile as the
for currency options. (It is also what is observed for options on
relationship between the implied volatility and K/F0, where F0
most other assets.)
is the forward price of the asset for a contract maturing at the
same tim e as the options that are considered. Traders also often W hen a new option has to be valued, financial engineers look
define an "at-the-m oney" option as an option where K = F q, up the appropriate volatility in the table. For exam ple, when
not as an option where K = S0. The argum ent for this is that F0, valuing a 9-month option with a K/S0 ratio of 1.05, a financial
not S q, is the expected stock price on the option's maturity date engineer would interpolate between 13.4 and 14.0 in Table 15.2
in a risk-neutral world. to obtain a volatility of 13.7% . This is the volatility that would be
used in the Black-Scholes-M erton formula or a binomial tree.
Yet another approach to defining the volatility smile is as the W hen valuing a 1.5-year option with a K/S q ratio of 0.925, a two-
relationship between the implied volatility and the delta of the
dimensional (bilinear) interpolation would be used to give an
option. This approach som etim es makes it possible to apply vol
implied volatility of 14.525% .
atility smiles to options other than European and Am erican calls
and puts. W hen the approach is used, an at-the-money option The shape of the volatility smile depends on the option matu
is then defined as a call option with a delta of 0.5 or a put rity. As illustrated in Table 15.2, the smile tends to becom e less
option with a delta of —0.5. These are referred to as "50-delta pronounced as the option maturity increases. Define T as the
options." tim e to maturity and F0 as the forward price of the asset for a
contract maturing at the same tim e as the option. Some finan
cial engineers choose to define the volatility smile as the rela
15.5 TH E VO LATILITY TERM tionship between implied volatility and
STRU CTU RE AN D VO LATILITY

SU R FA C ES
Traders allow the implied volatility to depend on tim e to matu Table 1 5.2 Volatility Surface
rity as well as strike price. Implied volatility tends to be an
K /S0
increasing function of maturity when short-dated volatilities are
historically low. This is because there is then an expectation 0.90 0.95 1.00 1.05 1.10
that volatilities will increase. Similarly, volatility tends to be a
1 Month 14.2 13.0 12.0 13.1 14.5
decreasing function of maturity when short-dated volatilities are
historically high. This is because there is then an expectation 3 Month 14.0 13.0 12.0 13.1 14.2
that volatilities will decrease. 6 Month 14.1 13.3 12.5 13.4 14.3
1 Year 14.7 14.0 13.5 14.0 14.8
3 Research by Derman suggests that this adjustment is sometimes 2 Year 15.0 14.4 14.0 14.5 15.1
"sticky" in the case of exchange-traded options. See E. Derman,
5 Year 14.8 14.6 14.4 14.7 15.0
"Regimes of Volatility," Risk, April 1999: 55-59.
rather than as the relationship between the implied volatility and 15.7 TH E R O LE O F TH E M O D EL
K. The smile is then usually much less dependent on the tim e to
maturity. How im portant is the option-pricing model if traders are pre
pared to use a different volatility for every option? It can be
argued that the Black-Scholes-M erton model is no more than
15.6 MINIMUM V A R IA N CE DELTA a sophisticated interpolation tool used by traders for ensuring
that an option is priced consistently with the m arket prices of
The form ulas for delta and other G reek letters assume that other actively traded options. If traders stopped using Black-
the implied volatility remains the same when the asset price Scholes-M erton and switched to another plausible m odel, then
changes. This is not what is usually expected to happen. the volatility surface and the shape of the smile would change,
Consider, for exam ple, a stock or stock index option. The volatil but arguably the dollar prices quoted in the m arket would not
ity smile has the shape shown in Figure 15.3. Two phenomena change appreciably. G reek letters and therefore hedging strat
can be identified: egies do depend on the model used. An unrealistic model is
1. As the equity price increases (decreases), K/S0 decreases liable to lead to poor hedging.
(increases) and the volatility increases (decreases). In other Models have most effect on the pricing of derivatives when sim i
words, the option moves up the curve in Figure 15.3 when lar derivatives do not trade actively in the m arket. For exam ple,
the equity price increases and down the curve when the the pricing of many of the nonstandard exotic derivatives is
equity price decreases. m odel-dependent.
2. There is a negative correlation between equity prices and
their volatilities. W hen the equity price increases, the whole
15.8 W H EN A SIN G LE LA R G E JU M P
curve in Figure 15.3 tends to move down; when the equity
price decreases, the whole curve in Figure 15.3 tends to
IS AN TICIPATED
move up.
Let us now consider an exam ple of how an unusual volatility
It turns out that the second effect dom inates the first, so that smile might arise in equity m arkets. Suppose that a stock price is
implied volatilities tend to move down (up) when the equity currently USD 50 and an im portant news announcem ent due in a
price moves up (down). The delta that takes this relationship few days is expected either to increase the stock price by USD 8
between implied volatilities and equity prices into account is or to reduce it by USD 8. (This announcem ent could concern the
referred to as the minimum variance delta. It is: outcom e of a takeover attem pt or the verdict in an im portant
lawsuit.) The probability distribution of the stock price in, say,
_ dfeSM dfesM fjE(<Tmp)
1 month might then consist of a mixture of two lognormal distri
MV dS d<Xjm p c)S
butions, the first corresponding to favorable news, the second
where fBSM is the Black-Scholes-M erton price of the option, crimp to unfavorable news. The situation is illustrated in Figure 15.5.
is the option's implied volatility, £ (ojmp) denotes the expectation The solid line shows the mixture-of-lognormals distribution for
of (7jmp as a function of the equity price, S. This gives the stock price in 1 month; the dashed line shows a lognormal
distribution with the same mean and standard deviation as this
A A d£(o"imp) distribution.
A mv = A bs m + v Bs m ^
The true probability distribution is bimodal (certainly not log
where A BSM and v q sm are the delta and vega calculated from the normal). O ne easy way to investigate the general effect of a
Black-Scholes-M erton (constant volatility) m odel. Because v q sm bimodal stock price distribution is to consider the extrem e case
is positive and, as we have just explained f)E (crimp)/dS is negative, where there are only two possible future stock prices. This is
the minimum variance delta is less than the Black-Scholes-M er- what we will now do.
ton d elta.4*1
Suppose that the stock price is currently USD 50 and that it is
known that in 1 month it will be either USD 42 or USD 58. Sup
pose further that the risk-free rate is 12% per annum. The situa
tion is illustrated in Figure 15.6. O ptions can be valued using the
binomial model. In this case u = 1.16, d = 0.84, a = 1.0101,
4 For a further discussion of this, see, for example, J. C. Hull and A.
White, "Optimal Delta Hedging of Options," Working paper, University and p = 0.5314. The results from valuing a range of differ
of Toronto, 2016. ent options are shown in Table 15.3. The first column shows

line is the true distribution; the dashed line is the
lognormal distribution.
European put option is the same as that of a European call

option when they have the same strike price and maturity.) Fig
ure 15.7 displays the volatility smile from Table 15.3. It is actually
a "frow n" (the opposite of that observed for currencies) with
volatilities declining as we move out of or into the money. The
volatility implied from an option with a strike price of 50 will
overprice an option with a strike price of 44 or 56.
Figure 15.6 Change in stock price in 1 month.
SUM M ARY
Table 15.3 Implied Volatilities in Situation Where it is
Known that the Stock Price will Move from USD 50 to The B lack-Sch o les-M erto n model and its extensions assume
Either USD 42 or USD 58 that the probability distribution of the underlying asset at
any given future tim e is lognorm al. This assum ption is not the
Strike Call Put Implied
one made by trad ers. They assum e the probability distribu
Price (USD) Price (USD) Price (USD) Volatility (%)
tion of an equity price has a heavier left tail and a less heavy
42 8.42 0.00 0.0 right tail than the lognorm al distribution. They also assum e
that the probability distribution of an exchange rate has a
44 7.37 0.93 58.8
heavier right tail and a heavier left tail than the lognormal
46 6.31 1.86 66.6
distribution.
48 5.26 2.78 69.5
Traders use volatility smiles to allow for nonlognorm ality. The
50 4.21 3.71 69.2 volatility smile defines the relationship between the implied
52 3.16 4.64 66.1 volatility of an option and its strike price. For equity options,
the volatility smile tends to be downward sloping. This means
54 2.10 5.57 60.0
that out-of-the-money puts and in-the-money calls tend to have
56 1.05 6.50 49.0 high implied volatilities whereas out-of-the-money calls and in-
58 0.00 7.42 0.0 the-m oney puts tend to have low implied volatilities. For foreign
currency options, the volatility smile is U-shaped. Both out-of-
the-m oney and in-the-money options have higher implied vola
tilities than at-the-money options.
alternative strike prices; the second column shows prices of
1-month European call options; the third column shows the O ften traders also use a volatility term structure. The implied
prices of one-month European put option prices; the fourth volatility of an option then depends on its life. W hen volatility
column shows implied volatilities. (The implied volatility of a smiles and volatility term structures are com bined, they produce
a volatility surface. This defines implied volatility as a function of
both the strike price and the tim e to maturity.
A P P EN D IX
Determining Implied Risk-Neutral

Distributions From Volatility Smiles Fig u re 15A .1 Payoff from butterfly spread.
The price of a European call option on an asset with strike price
0.5 X 2 8 X 8 = 82.The value of the payoff (when 8 is small) is
K and maturity T is given by
/»oc therefore e~rTg (K )S 2. It follows that
(ST ~ K )g (S T)d S T e~rTg(K)82 = q + c3 - 2c 2

ST= K
which leads directly to
where r is the interest rate (assumed constant), S T is the asset
price at tim e T, and g is the risk-neutral probability density func T C i + c3 - 2 c 2
g(K) = erT J -----p ------ 1 (15A.2)
tion of S j. Differentiating once with respect to K gives
I= - I s r J i S T ) d S T
Example 15A.1
Differentiating again with respect to K gives
Suppose that the price of a non-dividend-paying stock is USD
d2c 10, the risk-free interest rate is 3%, and the implied volatilities
= e~rTg(K)
dK2 of 3-month European options with strike prices of USD 6 , USD
7, USD 8 , USD 9, USD 10, USD 11, USD 12, USD 13, USD 14 are
This shows that the probability density function g is given by
30% , 29% , 28% , 27% , 26% , 25% , 24% , 23% , 22% , respectively.
______rT d2c O ne way of applying the above results is as follows. Assum e that
g(K) = e (15A.1)
dK2 g {S T) is constant between S T = 6 and S T = 7, constant between
This result, which is from Breeden and Litzenberger (1978), S T = 7 and S T = 8 , and so on. Define:
allows risk-neutral probability distributions to be estim ated from g ( s T) = g^ for 6 < Sr < 7
volatility sm iles .5 Suppose that c1f C2, and C3 are the prices of for 7 < St < 8
g ( s T ) = 92
T-year European call options with strike prices of K — 8, K, and
) = 93 for 8 < St < 9
K + 8, respectively. Assum ing 8 is small, an estim ate of g(K),
g ( s T
obtained by approxim ating the partial derivative in g ( S T ) = 94 for 9 < S t < 10

equation (15A.1), is g (S T) = 95 for 10 < S T < 11
c-| + C3 — 2 c 2 g (S T) = 96 for 11 < S T < 12
52
g (S T) = 97 for 12 < S T < 13
For another way of understanding this form ula, suppose you for 13 < S T < 14
g (S T) = 98
set up a butterfly spread with strike prices K — 8, K, and K + 8,
can be calculated by interpolating to get the
and maturity T. This means that you buy a call with strike price
implied volatility for a 3-month option with a strike price of
K — 8, buy a call with strike price K + 8, and sell two calls with
USD 6.5 as 29.5% . This means that options with strike prices
strike price K. The value of your position is c-\ + C3 — 2c2. The
of USD 6 , USD 6.5, and USD 7 have implied volatilities of 30%,
value of the position can also be calculated by integrating
29.5% , and 29% , respectively. From DerivaGem their prices
the payoff over the risk-neutral probability distribution, g (Sj),
are USD 4.045, USD 3.549, and USD 3.055, respectively. Using
and discounting at the risk-free rate. The payoff is shown in
equation (15A.2), with K = 6.5 and 8 = 0.5, gives
Figure 15 A .1 . Since 8 is small, we can assume that g (S T) = g(K)
eo.03xo.25(4 045 + 3.055 - 2 X 3.549)
in the whole of the range K — 8 < S j < K + 8, where the pay g-i = ------------------------- 7 = 0.0057
off is nonzero. The area under the "sp ike" in Figure 15A.1 is 0 .5 2
Similar calculations show that
g 2 = 0.0444, g 3 = 0.1545, g4 = 0.2781
5 See D. T. Breeden and R. H. Litzenberger, "Prices of State-Contingent
Claims Implicit in Option Prices," Journal o f Business, 51 (1978), 621-51. g 5 = 0.2813, g 6 = 0.1659, g 7 = 0.0573, gs = 0.0113

0.3 n
Probability Figure 15A.2 displays the implied distribution. (Note that the
r
0.25- area under the probability distribution is 0.9985. The probabil
ity that S T < 6 or S T > 14 is therefore 0.0015.) Although not
obvious from Figure 15A.2, the implied distribution does have
0.15-
a heavier left tail and less heavy right tail than a lognormal distri
0.1 - bution. For the lognormal distribution based on a single volatil
0.05 ity of 26% , the probability of a stock price between USD 6 and
USD 7 is 0.0031 (compared with 0.0057 in Figure 15A.2) and
0 t i ---------------- 1----------------1---------------- 1----------------1---------------- 1----------------1 I----------------1
5 6 7 8 9 10 11 12 13 14 15 the probability of a stock price between USD 13 and USD 14 is
Stock price 0.0167 (compared with 0.0113 in Figure 15A.2).
Fig u re 1 5 A .2 Implied probability distribution for
Example 15A.1.
Fundamental
Review of the
Trading Book
Learning Objectives
Describe the changes to the Basel fram ework for calculat Explain the FRTB revisions to Basel regulations in the
ing m arket risk capital under the Fundam ental Review of following areas:
the Trading Book (FRTB), and the motivations for these ■ Classification of positions in the trading book
changes. com pared to the banking book
Backtesting, profit
Com pare the various liquidity horizons proposed by the
FRTB for different asset classes and explain how a bank
can calculate its expected shortfall using the various
horizons.
E x c e rp t is C hapter 18 o f Risk M anagem ent and Financial Institutions, Fifth Edition, b y Jo h n C. Hull.
N o te: This ch apter referen ces the D ece m b er 2014 p ro p o sa l for the Fundam ental R eview o f the Trading Book. The final version was
p u b lish ed under the title "M inim um capital requirem ents for m arket risk" (Basel C om m ittee on Banking Supervision Publication 352,
January 2016). It is freely available on the G A R P w ebsite.
203
In May 2012, the Basel Com m ittee on Banking Supervision shortfall is /x + 2.338o\4 For non-normal distributions, they are
issued a consultative docum ent proposing major revisions to the not equivalent. W hen the loss distribution has a heavier tail than
way regulatory capital for m arket risk is calculated. This is a normal distribution, the 97.5% ES can be considerably greater
referred to as the "Fundam ental Review of the Trading Book" than the 99% VaR.
A
(FRTB). The Basel Com m ittee then followed its usual process of
Under FR TB , the 10-day tim e horizon used in Basel I and Basel
requesting com ments from banks, revising the proposals, and
II.5 is changed to reflect the liquidity of the market variable
carrying out Q uantitative Impact Studies (QISs). The final ver
being considered. FRTB considers changes to m arket vari
sion of the rules was published by the Basel Com m ittee in Janu-
ables that would take place (in stressed m arket conditions)
ary 2016. This requires banks to im plem ent the new rules in
over periods of tim e reflecting their liquidity. The changes are
2019, but in D ecem ber 2017, the im plem ented year was revised
referred to as shocks. The m arket variables are referred to as risk
to 2022.
factors. The periods of tim e considered are referred to as liquid
FRTB's approach to determ ining capital for m arket risk is much ity horizons. Five different liquidity horizons are specified: 10
more com plex than the approaches previously used by regula days, 20 days, 40 days, 60 days, and 120 days. The allocation of
tors. The purpose of this chapter is to outline its main features. risk factors to these liquidity horizons is indicated in Table 16.1.
FRTB specifies both a standardized approach and an internal

models approach for calculating market risk capital. Even when
16.1 B A C K G R O U N D banks have been approved to use the internal models approach,
they are required by regulators to calculate required capital under
The Basel I calculations of m arket risk capital were based on a
both approaches. This is consistent with the Basel Committee's
value at risk (VaR) calculated for a 10-day horizon with a 99%
plans to use standardized approaches to provide a floor for capi
confidence level. The VaR was "current" in the sense that cal
tal requirements. As discussed in section "Use of Standardized
culations made on a particular day were based on the behavior
Approaches and SA -C C R," in Decem ber 2017, the Basel Com m it
of m arket variables during an im m ediately preceding period
tee announced a move to a situation where total required capital
of tim e (typically, one to four years). Basel II.5 required banks
is at least 72.5% of that given by standardized approaches. It
to calculate a "stressed VaR" measure in addition to the cur
will achieve this by 2027 with a five-year phase-in period. These
rent m easure. This is VaR where calculations are based on the
changes are a culmination of a trend by the Basel Committee since
behavior of market variables during a 250-day period of stressed
the 2008 crisis to place less reliance on internal models and to use
market conditions. To determ ine the stressed period, banks
standardized models to provide a floor for capital requirements.
were required to go back through tim e searching for a 250-day
period where the observed m ovem ents in m arket variables A difference between FRTB and previous market risk regulatory
would lead to significant financial stress for the current portfolio. requirem ents is that most calculations are carried out at the
trading desk level. Furtherm ore, permission to use the internal
FRTB changes the measure used for determ ining m arket risk
models approach is granted on a desk-by-desk basis. Therefore
capital. Instead of VaR with a 99% confidence level, it uses
it is possible that, at a particular point in tim e, a bank's foreign
expected shortfall (ES) with a 97.5% confidence level. The m ea
currency trading desk has permission to use the internal models
sure is actually stressed ES with a 97.5% confidence. This means
approach while the equity trading desk does not.
that, as in the case of stressed VaR, calculations are based on
the way m arket variables have been observed to move during W e saw how the ways in which capital is calculated for the
stressed m arket conditions. trading book and the banking book are quite different. This
potentially gives rise to regulatory arbitrage where banks
For normal distributions, VaR with a 99% confidence and ES with
choose to allocate instruments to either the trading book or the
a 97.5% confidence are alm ost exactly the sam e. Suppose losses
banking book so as to minimize capital. In Basel II.5, the incre
have a normal distribution with a mean /jl and standard deviation
mental risk charge made this less attractive. FRTB counteracts
a . The 99% VaR is /x + 2.326a while the 97.5% expected
regulatory arbitrage by defining more clearly than previously the
differences between the two books.
1 See Bank for International Settlements, "Consultative Document: Fun
damental Review of the Trading Book," May 2012.
4 The ES for a normal distribution with mean /x and standard deviation a
2 QISs are calculations carried out by banks to estimate the impact of
is /x + a e x p (-Y 2/2)/[V27r(1 — X)] where X is the confidence level and
proposed regulatory changes on capital requirements.
Y is the point on a normal distribution that has a probability of 1 — X of
3 See Bank for International Settlements, "Minimum Capital Require being exceeded. This can also be written /x + o-2f(VaR)/(1 - X) where f
ments for Market Risk," January 2016. is the probability density function for the loss.
Table 16.1 Allocation of Market Variables to The delta risk charge for a risk class is calculated using the risk
Liquidity Horizons weights and weighted sensitivity approach:
R isk F a cto r H orizon (D ays)

Risk Charge = 2 S p <MW 'W j (16.1)
i i
• •
Interest rate (dependent on currency) 10-60

where the summations are taken over all risk factors in the risk
Interest rate volatility 60
class. The risk weights W) and the correlations between risk fac
Credit spread: sovereign, investm ent grade 20
tors, pjj, are determined by the Basel C om m ittee.5 The weighted
Credit spread: sovereign, non-investment 40 sensitivities (or deltas), <5„ are determined by the bank. In the case
grade of risk factors such as equity prices, exchange rates, or com m od
Credit spread: corporate, investm ent 40 ity prices, the deltas measure the sensitivity of the portfolio to
grade percentage changes. For exam ple, if a 1% increase in a com m od
Credit spread: corporate, non-investment 60 ity price would increase the value of a portfolio by USD 3,000, the
grade delta would be 3,000/0.01 = 300,000. In the case of risk factors
such as interest rates and credit spreads, the deltas are defined in
Credit spread: other 120
term s of absolute changes. For exam ple, if the effect of an inter
Credit spread volatility 120
est rate increasing by one basis point (0.0001) is to reduce the
Equity price: large cap 10 value of a portfolio by USD 200, the delta with respect to that
Equity price: small cap 20 interest rate would be —200/0.0001 = —2,000,000.
Equity price: large cap volatility 20 Consider how the risk w eights, Wj might be set by regulators.
Suppose first that all risk factors are equity prices, exchange
Equity price: small cap volatility 60
rates, or com m odity prices, so the deltas are sensitivities to per
Equity: other 60 centage changes. If W, were set equal to the daily volatility of
Foreign exchange rate (dependent on 10-40 risk factor / for all /, the risk charge in Equation 16.1 would equal
currency) the standard deviation of change in the value of the portfolio
Foreign exchange volatility 40 per day. If W (- were set equal to the daily volatility of risk factor
/ in stressed m arket conditions (the stressed daily volatility) for
Energy price 20
all /, Equation 16.1 would give the standard deviation of the
Precious metal price 20 daily change of the portfolio in stressed m arket conditions. In
O ther com m odities price 60 practice, the W, are set equal to multiples of the stressed daily
volatility to reflect the liquidity horizon and the confidence level
Energy price volatility 60
that regulators wish to consider. Suppose that the stressed daily
Precious metal volatility 60
volatility of risk factor / is estim ated as 2% and that the risk fac
O ther com m odities price volatility 120 tor has a 20-day liquidity horizon. The risk w eight might be set
Com m odity (other) 120 as 0.02 X a/ 20 X 2.338 = 0.209. (Note that the 2.338 multi
plier reflects the amount by which a standard deviation has to
be multiplied to get ES with a 97.5% confidence when a normal
distribution is assum ed.)
16.2 STAN D ARD IZED A PPR O A CH Now suppose that the risk factors are interest rates and credit
spreads so that deltas are sensitivities with respect to actual
Under the standardized approach, the capital requirem ent is the changes measured in basis points. The W(-for risk factor / is set
sum of three com ponents: a risk charge calculated using a risk equal to a multiple of the stressed daily standard deviation for
sensitivity approach, a default risk charge, and a dual residual all /. If the multiple were 1, the formula would give the standard
risk add-on. deviation of the value of the portfolio in one day. In practice the
multiple is determ ined as just described to reflect the liquidity
Consider the first com ponent. Seven risk classes (corresponding
horizon and confidence level.
to trading desks) are defined (general interest rate risk, foreign
exchange risk, com m odity risk, equity risk, and three categories
5 Banks are required to test the effect of multiplying the correlations
of credit spread risk). Within each risk class, a delta risk charge, specified by the Basel Committee by 1.25, 1.00, and 0.75 and then set
vega risk charge, and curvature risk charge are calculated. the capital charge equal to the greatest result obtained.
Chapter 16 Fundamental Review of the Trading Book ■ 205

Vega risk is handled similarly to delta risk. A vega risk charge is W , 8r Sim ilarly, the im pact of a decrease of W(- in the risk factor
calculated for each rick class using Equation 16.1. The risk fac is —8, Wj. To evaluate the im pact of curvature net of the delta
tors (counted by the / and j) are now volatilities. The summation effect, the standardized approach therefore calculates
is taken over all volatilities in the risk class. The param eter <5, is
1. Wj8j minus the im pact of a increase of W, in the risk factor,
actually a vega. It is the sensitivity of the value of the portfolio to
and
small changes in volatility i6. The param eter ptj is the correlation
2 . —Wj8j minus the im pact of a decrease in the risk factor
between changes in volatility / and volatility j, and Wj is the risk
w eight for volatility /. The latter is determ ined similarly to the of Wj.
delta risk weights to reflect the volatility of the volatility /, its The curvature risk charge for the risk factor is the greater of
liquidity horizon, and the confidence level. these tw o. If the im pact of curvature net of delta is negative, it is
counted as zero. The calculation is illustrated in Figure 16.1.
There are assumed to be no diversification benefits between risk
factors in different risk classes and between the vega risks and In Figure 16.1a, the portfolio value is currently given by point O .
If there were no curvature, an increase of W(- in the risk factor
delta risks within a risk class. The end product of the calculations
we have described so far is therefore the sum of the delta risk would lead to the portfolio value at point C , whereas a decrease
of Wj in the risk factor would lead to the portfolio value at point
charges across the seven risk classes plus the sum of the vega
risk charges across the seven risk classes. A . Because of curvature, an increase of W, leads to the portfolio
value at point D, and a decrease of W, leads to the portfolio
value at point B. Since A B > C D , the risk charge is A B . In
Term Structures Figure 16.1b, the risk charge is zero because curvature actually
increases the value of the position (relative to what delta would
In the case of risk factors such as interest rates, volatilities, and
suggest) for both increases and decreases in the risk factor.
credit spreads, there is usually a term structure defined by a
(Figure 16.1a could correspond to a short position in an option;
number of points. For exam ple, an interest rate term structure is
Figure 16.1b could correspond to a long position in an option.)
typically defined by 10 points. These are the zero-coupon inter
est rates tor maturities of 3 months, 6 months, 1 year, 2 years, W hen there are several risk factors, each is handled similarly to
3 years, 5 years, 10 years, 15 years, 20 years, and 30 years. Figure 16.1. W hen there is a term structure (e.g ., for interest
Each vertex of the term structure is a separate risk factor for the rates, credit spreads, and volatilities), all points are shifted by
purposes of using Equation 16.1. The delta of a portfolio with the same amount for the purpose of calculating the effect of
respect to a one basis point move in one of the vertices on the curvature. The shift is the largest W, for the points on the term
term structure is calculated by increasing the position of the structure. In the case of an interest rate term structure, the W)
vertex by one basis point while making no change to the other corresponding to the three-month vertex is often the largest W /
vertices. The Basel Com m ittee defines risk weights for each ver so this would define an upward and downward parallel shift in
te x of the term structure and correlations between the vertices the term structure. The delta effect is removed for each point on
of the same term structure. the term structure by using the 8, for that point.
A sim plification is used when correlations between points on The curvature risk charges for different risk factors are combined
different term structures are defined. The correlations between to determ ine a total curvature risk charge. W hen diversification
point A on term structure 1 and point B on term structure 2 are benefits are allow ed, aggregation form ulas broadly similar to
assumed to be the same for all A and B. those used for deltas are used with correlations specified by the
Basel Com m ittee.
Curvature Risk Charge

The curvature risk charge is a capital charge for a bank's gamma
risk exposure under the standardized approach. Consider the
exposure of a portfolio to the /th risk factor. Banks are required
to test the effect of increasing and decreasing the risk factor by
its risk weight, Wr If the portfolio is linearly dependent on the
risk factor, the im pact of an increase of Wj in the risk factor is
Fiq u re 16.1 Calculation of curvature risk charge for a
6 Banks can choose whether it is percentage or actual changes in volatil risk factor. In Figure 16.1a, the curvature risk charge is
ity that are considered. AB; in Figure 16.1b, it is zero.
Default Risk Charge simulation approach is likely to be used. Risk factors are allo
cated to liquidity horizons as indicated in Table 16.1. Define:
Risks associated with counterparty credit spread changes are
C a teg o ry 1 Risk Factors: Risk factors with a tim e horizon of
handled separately from risks associated with counterparty
10 days
defaults in FRTB. In the standardized approach, credit spread
risks are handled using the delta/vega/curvature approach C a teg o ry 2 Risk Factors: Risk factors with a tim e horizon of
described earlier. Default risks, som etim es referred to as jum p- 20 days
to-default (JTD ) risks, are handled by a separate default risk
C a teg o ry 3 Risk Factors: Risk factors with a tim e horizon of
charge. This is calculated by multiplying each exposure by a
40 days
loss given default (LGD) and a default risk weight. Both the LGD
and the risk w eight are specified by the Basel Com m ittee. For C a teg o ry 4 Risk Factors: Risk factors with a tim e horizon of
exam ple, the LG D for senior debt is specified as 75% and the 60 days
default risk for a counterparty rated A is 3%. Equity positions are C a teg o ry 5 Risk Factors: Risk factors with a tim e horizon of
subject to a default risk charge with an LGD = 100%. Rules for 120 days
offsetting exposures are specified.
As we shall see, all calculations are based on considering 10-day
changes in the risk factors. In Basel I and Basel II.5, banks are
Residual Risk Add-On allowed to deduce the impact of 10-day changes from the impact
of one-day changes using a V T o multiplier. In FRTB, banks
The residual risk add-on considers risks that cannot be handled are required to consider changes over periods of 10 days that
by the delta/vega/curvature approach described earlier. It occurred during a stressed period in the past. Econometricians
includes exotic options when they cannot be considered as naturally prefer that non-overlapping periods be used when VaR
linear com binations of plain vanilla options. The add-on is calcu or ES is being estimated using historical simulation, because they
lated by multiplying the notional amount of the transaction by a want observations on the losses to be independent. However,
risk w eight that is specified by the Basel Com m ittee. In the case this is not feasible when 10-day changes are considered, because
of exotic options the risk w eight is 1%. it would require a very long historical period. FRTB requires banks
to base their estim ates on overlapping 10-day periods. The first
simulation trial assumes that the percentage changes in all risk
A Simplified Approach factors over the next 10 days will be the same as their changes
In this section, we have described the standardized approach between Day 0 and Day 10 of the stressed period; the second
that the Basel Com m ittee requires all large banks to use. It is simulation trial assumes that the percentage changes in all risk
worth noting that in June 2017 the Basel Com m ittee published factors over the next 10 days will be the same as their changes
a consultative docum ent outlining a sim plified standardized between Day 1 and Day 11 of the stressed period; and so on.
approach that it proposes for sm aller banks.7 The full approach Banks are first required to calculate ES when 10-day changes are
is sim plified in a number of ways. For exam ple, vega and gamma made to all risk factors. (We will denote this by ES-|.) They are
risk do not have to be considered. This should make FRTB more then required to calculate ES when 10-day changes are made to
attractive to jurisdictions such as the United States that have all risk factors in categories 2 and above with risk factors in cat
many small banks that tend to enter into only relatively simple egory 1 being kept constant. (We will denote this by ES2.) They
transactions. are then required to calculate ES when 10-day changes are made
to all risk factors in categories 3, 4, and 5 with risk factors in cate
gories 1 and 2 being kept constant. (We will denote this by ES3).
16.3 IN TER N A L M O D ELS A PPRO A CH They are then required to calculate ES when 10-day changes are
made to all risk factors in categories 4 and 5 with risk factors in
The internal models approach requires banks to estim ate categories 1,2, and 3 being kept constant. (We will denote this
stressed ES with a 97.5% confidence. FRTB does not prescribe by ES4.) Finally, they are required to calculate E S 5, which is the
a particular method for doing this. Typically the historical effect of making 10-day changes only to category 5 risk factors.
The liquidity-adjusted ES is calculated as
7 See Basel Committee on Banking Supervision, "Simplified Alternative

to the Standardized Approach to Market Risk Capital Requirements,"
June 2017.
^ + ( 1 6 -2 )

where LHy is the liquidity horizon for category j. To understand are used. The expected shortfalls that are calculated are scaled
Equation 16.2, suppose first that all risk factors are in category up by the ratio of ES for the most recent 12 months using all risk
1 or 2 so that only ES-] and ES 2 are calculated.lt is assumed factors to ES for the most recent 12 months using the subset of
that the behavior of all risk factors during a 10-day period is risk factors. (This potentially doubles the number of ES calcula
independent of the behavior of category 2 risk factors during a tions from 30 to 60.)
further 10-day period. An extension of the square root rule then
Banks are required to calculate ES for the whole portfolio as
leads to the liquidity-adjusted ES being
well for each of six trading desks. The ES for a trading desk is
V E S fT E S f referred to as a partial e x p e c te d shortfall. It is determ ined by
shocking the risk factors belonging to the trading desk while
Now suppose that there are also category 3 risk factors. The
keeping all other risk factors fixed. The sum of the partial
expression V i s f T ES^ would be correct if the category 3 risk
expected shortfalls is always greater than the ES for the whole
factors had a 20-day instead of a 40-day liquidity horizon. We
portfolio. W hat we will refer to as the weighted expected short
assume that the behavior of the category 3 risk factors over an
fall (W ES) is a weighted average of (a) the ES for the whole
additional 20 days is independent of the behavior of all the risk
portfolio and (b) the sum of the partial expected shortfalls.
factors over the periods already considered. We also assume that
Specifically:
the ES for the category 3 risk factors over 20 days is V 2 times
their ES over 10 days. This leads to a liquidity-adjusted ES of: W ES = A X EST + (1 - A) X £ ESP,
j
V es ^+ es 2 + 2ES2
3
where ES T is the expected shortfall calculated for the total port
Continuing in this way, we obtain Equation 16.2. This is referred folio and ESP; is jth partial expected shortfall. The param eter A is
to as the cascade approach to calculating ES (and can be used set by the Basel Com m ittee to be 0.5.
for VaR as well).
Some risk factors are categorized as non-m odelable. Specifi
Calculations are carried out for each desk. If there are six desks, cally, if there are less than 24 observations on a risk factor in a
this means the internal models approach, as we have described year or more than one month between successive observations,
it so far, requires 5 X 6 = 30 ES calculations. As m entioned, the risk factor is classified as non-m odelable. Such risk factors
the use of overlapping tim e periods is less than ideal because are handled by special rules involving stress tests.
changes in successive historical simulation trials are not indepen
The total capital requirem ent for day t is
dent. This does not bias the results, but it reduces the effective
sam ple size, making results more noisy than they would other m ax (W ESt_-| + N M C t_ 1f mc X W E S avg + N M C avg)
wise be. where W E S t_-| is the W ES for dayt_ i , N M C t_-| is the capi
FRTB represents a m ovem ent away from basing calculations on tal charge calculated for non-modelable risk factors on
one-day changes. Presum ably the Basel Com m ittee has decided dayt_ i , W E S avg is the average W ES for the previous 60 days, and
that, in spite of the lack of independence of observations, a N M C avg is the average capital charge calculated for the non-
measure calculated from 10-day changes provides more relevant m odelable risk factors over the previous 60 days. The param eter
information than a measure calculated from one-day changes. mc is at minimum 1.5.
This could be the case if changes on successive days are not
independent, but changes in successive 10-day periods can rea
Back-Testing
sonably be assumed to be independent.
FRTB does not back-test the stressed ES measures that are
The calculation of a stressed measure (VaR or ES) requires
used to calculate capital under the internal m odels approach for
banks to search for the period in the past when m arket variable
two reasons. First, it is more difficult to back-test ES than VaR.
changes would be worst for their current portfolio. (The search
Second, it is not possible to back-test a stressed measure at all.
must go back as far as 2007.) W hen Basel II.5 was im plem ented,
The stressed data upon which a stressed measure is based are
a problem was encountered in that banks found that historical
extrem e data that statistically speaking are not expected to be
data were not available for some of their current risk factors. It
observed with the same frequency in the future as they were
was therefore not possible to know how these risk factors would
during the stressed period.
have behaved during the 250-day periods in the past that were
candidates for the reference stressed period. FRTB handles this FRTB back-tests a bank's models by asking each trading desk
by allowing the search for stressed periods to involve a subset of to back-test a VaR measure calculated over a one-day horizon
risk factors, provided that at least 75% of the current risk factors and the most recent 12 months of data. Both 99% and 97.5%
confidence levels are to be used. If there are more than 12 is 120 days. The jum p-to-default risk is handled in the same
exceptions for the 99% VaR or more than 30 exceptions for the way as default risks in the banking book. In the internal models
97.5% VaR, the trading desk is required to calculate capital using approach, the capital charge is based on a VaR calculation with a
the standardized approach until neither of these two conditions one-year tim e horizon and a 99.9% confidence level.
continues to exist.
Banks may be asked by regulators to carry out other back-tests. Securitizations

Some of these could involve calculating the p-value of the profit
or loss on each day. This is the probability of observing a profit The com prehensive risk measure (CRM) charge was introduced
that is less than the actual profit or a loss that is greater than in Basel II.5 to cover the risks in products created by securi
the actual loss. If the model is working perfectly, the p-values tizations such as asset-backed securities and collateralized
obtained should be uniformly distributed. debt obligations. The CRM rules allow a bank (with regulatory
approval) to use its own m odels. The Basel Com m ittee has
concluded that this is unsatisfactory because there is too much
Profit and Loss Attribution variation in the capital charges calculated by different banks for
the same portfolio. It has therefore decided that under FRTB the
Another test used by the regulators is known as p ro fit and loss
standardized approach must be used for securitizations.
attribution. Banks are required to com pare the actual profit or
loss in a day with that predicted by their m odels. Two measures
must be calculated. The measures are:
16.4 TRAD IN G B O O K V S. BA N KIN G
Mean of U
BOOK
Standard Deviation of V
Variance of U The FRTB addresses whether instruments should be put in the

Variance of V trading book or the banking book. Roughly speaking, the trad
where U denotes the difference between the actual and model ing book consists of instruments that the bank intends to trade.
profit/loss in a day and V denotes the actual profit/loss in a d ay.8 The banking book consists of instruments that are expected to
Regulators exp ect the first measure to be between and —10% be held to maturity. Instruments in the banking book are subject
and +10% and the second measure to be less than 20%. When to credit risk capital whereas those in the trading book are sub
there are four or more situations in a 12-month period where ject to m arket risk capital. The two sorts of capital are calculated
the ratios are outside these ranges, the desk must use the stan in quite different ways. This has in the past given rise to regula
dardized approach for determ ining capital. tory arbitrage. For exam ple, banks have often chosen to hold
credit-dependent instruments in the trading book because they
are then subject to less regulatory capital than they would be if
Credit Risk they had been placed in the banking book.
As m entioned, FRTB distinguishes two types of credit risk exp o The FRTB attempts to make the distinction between the trading
sure to a com pany: book and the banking book clearer and less subjective. To be
in the trading book, it will no longer be sufficient for a bank to
1. C red it sp rea d risk is the risk that the company's credit
have an "intent to trad e." It must be able to trade and manage
spread will change, causing the mark-to-market value of the
the underlying risks on a trading desk. The day-to-day changes in
instrument to change.
value should affect equity and pose risks to solvency. The FRTB
2. Jum p-to-default risk is the risk that there will be a default by
provides rules for determining for different types of instruments
the com pany.
whether they should be placed in the trading book or the
Under the internal m odels approach, the credit spread risk is banking book.
handled in a similar way to other market risks. Table 16.1 shows
An im portant point is that instruments are assigned to the bank
that the liquidity horizon for credit spread varies from 20 to
ing book or the trading book when they are initiated and there
120 days and the liquidity horizon for a credit spread volatility
are strict rules preventing them from being subsequently moved
between the two books. Transfers from one book to another can
happen only in extraordinary circum stances. (Exam ples given of
8 The "actual" profit/loss should be the profit and loss that would occur
if there had been no trading in a day. This is sometimes referred to as extraordinary circum stances are the closing of trading desks and
the hypothetical profit and loss. a change in accounting standards with regard to the recognition

of fair value.) Any capital benefit as a result of moving items The Basel Com m ittee has specified a standardized approach
between the books will be disallow ed. and an internal models approach. Even when they have
been approved by their supervisors to use the internal m od
els approach, banks must also im plem ent the standardized
SUM M ARY approach. Regulator capital under the standardized approach
is based on form ulas involving the delta, vega, and gamma
FRTB is a major change to the way capital is calculated for mar exposures of the trading book. Regulatory capital under the
ket risk. A fter 20 years of using VaR with a 10-day time horizon internal models approach is based on the calculation of stressed
and 99% confidence to determ ine market risk capital, regulators expected shortfall. Calculations are carried out separately for
are switching to using ES with a 97.5% confidence level and each trading desk.
varying tim e horizons. The tim e horizons, which can be as high
as 120 days, are designed to incorporate liquidity considerations
into the capital calculations. The change that is considered to a Further Reading
risk factor when capital is calculated reflects movements in the
risk factor over a period of time equal to the liquidity horizon in Bank for International Settlem ents. "Minimum Capital Require
stressed m arket conditions. ments for M arket Risk," January 2016.
IN D EX
A extensions, 56
model verification based on failure rates, 51-54
actual return, 51
F R T B ,208-209
AGARCH model, 27
implementation, VaR, 77-78
age-weighted historical simulation, 24-25
no exceptions, 58
Aman Capital, 115
setup, 50-51
Anderson-Darling, 130
Baily Coates Cromwell, 115
Apollo Group (APOL), 119
banking book, 209-210
arbitrage-free models, 168
Barone-Adesi, G., 26, 27
arbitrage pricing
Basel Committee on Banking Supervision, 50, 54, 74, 81,
constant-maturity treasury swap, 161-162
87, 88, 90, 95, 204-210
of derivatives, 157-158
Basel I, 116, 204, 207
in multi-period setting, 159-161
Basel II, 87, 88, 92, 96, 116
ARCH model, see Autoregressive Conditional
Basel II.5, 204, 207-209
Heteroscedasticity (ARCH) model
Basel III, 116, 121
arithmetic returns
Basel rules, 51, 54-55, 62
market risk measurement, 2
basis-point volatility, 176
normally distributed, 5-6
BEKK model, 77
autocorrelation
benchmarking, 64-66
equity correlations, 129-130
binomial distribution, 52
Autoregressive Conditional Heteroscedasticity
Black-Karasinski model, 191
(ARCH) model, 129
Black-Scholes-Merton
AutoZone (AZO), 119
option model, 110, 194
average quantile approach, 18
pricing analysis, 164
average tail-VaR method, 7
Black-Scholes (BS) model, 70
average VaR algorithm, 9
BNP Paribas, 107, 108
Bollerslev, T., 129
B bootstrap, 31
backtesting model historical simulation, 19
applications, 56-57 and implementation, 31-33
with exceptions limitations of conventional sampling approaches, 31
Basel rules, 54-55 standard errors, 33-34
conditional coverage models, 55-56 time dependency and, 34
"bottom-up" approach, 91-93 trading and, 109-116, 121
Boudoukh, Richardson and Whitelaw (BRW) approach, 24, 25 volatility-asset return, 121
Bravais, A., 106 correlation swaps, 111
BRW approach, see Boudoukh, Richardson and Whitelaw correlation-weighted historical simulation, 26
(BRW) approach counterbalancing, 51
buying correlation, 111 counterparty credit risk (CCR), 83
covariance matrix, 109
C Cox-Ingersoll-Ross (CIR) model, 189-190, 192
credit correlation, 121
Capital Asset Pricing Model (CAPM), 108, 172
credit default swap (CDS), 107
capital diversification, 90
credit deterioration, 114
cash-flow mapping, 63
credit risk, 112, 117-119
central limit theorem, 47, 52
assuming, 114 (see also equity tranche)
CF Industries (CF), 119
FRTB, 209
chi-squared, 53, 106, 130
credit value adjustment (CVA), 116
Choleski decomposition, 26
currency risk, 112
cleaned return, 51
curvature risk charge, 206
coherent risk measures, 8-9
expected shortfall, 7-8
standard errors, 12-13
D
commodity risk, 112 daily price volatility, 50
comprehensive risk measure (CRM) charge, 209 dealing with dependent (or non-iid) data, 46
concentration risk, 119-121 default correlation, 121
conditional EV, 46 default probabilities (PD), 92
conditional VaR (CVaR), 84, 117 default risk, 117
confidence interval, 30 charge, 207
constant drift, 179 default-time copula, 137
constant-maturity Treasury (CMT) swap, 161 dependent variable, 141
constant volatility model, 199 DeVry Inc. (DV), 119
convexity effect, 178 distortion risk measures, 86
convexity/volatility, 168-171 diversification, see "top-down" approach
copula correlations Dow Jones Industrial Average (Dow), 111, 126
default time for multiple assets, 137 down-state prices, 156
Gaussian copula, 134-137 downward-sloping, 178, 185, 200
correlation risk drift, 160
and concentration risk, 119-121 and risk premium, 178-179
and credit risk, 117-119 time-dependent, 179-180
and market risk, 117 DVBP, 62
and systemic risk, 119 dynamic financial correlation, 106
correlations, 106 dynamic replication, 161
buying, 111
credit, 121 E
default probability, 131 Edward Lifesciences (EW), 119
and dependence, 122 endogenous liquidity
dynamic financial, 106 and market risk for trading portfolios, 80-81
equity (see equity correlations) motivation, 79-80
global financial crises 2007 to 2009, 113-116 equal-weight approach, 23-24
independence and uncorrelatedness, 122-123 equilibrium models, 168
investments and, 108-109 equity correlations, 126-127
properties of bond, 131 autocorrelation, 129-130
and regulation, 116 distributed, 130
and risk management, 112-113 mean reversion, 128-129
selling, 111 volatility, 130-131
static financial, 106 equity options, 196-197
statistical independence, 122 equity risk, 112
212 ■ Index
equity tranche, 114-115 generalised Pareto approach, 43
estimating VaR Generalized Autoregressive Conditional Heteroscedasticity
historical simulation (HS) approach, 3-4 (GARCH) model, 27, 129
lognormal distribution, 6-7 General Motors, 114
with normally distributed arithmetic returns, 5-6 geometric return data, 2-3
with normally distributed profits/losses, 4-5 Giannopoulos, K., 26, 27
exogenous liquidity, 79 Gilead Pharmaceuticals (GILD), 119
expectations, 168-168 global financial crises 2007 to 2009, 113-116
expected discounted value, 156 Gnedenko-Pickands-Balkema-deHaan (GPBdH) theorem, 44
expected shortfall (ES) great recession, 115
estimating coherent risk measures, 7-8 Greenspan, A ., 50
risk measures, 84-85 Gumbel distribution, 30, 37, 38
Exponentially Weighted Moving Average (EWMA) approach, 76
exposure at default (EAD), 92
H
extreme-value theory (EVT), 36
Heston model, 106
estimation parameters, 39-43
Hill estimator, 40, 41
historical simulation, VaR and ES
F basic historical simulation, 19
Family Dollar (FDO), 119 bootstrapped historical simulation, 19
filtered historical simulation (FHS), 26-27, 77 curves and surfaces for, 21
financial correlations risk, 106-108 non-parametric density estimation, 19-21
Financial Markets Group (FMG) of the London School of Ho-Lee model, 179-180
Economics, 96 Hull, J ., 119
financial risk management, 112 Hull-White lines, 26
Fisher-Tippett theorem, 36 hypothetical return, 51
fitted regression line, 142
fixed income vs. equity derivatives, 164—165
I
foreign currency options, 195-196
forward rate agreements (FRAs), 68-69 Iksil, B„ 112
Frechet distribution, 37, 38 implementation, VaR
Fundamental Review of the Trading Book (FRTB), 204-205 backtesting models, 77-78
internal models approach, 207-209 overview, 74
back-testing, 208-209 time horizon for regulatory, 74-76
credit risk, 209 time-varying volatility, 76-77
profit and loss attribution, 209 incorporating liquidity
securitizations, 209 endogenous liquidity
standardized approach, 205-207 and market risk for trading portfolios, 80-81
curvature risk charge, 206 motivation, 79-80
default risk charge, 207 exogenous liquidity, 79
residual risk add-on, 207 overview, 78-79
simplified approach, 207 time horizon to account for liquidity risk, 81
term structures, 206 Incremental Risk Charge (IRC), 75
trading book vs. banking book, 209-210 independently and identically distributed ("IID"), 75
independent variable, 141
initial public offerings (IPOs), 60
G interest-rate
GARCH model, see Generalized Autoregressive Conditional risk, 112
Heteroscedasticity (GARCH) model swaps, 69-70
Gaussian copula, 134-137 internal models approach, 207-209
Gaussian distribution, 47, 177 back-testing, 208-209
generalised extreme-value (GEV) distribution, 36-39, 37, 45 credit risk, 209
estimation of EV parameters, 39-43 profit/loss attribution, 209
ML estimation methods, 40-43 securitizations, 209
short-cut EV method, 39 intra-horizon risk, 76
Index ■ 213
J geometric return data, 2-3
plotting data, 14
jackknife/jackknifing, 32, 33
preliminary data analysis, 13-14
J.P. Morgan, 52, 60, 64, 76, 112
profit/loss data, 2
jump-to-default (JTD), 207, 209
quantile-quantile (QQ) plot, 14-16
Market timers, 61
K maximum likelihood (ML) methods, 40
Kolmogorov-Smirnov, 130 mean deviation, 86
Kuiper statistic, 56 mean excess function (MEF), 16
mean-reversion, 106
L equity correlations, 128-129
lognormal model with, 191
least-squares estimation, 141, 142
Vasicek model, 181-186
least-squares hedge, 151-152
mean-squared-error (MSE), 43
left-tail measure, 86
mezzanine tranche, 114. see also short credit
Lehman Brothers, 119
migration risk, 117
level versus change regressions, 146
minimum variance delta, 199
leveraged super-senior tranche (LSS), 115
mountain range options, 109
Li, David, 114, 134
multi-asset options, 109
liquidity horizons, 204
multivariate extreme value theory (MEVT), 47
lognormal model, 189
multivariate stochastic analysis, 13
Cox-Ingersoll-Ross and, 189-190
estimating VaR, 6-7
long-term bonds, 164 N
loss-given-default (LGD), 92, 207
naive estimators, 20
Netflix (NFLX), 119
M non-parametric approaches
malign risk interactions, 90 advantages of, 28-29
mapping, 60 bootstrap, 31
cash-flow, 63 and implementation, 31-33
fixed-income portfolios limitations of conventional sampling approaches, 31
approaches, 63-64 standard errors, 33-34
benchmarking, 64-66 time dependency and, 34
stress test, 64 compiling historical simulation data, 18-19
linear derivatives disadvantages of, 28-29
commodity forwards, 67-68 estimation of historical simulation VaR and ES, 19-21
forward and futures contracts, 66-67 confidence intervals for, 21-23
forward rate agreements (FRAs), 68-69 order statistics
interest-rate swaps, 69-70 estimate confidence intervals for VaR, 29-30
options, 70-72 estimating risk measures with, 29
principal, 63 weighted historical simulation
for risk measurement age-weighted, 24-25
general and specific risk, 62 correlation-weighted, 26
process, 61-62 FHS, 26-27
solution to data problems, 60-61 volatility-weighted, 25-26
Marin Capital, 115 non-parametric density estimation, 20
market risk, 112, 117 non-Pearson correlation, 122
for trading portfolios, 80-81 normal distribution, 47
Market Risk Amendment (MRA) rules, 51, 74, 75 normally distributed
market risk measurement, 2 arithmetic returns, 5-6
arithmetic return data, 2 profits/losses, 4-5
core issues, 13 rates, 176-178
evaluating summary statistics, 14 normal models, 177
214 ■ Index
o conditional EV, 46
dealing with dependent (or non-iid) data, 46
operational risk, 112
multivariate EVT, 47
option-adjusted spread (OAS), 156, 162
regression analysis, 141
with profit and loss attribution, 162-163
regression coefficient, 143
order-statistics (OS) theory, 22
regression hedge, 142-143
regression line, 106
P fitted, 142
pairs trading, 121 regression methods, 40
parametric approaches residual risk add-on, 207
generalised extreme-value theory, 36-39 risk aggregation, 90-91
estimation of EV parameters, 39-43 risk-aversion, 8, 9, 81
ML estimation methods, 40-43 risk charge, 205
short-cut EV method, 39 risk level, 13
peaks-over-threshold (POT) approach risk management
estimation, 45 and correlation, 112-113
vs. GEV, 45 intermediation and leverage, 96-97
refinements to EV approaches overview, 95-96
conditional EV, 46 regulation, 97
dealing with dependent (or non-iid) data, 46 risk measures, 13
multivariate EVT, 47 expected shortfall (ES), 84-85
peaksover-threshold (POT), 43. see also generalised other risk measures, 86
Pareto approach overview, 82
estimation, 45 spectral risk measures (SRM), 85-86
vs. GEV, 45 V a R ,82-84
Pearson correlation, 109, 126 risk-neutral distributions, 201-202
persistence, 129 risk-neutral investors, 171
Pickands estimator, 40, 41 risk-neutral pricing, 158-159
price trees, 156 risk-neutral probabilities, 158
principal components analysis (PCA) risk-neutral process, 181
application to butterfly weights, 149-150 risk premium, 171-173
EUR, GBP, and JPY swap rates, 150 and drift, 178-179
hedging with, 149-150 risk sensitivity approach, 205
overview, 146-147 Ross Stores (ROST), 119
shape of PCs over time, 150-152
for USD swap rates, 147-149
principal mapping, 63
S
profit/loss Salomon Brothers model, 190-191
FRTB, 209 securitizations, FRTB, 209
market risk measurement, 2 selling correlation, 111
normally distributed, 4-5 semi-parametric estimation methods, 40-43
short credit, 114
Q short-cut EV method, 39
simplified approach, 207
quantile estimators, 10-12
Simpson's rules, 10
quantile-quantile (QQ) plot
single-variable regression-based hedging
market risk measurement, 14-16
regression hedge, 142-143
stability of regression coefficients over time, 143-144
R Sklar, A., 114, 134
rainbow options, 109 Southwestern Energy (SWN), 119
rate trees, 156 spectral risk measures (SRM), 85-86
recombining trees, 159 Spitzer, E., 61
refinements to EV approaches square-root scaling rule, 37
Index ■ 215
standard errors of estimators Twain, M., 106
coherent risk measures, 12-13 two-variable regression-based hedging, 144-146
quantile estimators, 10-12
standardized approach, 205-207 U
curvature risk charge, 206 unified vs. compartmentalised risk measurement
default risk charge, 207 "bottom-up" approach, 91-93
residual risk add-on, 207 overview, 89-90
simplified approach, 207 risk aggregation, 90-91
term structures, 206 "top-down" approach, 94-95
state-dependent volatility, 159 univariate stochastic analysis, 13
static financial correlations, 106 upper partial moments, 86
stock market crash, 119 up-state prices, 156
stressed market conditions, 204 U.S. Treasury bond, 140
stressed ES, 208
stressed VaR, 88-89, 204, 208 V
stress testing, 64
value-at-risk (VaR) models
incorporating into market-risk modelling, 87-88
backtesting, 50
stressed VaR, 88-89
risk measures, 82-84
systemic risk, 119
variance, 86
T Vasicek, O., 134, 181
volatility, 108
tail conditional expectation (TCE), 84
and convexity, 168-171
tail risk, 117
term structure, 198-199
Taleb, N., 126
yield, 189
term structure models
volatility-asset return correlation, 121
desirability of fitting, 180-181
volatility smiles, 194
drift and risk premium, 178-179
characterizing ways, 198
Ho-Lee model, 179-180
risk-neutral distributions from, 201-202
and no drift, 176-178
volatility-weighted historical simulation, 25-26
normally distributed rates, 176-178
standardized approach, 206
W
time-dependent drift, 179-180 Walker, H., 106
time-dependent volatility, 188-189 Walmart (WMT), 119
time step reducing, 163-164 Weibull distribution, 37
time-varying volatility, 76-77 weighted average quantile method, 10
"top-down" approach, 94-95 weighting function, 8
tracking error VaR (TE-VaR), 65 wrong-way risk (WWR), 116
trading book, 209-210
trapezoidal rules, 10 Y
Treasury Inflation Protected Securities (TIPS), 140, 143 yield volatility, 189
216 ■ Index

Market Risk Measurement and Management PDF

Uploaded by

Copyright:

Available Formats

Market Risk Measurement and Management PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Market Risk Measurement and Management PDF

Uploaded by

Copyright:

Available Formats

Financial Risk Manager

O ptions, Futures, and O th er D erivatives, Tenth Edition by John C . Hull

Grateful acknowledgment is made to the following sources for permission to reprint

Learning O bjectives provided by the Global Association of Risk Professionals.

Printed in the United States of Am erica

ISBN 10: 0135965896

1.6 The Core Issues:An Overview 13

3.1 Generalised Extreme-Value 5.1 Mapping for Risk Measurement 60

Chapter 10 Empirical Approaches

8.2 Do Equity Correlations Exhibit 10.2 Two-Variable

14.1 Time-Dependent Volatility:

Michelle McCarthy Beck, SMD Dr. Victor Ng, CFA, MD

William May, SVP

E x c e rp t is C hapter 3 o f Measuring M arket Risk, S e co n d Edition, by Kevin D ow d.

• O verview : An overview of the different approaches to market

1.2 ESTIMATING HISTORICAL

Suppose we have 1000 loss observations and are interested in

Chapter 1 Estimating Market Risk Measures ■ 3

1.3 ESTIMATING PARAMETRIC V a R

Estimating VaR with Normally Distributed

aVaR = -fip jL + o-P/Lza (1.8)

w here za is the standard normal variate corresponding

If we are working with normally distributed L/P data,

aVaR = [xLj P + crL/Pza (1.10a)

aVaRe = mL/P + s L/Pza (1.10b)

Example 1.2 VaR with Normal P/L

Estimating VaR with Normally Distributed Loss (+)/profit (-)

r* = /xr - arza ( 1 . 11) ( 1. 12)

Chapter 1 Estimating Market Risk Measures ■ 5

and is always positive, which confirms the lognormal has a

Example 1.5 Lognormal VaR vs Normal VaR

1.4 ESTIMATING CO H EREN T RISK

Estimating Expected Shortfall

The fact that the ES is a probability-w eighted average of tail

Example 1.4 Lognormal VaR

Chapter 1 Estimating Market Risk Measures ■ 7

Estimating Coherent Risk Measures

M<i>= J <Mp)qPdp (1.17)

Table 1.2 ES Estimates as a Function of the Number

95.5% 1.6954 25 2.0433

96.0% 1.7507 50 2.0513

96.5% 1.8119 100 2.0562

97.0% 1.8808 250 2.0597

97.5% 1.9600 500 2.0610

98.0% 2.0537 1000 2.0618

98.5% 2.1701 2500 2.0623

99.0% 2.3263 5000 2.0625

99.5% 2.5738 10 000 2.0626

A verage of tail VaRs 2.0250 True value 2.0630

W hen estim ating ES or more general coherent risk measures in

and is therefore equal to 0.4228.

As when estim ating the ES earlier, when using this type of

Table 1.3 Estimating Exponential Spectral Risk Estimate of Exponential

10% -1 .2 8 1 6 0 0.0000 100 1.5853

20 % -0 .8 4 1 6 0 0.0000 250 1.7338

30% -0 .5 2 4 4 0 0.0000 500 1.7896

40% -0 .2 5 3 3 0.0001 0.0000 1000 1.8197

50% 0 0.0009 0.0000 2500 1.8392

60% 0.2533 0.0067 0.0017 5000 1.8461

interval. (The central tendency param

simulated density function from the distribution of our 0°(i) val