Lecture 6 - Modeling Conditional Correlations and Multivariate GARCH - Copy20130530013057

Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

Modelling and Forecasting Conditional

Covariances: DCC and Multivariate GARCH


Massimo Guidolin
Dept. of Finance, Bocconi University
1. Introduction
In chapter 5 we have made additional progress in our stepwise distribution modeling (SDM) approach, i.e.:
1. Establish a variance forecasting model for each of the assets individually and introduce methods for evaluating the performance of these forecasts, which occurred in chapter 4;
2. Consider ways to model conditionally non-normal aspects of the return distribution of the
assets in our portfolioi.e., aspects that are not captured by time series models of conditional
means and variances, which has been the focus of chapter 5.
The third and crucial step that we take in this chapter consists in
3. Linking individual variance forecasts with correlations forecasts, possibly by modelling the
process of conditional variances themselves.
The simple fact is that most relevant (realistic) applications in empirical finance are actually
multivariate: they involve 2 assets/securities/portfolios. If you collect returns on such

assets or portfolios in a 1 vector R [1 2 ... ]0 then the variance of a random vector


turns out to be a matrix of second moments, i.e., variances and covariances:1

[R ] -[R ] [(R [R ])(R [R ])0 ]

1 [1 ]

2 [2 ]

1
1
2
2

[
]

[
]

[
]

..

[ ]

(1 [1 ])2
(1 -[1 ])(2 -[2 ]) (1 -[1 ])( -[ ])

(1 -[1 ])(2 -[2 ])


(2 [2 ])2
(2 -[2 ])( -[ ])

=
..
..
..
..

.
.
.

1
1

2
2

( [ ])2
( -[ ])( -[ ]) ( -[ ])( -[ ])
1

It is immaterial whether you want to call this a variance, a covariance, or a variance-covariance matrix. In
this chapter we shall express a preference for the second term, covariance matrix. Moreover, this definition is easily
extended from the unconditional covariance matrix, [R ] to the conditional covariance matrix, [R+1 ]
[R+1 |= ]

[1 ]
[2 ]
..
..
.
.
1

[ ]
[ ] [ ]
[1 ]
[1 2 ]
..
.

[1 2 ]
[2 ]
..
.

21
12
..
.
1

12 1
22 2
..
..
..
.
.
.
2 2

Clearly, all variances are collected on the main diagonal, while all covariances are collected o
the main diagonal. Moreover, because [ ] = [ ] from simple properties of expectations, then [R ] is by construction a symmetric matrix. For instance, any portfolio choice
methodology is clearly based on knowledge or estimation of [R ] as it is well known that optimal portfolio shares will also depend on the covariance of asset returns considered in pairs. Because
risk management concerns either portfolios of securities or portfolios of investment projects, then
risk management is also intrinsically a multivariate application. Although in many courses, pricing
problems are mostly presented with reference to univariate applications only (i.e., we price one asset
at the time, for instance a derivative written on an individual security), in reality this represents
more the exception than the rule, as we are often called to price assets that concern several cash
flows or underlying securities (think about compound or basket options). Also in this respect, one
needs to develop useful multivariate time series methods to model and forecast quantities of interest
and, among them, surely dynamic covariances and correlations.
In chapters 4 and 5 all of our attention has been directed to developing, estimating, testing, and
forecasting univariate ( = 1) volatility models only. In this chapter, we broaden our interest to
multivariate ( 2) models thatas far as second moments are concernedwill necessarily also
concern covariances and correlations besides variances. We therefore examine three approaches to
multivariate estimation of conditional second moments. First, we deal with an approach that moves
the core of the eort from the econometrics to the asset pricing, in the sense that covariances will
predicted o factor pricing models (such as, but not exclusively, the CAPM). The advantage of this
way of proceeding is that some of us prefer to do more economics and less econometrics (and this
seems to be a good idea also to the Author of these notes). Unfortunately, most of the asset pricing
theory currently circulating tends to be rejected (sometimes rather obviously, think of the CAPM,
in other occasions only marginally) by most data sets. As a result, the majority of users of financial
econometrics (risk and asset managers, some quantsy types of asset pricers and structurers) prefer
to derive forecasts from econometric models, vs. incorrect, commonly rejected asset pricing models.
Second, we propose models that directly model conditional covariances following a logic similar to
chapter 4: these are in practice multivariate extensions of ARCH and GARCH models. As we
shall see, the idea is similar to when in chapter 3 you did move from univariate time series models
for the conditional mean to multivariate, vector models (such as vector autoregressions). However,
in the case of covariance matrices, we shall see that extending univariate GARCH models to their
multivariate counterparts will present many practical diculties, unless a smart approach is adopted.
Therefore the corresponding material is presented only in the final, but rather important Section
6. Third, such a smart approachdynamic conditional correlations (DCC) modelsrepresents the
2

other important, key tool that is described in this chapter.


In spite of the diculties we may encounter with a truly multivariate GARCH approach, its
payos are obvious in terms of the questions such a framework make it possible to answer, besides
whether or not correlations do change over time: Is the volatility of one specific market (say, the
U.S.) leading the volatility of other markets? Is the volatility of an asset transmitted to another
asset directly (through its conditional variance) or indirectly (through its conditional covariance)?
Section 2 presents the important distinction between passive and active risk management that
motivates a need of a multivariate approach to the time series analysis of volatility and covariance.
Section 3 investigates the special case in which there is no dierence between passive and active
estimation strategies, i.e., in which the econometrics of portfolio returns gives forecasts of variance
that automatically incorporate forecasts of covariances between assets in pairs. Unfortunately,
such an interesting result that could remarkably simplify variance forecasting obtains only when
we assume rather specific asset pricing models that have a linear factor structure. Section 4 deals
with simple, one would say naive, models used to forecast covariances. Section 5 presents the most
imported and arguably best working set of methods to model and forecast dynamic correlations,
Engles (2002) DCC model. Section 6 finally extends our horizon to the full family of multivariate
GARCH models, of which the DCC is in a one of the most recent and yet very successful members.
Appendix A presents a few additional results concerning estimation methods, in particular the
R

feasible GLS approach. Appendix B presents a fully worked out set of examples in Matlab
concerning DCC modelling.

2. Motivation: Passive vs. Active Risk Management


Suppose you are a risk manager in charge of measuring and controlling risk for a given portfolio,

whose return we call +1


. Although it is easy to read what follows with reference to a portfolio

of securities, more generally, this could be a portfolio of loans or other OTC positions/exposures.
Such a portfolio is composed of positions such that:

1
2
+1
1 +1
+ 2 +1
+ + +1

+1
= w0 R+1

(1)

=1

where the vector w [1 2 ... ]0 represents the weights that apply between time and
P


0

+ 1. Formally, one could even be more precise and write that +1
=1 +1 = w R+1

to emphasize that w [1 2 ... ]0 has been selected at time The first, simplest choice, is

to ignore the underlying structure and origins of +1


in (1): because once the summation on the


}=0
right-hand side of (1) has been performed, we are likely to have available a time series {+1

of data on portfolio returns, one possibility is to just use such returns and apply the univariate
methods covered in chapters 4 and 5. For instance, under the assumption of multivariate normality
3

of the vector of returns R+1 , R+1 ( ), one may compute


1
() =


+1 ()
+1
q

where

represents
a
forecast
of
the
quantity
[+1
] i.e., of volatility at time +1 obtained
+1


from some conditional
heteroskedastic model estimated on (applied to) the time series {+1
}=0 for
q
)2 . This is called a passive risk management approach: it
+
( )2 + (
instance
=
+1

is passive because all of these steps are just fine of course, but have a considerable counterindication,
they condition on the vector of portfolio weights between time and + 1 being given and constant,
w+1 = w [1 2 ... ]0 . Formally, we should then write2
1
(; w ) =

+1 (w ) ()


because both

+1 and
+1 condition on the time portfolio weights, w . Passive risk management
implies that the quant steps (modelling, estimation, forecasting, etc.) are simple because they are
based on univariate tools. However, the results produced by passive portfolio management are
fragile by construction. For instance, even though we may be able to say that for all the assets

( = 1 2 ..., )
+2 '
+1 notice that +1
() may be completely dierent from ()



for an obvious reason: if the weights w+1 dier from the weights w , then
+2 (w+1 ) 6=
+1 (w )

even though
+2 '
+1 for all assets in the portfolio. Probably, in practice, this means that a
passive risk management user will be forced to repeat estimation and all calculations at each point
in time, or at least every time the structure of the portfolio is modified. Moreover, suppose that
your interest lies in understanding how your risk measures could change if you alter the structure
of the portfolio weights, i.e., something like
(; w )
for some = 1 2 ...,


Clearly, econometric methods simply applied to the aggregate time series {+1
}=0 will be incapable

of accomplishing that.
A risk manager may indeed resort to active instead of passive risk management methods. Adopting active methods is equivalent to using multivariate econometric modelling. The advantage of
active methods is that the individual, asset- or security-specific contributions to risk (or portfolio
performance, in the case of asset allocation applications) can be estimated, like (; w )
in the example above. To adopt a multivariate model means to switch focus from modelling and

forecasting [+1
] to [R+1 ] or

from [+1
] to [w0 R+1 ] = w0 [R+1 ]w ,

where (as we have already noted) there is no dierence between [R+1 ] and [R+1 ]. Moreover, remember that for any random vector X+1 , [w0 X+1 ] = w0 [X+1 ]w in a way similar
2

From now on, we resume assuming that


+1 ( ) = 0 and we omit it.

to the fact that when = 1, [ +1 ] = 2 [+1 ]. However, to come up with models

].
and estimation methods for [R+1 ] is a much more serious endeavour than for [+1

We have already stated in the Introduction that modelling [R+1 ] means to model covariances and/or correlations. Note that the statements are not equivalent, because3

[+1
+1
]

+1

]
[+1
+1
q +1
=q

] [ ]
+1 +1
[+1
+1

so that while modelling and forecasting correlations requires modelling and forecasting covariances,
it also implies that one can model and forecast variances, which we can do already using the methods
developed in chapters 4 and 5.
The true nature of active risk management and the fact that it involves correlations and covariances emerges from an example for the case = 2:
[w0 R+1 ] = w0 [R+1 ]w
"
#"
#
21+1 12+1
1
1
2
= [ ]
12+1 22+1
2
= (1 )2 21+1 + 21 2 12+1 + (2 )2 22+1
= (1 )2 21+1 + (2 )2 22+1 + 21 2 12+1 1+1 2+1

(2)

The last line shows that under an active risk management approach ( [w0 R+1 ]), also dynamic
forecasts of either covariance ( 12+1 ) or, equivalently, correlation (12+1 ) are required. Obviously,
the very last line derives from the definition of a correlation, 12+1 = 12+1 1+1 2+1 . For a
general 2, the expression in (2) generalizes to:
[w0 R+1 ] = w0 [R+1 ]w =

X
X

+1

=1 =1

X
X

+1 +1 +1

=1 =1

X
X
( )2 2+1 + 2
+1 +1 +1
=1

(3)

=1 =+1

which still emphasizes that not all the variance forecasts will matter, but also the ( 1)2

correlation forecasts.4 For instance, in simple risk-management applications, under active risk
management we shall have:

X
X

+1
+1
+1 1 ()
() = {w0 [R+1 ]w }1 () =
=1 =1

In this chapter +1 and


+1 +1 and +1 mean the same. Where one places the indices is irrelevant,
provided a Reader is alerted of the meaning. You should also recall that correlations simply represent measure of
linear dependence between pairs of random variables, meaning that more complex form of dependence may exist that
correlations will not necessarily capture.
4
Relevant correlation forecasts are only ( 1)2 because of the symmetry of the covariance matrix, i.e., +1 =
+1 for all 6= .
3

which clearly leads to expressions for the partial derivatives mentioned above, of the type

X
()
= 2 2+1 + 2

+1
+1
+1 1 ()

=1, 6=

This gives the contribution of the second moments of the th asset to the -percent VaR of the
portfolio.
The expression in (3) makes it clear that in general, an active risk management problem will
involve the forecasts of variances and of ( 1)2 covariances (or correlations). While up
to this point we have generally assumed that given a conditional heteroskedastic model, we always
have sucient observations to proceed to estimation, we immediately note that when it comes
to multivariate covariance matrix estimation and forecasting, the availability of suciently long
time series may become an issue that requires attention.5 For instance, with only 15 assets in a
portfoliowhich is a rather sensible and commonly seen portfolioyou will need: (i) 15 variance
forecasts; (ii) (15 142) = 105 correlation forecasts, for a total of 120 parameters or moments to
forecasts. Suppose, for simplicity, that variances and covariances are constant over time. Then the
120 objects that you care for in this example, simply become parameters to estimate, {
}15
=1 and

{
}
=1, . At this point, with 15 series of return data (because with 15 assets you will have

at least these 15 time series), note that a total of 120 parameters to be estimated on 15 series,
gives you 120/15 = 8 data points per series. Even though you may think that 15 time series are
a lot, for each of them you will need at least 8 observations in order to proceed. However: would
you ever estimate 120 parameters using exactly 120 observations? Hopefully not. In fact, time
series econometricians normally use a simple rule-of-thumb by which one should always have 20
observations per parameter before proceeding to any econometric analysis. The ratio between the
total number of observations and the number of parameters to be estimated is called saturation
ratio. In this case, 20 120 = 2 400 observations. This means that for each series, you should
have 2400/15 = 160 observations per series before seriously thinking of tackling this problem. 160
observations per series mean that you should recover almost 14 years of monthly data; or 32 weeks
of daily data. These requirements are moderate, but already not completely negligible when you
deal with over-the-counter instruments or newly floated stocks in the aftermath of IPOs.
If you worked through this example afresh after having increased the number of assets to something even more realistic such as 100 or so assets, you will come to realize that there is a new
dimension of multivariate time series problems that was unknown before: because the size of the
covariance matrix grows as a function of 2 (formally, + ( 1)2 = ( 2 )), the size of the
estimation problem and the corresponding data requirements grow quadratically in the number of
5

What follows assumes that you have realized prior to today that if you have parameters to estimate, then you
will need at least observations. However, it is common to expect that you will have many more observations
than parameters to estimate, say = where is at least 10 or 20 and will be called saturation ratio in what
follows.

assets (i.e., very quickly).6 In fact, you can read most of the material that follows not only as a an
attempt to develop good multivariate econometric methods that may accurately forecast variances
and covariances, but also as a way to deal with the issue of excessively high number of parameters
that estimating covariance matrices implies.

3. Exposure Mapping (Factor-Based) Approach


A simple way to reduce the dimensionality of the problem of estimating and forecasting portfolio
variance is to impose a factor structure using observed returns as factors. Although, as we shall see
below, the method is considerably more general, lets start from an analysis of the CAPM. Assume
that the CAPM holds exactly, i.e., that it perfectly describes portfolio returns:

+1
= + [+1
]

(4)

is the return on
where is the riskless rate (assumed to be constant, just to simplify) and +1

the market portfolio (you know what this means from your asset pricing courses). Also suppose
that you have already managed to estimate the beta of your portfolio, for instance, using simple
OLS methods:

d
= [+1 +1 ]

]
d
[+1

where hats refer to sample estimates, i.e., obtained from the data. Once you are through with
that, then

2 d
d
[+1
]=
[+1 ]

which is simple enough. At this point, you should be confused (for a short time only) because
the two previous formulas seemingly depend on portfolio returns only (besides the time series of
market portfolio returns). In Section 2 we said that if you just use realized portfolio returns then
your approach will be a passive one, with all its limitations. However, you would be incorrect
in your confusion, because under the exposure mapping approach, it turns out that the passive
and active approaches are identical. Equivalently (and these are excellent news), it turns out that a
passive exposure-based approach to variance forecasting gives the same result as an active approach;
therefore the easier, passive approach is preferred. The reason for this surprising result is thatas
you should recall from your asset pricing coursesCAPM (more generally, factor) betas are linear
in portfolio weights, i.e.,
(w ) =

(5)

=1

Although a covariance matrix contains elements, it is symmetric and collects variances on its main diagonal.
This means that the number of distinct elements (parameters) collected in [R+1 ] is ( + 1)2 of which
are variances and ( + 1)2 = ( 1)2 correlations.


(w ) = P
and

=1 , i.e., the beta of a portfolio is the weighted sum of the individual betas,

with weights equal to portfolio weights. (5) derives from a well-known properties of covariances:
(w ) =
=
=

]
[+1
+1
]
[+1
P
P

[
=1 +1 +1 ]
=1 [+1 +1 ]
=
]
]
[+1
[+1

]
X
+1
=

]
[+1
=1

[+1

=1

Therefore, it is easy to see why passive and active risk management need to give the same result:

[+1
] =

"
X
=1

#2

"
X

[+1
]

+1

=1

X
=1 =1

X
=1 =1

X
X
=1 =1

X
X
=1 =1

X
X
=1 =1

(passive)
h
i

+1
+1

i
h

+ (+1
) + (+1
)
h
i

(+1
) (+1
)
h
i

+1
+1

h
i

+1

At this point, exploiting our assumption that the CAPM holds exactly, we have:

[+1
] =

X
h
i
h
i
X
X

( )2 2 +1
+ 2
+1

|
|
{z
}
{z
}
=1
=1 =+1

[+1
[+1 +1
] from CAPM
] from CAPM

X
X
X
( )2 2+1 + 2
+1 = w0 [R+1 ]w (active),
=1

=1 =+1

which shows that starting from a seemingly passive approach, one can get to an active one thanks
to the properties of the CAPM.7 In this case, assuming normality,

X
()

= 2 2 (+1
) + 2
(+1
) 1 ()

=1, 6=

i.e., the typical partial derivatives of interest in active risk management applications can all be
re-expressed in terms of individual asset betas.
7

Make sure to understand why under the CAPM, [+1


+1
] = +1
.

Of course, these results obtain only when the CAPM applies exactly, or equivalently, when the
returns of none of the assets in the portfolio contains any firm- or security-specific idiosyncratic risk.
In other words, asset returns must be entirely explained by systematic risks. To see what happens
in case there is any residual, idiosyncratic risk left in the process of individual asset returns, note
that if

= + [+1
] + +1 ( = 1 2 ..., ),
+1

where +1 captures such idiosyncratic risk, then


(
)

h
i X
X

] =
( )2 2 +1
+
( )2 [+1 ] +
[+1
=1

+2

"
X
=1

=1 =+1

#2

=1

h
i X

+1
+
[+1 +1 ]

=1 =+1

h
i X

+1 +
( )2 [+1 ]
=1

= passive mode + active mode applied to

X
( )2 [+1 ].
=1

Even though the definition of idiosyncratic risk implies that these risks must be uncorrelated across
assets so that [+1 +1 ] = 0 6= , fact remains that the terms [+1 ] for = 1 2,
..., need to be predicted on the basis of some model that cannot be the CAPM itself (the
CAPM is silent about idiosyncratic risk by construction, being only about systematic risk). This
means at this stage one will be forced to go back to her Matlab copy to perform estimation and
forecasting of { [+1 ]}
=1 using econometric methods, for instance dierent GARCH models
for idiosyncratic risk,
2 +1 = 0 + 1 2+1 + 2 2
one for each of the assets in our portfolio. This is of course very active but at the same time also
rather painful which goes to show that the advantages of active and passive management are lost
because of the very existence of idiosyncratic risk.

3.1. Applications to risk management


Assuming the CAPM in (4) is subject to IID shocks from a Gaussian distribution with zero crosssectional correlations (i.e., [+1 +1 ] = 0 for 6= = 1 2 ),
+1 = + (+1 ) + +1

+1 IID (0 2+1 )

(6)

we now derive an expression for the 1% VaR of a portfolio characterized by weights {1 2 }


and that wants to emphasize that such an expression involves only quantities that are specific to
each of the assets, i.e., their portfolio weights, their betas, their (estimated) idiosyncratic risk
levels, etc.
9

Because +1

=1 +1

and the linear properties of the covariance (i.e., [ 1 +

2 ] = 1 [ ] + 2 [ ]), we know that the estimate of in


(+1 ) + +1
+1 = +

is
=
=1 while
[ +1 ] =
=

"
X
=1
"
X
=1

"
X

(+1 )

(+1 )

X
=1

(+1 ))
(+1
#

+1 =

=1

2
2+1 ,

=1

which requires assuming that [+1 +1 ] = 0 for 6= = 1 2 to also obtain [+1 +1 ]


= 0. Therefore
2 +1

2 2

+1

+ 2 +1

=1

!2

2+1 +

2
2+1

=1

and under normality,


+1 () =

=1

X
=1

!2

2+1 +

2
2+1

=1

(
b +1 )

12

1 () +

b +1 ) can be simply estimated as the sample mean excess return on the market
where (

portfolio. Notice that if you compute VaRs assuming that the CAPM holds, then
(
)

X
X
X

(
b
b +1 )
+1 +
(+1 ) = 1
+

=1

=1

will generally be non-zero, unless

=1 (+1 )

= 0 and

=1

= 1 which are

=1

rather special restrictions that in general will not be satisfied. In particular, in the case of a 1%
VaR, we will have that under the assumption of +1 IID (0 2 )

!

X
X
2 2
+1 IID 0
+1
+1
=1

(7)

=1

so that
+1 (0.01) = 233

=1

!2

2+1 +

X
=1

12 (
)

X
2 2

(
b +1 - )

+1
+

=1

and clearly the risk exposure will entirely depend on the portfolio composition, the betas of the
securities in the portfolio, and their idiosyncratic risk coecients. Finally, the result in (7) shows
10

the key role played by the assumption that +1 IID (0 2 ) for = 1 2 . As for the
shocks, their importance needs little emphasizing: while the CAPM as an asset pricing model,
[+1 ] = + [+1 ] has no empirical implications because it just pins down
expected returns, (6) can be used in risk management but it requires us making some assumptions
as we didon the distribution of the +1 as well as on [+1 +1 ].
Because our result has emphasized the specific role of individual asset portfolio weights, we can
also quantify the VaR (total risk exposure) loss of changing the portfolio weights in the direction of
realizing an optimal degree of diversification such that the condition

X
=1

2
2+1 ' 0

eventually holds as grows larger and larger. In fact, note that if the portfolio is not well-diversified,
then
+1 () =

=1

X
=1

!2

2+1 +

2
2+1

=1

(
b +1 )

12

1 () +

If the portfolio is well diversified, then

(
)
!

X
X

+1 1 () +
(
b +1 )
+1
() =

=1

=1

Clearly, because 1 () 0 for 05, the suspicion is that

()
+1 () +1

and the decline can be quantified as the disappearance of the term

(
X

2
2+1

=1

)12

1 () 0

However to compute the actual dierence is a bit of a mess.8


P
P

b
(
b +1
Define +1 = +

=1 (+1 ) and +1 = +
=1

) Notice that

[ +1 ()]2 =

X
=1

!2

2 X
2

2+1 1 () +
2
2+1 1 () +

+2 +1 + 2 +1
8

=1

=1

!2

2+1 +

X
=1

2
2+1

12

1 ()

Those with a weak heart are advised to skip the algebra that follows and to go directly to conclusions to prevent
permanent damage.

11

while
h
i2

+1
() =

=1

The dierence is then

!2

!
X

2
+1 1 ()
2+1 1 () +(

+1 ) +2 +1
=1

!2
!2

X
h
i2
X

2+1 1 () +
[ +1 ()]2 +1
()
=

=1

=1

2
[( +1 )2 (
+1 ) ] +

2
2+1 1 () +

=1

12
!
2

X
X
2
+2 +1

2
2+1 +
+1 +

=1
=1

!
)
X
+1 1 ()


+1

=1

h
i2

() 0 holds, then it must be that +1 ()


Now notice that if [ +1 ()]2 +1
h
i2

+1
(). However, [ +1 ()]2 +1
() 0 would be guaranteed by the fact that

!2
!2

X
X
X
2

0
2
2+1 1 () +

=1
=1
=1

12
!2

X
X
X

2
2 2

0
+1
+1 +

+1
+1

+1

=1
=1
=1

and ( +1 ) (
+1 ) 0 but in general any combination of these three condition may deliver
P
the result. Importantly, even if it is intuitive to think that setting
2
2+1 ' 0 should bring
=1

a reduction in VaR and this remains likely, formally it is possible that setting the weights in such a
way may cause a reduction in the expected portfolio return so large to over-turn the eect. Indeed,
observe that if we were able to say that +1 =
+1 ' 0 as often assumed in our lectures as
well as in chapters 4-5, then

h
i2 X
2

[ +1 ()] +1 () =
2
2+1 1 () +
2

=1

=1

!2

X
=1

!2

2
2+1 1 () 0,

P
2
2+1 ' 0 must
which is likely to hold if the individual security betas are all finite because
=1
P
2 0 as the degree of diversification in the portfolio increases (i.e., as ),
require that =1
P
remains unrestricted.9
while
=1

This proof is not suciently tight to be called a proof. However to make the proof compelling would require
imposing assumptions on the asymptotic behavior of the weights what would just increase the formal burden. The
statement is highly likely to hold in most realistic circumstances.

12

3.2. Multi-factor exposure mappings


As we have stated while introducing the topic, the exposure mapping approach also works beyond
the simple case of the CAPM. As usual, although the intuition and mechanics remains the same, the
details are a bit more tedious. Consider the following (empirical) asset pricing model for a generic
asset/security/portfolio = 1 2 (due to Fama and French, 1992):
[+1 ] = + [+1 ] + [+1 ] + [+1 ]

(8)

where is the constant riskless rate of return, +1 is the rate of return from a special
long-short (zero net investment) portfolio that goes long in small capitalization stocks and short
in large capitalization stocks, and +1 is the rate of return from a special long-short (zero
net investment) portfolio that goes long in high book-to-market ratio (also called value) stocks
and short in low book-to-market ratio (growth) stocks.10 Notice that this an asset pricing model
and not (yet) a ready-to-use econometric framework to be applied in risk management because the
model only imposes restrictions on expected returns, i.e., it is not a model for returns but of their
expectations.11
For instance, how does the expression of the 1% VaR of a portfolio characterized by weights
{1 2 } look like in the case of (8), assuming the asset pricing model is subject to IID
shocks from a Gaussian distribution with zero cross-sectional correlations (i.e., [+1 +1 ] = 0
for 6= = 1 2 ),
+1 = + (+1 ) + +1 + +1 + +1

+1 IID (0 2 ) ? (9)

In this case it may be simpler to express (9) in matrix form as R+1 = B+1 + +1

where R+1 is a 1 vector of asset returns, is a 1 vector of ones, +1 [(+1 )

+1 +1 ]0 is a 3 1 vector, and B is 3 a matrix that collects in each of its rows

the exposure coecients [ ]0 . Finally +1 IID (0 ) where is a diagonal matrix


that collects the variance coecients 21 , 22 , , 2 .
P
0
We start by noting that +1
=1 +1 can be re-written as +1 R+1 w and
the 1 vector x that collects the excess return observations 1 , 2 , ...,

can be written as

x = Xw,
where X is a matrix [x1 x2 ... x ] and w a 1 vector of portfolio weights. Hence the
estimate of (the 3 1 column vector) b [ ]0 in the stacked regression
+
x = b

10

(10)

SMB is the acronym for Small minus Big and HML is the acronym for High minus Low (referred to the
book-to-market ratio). The fact that SMB and HML are two zero net investment portfolios explains why we do not
need to subtract the constant riskless rate from +1 and +1 . You must have already encountered this
factor models in at least three of your MSc. courses.
11
Morever, under the restrictions = = 0 for = 1 2 (i.e., for all assets under consideration), this model
becomes the CAPM used in the lectures, [+1 ] = + [+1 ].

13

will be
= (0 )1 0 x = (0 )1 0 (Xw) = ((0 )1 0 X)w = B

b
is the 1 vector
In (10) is the sample size, [01 02 ... 0 ]0 is a 3 matrix, and
indicates
that collects the observations 1 , 2 , ..., .12 However, the expression for b
that this vector of estimated coecients may be written as weighted combination of the columns of
((0 )1 0 X) where each column has an interpretation similar to a vector
the 3 matrix B
of ratios of covariance-type terms between the returns on the factor portfolios and the returns
on each of the assets (0 X) divided (loosely speaking) by a 3 3 matrix of variance-type
terms concerning (0 ) returns on the three factor portfolios. The weights of the combinations are

provided by the portfolio weights. Moreover

w.
[ +1 ] = [
0+1 w] = w0
+1
0+1 w = w0

Therefore

+1 + +1 )0 w]
2 +1 = [ +1 ] = [( + B
+1 + +1 )0 ]w
= w0 [(B
0 w + w0 [+1 ]w
[+1 ]B
= w0 B
n
o
[+1 ]B
[+1 ]B
w = w0 B
0 w + w0
0 +
w,
= w0 B
where [+1 ] is the 3 3 covariance matrix of the returns on the factor portfolios, i.e.,

[+1 +1 ] [+1 +1 ]
2+1

[+1 ] [+1 +1 ]
2+1
[+1 +1 ] .
2+1
[+1 +1 ] [+1 +1 ]

Finally, it is easy to see that assuming normality,

n h
i o12
+1 ])0 w
[+1 ]B
0 +
w
[
( + B
+1 (0.01) = 233 w0 B
[+1 ]B
0 + [ +1 ]} [ +1 ]
= 233{B

(11)

+1 ] can be simply estimated as the sample mean of (excess) returns on the factor portfowhere [
lios. Once more, the risk exposure will entirely depend on the portfolio composition, the exposures
the
of the securities in the portfolio vs. each of the three priced risk factors as measured by B
covariance matrix of the factors themselves, [+1 ], and the idiosyncratic risk coecients of all
. Although it is algebraically
the securities in the portfolio, as captured by the diagonal matrix
more involved, also in this case we see that a seemingly passive expression (second line of (11)) gives
the same answer as a perfectly active one (first line of (11)).
12

In case this sounds unfamiliar, please review your notes from the first semester of Econometrics to see how the
multivariate regression +1 = ( +1 )+ +1 + +1 + +1 can be written
in stacked form.

14

4. Naive Models of Covariance Prediction


What if the asset pricing models discussed in Section 3 are rejected by the data? Unfortunately,
you may have learned from your courses that this tends to be the case for most models, data sets,
and sample periods. The key alternative idea that populates the financial econometrics literature is
that concepts and tools introduced in chapters 4 and 5 with reference to volatility forecasts can now
be extended to covariances as well (and hence to correlations). This of course starts with rather
simple, naive techniques and models that we have already commented in chapter 4. The simplest
idea is to build time-varying estimates of covariances using rolling (moving) averages:

+1 =

1 X

=1 +1 +1

where is the window length and we have assumed a zero mean for the returns on both assets
and . Clearly, when = this becomes the rolling window variance estimator already analyzed
in chapter 4. The problem of this rolling window covariance estimator/predictor remains the same
one that we have already encountered in chapter 4: how should one pick the window parameter ?
Obviously, its choice is critical for the estimator that can be obtained. Too long a window makes
the estimator rather smooth but also risks of including in the calculation returns that may have
been originated from a possibly dierent period or regime, in either a statistical or in an economic
sense. The choice of a small leads to a jagged and quickly changing estimator. Moreover, the
box-shaped spurious eects already discussed in chapter 4 would also characterize this rolling
window covariance estimator, with the risk of covariance prediction at time changing not because
of events recorded at time but because returns recorded periods ago drop out of the rolling
window. Finally, the rolling window estimator attaches equal weights on past cross products of
returns, which may be highly questionable. Figure 1 reports one such example. The appearance of
some box-shaped is obvious.

Figure 1: S&P500 vs. USD/Yen return moving average covariance estimate, = 25

An alternative idea that has had some impact on the practice of risk and asset management
consists of extending the RiskMetrics variance estimator to covariance, i.e., the idea is that of an
15

exponential smoother applied to covariances:


+1 = (1 ) +

(12)

with (0 1).13 As already discussed, JP Morgan had originally popularized a choice = 094
which turns out to work rather well also for covariances. However, the restriction that the coecient
(1 ) on the cross product of returns and the coecient on past covariance sum to one is not

necessarily desirable. To understand this, consider the model +1 = 0 + 1 + 2 of


which (12) represents a special case. Because when means are zero, then
= [ ] we have
that

[ +1 ] = 0 + 1 [ ] + 2 [ ]
0
= 0 + 1
+ 2
=
=

1 1 2
However, because in (12), 0 = 0 and 1 + 2 = 1 we have that under the RiskMetrics model,

= 00, i.e., that long-run, unconditional covariance actually fails to exist. This implies that
there is no mean-reversion in covariance: based on the closing price today, if tomorrows covariance
is high then it will remain high, rather than revert back to its mean. Equivalently, we say that
under (12) covariance follows a non-stationary, unit root process. Figure 2 shows an example of
predicted covariance dynamics generated from a RiskMetrics model for the same data as in Figure
1. Clearly, RiskMetrics covariance reacts more to shocks to return cross-products than a rolling
window covariance estimator does.

Figure 2: S&P500 vs. USD/Yen return RiskMetrics covariance estimate ( = 094)

In fact, (12) can be easily re-written in the equally familiar format


+1 = (1 )

=0

13

As we shall explain below, we need to set to be independent of and to ensure that the covariance matrix
predicted from the model is semi-positive definite.

16

which shows that that this forecast corresponds to an exponentially weighted, infinite moving average. To see this, we just need to re-write the model in recursive fashion moving backwards in
time:
+1 = (1 ) +
= (1 ) + (1 )1 1 + 2 1
= (1 )[ + 1 1 ] + 2 (1 )2 2 + 3 2
= (1 )[ + 1 1 + 2 2 2 ] + 3 2
=
= (1 )

=0

as lim = 0 for (0 1).

The model +1 = 0 + 1 + 2 used above to work on the non-stationarity of the

RiskMetrics estimator shows already an obvious direction in which we ought to be looking at, i.e.,
extending GARCH(1,1) style models to predict covariances besides variances:
+1 = + +

(13)

where and in principle depend on the couple of assets and under examination. Similarly
to a GARCH(1,1), one needs + 1 for the process to be stationary, as

is finite if and only if + 1. However, because covariances can be negative, in this case one
does not need to restrict any of the parameters to be estimated to be positive (or non-negative).
Unfortunately, it is possible to show that unless = and = for all possible pairs 6= , even
though +1 can be anyway estimated/predicted, when one organizes such estimates/predictions
into a covariance matrix predicted at time for time + 1,
2
12+1
1+1

1+1

2+1
2+1

d [R+1 ] = 12+1
+1

.
.
..
.

..
..
..
.

2
2+1
+1

1+1

+1 is not guaranteed to be semi-positive definite (SPD), while it


unfortunately the resulting
+1 be SPD? As often in this course, the reason
should be.14 Why do we need so desperately that
is purely an economic one, not a statistical one. First, lets recall that a square symmetric

matrix A is SPD if and only if x R then x0 Ax 0 Second, when applied to our problem
+1 w 0. So far we
this definition implies that for any vector of portfolio weights w R , w0
14

Notice that instead is allow to depend on the pair 6= .

17

+1 w has a very
have only applied a mathematical definition. The beauty of this result is that w0
precise meaning to us:

+1 w 0
d
[+1
] = w0

+1 is necessary (as well as sucient) to ensure that the variance of any


i.e., the SPD nature of
portfolio is non-negative, as it should be.
To get a feeling for why we need to prevent the GARCH-type coecients to be a function of
the pair of assets to ensure a good behavior of the covariance matrix, lets consider the RiskMetrics
case, which a we know it is just a case of zero-intercept, non-stationary GARCH model. We deal
with RiskMetrics because to think about this problem with reference to one parameter only ()
delivers ready intuition with less algebra. Assume the exponential smoothing model is applied to
both variances and covariances in the case of two assets, = 2, i.e.:
+1 = (1 ) +

= 1 2

so that the dynamic model also applies to variances when = The exponential smoothing estimator of the entire conditional covariance matrix is then:
"
# "
#
2 +
11+1 12+1
(1 )1 2 + 12
(1 )1
11
+1 =
=

2 +
12+1 22+1
(1 )1 2 + 12
(1 )2
22
Assume now, by the sake of contradiction that the smoothing parameters that apply to conditional
variances and covariances are allowed to be dierent:15
2
+1 = (1 )
+

= 1 2

12+1 = (1 )1 2 + 12
with 6= (but both of them still belong to the interval (0 1)). It is relatively easy to use a
simple example to show that 6= may lead to a conditional correlation between returns on
assets 1 and 2 that fails to be in the interval [1 1]. The idea is to work by focussing on few returns
setting all other returns to zero, which may of course happen only by accident in realityrecall that
you just need one example, not a general proof. Suppose you have available a sample of + 1 paired
returns, i.e., {1 , 2 } =0 . Then the RiskMetrics processes can be re-written in exponential
smoothing form as:
+1 = (1 )
12+1 = (1 )

=0

2

+ +1
0

= 1 2

1 2 + +1
120

=0

Assume that at time 0, 0 = 120 = 0 for = 1 2 and thatjust by accidentwhile 1 =


2 = 0 for = 0 1 1 at time the returns are potentially non-zero, call them 1 and
15

Here +1 is the same as 2+1 . This derives from the fact that the covariance of a random variable with itself
is the same as the variance. Therefore, this means that is allowed to dier according to the fact that = or not.

18

2 . Then
2
+1 = (1 )

= 1 2

12+1 = (1 )1 2
and the corresponding conditional correlation is
(1 )1 2
(1 )1 2
(1 )
12+1 = q
=
=
(1 2 )
(1 )|1 ||2 |
(1 )
2 (1 )2
(1 )1

where the sign function (1 2 ) takes a value of +1 when the sign of 1 2 is positive and
-1 otherwise. At this point, note that if then
(
1 if (1 2 ) = +1
12+1
1 if (1 2 ) = 1
which is clearly inadmissible. As argued above, this shows (although it is just a very special example)
that = is sucient for 12+1 to be in [1 1] because in that case
(
+1 if (1 2 ) = +1

12+1 = (1 2 ) =
1 if (1 2 ) = 1
[1 1] map into our claim that when
which is again admissible. How does an example of 12+1
the coecients are allowed to depend on either the assets or the moments (as in this case), then
+1 may fail to be SPD so that it cannot be a covariance matrix used in financial applications?
Recall that if +1 is semi-positive definite, then det(+1 ) 0. In our example, we have
"
#
2
(1 )1 2
(1 )1
+1 =

2
(1 )1 2
(1 )2
The determinant of this matrix is simply:
2
2
(1 )2
] [(1 )1 2 (1 )1 2 ]
det(+1 ) = [(1 )1
2
2
2
2
= (1 )2 1
2
(1 )2 1
2
2
2
= 1
2
[(1 )2 (1 )2 ]

which is non-negative if and only if [(1 )2 (1 )2 ] 0 or


(1 )2
1
1 =
1 =
2
(1 )
1
Once more, should we be setting the outcome is that det(+1 ) 0 which would show
that +1 is negative definite and hence it cannot be a covariance matrix.
Therefore, going through a re-cap of the points above, only setting = and = for all
possible pairs 6= , i.e., setting the coecients of (13) to be identical across all pairs of assets and
therefore across models for conditional variance vs. covariance, one guarantees that for any portfolio
that can be formed, d
[ ] 0. Unfortunately, empirically it is unclear that the persistence
+1

19

parameters and should be the same for all variances and covariances across assets. Often the
data reject this restriction. Moreover, this business of having to set = and = for all
possible pairs 6= , has another critical implication. While in principle we would like to use data to
proceed to the estimation of (13) for each possible pair of assets, imposing constraints such as the
ones discussed above, implies that all the dynamics processes underlying the elements of should
be estimated jointly, in one single pass. If this were not enough to scare you, this means that given
assets, we should jointly estimate the parameters of ( 1)2 dierent processes, one for the
covariance of each possible pair of assets. For instance, in the case of (13), this implies the need to
estimate + 2 parameters ( dierent constant coecients plus one single and one single )
that enter ( 1)2 dierent econometric models. At this point, hoping that you are scared by
the perspective of actually implementing this task for a large , it becomes obvious that we need
to develop better, multivariate econometric models of the conditional covariance matrix.

4.1. Comparing the properties of RiskMetrics and GARCH models for covariances
This is one of our traditional stops in the flow of our arguments. What are the eects of selecting a
conditional model for covariance to be of a GARCH vs. of a RiskMetrics type? In this subsection,
we therefore compare the RiskMetrics-style model,

+1 = (1 ) +
(that we know it is easy to re-write as
+1 = (1 )
exponential smoother) and

=0 ,

where ES stands for


= + +
+1

(14)

where has been simplified not to depend on and for the purposes of this subsection. As a
starting point, we show that the GARCH model in (14) can be written in a format similar to the
ES one. We just need to re-write the GARCH in a recursive fashion moving backwards in time:

+1

= + +
= + + [ + 1 1 + 1 ]
= (1 + ) + ( + 1 1 ) + 2 1
= (1 + + 2 ) + ( + 1 1 + 2 2 2 ) + 3 2
=

X
X
=
+
=
=0

=0

+

1
=0

which is the desired expression. Clearly, this process for


simplifies to
+1
+1 = (1 )
P
=0 when = 0 = 1 and = However, because you now perfectly
understand that the persistence of a GARCH(1,1) process is measured by the sum + , it is also
clear that = 1 and = implies that + = 1+ = 1 which means that an exponentially
20

smoothed process for conditional covariance implies that covariance is integrated of order 1, i.e.,

that
+1 has infinite memory, in the sense any shock to will have an impact on future,

subsequent
+ that lasts forever ( 0).
We call
[ ] = (1 ) the unconditional covariance implied by a stationary
GARCH(1,1) model in which + 1. As usual, this is obtained as

[ ] = + [ ] + [ ] =
[ ] =

.
1

Therefore (14) can be re-written as



=
+ (
) + (
)
+1
This follows from

+1

= + +
=
(1 ) + +
=
+ (
) + (
)

as desired. This expression shows that forecasts of future covariance depend on three ingredients:
(i) the baseline forecast is represented by the unconditional covariance
which depends on all the
parameters, , , and ; (ii) the deviation of the current cross-product of asset returns from
the unconditional covariance
weighted by the coecient ; (iii) the deviation of the current
conditional covariance from the unconditional covariance
weighted by the coecient .
Interestingly, (14) can be equivalently re-written as

+1

=
+ ( + )(
) +

because (using a simple trick, i.e., adding and subtracting (


) at the right stage and

exploiting the fact that = ):



+1

=
+ (
) + (
)
=
+ (
) (
) + (
) + (
)
=
+ ( + )(
) + ( )


) +
1
=
+ ( + )(


) +
1
=
+ ( + )(


) +
1
=
+ ( + )(

At this point the -step ahead forecast for the variance can be found as:
+ ( + )( [+2 [ +1 ]]
)+
[
+ ] =
21

+1 +1
+ +2 +1
1
+1
=
+ ( + )( [
+ ( + )( +2
)]
)
=
+ ( + )2 ( [ +2 ]
)
)
=
+ ( + )2 ( [+3 [+2 ]]
=
+ ( + )3 ( [ +3 ]
)
= =
+ ( + ) (
)
where we have exploited the law of iterated expectations by which [+1 ] = [+1 [+1 ]]
and

+1 +1
+1 +1
+2 +1
1 = +2 ( +1 ) +2
1 =0
+1
+1
as +2 (+1 +1 ) = +1 and +1 is independent by construction of +1
and +1 (think of it: this is a GARCH process and the filtered GARCH covariance just depends
on past shocks).16 Similarly to what found in chapter 4, it is then rather simple to compute forecasts
of future, -step ahead covariances from [
+ ( + ) (
).
+ ] =
As for the comparison, notice that from

+1 = (1 ) +

we have that (exploiting that in a RiskMetrics model + = 1 and [


+ ] = )

+ ( + ) [+2 ( +1
)]
[
+ ] =
| {z }
=1

= [+2 ( +1 )]

= [+2 ] = [+3 ( +2 )] = [ +3 ] = = [ ] =
This means that the forecast -step ahead in an exponential smoothing model is simply the current
estimate of covariance. This is what we have alluded to early on when we have claimed that under
(12) covariance follows a non-stationary, unit root process. On the contrary, because

[

+ ] = ( + ) + [1 ( + ) ]

in the case of the GARCH(1,1) the current covariance receives a weight ( + ) which declines to
zero as under covariance stationarity, while the complement to one weight (which therefore
increases to 1 as ) is assigned to the unconditional covariance
. In fact, because the
RiskMetrics model can be obtained by setting + = 1 in that case the GARCH forecast reduces
to the exponential smoothing one.
16

Here you need to pay attention to the notation: +2 (+1 +1 ) is the time + 2 predicted
conditional covariance between +1 and +1 that we have denoted +1 . Notice that properties such as

++1 ++1
1 =0
+
++1
with have to be used repeatedly in the proof.

22

5. Dynamic Conditional Correlation (DCC) Modelling


If you listen to trading desk and asset management lingo, they will hardly talk about covariances:
instead the focus will be on correlations, besides volatility. For instance, one interesting (worrisome) phenomenon is that all correlations tend to skyrocket during market crisis (bear regimes)
and thisas well shall see in detail in chapter 7has recently attracted considerable attention.17
Even though it is obvious that given any type (e.g., GARCH) models for covariances (
+1 ) and
+1 ) separately, one can always compute the implied dynamic (prediction
variances (
+1 and
of) correlation,

+1 =

+1

+1
+1

one generation of financial econometricians has tried to actually write and estimate models for dynamic conditional correlations directly, i.e., when the model directly concerns the behavior of +1
over time. However, that task appears to be far from trivial for one obvious reason: +1 [1 1]
and so a dynamic estimator would imply a need to constrain parameter estimates to avoid that for all
times , |+1 | exceeds 1. For instance, consider a model +1 = 0 + 1 ( )( ) + 2
and try to ask yourself what kind of restrictions on 1 and 2 may keep |+1 | from exceeding 1
given that [1 +).18

A more fruitful approach is the one that leads to specify and estimate DCC models. This
approach is based on the idea that it is more appealing to model an appropriate auxiliary variable,
+1 . The DCC approach is based on a generalization to the vector/matrix case of the standard
result that +1 +1 +1 +1 = +1 +1 +1 :
+1 D+1 +1 D+1
where D+1 is a matrix of standard deviations, +1 , on the th diagonal and zero everywhere
else ( = 1 2 ..., ), and +1 is a matrix of correlations +1 and with ones on its main diagonal.
For instance, in the = 2 case:
"
# "
#"
#
#"
21+1 12+1
0
0
1+1
1
12+1
1+1
+1
=

12+1 22+1
12+1
1
0
2+1
0
2+1
The key step of the DCC approach is based on the ability to disentangle the estimation and predic +1 and the estimation and prediction of +1 , that will give
+1 . In
tion of D+1 to obtain D
particular, we proceed in two steps:
1. The volatilities of each asset are estimated/predicted through a GARCH or one of the other
methods considered in chapters 4 and 5. For instance, one can think of 2+1 = + (

)2 + 2 for = 1 2 ..., .
17

The verb skyrocket appears in quotation marks because you should recall that correlations are defined in [1 1]
and they cannot diverge to +.
18
Please also ask yourself why [1 +), i.e., why 1 is impossible.

23


, derived from the
2. Model conditional covariances of standardized returns, +1
+1
+1

first step using GARCH-type models. Here, we exploit the fact that the conditional covariance

of the +1
variables equals the conditional correlation of raw returns:
"
#

h
i

+1

[+1
+1
] =
+1 =
+1
+1
+1 +1
+1 +1
+1
=
= +1
+1 +1

However, the GARCH-type modelling will not concern directly the standardized returns, but
instead an auxiliary variable +1 to be estimated/predicted to be able to compute conditional
correlations. Typically, the most popular model used in this second DCC step is:

+1 =
+ (+1
+1

) + (
)

which is a GARCH(1,1) for the auxiliary variable, written in deviations from the unconditional,
long-run mean, usually set to be equal to the unconditional sample covariance of the data. An
alternative that has been sometimes used is of a RiskMetrics type:

+1 = (1 )+1
+1
+

with (0 1). Note that these models apply to all pairs and even when = which is used
in (15) below. How do you go now from a forecast of the auxiliary variable +1 to a forecast of
correlations? Here one uses the transformation
+1
+1 =

+1 +1

(15)

Note that (15) guarantees by construction that +1 [1 1].19 Therefore, in the GARCH(1,1)
case,
= (1 ). Interestingly, only the intercept parameter is allowed to dier across
dierent pairs of assets: also in this case, and are common across dierent pairs of assets;
the same occurs for the single parameter in the RiskMetrics-type model. This implies that the
persistence of the correlation between any two assets in the portfolio is the same, which is obviously
unrealistic.20 With reference to same two series of S&P 500 returns and the USD/yen exchange rate
log-changes featured in Figures 1 and 2, Figures 3 and 4 show the predicted, one-step correlations
19

Other transformations have been explored in the literature, for instance the Fisher transformation of the correlation coecient: +1 = [exp(2 +1 ) 1][exp(2 +1 ) + 1], where +1 can be defined as any GARCH model
1
2
using +1
+1
as innovations. This model is easy to implement because the positive definiteness of the conditional
correlation matrix is guaranteed by the Fisher transformation. However, it is only a bivariate model and it is not
clear how to generalize it to any 2.
20
In any event, because is allowed to dier across pairs of assets, the fact that and or must be common
across pairs of assets, does not imply that the level of the correlations at any time is the same across pairs of assets.

24

estimated from a DCC RiskMetrics and a DCC GARCH(1,1) model, respectively.

Figure 3: S&P500 vs. USD/Yen return DCC/RiskMetrics covariance estimate ( = 094)

Figure 4: S&P500 vs. USD/Yen return DCC/GARCH(1,1) covariance estimate

Clearly, the two figures are quite dierent, even though the general dynamics of the correlation
forecasts is qualitatively similar.
The RiskMetrics and GARCH/DCC models can also be written in matrix form as
Q+1 = (1 )z z0 + Q
Q+1 = + z z0 + Q
1
2
]0 , Q
where z [+1
+1
... +1
+1 is a symmetric matrix that collects the values/predictions

of the auxiliary variables +1

Q+1

11+1
12+1
..
.
1+1

12+1 1+1
22+1 2+1
..
..
..
.
.
.
2+1 +1

25

for = 1 2..., and is a symmetric SPD matrix.21 Q+1 is SPD because it is a weighted
average of positive semi-definite and positive definite matrices. This will in turn ensure that the
correlation matrix +1 and the covariance matrix, +1 , will be positive semi-definite. One often
used variation of the DCC GARCH model features a covariance targeting variant:
Q+1 = (1 )[z z0 ] + z z0 + Q
which guarantees that the unconditional correlation will be identical to the sample unconditional
correlation of the data.
DCC models are enjoying a massive popularity because they are easy to implement in 3 steps:
1. All the individual variances are estimated one by one;
2. The returns are standardized and the unconditional correlation matrix is estimated;
3. The correlation persistence parameters and are estimated.22
One point that is worthwhile to make (see Bauwens et al., 2006, for additional details, as well
as Section 6.2 that follows) is that the class of models that is based on non-linear combinations
of univariate GARCH models in which one can specify separately, on the one hand, the individual
conditional variances, and on the other hand, the conditional correlation matrix (this is called a
hierarchical specification strategy) is not composed of DCC models only. For instance, Tse and Tsui
(2002) specify their multivariate DCC as +1 D+1 +1 D+1 where +1 follows a GARCH-like
process but it is not necessarily a time-varying correlation matrix and has a factor structure. In
practice, in this case the DCC formulates the conditional correlation directly as a weighted sum of
past correlations.
DCC models are estimated by QMLE by construction: because the model is implemented in
three dierent steps, even though in each of these stages a log-likelihood function is written and
estimated, the overall outcome only represents a QMLE. In each of the stages, only few parameters are estimated simultaneously using numerical optimization. This feature makes DCC models
extremely tractable for risk management of large portfolios: in the first step, one only estimates
univariate GARCH-type models; the resulting GARCH inferences are then used to compute the

; in the final step, one only estimates two parameters


standardized returns, +1
+1
+1

and that apply to all pairs of assets. For instance, in the case of = 2, in the first stage one
solves (in the simple GARCH(1,1) case)
(
)

2
X

1X

2
max
log + 1
+ 21
log 2

2
, ,
2
2 =1
2 =1 + 1
+ 21
21

Note that +1 is in general dierent from 2+1 obtained in the first step, = 1 2 ..., This is important and
it represents the logical sacrefice on which the two-step DCC estimation is based: the use of first-step predictions for
variances that potentially dier from those that are then used to go from predictions of the +1 to predictions of
correlations constrained to be in [1 1].
22
This last claim and the fact that has not been mentioned assumes that covariance targeting has been applied
as we shall see in our Matlab session.

26

where 20 is initialized at
20 =


+1
In the second step, given the pair of time series +1
+1 and +1 +1 +1 one solves

1 )2 + ( 2 )2 2
1
2

1 X (1
1X
121 1 1
1
2

max log 2
ln(1 121 )
,
2
2 =1
2 =1
(1 2121 )

where 121 = 12+1 [ 11+1 22+1 ] and +1 follows either a GARCH or a RiskMetrics
model, involving the parameters and in the former case (under covariance targeting), and in
the latter. Notice that the variables that enter the log-likelihood are the standardized returns, ,
and not the original raw returns themselves. We are essentially treating the standardized returns as
actual observations here and this is an alternative way to appreciate the loss of estimation eciency
that QMLE implies in this application. In theory, we could obtain more precise estimators by
estimating all the volatility models and the correlation model simultaneously, which would yield a
ML estimator. In practice, this is not feasible for large portfolios, i.e. for cases in which exceeds
3 or 4 dierent assets.

5.1. One detailed bivariate example


Consider the GARCH(1,1) version of the dynamic conditional correlation model

+1

= [ ] + ( [ ]) + (
[ ])
()

()

+1

q +1
()
()
+1 +1

for the case of = 2 We show that the in the GARCH case, the covariance matrix of returns can
be written as:

= D

+1
+1
+1+1 D+1
q
D
= (Q
)

= (D
)1 Q
(D
)1
+1
+1
+1
+1
+1
+1
"
#

11+1
12+1
Q

= [z z0 ](1 ) + z z0 + Q
+1


12+1
22+1

where z [1 2 ]0 is the vector that collects standardized residuals from a GARCH(1,1) model
= = 1 2.23 The point simply consists in
and under the special assumption that +1
+1

the patient verification of a few matrix expressions. Lets work backwards:


"
#
"
#!

2
11+1
1 2
1
12+1

Q+1

= (1 )
+

2
12+1
1 2
2
22+1
23

Notice that in this case the same matrix +1


appears in the definitions of
and of
In
+1
+1
general this does not need to be the case, i.e., as stated in the subsection title, this represents a special example.

27

+
"

"

2
1
1 2
2
1 2
2

"


11
12

12
22

2 ) + 2 +

(1--)(1 2 ) + 1 2 + 12
(1 )(1
1
11

2 ) + 2 +
(1--)(1 2 ) + 1 2 + 12
(1 )(2
2
22

which is a matrix collecting standard GARCH dynamic models for the +1 as defined above and
in which

) = (1 )(1 2 ) + (1 2 ) + (12
) = (1 2 )
(12+1

is the correlation matrix implied by the process:


Next, we need to show that
+1

+1

= (D
)1 Q
(D
)1
+1
+1
+1

"

=
=

"

2 ) + 2 +
(1 )(1
1
11

(1--)(1 2 ) + 1 2 + 12
1

2 )+ 2 +
(1)(1
1
11

2 )+ 2 +
(1)(1
1
11

2 )+ 2 +
2
2
[(1--)(1
11 ][(1--)(2 )+2 +22 ]
1

12+1

12+1
1

(1--)(1 2 ) + 1 2 + 12
2 ) + 2 +
(1 )(2
2
22

2 )+ 2 +
2
2
[(1--)(1
11 ][(1--)(1 )+1 +11 ]
1

(1)(1 2 )+1 2 +12

2 )+ 2 +
(1)(2
2
22

2 )+ 2 +
(1)(2
2
22

2 )+ 2 +
(1)(1
1
11

(1)(1 2 )+1 2 +12

2 )+ 2 +
2
2
[(1--)(1
11 ][(1--)(2 )+2 +22 ]
1
2 )+ 2 +
(1)(2
2
22

2 )+ 2 +
2
2
[(1--)(2
22 ][(1--)(2 )+2 +22 ]
2

Finally, if we re-assemble everything, we have:

+1

= D

D
+1
+1
+1
"

q
# q

11
0

0
1
12+1
11

q
q
=

12+1
1
0
0
22
22

q
q
q

11
11

0
12+1
11

q
q
= q

22
12+1
22
22
0
q
q


11
11
12+1
22

q
= q

11
22
12+1
22
"
#

11+1
12+1
=

12+1
22+1

Just one final remark: in this case we have assumed that and are taken to characterize the
= or, in matrix
dynamic process for Q
as a whole under the assumption of +1
+1
+1

28

form, that D
= F
in the more general structure:
+1
+1

= D

D
+1
+1
+1
+1
q
F
=
(Q
)
+1
+1

= (F
)1 Q
(F
)1
+1
+1
+1
+1

Q
= [z z0 ](1 ) + z z0 + Q

+1

As you know from the lectures this need not be the case and in applied work it is typical to first
estimate some GARCH models (it does not need to be GARCH(1,1), as it could be N(A)GARCH,
GJR, etc.)

for volatilities and then use the corresponding standardized residuals to estimate

Q
This implies additional complexity but the advantage is that the and that char+1
acterize the dynamic process for correlations needs not to be the same as the one for volatility.

5.2. Asymmetric (GJR-type) DCC models


So far we have considered only symmetric correlation models where the eect of two positive shocks
(i.e., standardized returns) is the same as the eect of two negative shocks of the same magnitude.
But, just as we modeled the asymmetry in volatility (the leverage eect) in the univariate case,
we may want to allow for a down-market eect in correlations. This can be achieved using an
asymmetric DCC model where, for instance (also imposing correlation targeting):
z0 ] + z z0 + Q + (g g0 [g
g0 ])
Q+1 = (1 )[z
where hatted expectations will be estimated using sample moments, for instance

X
z0 ] = 1
[z
z z0
=1

and the vectors g are defined as the negative part of z as follows:


(
if 0
, = 1

0 if 0
g0 ]) captures a leverage eect in correlations: When 0 then the
The term (g g0 [g
correlation for asset and will increase more when and are negative than in any other case.
This captures a phenomenon often observed in markets for risky assets: Their correlation increases
more in down markets ( and both negative) than in up markets ( and both positive).
6. Multivariate GARCH Models
In our introduction we have already emphasized that a full extension and generalization of simple,
univariate GARCH methods to the multivariate case presents many issues and problems related to

29

the large scale of the resulting models and their tendency to be over-parameterized. In this Section
we take this task seriously and attempt to generalize the simple set-up of the first part of the course,
+1 = +1 +1

+1 IID N (0 1)

to the case in which returns on assets collected in R+1 , are described by


12

R+1 = +1 z+1

z+1 IID N (0 I )

(16)
12

where I is a identity matrix, and (similarly to chapter 7), +1 is the square-root, or


Cholesky decomposition, of the covariance matrix, such that24
12

12

+1 (+1 )0 = +1 [R+1 |= ]
Even though in (16) we have specified z+1 IID N (0 I ), in certain situations it is desirable to
search for a better distribution for the innovation process, z+1 . A natural alternative to the
multivariate Gaussian density is the multivariate Student density, of which skewed versions exist.
12

Moreover, note that +1 is in no way the matrix of square roots of the elements of the full
covariance matrix +1 (if so, how would you deal with potentially negative covariances?).25 Our
problem is then to write and estimate appropriate dynamic time series models for +1 knowing
that this matrix contains 05 ( + 1) distinct elements (because of symmetry these are less than
2 ), which implies that in principle one would have to write and estimate dynamic models for each
of these elements. However, as already discussed in Sections 4 and 5, constructing positive semidefinite (PSD) covariance matrix forecasts, which ensures that the portfolio variance is always nonnegative, remains dicult. Appropriate structure needs to be imposed to guarantee the PSDness
+1 Here one thing needs to be appreciated: although much theoretical
of the resulting forecast
(econometrics) literature has focussed on relatively small multivariate cases of (16), for instance
with = 2 or 3, practioners need us to develop methods that apply to any value of the crosssectional dimension including limit cases of being large. In this respectpossibly with an
exception of the diagonal BEKK model presented in Section 6.3 belowDCC remains the best
option available. Therefore the models that are presented in the following are rather interesting on
paper and for small-scale applications (up to = 4 or 5) but rapidly become unwieldy or even
impossible to estimate for realistic applications with hundreds of assets or securities to be modelled
simultaneously.
This point is easily understood through the case of the straightforward, plain vanilla -dimensional
generalization of a GARCH(1,1) in VEC(H) form:
(+1 ) = (C) + A(R R0 ) + B( )
12

24

In this section, to make the distinction starker, we denote as +1 the Choleski factor of +1 also for analogy
12
with the factors +1 that will appear in chapter 7.
25

12

In fact, +1 is a lower triangular matrix appropriately defined according to an algorithm that is implemented in
most software packages (sure enough, in Matlab). Section 10.1 of chapter 7 shows one example for the = 2 case.

30

where () (vector half) is the operator that converts the unique upper triangular elements of
a symmetric matrix into a 05 ( + 1) 1 column vector. For instance

"

21+1
12+1

12+1
22+1

#!

21+1

= 12+1
22+1

In this general VEC model, each element of +1 is a linear function of the lagged squared
errors and cross-products of errors and lagged values of the elements of +1 . Note that while the
() of a symmetric matrix would simply be a 2 1 vector, the () is instead a

smaller, 05 ( + 1) 1 vector.26 In the vech-GARCH(1,1) model above, A and B are [05 ( +

1)][05 ( +1)] square matrices while C is a symmetric matrix. In this vech-GARCH(1,1)


framework, each element of may aect each element of +1 , and similarly for the outer product
of past returns, R R0 (note that this is a matrix because R is an -dimensional vector).

However, the structure of C, A and B gives a total of27

05( + 1) + 2[05 ( + 1)]2 = 05 ( + 1)[ 2 + + 1]


= 05 4 + 05 3 + 05 2 + 05 3 + 05 2 + 05
= 05 4 + 3 + 2 + 05 = ( 4 )
parameters to be estimated. For instance, for = 100, which represents hardly a large portfolio
or risk management problem, then the vech-GARCH(1,1) model has 51,010,050 parameters to be
estimated. If you need to have at least 20 observations available, with = 100 assets this means
20 51 010 050100 = 10 202 010 observations per series, or a daily history of more than 40,484
years per series. This is clearly not feasible.28 More generally, vech-GARCH models that naively

generalize the GARCH models of chapter 4 to the multivariate case, tend to generate a serious
curse-of-dimensionality problem, as estimating this many free parameters is obviously infeasible,
both in terms of data availability and in numerical terms (try and propose your Matlab to estimate
51 million parameters and then you will see you may take a 2,000-year vacation as well).29
Moreover, this is not even the end of the bad news: these ( 4 ) parameters, need to be restricted
for them to yield forecasts of the covariance matrix that are eventually SPD, as required. Such a
restrictions are even too complex and involved to be presented here (see Gourieroux, 1997, section
26

() denotes the operator that stacks the lower triangular portion of a matrix as a 05( + 1) 1
vector.
27
In what follows, as you may recall from your math classes, the notation ( 4 ) indicates that the quantity under
examination grows at the same speed as 4 .
28
This is the sense in which a textbook example with = 3 i.e., 78 parameters to be estimated based on, say,
2,600 observations per series, i.e., approximately 10 years of data is not that indicative of the feasibility of this model
in practice.
29
The fact over-parameterization represents the key obstacle in the generalization of GARCH to the multivariate
case also explains why in what follows we entertain at most the (1,1) case. Of course, higher order GARCH is
technically feasible but almost always unfeasible.

31

6.1).30
As you know, one often invoked trick to deal with the curse of dimensionality in GARCH, and
also to make sure that the implied unconditional moments turn out to be consistent with what the
model implies, consists of the so-called (co)variance targeting. As already mentioned in Section 5,
the intuition is that the model-implied unconditional covariance matrix is constrained to equal a
pre-calculated estimate from the simple sample covariance matrix by setting:

1X
0
(C ) = (I05(+1) A B)
R R
=1

(17)

Because by analogy to the univariate case, the unconditional, long-run covariance matrix from a
vech-GARCH model is
= (I05(+1) A B)1 (C)
()
setting (C) in the way reported above, gives
) =
(

1X
0
R R

=1

where VT stands for variance targeting and the result is the desired one. This trick avoids
cumbersome nonlinear estimation of (C) and is also useful in a forecasting perspective to
avoid that small perturbations in any of the elements of the matrices A and B may result in
large changes in implied unconditional variances and covariances. However, even though setting
(C) as in (17), does reduce the number of estimable parameters by 05 ( + 1) the residual
number 2[05 ( + 1)]2 remains ( 4 ) which means that there are still too many parameters to
be estimated simultaneously in A and B when is large. As a result, further ideas have been
explored in the literature, besides covariance targeting.

6.1. Diagonal and Scalar multivariate GARCH models


One idea that has emerged early on (in the early 1990s) in this literature is that adequate restrictions
on A and B would deliver a sensible reduction in the number of estimable parameters. One such
possibility is oered by a diagonal multivariate GARCH( ), that we state in the general ( )
form to emphasize that GARCH models may in principle be defined for cases more complex than
the standard (1,1) framework, but also incorporating already covariance targeting:

1X

X
X
A
B
R R0 +
(+1 ) = I05(+1)
=1
=1
=1
X
X
+
A (R+1 R0+1 ) +
B (+1 )
=1

=1

30

For instance, to avoid estimating C, A and B jointly, Ledoit et al. (2003) estimate each variance and covariance
equation separately. The resulting estimates do not necessarily guarantee positive semi-definite +1 . Therefore, in
a second step, the estimates are transformed in order to achieve the requirement, keeping the disruptive eects as
small as possible. The transformed estimates are still consistent with respect to the parameters of the diagonal VEC
GARCH model.

32

where all the [05 ( + 1)][05 ( + 1)] matrices {A }=1 and {B }=1 are diagonal matrices,
in the sense that all of their o-diagonal elements equal zero. However, although always useful
because compact, in the case of diagonal M-GARCH, one does not really need vector and matrices
to express the process. It is easy to see that each element of covariance matrix follows a simple
dynamics:
+1 =

X
1

=1

=1

=1

+1

1X
X
+
+1 +1 +
=1
=1

This expression shows that conditional variances depend only on own lags and own lagged squared
returns, and conditional covariances depend only on own lags and own lagged cross products of
returns. Even the diagonal GARCH framework, however, results in ( 2 ) parameters to be jointly
estimated, which is computationally infeasible with large to medium ; in fact, the number of
parameters is
05 ( + 1) + 05 ( + 1) = 05( + ) ( + 1).
We also know of another issue that is likely to show up in this case: because the coecients
and are not restricted to be the same across dierent assets and pairs of assets, constraints will
have to be imposed to keep the resulting +1 that collects the forecasts +1 for = 1 2
PSD. In spite of the reduction of the number of parameters, such constraints may represent a
considerable drag on the estimation speed and ease.
An even more drastic simplification, that we have in fact already examined before, is represented
instead by a scalar GARCH( ):
+1 =

X
1

=1

=1

=1

+1

1X
X
+
+1 +1 +
=1
=1

which means that ARCH and GARCH coecients reduce to real scalar parameters common across
assets. In matrix format, the model becomes:

X
(+1 ) = 1

=1

=1

=1

(+1 )

1X
R R0
=1

=1

(R+1 R0+1 ) +

As we know from Section 4, these strong restrictions do ensure that the resulting covariance matrix
is SPD because all coecients are restricted to be the same across dierent pairs and . Moreover,
the parametric simplification is obvious as the number of parameters now simply becomes + ,
which is in fact independent of . However, one is left to wonder about the exact meaning of a
model in which the speed of mean reversion is restricted to be common across dierent assets or
portfolios.
33

6.2. Constant Conditional Correlation (CCC) GARCH(p,q)


In a way, you are already very familiar with CCC models from Section 5: if you consider a

DCC model and you impose that the correlation matrix in the DCC representation,
+1

D+1 +1 D+1 , is constant, so that


+1 D+1 D+1 you obtain a CCC in whichindeedthe

first letter of the acronym means constant. Of course, tha assumption of constant correlations
over time is unrealistic. However, it simply avoids all the business of defining and modelling with
GARCH-type processes the +1 auxiliary variable. More generally, CCC and DCC models represents famous but special examples from a more general family that entertains non-linear combinations of univariate GARCH models and allows for models where one can specify separately, on the
one hand, the individual conditional variances, and on the other hand, the conditional correlation
matrix or another measure of dependence between the individual series (like a copula of the conditional joint density).31 For models in this category, theoretical results on stationarity, ergodicity
and moments may not be so straightforward to obtain as for models presented elsewhere in this
Section. Nevertheless, they are less greedy in terms of number of estimated parameters than the
models analyzed above and therefore they have been more successful in practice.
Analogously to the DCC case, a CCC model is based on a generalization to the vector/matrix
case of the standard result that when correlations are constant, time-varying covariances may only
derive from time variation in volatilities, +1 +1 +1 = +1 +1 :
+1 D+1 D+1
where D+1 is a matrix of standard deviations, +1 , on the th diagonal and zero everywhere
else ( = 1 2 ..., ), and is a constant matrix of correlations , with ones on its main diagonal.
For instance, in the = 2 case:
"
# "
#"
#
#"
21+1 12+1
0
0
1+1
1 12
1+1
+1
=

12+1 22+1
12 1
0
2+1
0
2+1
The key step of the DCC approach is based on the ability to disentangle the estimation and predic +1 and the estimation of , that will give .
In particular, also in the
tion of D+1 to obtain D
case of a DCC, we proceed in two steps:
1. The volatilities of each asset are estimated/predicted through a GARCH or one of the other
methods considered in chapters 4 and 5. For instance, one can think of 2+1 = + (

)2 + 2 for = 1 2 ..., .
31

The so-called copula-GARCH approach makes use of the theorem due to Sklar stating that any -dimensional
joint distribution function may be decomposed into its marginal distributions, and a copula function that completely
describes the dependence between the variables. These models are specified by GARCH equations for the conditional
variances (possibly with each variance depending on the lag of the other variances and of the other shocks), marginal
distributions for each series (e.g. t-distributions) and a conditional copula function. The copula function may be
time-varying through its parameters, which can be functions of past data. In this respect, like the DCC model of
Engle (2002), copula-GARCH models can be estimated using a two-step QML approach.

34

2. Estimate constant correlations using a simple sample estimator based on the standardized


+1
+1 , derived from the first step using GARCH-type models. Here, we
returns, +1

exploit the fact that the conditional covariance of the +1


variables equals the conditional

correlation of raw returns:

1X

=1 +1 +1

to estimate the constant correlation


Such constant correlations are then inserted inside
matrix.

6.3. BEKK GARCH


Given the picture provided above and the fact that DCC is a model popularized around the turn
of the millenium, one may ask what was the state of multivariate GARCH modelling in practice
before DCC became as popular as it is today. Apart from the uncomfortable case of CCC models
that assume constant correlations over time, during the 1990s one of the most popular multivariate
GARCH models had been Engle and Kroners (1995) BEKK GARCH( ):32
+1 = CC0 +

=1

A (R+1 R0+1 )A0 +

=1

B +1 B0

where the matrices {A }=1 and {B }=1 are non-negative and symmetric. This special productsandwich form that is used to write the BEKK ensures the PSD property without imposing further
restrictions, which represent the key reason for the success of BEKK models. In fact, this full matrix
BEKK is easier to estimate than vech-GARCH models, even though it remains rather complex
to handle. In practice, the popular form of BEKK that many empirical analysts have come to
appreciate is a simpler (1,1) diagonal BEKK that restricts the matrices A and B to be diagonal
matrices. BEKK models possess three attractive properties:
1. A BEKK is a truncated, low-dimensional application of a theorem by which all non-negative,
symmetric matrices (say, M) can be decomposed (for instance) as
"
# 2 "
#
X m0 m1 m0 m2
11 12
1
1
M=
=
0 m
0 m
21 22
m
m
2 1
2 2
=1
for appropriately selected vectors m . In a sense, mathematically it is no surprise that
BEKK models often oer a good fit to the dynamics of variance.
2. As already mentioned, it easily ensures PSDness of the covariance matrix.
3. BEKK is invariant to linear combinations: e.g., if R+1 follows a BEKK GARCH( ), then
any portfolio formed from the securities or assets in R+1 will also follow a BEKK.
32

In case you wonder, BEKK means Baba-Engle-Kraft-Kroner and the acronym simply compacts the name of
the four econometricians who contributed to its development.

35

However, the number of parameters in BEKK remains rather large:


05 ( + 1) + 05 ( + 1) + 05 ( + 1) = 05 ( + 1)[1 + + ] = ( 2 )
Often, this has still made DCC models preferrable in practice. However, the number of parameters
in BEKK is substantially inferior to those appearing in a full VEC specification. This happens
because the parameters governing the dynamics of the covariance equation in BEKK models are
the products of the corresponding parameters of the two corresponding variance equations in the
same model.
The second and third properties of BEKK models can only be appreciated contrasting the
features of BEKK under linear aggregation with the properties of alternative multivariate GARCH
models, for instance even a simple diagonal vech ARCH. Not all multivariate GARCH models are
invariant with respect to linear transformations.33 For instance, for the case of two asset return series
( = 2), consider as simple diagonal multivariate ARCH(1) model obtained from a simplification
of the diagonal GARCH( ) introduced early on:

1
0
( ) = (I3 A)
R R + A R1 R01

(18)

=1

where the helpful variance targeting restriction has already been imposed and A is a diagonal matrix.
Because we have set = 2, will be a 22 matrix, A is a 33 diagonal matrix, R is 21 vector of
P
asset returns, ( ) is a 3 1 vector of unique elements from , ( 1 =1 R R0 ) is a 3 1

P
vector of unique elements from the sum of cross-product matrices 1 =1 R R0 , R1 R01
is a 3 1 vector of unique elements from the lagged cross-product matrix R1 R01 . The number
of coecients to be estimated is of course 3, 11 , 22 , and 33 in the representation:


P
2
0
11
1 =1 1
1 0 0
11 0
P


12 = 0 1 0 0 22 0 1 =1 1 2
P
2
22
1 =1 2
0
0 33
0 0 1

P
2
2 + 11 2
11
(1 11 ) 1 =1 1
0
11 0
11
P

+ 0 22 0 11 21 = (1 22 ) 1 =1 1 2 + 22 11 21
P
2
2 + 33 2
21
(1 33 ) 1 =1 2
0
0 33
21

As for the conditions that guarantee that 11 0 and 22 0 at all times, i.e., that ensure
PSDness of the model, clearly
(1 11 ) 1
(1 33 ) 1

X
=1

X
=1

2
2
1
+ 11 11
0 if and only if 11 (0 1)
2
2
2
+ 33 21
0 if and only if 33 (0 1)

+1 =
By invariance of a model, we mean that it stays in the same class if a linear transformation is applied to R
+1 corresponds to new assets (portfolios combining the original
FR+1 , where F is a square matrix of constants and R
assets). It seems sensible that a model should be invariant, otherwise the question arises which basic assets should be
modelled.
33

36

At this point the filtered (predicted) correlation coecient has expression


22 + 22 11 21
q
12 = q

2
2
11
11
+ 11 33 + 33 21

and, as it is obvious, 12 should belong to [1 1] 1. Here we have shortened the notation setP
P
P
2 , 33 (133 ) 1
2
22
22
1
ting 11 (111 ) 1 =1 1
=1 2 , and (1 )
=1 1 2 .
Focussing on the upper bound of the interval this means that

2
2
(22 + 22 11 21 )2 (11 + 11 11
)(33 + 33 21
)

or
2
2
2
2
2
2
(22 )2 +(22 )2 11
21
+222 22 11 21 11 33 +33 11 11
+11 33 21
+11 33 11
21

which is equivalent to
2
2
2
2
[11 33 (22 )2 ]11
21
+ [11 33 (22 )2 ] + 33 11 11
+ 11 33 21
222 22 11 21 0

which cannot hold for a continuous distribution for the asset return series as, even constraining
[11 33 (22 )2 ] 0 and [11 33 (22 )2 ] 0,34
2
2
33 11 11
+ 11 33 21
222 22 11 21 0

in general does not hold for 22 6= 0 However, notice that if one sets 22 = 0 then the previous
inequality simplifies to
"
#"
# "
#2

X
X
X
2
2
2
2
+
21
+
1
2
1 2
11 33 11
(1 33 ) 1
1
(1 11 ) 1

=1

=1

=1

2
2
+ 33 11 11
+ 11 33 21
0

which has a chance to hold if 11 and 33 are such that


"
#"
# "
#2

X
X
X
11
1
2
33
1
2
1
1 (1 )
2
1 2
(1 )
=1

=1

=1

which also means that

12

P
1 =1 1 2

12
q
=
=q
1
P
P

11
22
2
2
(1 11 ) 1 =1 1
(1 33 ) 1 =1 2

the unconditional correlation implied by the data and the diagonal bivariate ARCH(1) process is
well-behaved. Therefore, if 11 (0 1) and 33 (0 1), then 22 = 0 (and some other restriction on
11 and 33 ) must be imposed. This means that it is impossible to model the dynamics of volalities
34

At all times here really means for all possibile realizations of the continuous bivariate vector R which as
domain [1 +) [1 +), which alludes to the fact that even under limited responsibility, in finance asset
returns may in principle take very large values

37

and covariances simultaneously while satisying the positivity requirement for the volatilities and
keeping semi-positive definite at all times. Equivalently, if one wants to impose that the diagonal
vech ARCH(1) model delivers a filtered covariance matrix that is semi-positive definite at all
times, the diagonal model itself must be turned into a constant covariance multivariate ARCH
P
model, as you understand that 22 = 0 implies 12 = 1 =1 1 2 =
12 so that

12
q
12 = q
2
2
11 + 11 11
33 + 33 21

and dynamics in conditional correlations will exclusively come from dynamics in volatilities.35
Lets now examine the issues concerning the fact that while BEKK is closed under linear
aggregation, a simpler diagonal vech-GARCH model is not. Consider a portfolio of the two assets,
with weights and (1). We show that in spite of the fact that R1 is characterized by a diagonal
bivariate ARCH(1), the portfolio returns = 1 + (1 )2 has a variance process
P

1 [ ] that fails to display the typical diagonal form, i.e., (1 ) 1 =1 + (1


)2 .
Note first that

1 [ ] = 1 [1 + (1 )2 ]
= 2 11 + (1 )2 22 + 2(1 ) 12
= 2 (1 11 ) 1

X
=1

2
2
1
+ 2 11 11
+ (1 )2 (1 33 ) 1

2
+ 2(1 )(1 22 ) 1
+(1 )2 33 21

which cannot be written in diagonal form, (1 )

X
=1

2
2
+

=1

1 2 + 2(1 )22 11 21

P
1

=1 [1 + (1 )2 ] + [1 + (1

)2 ]2 because for no definition of it is possibile to show that

2 (1 ) 1
2

11

= (1 )

=1

2
1
+ (1 )2 (1 ) 1

X
=1

2
1

33

+ (1 ) (1 )

X
=1

2
2
+ 2(1 )(1 ) 1

X
=1

2
2

22

+ 2(1 )(1 )

1 2 =

=1

1 2

=1

35

In case you are curious, notice that the heuristic proof above is in itself sucient to derive that 22 = 0 from
11 (0 1) and 33 (0 1) and that you do not need to deal with the lower bound of the filtered correlation coecient
derived from 11 , 22 , and 33 . Just for completess, let us also consider that case of imposing that 0 12 1
1. This lower bound means that

2
2
)(33 + 33 21
)
(22 + 22 11 21 ) (11 + 11 11
or

2
2
)(33 + 33 21
)
(22 + 22 11 21 )2 (11 + 11 11

2
2
2
2
2
2
21
+ 222 22 11 21 11 33 + 33 11 11
+ 11 33 21
+ 11 33 11
21

(22 )2 + (22 )2 11

which is equivalent to
2
2
2
2
21
+ [11 33 (22 )2 ] + 33 11 11
+ 11 33 21
222 22 11 21 0
[11 33 (22 )2 ]11

which is the same condition used above.

38

and especially that


2
2
2
2
2 11 11
+ (1 )2 33 21
+ 2(1 )22 11 21 = 2 1
+ (1 )2 2
+

+2(1 ) 11 21
This means that the Diagonal multivariate ARCH model fails to be invariant to linear combinations:
if you start with assets that follow a Diagonal multivariate ARCH model, the resulting portfolio
of assets will fail to follow a similar Diagonal model, which is of course problematic if not confusing.
As you should be reading in the paper by Bauwens et al. (2006), the problem of (18) that causes it
to fail the invariance property is very simple to visualize: while in

( ) = (I3 A) 1
R R0 + A R1 R01
=1

A is diagonal, can be written as [ 1 ]R = w0 R and 1 [ ] = w0 w implies the need

to use a vector of coecients w0 A which is no longer a diagonal matrix (of course, it is not even a
matrix).
It is also easy to see what you need to do in order for the invariance property to obtain: if you
set 11 = 22 = 33 , then when = 11
2

(1 )

2
1

=1

= 2 (1 11 ) 1

+ (1 ) (1 )

2
2

=1

2
1
+ (1 )2 (1 33 ) 1

=1
2 11

X
=1

+ 2(1 )(1 )

1 2 =

=1

2
2
+ 2(1 )(1 22 ) 1

2
2
11
+ (1 )2 33 21
+ 2(1 )22 11 21

1 2

=1

2
2
= 2 1
+ (1 )2 2
+ 2(1 ) 11 21

will trivially hold. But this means that the only way for a Diagonal multivariate ARCH to possess
the invariance property is for it to actually be a Scalar multivariate ARCH, in which the same
ARCH coecient applies to all conditional equations.
It remains natural to ask why and when researchers and practitioners alike should bother with
complex and over-parameterized models of the multivariate GARCH type. On the one hand, this
is no longer a timely question because we know that CCC and DCC models have been enjoying
growing popularity also because they may be easily implemented in practice. On the other hand,
there is another interesting reason: multivariate GARCH models speak to the heart of finance

theory. To see one example of this feature, consider the case of an investor that maximizes [+1
]

], with a trade-o coecient , similarly to what we have seen in a few of


and minimizes [+1

our Matlab workouts:

(w ) = [+1
]

1
1

] = w0 [R+1 ] w 0 [R+1 ]w
[+1

where w represents the vector of portfolio weights held by the investor and (w ) is an index of the
satisfactor (happyness) of this investor. As you have seen in other courses and we have used in the
39

our lab sessions a few times, the optimal portfolio weights (i.e., the demand function of securities
by the investor) is:
w
=

1
1
[R+1 ]
{ [R+1 ]}1 [R+1 ] = 1

+1

At this point, equating demand to supply (say, a given w


), we have
1 1
= [R+1 ] = +1 w

[R+1 ] = w
+1
which represents the mean-variance equilibrium vector of expected returns. At this point, if R+1
follows (say) a BEKK GARCH(1,1) model and pricing errors have a multivariate IID distribution
(not necessarily normal), we obtain that:
12

12

R+1 = [R+1 ] + +1 z+1 = +1 w


+ +1 z+1


12
= CC0 + A(R R0 )A0 + B B0 w
z+1
+ CC0 + A(R R0 )A0 + B B0

At this point, a test of this simple asset pricing model is whether such a regression model may
provide a high R-square thus explaining most of the variation of the assets included in R+1 .
6.4. Leverage Eects in multivariate GARCH
The ideaespecially befitting to stock returnsthat negative shocks may have a larger impact on
their volatility than positive shocks of the same absolute value already discussed in chapter 4 (and
most often interpreted as a leverage eect) can be easily extended to multivariate models: both
variances and covariances may react dierently to a positive than to a negative shock. A useful
and rather general model that takes explicitly the sign of the errors into account is the asymmetric
dynamic covariance (ADC) model of Kroner and Ng (1998):
+1 = +1

p
+1 +1 + +1 +1 6=

2+1 = +1

+1 = QQ0 + A(R R0 )A0 + D(v v0 )D0 + B B0


v max[0 R ]
where +1 comes from a DCC-type estimate of +1 .
6.5. Factor GARCH models
In Section 3 we have investigated how factor models may be used to estimate and forecast correlations and in that when an exposure mapping approach is used, passive and active approaches to
risk management become perfectly equivalent. The idea that factor models may greatly simplify the
forecasting of conditional second moments may considerably generalized to the case of multivariate
variance and covariance forecasting. The diculty when estimating a VEC or even a BEKK model
is the high number of unknown parameters, even after imposing several restrictions. It is thus not
surprising that these models are rarely used when the number of series is larger than 3 or 4. Factor
40

and orthogonal models circumvent this diculty by imposing a common dynamic structure on all
the elements of +1 . However, we shall see that doing that within a multivariate framework is
not much dierent from building and estimating special, constrained BEKK models. Suppose that
the 1 vector of returns R+1 has a factor structure with 2 factors given by the 2 1 vector

f+1 [+1 Infl+1 ]0 (these are industrial production and CPI inflation) and time invariant factor
loadings given by the 2 matrix B:
R+1 = Bf +1 + +1 .
Although we consider the special case of two factors only, this example can be generalized to the
case of any 2 factors, although the algebra becomes much more involved and challenging.
Also assume that the idiosyncratic shocks in the vector +1 have conditional covariance matrix
[+1 ] = [+1 0+1 ] = which is constant in time and semi-positive definite, and that the
common factors are characterized by [f+1 ] = 0, [f+1 0+1 ] = O (a matrix of zeros of the
0 ] = { 2
2
appropriate dimensions), and [f+1 f+1
+1 , +1 }. Because [+1 ] = 0, then

[R+1 ] = 0 which means that the returns have also been de-meaned.
The expression for the conditional covariance matrix of R+1 can be written by explicitly disantangling the role played by the risk exposures, the variance of the risk factors, and the variance
of idiosyncratic risk:
0
]B0 + [+1 0+1 ] + [f+1 0+1 ]
[R+1 ] = +1 = B [f+1 f+1
0
= B [f+1 f+1
]B0 + = b b0 2+1 + b b0 2 +1 + ,

(19)

where b is the 1 vector that collects the factor loadings of each of the assets on the IP
factor, and b is the 1 vector that collects the factor loadings of each of the assets on
the inflation factor. We may highlight the role of variances and covariances of the assets through a
simple = 2 example:
"
# "
2
11+1 12+1
(
1 )
=

12+1 11+1

1 2

1 2
2
(
2 )
"

"

2+1 +

11 12
12 22

"

2
)
(
1

1 2

2
1
2
(2 )

2 +1 +

2 2
2
2 2
2
) +1 + 11
2 +1 + 12
(
1 ) +1 +(1
1 2 +1 +1

2 2
2
2 2

2 2 +1 + 12 (
) +1 + 22
1 2 +1 +1
2 ) +1 +(2

At this point, it is revealing to define the 2 factor-mimicking portfolios (with returns , = 1 2)


with portfolio weights ( , = 1 2) that are orthogonal to all but one set of factor loadings:
+1 = 0 R+1

+1 = 0 R+1

such that 0 B = [1 0]0 and 0 B = [0 1]0 . The vector of factor-representing portfolios is


then r+1 = 0 R+1 where [ ]. It is then possible to re-write the expression for the
41

1 ] and [ 2 ] in terms of the


conditional covariance matrix of r+1 and in particular for [+1
+1

two factor mimicking portfolios:


[r+1 ] = 0 [R+1 R0+1 ]
= 0 +1 = 0 b b0 2+1 + 0 b b0 2 +1 + 0 .
In particular, notice that
1
[+1
] = 0 b b0 2+1 + 0 b b0 2 +1 + 0 11 = 2+1 + 1
2
[+1
] = 0 b b0 2+1 +0 b b0 2 +1 +0 22 = 2 +1 +
(20)
2

where is the [ ] element on the main diagonal of 0 , = 1 2. Each factor-mimicking


portfolio displays the exact time variation as the factor represented, which is why they are called
factor-mimicking portfolios, plus some idiosynchratic risk which is due to the possible need to avoid
complete diversification. At this point, it is possible to bring together the results in (19) and (20)
to derive an expression that links
b b0 [r+1 ] + b b0 [r+1 ]
to the variance of the factors and terms of the type b b0 1 + b b0 2 . Recall that
0

B [r+1 ]B0 = B0 b b0 B0 2+1 +B0 b b0 B0 2 +1 +B0 B


= b b0 2+1 + b b0 2 +1 + b b0 1 + b b0 2
Therefore
b b0 2+1 + b b0 2 +1 = B [r+1 ]B0 b b0 1 b b0 2

= b b0 +b b0 b b0 1 b b0 2 ,
where +1 [+1 ] and +1 [ +1 ]. Replacing this expression into +1 =

b b0 2+1 + b b0 2 +1 + found in (19), we have

+1 = b b0 +1 + b b0 +1 b b0 1 b b0 2 +
= + b b0 +1 + b b0 +1
where = b b0 1 b b0 2 .
However, these labored (and boring) mathematical derivations simply show that the conditional
covariance matrix of returns can be decomposed as a weighted sum of products of beta exposures to
factor mimicking portfolio returns and conditional variance forecasts for each of the two portfolios.
In practice, in order for the model to be implemented, one will need to parameterize +1
[+1 ] ( = ), for instance as simple GARCH-type processes,
2
+1 [+1
] = + 2 +

42

Clearly, such a specification may be replaced with dierent ARCH specifications from chapter 4,
without any qualitative dierences. As a result, the conditional covariance matrix of returns +1
may be re-written in a BEKK form as:
+1 = + A ( 0 )A0 + A ( 0 )A0 + B B0 + B B0 .
This is a very interesting results: all factor GARCH models eventually may be written as special
BEKK models in which the matrix of coecients (A , A , B and B ) bear functional
relationships to products of the beta exposures (b and b ) with row vectors of portfolio weights
(0 and 0 ) defining the mimicking relationships. This can be seen from the fact that
2
+
+1 = +
2
= + (0 )2 + 1 [
]

= + 0 (0 ) + 0 1 [R R0 ]
= + 0 (0 ) + 0
As a result, because we have seen that +1 = + b b0 +1 + b b0 +1 , we can
write the conditional covariance matrix of returns as:

+1 = + b b0 + 0 (0 ) + 0 +

+b b0 + 0 (0 ) + 0

= + b b0 + b b0 + b b0 0 (0 ) +

+b b0 0 (0 ) +b b0 0 +b b0 0

= + b 0 (0 ) b0 + b 0 (0 ) b0 +
+ b 0 b0 + b 0 b0
= + A ( 0 )A0 + A ( 0 )A0 + B B0 + B B0

where + b b0 + b b0 , A b 0 , A b 0 ,
p
p
b 0 , and B
b 0 . In conclusion, this shows that so that
B

the 2-factor GARCH model is a special case of the BEKK parametrization, although subject to
restrictions.

6.6. Estimation and diagnostic checks of multivariate GARCH models


Multivariate GARCH estimation is performed using maximum likelihood to jointly estimate the
parameters of the (conditional) mean and the variance equations in
12

R+1 = +1 + +1 z+1

z+1 IID N (0 I )
12

where all the parameters characterizing +1 and +1 are collected in some vector . Note that
although the GARCH parameters do not aect the conditional mean, the conditional mean parameters generally enter the conditional variance specification through the residuals, R+1 +1 ().
43

Assuming multivariate normality, the log-likelihood contributions (i.e., the PDF values for each of
the sample observations) for GARCH models are given by:36
1
1
1
N (R+1 ; ) ln(2) ln det +1 () (R+1 +1 ())1
+1 ()(R+1 +1 ())
2
2
2
In the case of a Student t-distribution, the contributions are of the form:

() 2 2 ( 2) 2

1
1
ln det +1 () ( + )
2
2
"
#
(R+1 +1 ())1
+1 ()(R+1 +1 ())
ln 1 +

(R+1 ; ) ln

where 2 is the number of degrees of freedom. The asymptotic properties of ML (and QML)
estimators in multivariate GARCH models are not yet firmly established, and are dicult to derive
from low level assumptions. While consistency has been proven by Jeantheau (1998), asymptotic normality of the QMLE is not established generally. However, applied researchers who use
MGARCH models have generally proceeded as if asymptotic normality holds in all cases.37
As usual, you may hesitate before introducing a specific parametric assumption on the distribution of the (standardized) residuals and may want to proceed instead under the weaker assumption
that
12

R+1 = +1 + +1 z+1

z+1 IID D(0 I )

where D is some distribution that is not specified. In this case you will be able to obtain QML
estimates using the same logic illustrated in chapter 4 in the case of univariate GARCH models.
In sum, even though the conditional joint distribution of the shocks z+1 is not normal (i.e.,z+1
IID D(0 I ) and D does not reduce to a N ), under some conditions, an application of MLE based
on z+1 IID N (0 I ) will yield estimators of the mean and variance parameters which converge
to the true parameters as the sample gets infinitely large, i.e. that are consistent. What are the
conditions mentioned above? You will need that:
The conditional variance function, +1 seen as a function of the information at time F
must be correctly specified.
The conditional mean function, +1 seen as a function of the information at time F must
be correctly specified.
Because estimating M-GARCH models is time-consuming, it is desirable to check ex ante whether
the data present evidence of multivariate (G)ARCH eects. This is done both on the individual series
by testing whether squared returns are serially correlated for each individual series, but also testing
36

The conditional mean and covariance functions are denoted as +1 () and +1 () to emphasize their dependence
on the parameter vector .
37
Gourieroux (1997, section 6.3) proves it for a general formulation using high level assumptions. Comte and
Lieberman (2003) prove it for the BEKK formulation.

44

2 2
whether squared returns appear to display any significant cross-correlations, [
] 6= 0

for 6= and 6= 0. See chapter 4 for examples of how this may be done and how one tests for the
significance of (cross-) serial correlations.
Ex post, it is also of crucial importance to check the adequacy of the M-GARCH specification.
However, few tests are specific to multivariate models. Univariate tests applied independently
to each series of (standardized residuals) remain very common, but not completely appropriate.
For instance, as seen in chapter 4, it is typicalwhen z+1 IID N (0 I ) has been assumedto
applystandard univariate tests of normality to the standardized model residuals defined as +1
+1
+1 , where
+1 denotes the time series of filtered standard deviations derived from the
+1 e ( = 1 2 , i.e., the th element on the main
estimated volatility model,
+1 = e0

+1 ). Here, we are clearly exploiting the fact that z+1 IID N (0 I ) implies that
diagonal of

each of the elements z+1 must have a marginal normal distrubution.38 As you know, one commonly
used test is Jarque and Beras and measures departures from normality in terms of the skewness and
kurtosis of standardized residuals. A second method exploits the fact that even though normality
has not been assumed (this is the case of QMLE) so that the assumed model for returns is z+1 IID
D(0 I ) and D(0 I ) is not N (0 I ), a correctly specified anyway implies
z+1 IID.
( ) ' 0 for all 1 where
As we know, independence implies that

( )

(
)2 2
=1

P
=1

(( ) ( ))((+ ) ( ))
P
2
=1 (( ) ( ))

and () is any (measurable) function. Because we are testing the correct specification of a conditional volatility model, it is typical to set () = 2 i.e., we test whether the squared standardized
2
2
residuals, +1
+1

2+1 , display any systematic autocorrelation patterns.39

Although univariate tests can provide some guidance, contemporaneous correlation of disturbances entails that statistics from individual equations are not independent. Therefore, truly multivariate tests have been developed and are routinely applied in practice. Recalling the framework
z+1 IID D(0 I ), it is typical to also test ex-post cross-serial correlations of functions of stan-

2
2 2
dardized residuals, e.g., (i) [
] = 0 for 6= , (ii) [
] = 0 for 6= , (iii)

2 ] = 0 for = 1 2 and (iv) [


[
] = 0 for 6= . These zero cross-serial

correlations are tested as usual using sample correlograms (here, cross-correlograms involving pairs
of series) and Portmanteau Box-Pierce tests, as seen in chapter 4. Moreover, a generalization of the
38

As you will recall from your statistics courses, the opposite does not hold: +1 N (0 1) 6= z+1 IID
N (0 I ).
39
We face one additional problem when z+1 IID is tested by sequentially applying tests for each of the resulting
times series of standardized residuals because
z+1 derives from a multivariate model and therefore after the first
test has been implemented, the subsequent tests may be aected by the previous inferential methods employed.

45

standard Box-Pierce/Ljung-Box test,


() ( + 2)

X
=1

2 2

exist, such as Hoskings (1980) (here HM stands for Hoskings multivariate test):
() 2

( )[1 (0) ()1 (0)0 ()] 2(max{})2


=1

where Y (z z0 ) and () is the sample autocovariance matrix of order for the series Y .
6.7. One easily implemented multivariate model: PC GARCH
If one leaves the DCC model of Section 5 aside, it seems legitimate to ask whether multivariate
GARCH models only give occasions for pain and sorrow. The simple answer is that unless one makes
smart attempts at getting multivariate estimates by only using univariate GARCH estimatesas
DCC and CCC models accomplishthis tends to be the case, at least when exceeds low values
such as 3 or 4. In fact, the literature features several such attempts, e.g., the orthogonal GARCH
and the principal component (PC) GARCH.40 In a PC GARCH model, the observed data are
assumed to be generated by an orthogonal transformation of (or a smaller number of) univariate
GARCH processes. The matrix of the linear transformation is the orthogonal matrix (or a selection)
of eigenvectors of the population unconditional covariance matrix of the standardized returns.41 In
a PC GARCH, estimation is organized in 7 steps. The input is a matrix (R) of returns with rows
representing points in time and columns representing assets. The steps are as follows:
1. Estimate a univariate GARCH model for each of the assets or portfolios in R i.e., for the
columns in the matrix; the GARCH models for each of the assets could be dierent, as in
Section 5; the parameters are estimated independently by ML or QML.
2. Standardize the residuals with the estimated variance for each asset, obtaining the to be
40

collected in a matrix Z.

Principal component analysis models the variance structure of a set of observed variables using linear combinations
of the variables. While we generally require as many PCs as variables to reproduce the original variance structure,
we usually hope to account for most of the original variability using a relatively small number of components (data
reduction). The PCs of a set of variables are obtained by computing the eigenvalue decomposition of the sample
variance matrix: = LL0 , where L is the matrix of eigenvectors and is the diagonal matrix with eigenvalues on
the diagonal. The first PC is the unit-length linear combination of the original variables with maximum variance;
subsequent PCs maximize variance among unit-length linear combinations that are orthogonal to the previous PCs.
PCs may be computed starting from either covariance matrices or correlation matrices; correlations may also be
computed in nonparametric fashion (e.g., Spearman rank-order or Kendalls tau measures).
41
Of course, PC/orthogonal models can also be considered as factor models, where the factors are univariate
GARCH-type processes. Therefore PC/orthogonal models are nested in the BEKK family. In particular, the
PC/orthogonal-GARCH model is covariance-stationary if the univariate GARCH processes built from principal compoents are themselves stationary.

46

3. Compute the principal components of the matrix of standardized residuals Z, obtaining


the PC matrix P, P = ZL where L is the matrix of loadings of the vectors of standardized
returns on each of the eigenvectors.
4. Estimate a univariate GARCH model for each of the principal components (that is, for
each column of P); a GARCH(1,1) is generally recommended.
5. Use the loading matrix L to rotate the PC variances back to variable space; at each point in
time compute: C = LD L0 , where D is the time diagonal matrix of the estimated variances
of the PCs at time . At this point, the matrix C is an approximate correlation matrix for the
original variables at time ; however, there is no guarantee that the elements on the diagonal
of C are equal to 1.
6. Standardize C so that it is a correlation matrix with all of its diagonal elements equal to
1, call the result Corr ; practically, this step is simply performed by using any software to
compute the correlation matrix of C .
7. At this point, we scale Corr with the estimated variances of the original GARCH models in
D to get the covariance matrix:
12

12

= D Corr D .
PC GARCH, although there are no compelling reasons for why it may work or accurately forecast
variances and covariances, can handle any practically interesting value for : computationally, a
problem would need to have several thousand variables/assets before computing time becomes a
serious issue. In the case of the PC GARCH, experiments have been performed to see if not
performing the GARCH estimates for the smallestthose that explain a smaller percentage of total
variancePCs may be a good thing, in the sense that not only fitting the time variation of variances
and covariances is not seriously impaired, but there is actually evidence that forecasting accuracy
may benefit. Therefore, in practice PCs with very small contributions to variance may be skipped.
For instance, Alexander (2001, section 7.4.3) illustrates the use of the PC GARCH model (that she
also calls orthogonal GARCH). She emphasizes that using a small number of principal components
compared to the number of assets is the strength of the approach (in one example, she fixes at
2 for = 12 assets). However, note that the conditional variance matrix has then reduced rank
(if ), which may be a problem for applications and for diagnostic tests which depend on the
inverse of .

47

Appendix A A Quick Review of Basic Estimation Methods


1. A1. Where the OLS World Ends...
Consider two time series 1: = {1 2 } and 1: = {1 2 }. At this stage of your
studies in statistics and econometrics you are familiar with linear regression models:
= 0 + 1 +
In order to estimate this model by ordinary least squares (OLS), several assumptions need to be in
place. Typically, one resorts to the so-called weak OLS hypotheses, i.e. ( {1 2 }):
1. ( ) = 0;
2. ( ) = 0, which can also be written ( | ) = ( ) (deterministic regressors);
3. ( | ) = ( ) = 2 (homoskedasticity);
4. ( ) = 0 (no autocorrelation in residuals).
Hypothesis 3 and 4 can be summarized by saying that (1: ) = I 2 . The strong OLS
Hypotheses set implies the four conditions above plus an assumption on the distribution of residuals:
(0 2 ) = 1 2 .
However, data from many real-life problems often fail to fulfill the OLS conditions, even in their
weak form. In fact:
Hypothesis 1 is trivially verified if you just add an intercept to your model.
However, Hypothesis 2 seems considerably more problematic. For instance, suppose that you
observe your independent variable with a random, white noise measurement error: = + .
In this case your regressor is no longer deterministic, i.e., fixed in repeated samples. Even
if you know that ( ) = 0, what about ( )? There is a wide literature dealing
with stochastic regressors, which uses Two Stage Least Squares (2SLS) and Instrumental
Variables.42
The potential issues with Hypotheses 3 and 4 are on the contrary particularly relevant in
finance. As you now know so well, for many financial time series, the conditional variance
(+1 |= ) (where = is an information set or an information structure determined by
the financial application at hand) is not constant over time. In addition, when fitting OLS
regression models to financial time series, the null of ( ) = 0 is often rejected. We shall
42

However, we will not discuss these estimators in this Appendix because these problems are not specific to finance
applications, but you should be aware of the fact that Hypothesis 2 may be often problematic in applications.

48

see below that if Hypotheses 3 and 4 are violated, the OLS estimator is not any longer the
best linear unbiased estimator (BLUE), even if it is still consistent. In fact, there is another
unbiased estimator whose variance is smaller than the one of OLS: it is the Generalized Least
Squares (GLS) estimator described below.
In addition, there is a further point that deserves to be discussed in depth. Consider a matrix of
time series Y = [1 , 2 , ..., ]0 (also called a Panel) and a set of regressors 1: , 1: ,
1: ,... How would you estimate the relationship between dependent variables and regressors in
this case?
A first way to do that (since you are familiar with OLS) is to run a separate OLS regression for
each time series , = 1 . In fact, you know that the OLS estimator is BLUE among the
estimators which are a linear function of . However, your problem involves more than just one
dependent variable. Is it possible to build an unbiased estimator which is a function of both 1 ,
2 ,... and is more ecient than the OLS? The answer is yes, and we will talk more extensively
later about models with multiple equations.
In the following subsections we will quickly review some estimators used in the econometrics
literature, which you might find helpful when dealing with problems where OLS are no longer a
feasible (or at least, an ecient) solution.

A2. Generalized (and Feasible Generalized) Least Squares


As already noted, in the OLS framework we assume that (1: ) = () = I2 . Now,
lets abandon this hypothesis and lets say that () = , where is any valid (i.e., symmetric
and positive definite) variance-covariance matrix. If the matrix is known (we will come back to
this point later), it is possible to derive an estimator for regression coecients in a way similar to
what we normally do under OLS. As you shall recall, if the covariance matrix is positive definite
(notice: this does not hold in case is semi-positive definite), you can always invert it and write
1 = DD0 . You can then write your standard Least Squares problem in matrix form as:
Y = X + ,
Where Y is the 1 vector collecting observations for the dependent variables, X is a
matrix of the regressors, is a 1 vector of coecients to be estimated, and is a vector of

residuals.43 If we pre-multiply the regression model by the matrix D we get:


D0 Y = D0 X + D0
or
= X
+
Y
,
43

If needed, X may be expanded to also include a vector of ones, that will then represent the constant of the
regression model to be estimated as the corresponding element of the 1 vector of coecients .

49

D0 Y, X
D0 X, and
where Y
D0 . Note that (
) = (

0 ) = D0 (0 )D = D0 (DD0 )1 D =
D0 (D0 )1 D1 D = I because 1 = DD0 implies that (0 ) = = (DD0 )1 .44 Coecients can

now be estimated by OLS, since any heteroskedasticity has been removed. The estimator will be:
= (X
1 X
0Y
= (X0 DD0 X)1 X0 DD0 Y
0 X)

= (X0 1 X)1 X0 1 Y
This estimator is unbiased:

) = (X0 1 X)1 X0 1 (X + )
= (X0 1 X)1 X0 1 X+(X0 1 X)1 X0 1 () = .

In the same way we can derive the variance-covariance matrix of the GLS estimators:
h 0
i1
) = (X
1 X
0 (
0
0 X)
(X
X)
(

0 )X
h 0
i1
1 X
0 IX
0
1
0 X)
(X
X)
0 X)
= (X
= (X
= (X0 DD0 X)1 = (X0 1 X)1

Note that if we were not taking into consideration heteroskedasticity and we were using OLS, our
estimator would have been inecient, but still unbiased. In fact:
= (X0 X)1 X0 Y

implies that (

) = (X0 X)1 X0 (X + ) = . However,

) = (X0 X)1 X0 (0 )X(X0 X)01 = (X0 X)1 X0 X(X0 X)01

) = (X0 1 X)1 . In fact, if = I 2 the two


which can be proven to always exceed (
) = 2 (X0 X)1 .
) = (
estimators are identical and also the two variances are: (
When 6= I 2 , we can write their ratio as:

) (

) = (X0 X)1 X0 X(X0 X)01 (X0 1 X)


= (X0 X)1 X0 DD0 X(X0 X)01 (X0 D1 D01 X)
= D01 X(X0 X)1 X0 DD0 X(X0 X)01 X0 D1

Because X(X0 X)01 X0 is an idempotent matrix, it follows that


) = D01 X(X0 X)1 X0 DD0 X(X0 X)01 X0 D1
) (
(
= D1 D01 DD0 X(X0 X)01 X0 X(X0 X)01 X0 = X(X0 X)01 X0
which is always greater than 1. This is why the OLS estimator is less ecient than the GLS
estimator.
44

(
) = (

0 ) because (
) = (D0 ) = D0 () = 0.

50

We now have to make and important remark. When estimating GLS in actual applications, the
which
structure of will never be known. You will rather have at your disposal some estimator
is an estimator of and thatassuming you want to focus only on consistent estimators, as you
shouldwill converge to the true covariance matrix as the sample becomes large (for those of
= ). Our estimator will therefore be the Feasible
you who know more, we can say that
Generalized Least Squares (FGLS) estimator:
= (X0
1 X)1 X0
1 Y.

The specification of a peculiar structure/model for the variance-covariance matrix becomes therefore
is dierent from . This might cause a bias in
a problem of key importance. In a finite sample
the GLS estimator, since we are not ex ante guaranteed (especially if the sample is very small) that
1 X)1 X0
1 )] = 0.
[(X0

A3. Completing the Picture: the (Generalized) Method of Moments


The estimators that we have reviewed so far (OLS and GLS) and the ones that we will consider
in the following pages can be derived using Maximum Likelihood Estimation (MLE), which is based
on an assumption (called parametric) on the density of distribution of the residuals. In the case
of OLS, this assumption is part of the Strong OLS Hypotheses set, and can be written as (see also
A1): (0 2 ). Given this hypothesis, it is possible to estimate parameters maximizing the
natural logarithm of the likelihood function. In the OLS case:
ln (Y; 2 ) =

ln 2
ln 2 2 (Y X)0 (Y X)
2
2
2

However, OLS and GLS estimators do not require any parametric assumption on residuals; they
can just be derived using orthogonality conditions. Parametric assumptions might be used in a
second step, in order to determine the probabilistic properties of the estimators. This is why they
are called semiparametric estimators. The underlying idea is that the sample statistics of a given
sample converge to the true moments of the population if the size of the sample gets larger and
larger. Sample statistics can then be used to impose conditions in order to estimate regression
parameters. As the size of the sample becomes large, parameter estimates will converge to their
true values. OLS and GLS can therefore be included in a wider group of estimation methods , which
is called Generalized Method of Moments (GMM).
To formalize the problem, suppose that we want to identify parameters ( = 1 ). We
have to define sample moments

1X

(
1 2 ) = 1 ,
=1

knowing (or assuming) that lim


= , where = [ ( 1 2 )] is the true but unknown population moment. Then parameter estimates may be estimated by imposing the set of

51

conditions:
1
2
) = 0

1 1 (

2 2 (
1 2 ) = 0
= 0
1
2
) = 0

(21)

where
is the sample value of a statistic of interest, = 1 . Basically, from a mathematical
2

1
point of view, all that you are doing is to look for the set of parameter values
which jointly minimizes the distance between the conditions imposed on parameters and sample
moments. Notice that the number of moment conditions may exceed the number of parameters
to be estimated, . When = , we speak of just identified GMM or simply method of moment
estimates: in this case you impose the minimal number of conditions for the system (21) to be


. This is the case we have dealt with in
solvedassuming a solution existsto find
1
2

chapter 5, when t-Student shocks have been introduced. Often in practice, one elects to set
and in this case one additional problem is how to weight the dierent moment conditions, because
2
will exist such that all
1
in a mathematical perspective, it is clear that only by chance
the conditions/equations in (21) may find a solution. Although this is a rather advanced topic to be
developed in the appropriate context, one solution, which in fact also delivers a number of optimal
by solving:
properties consists of estimating
1 [m
f ()]0 S
f ()]
min[m

(22)

where m
[1 2 ... ]0 ,
h
i0
P
P
P
f () 1 =1 1 ( 1 ) 1 =1 2 ( 1 ) ... 1 =1 ( 1 )

= [m
S
f ()]

is a first-step estimator that simply minimizes the quadratic form [m


and
f ()]0 [m
f ()]

to minimize a set of moment conditions weighted


Basically, (22) means that you pick
by the inverse of their covariance matrix, so that the moment conditions that are estimated more
precisely in the data sample will receive a higher weight. Clearly, a just identified GMM estimation
= I.
problem corresponds to (22) when S
How can we write an OLS or a GLS estimator as a GMM one? Here recall that one of the Weak
OLS Hypotheses stated that ( ) = 0. Thus, the GMM condition for estimating is:
[0 X] = [(Y X)0 X] = 0.
The same happens for GLS, with the only dierence that we have to take into account the structure
of the variance-covariance matrix:
X)
0 X]
= [(Y X)0 1 X] = 0.
[(Y
52

A4. Dealing with More than One Equation: Seemingly Unrelated Regressions (SUR)
Lets now consider a situation where there are two or more dependent variables, say 2.
The regression model is

Y1
Y2

Y = X +

X1 0

= 0 X2

0
0

0
0

X

1
2

1
2

where is the vector of coecients for the th equation, = 1 . The covariance matrix of the
residuals is:

() =

1
2
..
.

1 01
2 01
..
.

1 0

h
i

0 0 0
1
2

1 02
2 02

2 0

..
.

1 0
2 0
..
.

I 11
I 21
..
.
I 1

I 12
I 22

I2

..
.

I 1
I 2
..
.

= ,

where ( ) is the covariance between the residuals of regressions and . As we


said early on, a first approach to the problem would be to estimate a separate OLS model for
each regression equation. However, as we have seen when introducing GLS methods, this might
not be the most ecient solution. If we look at the model above, the equations for the dierent
dependent variables are apparently unrelated as far as the conditional mean is concerned. However,
relationships exist through the covariance matrix . Working through an algebra very similar to
the one shown in Section 2, it is possible to derive the Seemingly Unrelated Regressions estimator:
= (X0 1 X)1 X0 1 Y

This estimator is BLUE with variance


) = (X0 1 X)1 .
(
Also for the SUR estimator we have to bear in mind that usually we do not know the matrix .
This may pose problems in small samples. Interestingly, the OLS equation by equation estimator
and the SUR estimator are equivalent in only two very special situations:
is a diagonal matrix, or in other words = 0 if 6= .
The set of regressors is the same for all the equations: X = X = 1 2... .
53

When you are dealing with a single regression model ( = 1), the key issue is obviously only
the choice of the right set of regressors. However, as you can understand after our discussion in
this section, in the case of a system of regression models ( 2), there is another choice you
have to make: you have to decide whether to model a time series on its own or along with other
time series. If you choose the right set of time series and regressors, you will be able to exploit
cross-sectional information and improve the eciency of your estimates by obtaining a GLS-type
estimator.
This is the appropriate point in which we can introduce some ideas on estimation of models with
fixed and random eects.45 These models are useful when you are analyzing a population (e.g., the
set of EU countries) and you want to check whether there are any structural dierences among units
in the population (e.g., countries) conditionally on a common set of explanatory variables (GDP,
current account, inflation, etc.). Consider the regression of dierent vectors of dependent variables,
Y1 , Y2 , ..., Y , on a matrix of independent variables X. We can write:
= 0 +

=1

= 1 .

Assume that in this model = , ( ) = 0, ( 0 ) = I , ( 0 ) = 0. Summing up,


in this model cross-sectional covariances between errors are null, while the set of regressors and
regression slopes are the same for each dependent variable. The only estimator measuring cross
sectional dierences is the intercept 0 ( = 1 ). In practice we can think of the intercept as
a constant common to all models plus a fixed component that changes according to the dependent
+ . This model is known as a model with fixed eects. The name
variable. In formulas: 0 =
0

comes from the fact that the cross-sectional eects captured by the intercept (the s) are fixed
for each dependent variable. This model (as usual) has its own BLUE estimator for the regression
coecients.
In order to use a model with fixed eects, you need to assume that your model encloses the whole
cross-sectional dimension of the population. If you think about it, when you are introducing fixed
eects, you completely neglect any conditional (cross-sectional) volatility of the intercept. This
can be done only if you observe all the individuals (e.g., countries) that are part of the population
and you can perfectly identify the intercept eect for each one of them. If your population is a
sub-sample of a larger set, you need to take into account this conditional volatility and use random
eects. A model with random eects is very similar to a model with fixed eects, even though there
is one key dierence. In a random eect model, the intercept for the dependent variable is
0 + ,
0 =
45

This is a rather advanced topic and its appearance (while it has not emerged in the lectures and it will not) reflect
from my discomfort with the large number of MSc. Finance students that deal with panels in their theses and in their
jobs: some quick introduction is most needed.

54

where is a random error such that: ( ) = 0, (2 ) = 2 , ( ) = 0, ( ) = 0. So the


model can be re-written as:

+ P + +
=
0

= 1 .

=1

Also random eects have their own BLUE estimator. Greene (2008) represents a great starting
point for additional details and the formulas for the BLUE estimators mentioned above.

A5. Simultaneous Equations and Vector Autoregressive Models


In our discussion so far we have considered very general econometric frameworks. When modeling multiple time series in economics, a key question is the study of contemporaneous and lagged
casual relationships between variables. Simultaneous Equations Models are very popular in macroeconomics, where Structural Models are used to bring theoretical frameworks with large number of
equilibrium equations to the data. A consistent modeling of causality becomes of key importance.
One of the most basic (and most popular) frameworks in this area are Vector Autoregressive Models
(VAR).
VARs are based on the assumption that, given a vector of time series, there is a feedback
relationship between the current value of each time series and lagged values of the same or other
series. In practice this model can be seen as an autoregressive model for multiple time series. In
formulas, a VAR model with lags, also called VAR() can be written as:
Z = 0 + 1 Z1 + 2 Z2 + + Z + ,
where Z and are both 1 vectors. Note that this model can always be written as a VAR(1)
by re-arranging the equations in the following state-space form:

Z
0
1 2
Z1

O O Z2
Z1 0 I
= . + .

..
.
..
. .

.
. .. ..
.

. .

Z+1
Z
0
O I O

0
..
.
0

Because of this result, lets focus now on the simple framework: Z = 0 + 1 Z1 + . First,

this model is estimated by OLS, because the set of regressors is the same in each equation, and the
SUR estimator coincides with the OLS one. Second, the estimation of this model involves a large
number of parameters. Consider an example in which Z has four elements. Even if you neglect
intercepts you need to fill the full matrix 1 (and the variance-covariance matrix of residuals):

11 12 13 14

21 22 23 24

1 =

31
32
33
34
41 42 43 44

The presence of a large number of parameters may pose tricky risks of over-fitting the data,
lowering saturation ratios below what is sensible to tolerate. By this, we mean that some of the
55

parameters in the matrix 1 might come out to be significantly dierent from zero, even if they do
not explain any meaningful relationship in the data. This is why in estimating VAR matrices econometricians usually impose constraints and set ex-ante some entries to equal zero. The stationarity
condition for a VAR(1) model is somehow similar to the one that we impose for an AR(1) model.
Unconditional moments are:
(Z) = ( 1 )1 0
( (Z)) = ( 1 1 )1 ()
where is the Kronecker product of two matrices.46 Conditional -steps ahead moments can be
derived as:

(Z+ |Z ) = 0 + 1 0 + 21 0 + + 1
1 0 + 1 Z

1 0

k1
(Z+ |Z ) = + 1 01 + 21 20
(1
1 + + 1

Unconditional moments can then be obtained as the limit of the conditional moments as .

Remember that we can always write the eigendecomposition: 1 = CC; then: 1 = C C. To

be sure that 1 is well defined for , we need to ask that is well defined because

1 0 0 0

0 2 0 0

0 0 0
0 0 0

This is equivalent to requiring that all the eigenvalues of 1 fall in the unit circle. If this was not

enough and you want to know more, it time to consult Hamilton (1994), which remains the most
complete references on the market for specification and estimation of vector linear models.
46

The () operator simply stacks the elements of a matrix by concatenating them vertically, i.e., if the
matrix A is written as A [a1 a2 a ] where a is a 1 vector ( = 1 2... ), then (A) = [a01 a02 ... a0 ]0 or

(A) =

a1
a2
..
.
a

The Kronecker product is instead the element-wise multiplication

11 11


11 12
11 12
11 21

=
21 11
21 22
21 22
21 21

56

operator. An example will suce:

11 12 12 11 12 12
11 22 12 21 12 22

21 12 22 11 22 12
21 22

22 21

22 22

Appendix B A Matlab Workout


You are a European investor with the Euro as a reference currency. Using monthly data in
STOCKINT2013.XLS, construct monthly excess returns (in Euros) using the three price indices
DS Market-PRICE indices for Germany, the US and the UK.
1. For the sample period April 1977 - December 2009, plot the values of each of the three
individual indices (in logarithmic terms) and excess returns for the equally weighted portfolio
denominated in Euros. Make sure to include the dividends paid by each of the three indices in
each of the monthly return series. Notice that the monthly data made available on the course
web site also include data on the dividend yield on index ( = GER, US, UK), , defined
as:47

2. Over the same sample, use a GARCH(1,1)-DCC(1) model with constant mean to estimate
(filter) the dynamics of the covariance matrix of excess returns at monthly frequency. The
model is:
+1 = + +1
+1 = + +1
+1 = + +1

+1

,
+1 0 +1
+1

where +1 denotes excess returns ( = US, UK, Ger). Make sure to extract and print parameter estimates, their robust standard errors, and their t-ratios. Also extract the dynamic,
conditional covariance matrix implied by the model and use a three-panel plot to graph the
one-month ahead (predicted) expected excess returns, volatilities, and correlations. [Hint:
You have to use Kevin Sheppards MFE toolbox; you can find a link on the course web page;
before blindly proceeding, please make sure to read and understand what the toolbox allows
you to do]
3. Use the dynamic variance-covariance matrix filtered from question 2 and the unconditional
historical means of excess returns to build an in-sample, recursive dynamic Markowitz portfolio
based on the simple expression
1

w = [
+1
47

as:

As a result, the log- (continuously compounded) return on the stock index between 1 and will be defined
ln

+ .

1
In the following, while denotes returns, will denote returns in excess of the Euro riskless rate.

57

where
[


]0 and implicitly you have imposed a unit coecient of risk aversion.
Plot these recursive portfolio weights.
4. Now we turn instead to build and use models for the conditional mean, something we have
avoided so far. As a starter, estimate by OLS a contagion regression model for the three
stock indexes:

+1 =
0 + 1 ln + 2 + 3 + 4 + +1

+1 =
0 + 1 ln + 2 + 3 + 4 + +1

+1 =
0 + 1 ln + 2 + 3 + 4 + +1

+1

2
2
+1 (0 ).
+1

where , , and are log-index levels expressed in euros.48 Make sure to extract
and print conditional mean parameter estimates, their standard errors, and their t-ratios.
Use a three-panel plot to graph the one-month ahead (predicted) expected excess returns,
volatilities, and correlations.
5. Use the dynamic conditional means predicted from question 4 and the unconditional historical
variances and covariances of excess returns to build an in-sample, recursive dynamic Markowitz
portfolio based on

2

2
2 ]1
,
w = [

where
[ (+1 ) (+1 ) (+1 )]0 . Plot these recursive portfolio weights. If
needed, you may decide to cut excessively large and small weights (for instance, exceeding
400% in absolute value) and produce a parallel plot in which visibility may be enhanced. Are
these weights economically sensible?
6. Bring now the models of questions 2-3 and 4-5 together, by estimating by iterated MLE (see
below for details), the following restricted VAR(1) model that jointly captures the presence
of time variation in conditional means, conditional variances, and conditional covariances:49

+1 =
+1
0 + 2 + +1

+1 =
+

+1
+1
+1
0
3

+1 =
0 + 4 + +1

+1

Notice that by estimating this model you are generalizing the model in question 4 not only
because you are allowing variances to vary over time, but also (or especially) because you

48

Estimation by OLS implies that you will not be able to estimate the covariance of the shocks to dierent stock
indices, which are therefore simply set to be zero throughout the exercise.
49
This VAR is restricted because each equation simply features a dependence of future excess returns in market
from past excess returns in the same market. Please make sure to understand what a full, unrestricted VAR would
imply.

58

are no longer constraining the correlations to be always zero. Make sure to extract and
print parameter estimates, their standard errors, and their t-ratios. Use a three-panel plot
to graph the one-month ahead (predicted) expected excess returns, volatilities, and correlations. [Hint: the reference above to iterated MLE refers to the fact that Kevin Sheppards
GARCH/DCC utilities will not allow you to jointly estimate the regression coecients and
also the GARCH/DCC dynamic covariance matrix; therefore what you are advised to do is to
first estimate the residuals from the regression models, then fit on them your GARCH/DCC
model, and finally proceed to re-estimate the regression model by GLS when the covariance
matrix is the one estimated by GARCH/DCC].50
7. Using the dynamic conditional means, variances, and covariances predicted in question 6
compute in-sample, recursive dynamic Markowitz portfolios based on

]1
2
,
w = [

+1

is now estimated from the second, GLS-type step of question 6. Plot these
where
2

recursive portfolio weights. If needed, you may decide to cut excessively large and small
weights (for instance, exceeding 500% in absolute value) and produce a parallel plot in which
visibility may be enhanced. Are these weights economically sensible?
8. With reference to the out-of-sample period January 2010 - December 2012, proceed to compute
optimal weights for two models: the unconditional mean GARCH/DCC model of question 2;
the VAR/GARCH/DCC of question 6. Perform the calculation in the following way: for both
models use the same estimated conditional mean (the intercept in the first case, the restricted
VAR parameters in the latter) and the GARCH/DCC parameters estimated in questions 2
and 6, that you should have saved; compute the dynamic covariance matrix on the basis of
those parameter performing the updating on the basis of the out-of-sample forecast errors
over the out-of-sample period. They weights will come from the classical Markowitz formula.
After obtaining the weights, compute the realized Sharpe ratios over the out of sample period.
Compare these realized Sharpe ratios with those that you would have achieved by simply
investing all of your wealth in each of the three stock indices under consideration. [Hint: In
this case it is a very good idea to use Kevin Sheppards dcc mvgarch cov function]
9. Load the daily data set already used on many occasions before. Construct the monthly nonoverlapping cumulative returns and monthly non-overlapping realized variance (RV) series
for the US and German stock indices. Repeat the same exercise to construct monthly nonoverlapping realized covariance (RCOV) series for the US and German stock indices. Plot the
50

For our purposes, it will be sucient to iterate these two estimation steps only once, which is what we normally do
with feasible two-stage GLS. However, one may want to iterate these two steps as long as it takes to reach convergence
in parameter estimates. However, under the assumption of correct specification of the dynamic model, notice that
also the two-stage MLE is consistent. Please see the lecture notes that are available on the class web site.

59

resulting monthly realized variances, covariances, and the autocorrelograms of theses series.
Are the natural logarithms of the resulting series normally distributed?
10. Use the first 100 observations of the monthly realized variance sample you have obtained
in point 9 to estimate two univariate AR(1) models for the US and German RVs. Repeat
the same estimation on the basis of the initial 100 observations of the monthly sample to
estimate AR(1) models of the US-German RCOV. Compute a one-step ahead forecast of the
US, German and US-German realized variance and covariance from observation 100 to the end
of the sample by using the estimates obtained in the previous points. At this point, construct
an equally weighted portfolio, compute its returns and the one-step ahead predicted variance
and covariances. Plot the corresponding Q-Q plots for German, US, and the equally weighted
portfolio standardized returns.

Solution
This solution is a commented version of the MATLAB code Ex Modeling correlations 2013.m
posted on the course web site. Note that in this case, all the Matlab functions needed for the correct
functioning of the code have been included. This means that no Set Path instructions should be
used (only for this workout Matlab). The loading of the monthly data is performed by the lines of
code:
filename=uigetfile(*.txt);
data=dlmread(filename);
The above two lines import only the numbers, not the strings, from a .txt file.51 The usual lines
of code take care of the strings and are not repeated here. The same applies to the exchange rate
transformations that have now become customary in the first part of our Matlab workouts.
1. Figure A1 plots the values of each of the three individual indices (in logarithmic terms) and
excess returns for the equally weighted portfolio denominated in Euros. Here the message is
that you should always take a solid look at your data before venturing in any analysis. Note
that the lines of code
dy ger m = data(:,4)/(100);
dy ger m = dy ger m/(12); % Monthly dividend yield used in return calculation
lret ger m = log((p ger./lag(p ger))+dy ger m);
dy us m = data(:,6)/(100);
dy us m = dy us m/(12); % Monthly dividend yield used in return calculation
51

The reason for loading from a .txt file in place of the usual Excel is to favor usage from Mac computers that
sometimes have issues with reading directly from Excel, because of copyright issues with shareware spreadsheets.

60

lret us dol = log((p us./lag(p us))+dy us m);


dy uk m = data(:,8)/(100);
dy uk m = dy uk m/(12); % Monthly dividend yield used in return calculation
lret uk str = log((p uk./lag(p uk))+dy uk m);
make sure that total index returns include dividends.

Figure A1:Monthly portfolio indices and exchange rates

2. We now use the dcc function in Kevin Sheppards MFE toolbox


R eq dm = [R eq(:,1)-mean(R eq(:,1)) R eq(:,2)-mean(R eq(:,2))
R eq(:,3)-mean(R eq(:,3))];
[parameters, loglikelihood, SIGMA 1, stderrors, jointscores, diagnostics]=
dcc(R eq dm,[],1,0,1);
to estimate over the monthly sample 1977-2009 a GARCH(1,1)-DCC(1,1) model with constant mean
to and sample correlation (matrix targeting):
+1 = + +1
+1 = + +1
+1 = + +1

+1

,
+1 0 +1
+1

where +1 denotes excess returns ( = US, UK, Ger, i.e., in excess over 3-month euro riskless rates).
Parameter estimates, with QMLE standard errors are printed on the Matlab screen as shown in
Figure A2. Interestingly, also on these monthly return data we find rather precisely estimated and
using our
rather persistent GARCH dynamics. However, the implied persistence indices (
+

standard notation) tend to be (at 0.96-0.98) slightly (but only slightly) smaller than those typical
of monthly data.52 More interestingly, the estimated coecients for the DCC(1,1) model are 0.028
52

The only oddity is that the estimates of constant coecients ( , = US, UK, Ger) in the GARCH(1,1) models
are not statistically significant.

61

and 0.969, respectively, and both are estimated at a relatively high level of precision. The reason
for why only two coecients are estimated in the DCC(1,1) model is that a sample correlation
targeting modelas examined in the lectureshas been specified and estimated:
#
"

1X
+1 =
(1 ) + +
=1

Figure A2:GARCH/DCC estimates from monthly international excess return indices

The extraction of the dynamic, conditional covariance matrix implied by the DCC model is
performed automatically by Sheppards dcc function,
[parameters, loglikelihood, SIGMA 1, stderrors, jointscores, diagnostics]=
dcc(R eq dm,[],1,0,1);
} corresponds
where the sequence of (one-month ahead) predictions of the covariance matrix {
=1
to SIGMA 1. When we plot conditional means, volatilities, and correlations (all at a monthly
frequency) using standard instructions, we obtain Figure A3.

Figure A3-I: Conditional means and volatilities from a GARCH DDC model

62

Figure A3-II: Conditional correlations from a GARCH DCC model

The conditional means, as per the very instructions of the workout, are in fact constant over
time, which implies that unconditional and conditional means are constant in this case. On the
contrary, conditional volatilities are time-varying in ways that are typical of GARCH models, with
spikes that may receive a rather natural economic interpretation in the light of well-known events.
Conditional correlations as estimated by the model appear to be largely increasing over time, after
undergoing a discrete upward jump in the Fall of 1987. Although, as one may expect, the pairwise
conditional correlation between U.S. and U.K. (excess) equity returns is larger than the other two
pair-wise correlations, after 2005 all these correlations converge to similar levels, generally in excess
of 0.7. Such high correlations levels imply that the diversification opportunities across these major,
developed stock markets are limited at best.
3. The mean-variance portfolio weights obtained from the classical formula
]1
w = [
,
+1

]0 and implicitly you have imposed a unit coecient of risk aversion,
where
[

are computed by the lines of code

Figure A4: Optimal mean-variance portfolio weights from a GARCH DCC model for conditional covariance

for t=1:n
At 1(:,t) = inv(SIGMA 1(:,:,t))*FIT 1;
63

Wt 1(t,1) = At 1(1,t)/sum(At 1(:,t));


Wt 1(t,2) = At 1(2,t)/sum(At 1(:,t));
Wt 1(t,3) = At 1(3,t)/sum(At 1(:,t));
end
where the three ratios defining Wt 1(t,i) ( = US, UK, Ger) are simply enforcing the fact that
weights must be summing to one. Figure A4 plots these recursive portfolio weights.
4. At this point, we replace the constant mean function of questions 2-3 with a regression-based
mean function, that makes the conditional mean time-varying:

+1 =
0 + 1 ln + 2 + 3 + 4 + +1

+1 =
0 + 1 ln + 2 + 3 + 4 + +1

+1 =
0 + 1 ln + 2 + 3 + 4 + +1

+1

+1 (0 2 2 2 ).
+1

where , , and are log-index levels expressed in euros. This is called a contagion
regression model because it implies that lagged stock index levels in country may forecast
subsequent excess returns in other markets 6= . Moreover, note that in such a model, the
lagged dividend yield in each equity market forecasts subsequent excess stock returns. Because
the covariance matrix is assumed to be diagonal and constant over time (homoskedastic), in
this case (see your Financial Econometrics I notes and readings) simple, classical OLS will be
unbiased and ecient. Interestingly, estimation is performed using
[beta,sigma,resid,cov b] = mvregress([ones(n,1) lDP(1:end-1,i)
regress(1:end-1,:)],R eq(2:end,i));
Naturally, a diagonal covariance matrix implies that all covariances and hence correlations are
constant and identically zero over time. Of course, this is a restriction we have imposed, not an
empirical finding suggested by the data, that we have already noticed imply a dynamics that can be
usefully captured by a GARCH DCC model in question 2. Interestingly, this model is symmetric to
the one in question 2: then, conditional means were constant, but the conditional covariance matrix
time-varying; here the opposite occurs and only conditional means are constant. The estimates and
their robust standard errors (see MVREGRESS in the Matlab manual for details) are printed on

64

the Matlab screen in Figure A5.

Figure A5: OLS estimates of regression-based conditional mean functions

Interestingly, most coecients fail to be statistically significant (none of them are in the case of
UK excess returns) and when they are, these are marginally significant, with t stats close to the
threshold of 2.

Figure A6: Conditional means, volatilities, and correlations from a homoskedastic contagion model

There is however evidence that past high UK stock prices positively infect US stock returns, while
the opposite occurs for past German stock prices. Figure A6 uses the customary three-panel plot to
graph the one-month ahead (predicted) expected excess returns, volatilities, and correlations. As
it is obvious, all volatilities and correlations are constant (the latter are identically zero, as noted
above).
5. When we use conditional means, variances, and correlations to compute recursive dynamic
Markowitz weights when
[ (+1 ) (+1 ) (+1 )]0 , we obtain Figure A7,
65

where we have cut excessively large and small weights at 400% in absolute value.

Figure A7: Optimal mean-variance portfolio weights from a homoskedastic, contagion regression model

In fact, the optimal weights occasionally shoot up in absolute value to values that are hardly
sensible as they exceed 100%. This derives from the fact that in Figure A6although conditional means are imprecisely estimatedthe forecasts are highly unstable, and this carries
forward onto optimal MV portfolio shares.
6. We now bring the models of questions 2-3 and 4-5 together, by estimating by iterated MLE (see
below for details), the following restricted VAR(1) model that jointly captures the presence
of time variation in conditional means, conditional variances, and conditional covariances:53

+1 =
+1
0 + 2 + +1

+1 =
+

+1
+1
+1
0
3

+1 =
0 + 4 + +1

+1

Notice that by estimating this model you are generalizing the model in question 4 not only
because you are allowing variances to vary over time, but also (or especially) because you
are no longer constraining the correlations to be always zero and therefore constant over
time.54 The reference above to iterated MLE refers to the fact that Kevin Sheppards

GARCH/DCC utilities will not allow you to jointly estimate the regression coecients and
also the GARCH/DCC dynamic covariance matrix; therefore what you are advised to do is to
first estimate the residuals from the regression models, then fit on them your GARCH/DCC
model using the code,
for i=1:3
53

This VAR(1) is restricted because it boils down to three dierent AR(1) models stacked on top of each other.
However, dierent national stock markets are still allowed to interact and contage each other through the dynamic
covariance matrix.
54
Of course, the conditional mean functions are simpler than those in question 4, but you are welcome to propose
the same, identical regression to see what happens.

66

var1=strcat(beta ,country(i,:));
var2=strcat(SIGMA ,country(i,:));
var3=strcat(RESID ,country(i,:));
[beta,sigma,resid,cov b] = mvregress([ones(n,1) R eq(1:end-1,i)], R eq(2:end,i));
assignin(base,var1,beta);
assignin(base,var2,sigma);
assignin(base,var3,resid);
assignin(base,var4,cov b);
end
Res = [RESID US RESID UK RESID GER];
[parameters, loglikelihood, SIGMA 3, stderrors, jointscores, diagnostics]=
dcc(Res,[],1,0,1);
and finally proceed to re-estimate the regression model by feasible GLS when the covariance
matrix is the one estimated by GARCH/DCC:
SIGMA 3 r = [SIGMA 3 11 SIGMA 3 12 SIGMA 3 13;
SIGMA 3 21 SIGMA 3 22 SIGMA 3 23;
SIGMA 3 31 SIGMA 3 32 SIGMA 3 33];
X = [ones(n,1) R eq(1:end-1,1) zeros(n,2) zeros(n,2);
zeros(n,2) ones(n,1) R eq(1:end-1,2) zeros(n,2);
zeros(n,2) zeros(n,2) ones(n,1) R eq(1:end-1,2)];
Y = [R eq(2:end,1); R eq(2:end,2); R eq(2:end,3)];
BETA = inv(X*inv(SIGMA 3 r)*X)*X*inv(SIGMA 3 r)*Y;
OMEGA = inv(X*inv(SIGMA 3 r)*X);

Figure A8: First-stage GARCH/DCC estimates of demeaned excess returns

67

For a refresher on what FGLS are, how dierent these are from OLS, etc., please consult Appendix
A of this chapter. Figures A8-A9 display the estimated parameters as these are printed on the
Matlab screen:

Figure A9: Second-stage feasible GLS (GARCH/DCC-based) estimates of conditional mean parameters

Also in this case none of the regression coecients are strongly statistically significant, which should
advise caution before basing your mean-variance asset allocation on such parameters. Figure A10
shows the now customary three-panel plot, in which finally conditional means, variances, and correlations are all time-varying.

Figure A10: Conditional means, volatilities, and correlations from a heteroskedastic restricted VAR(1) model

7. When we use conditional means, variances, and correlations to compute recursive dynamic
Markowitz weights when
is derived from the fitted values from the restricted VAR(1) in
question 6, as given by the line of command
FIT3 vec = [(BETA(1,1)*ones(n,1)+BETA(2,1)*R eq(1:end-1,1))
(BETA(3,1)*ones(n,1)+BETA(4,1)*R eq(1:end-1,2))
(BETA(5,1)*ones(n,1)+BETA(6,1)*R eq(1:end-1,3))];
68

we obtain Figure A11, where we have cut excessively large and small weights at 500% in absolute
value.

Figure A11: Optimal mean-variance portfolio weights from a heteroskedastic restricted VAR(1) model

8. With reference to the out-of-sample period January 2010 - December 2012 we now proceed to
compute optimal Markowitz weights for two models: the unconditional mean GARCH/DCC
model of question 2; the VAR/GARCH/DCC of question 6. For both models we have used
the same estimated conditional mean (the intercept in the first case, the restricted VAR
parameters in the latter) and the GARCH/DCC parameters estimated in questions 2 and
6, but the dynamic covariance matrix has been updated in real time (to produce one-month
ahead forecasts) on the basis of the out-of-sample forecast errors obtained in the earlier step.
For instance, the GARCH models are used during the out-of-sample period to obtain
2

2 +1 =
+
(
)2 +

In the code, this occurs by first computing forecast, then forecast errors, and finally by feeding
such errors inside the GARCH DCC model using Kevin Sheppards dcc mvgarch cov function:55
% Computes forecasts
X = [ones(n,1) R eq(1:end-1,1) zeros(n,2) zeros(n,2);
zeros(n,2) ones(n,1) R eq(1:end-1,2) zeros(n,2);
zeros(n,2) zeros(n,2) ones(n,1) R eq(1:end-1,2)];
Forc h = X*BETA;
Forc = [Forc h(1:n) Forc h(n+1:2*n) Forc h(2*n+1:3*n)];
% Computes forecast errors
Pr err = R eq(2:end,:) - Forc;
55

The code below refers to the second, restricted VAR(1) model but they are logically equivalent to those applied
to the constant mean model.

69

[GAMMA, D]=dcc mvgarch cov(par dcc,Pr err(:,1),Pr err(:,2),Pr err(:,3),[0


1]*corr([Pr err(:,1) Pr err(:,2)])*[1;0],[0 1]*corr([Pr err(:,1) Pr err(:,3)])*[1;0],[0
1]*corr([Pr err(:,2) Pr err(:,3)])*[1;0]);
Figure A12 presents the weights obtained in this way.

Figure A12: Out-of-sample optimal mean-variance portfolio weights

After obtaining the weights, we compute the realized Sharpe ratios over the out of sample period,
which are presented in Figure A13.

Figure A13: Out-of-sample realized Sharpe ratios from alternative models

Clearlyalthough these are modestthere can be advantages from adopting a GARCH/DCC model
as a base for asset allocation, when the conditional mean model is suciently restricted and parsimonious to avoid excessive variation in forecasts of risk premia (expected excess returns).
9. At this point, we switch to a dierent data set and load the daily data set already used on
many occasions before. Using daily data, i.e., treating them as high-frequency data (with
= 22 per month), we compute monthly non-overlapping cumulative returns and monthly
70

non-overlapping realized variance (RV) series and realized covariance for US and German
stock indices. The following lines of code perform this task:
start=1+window;
RV(start,:)=sum(RET(2:2+(window-1),:).2);
RCOV(start,:)=sum(RET(2:2+(window-1),1).*RET(2:2+(window-1),2));
RET CUM(start,:)=sum(RET(2:2+(window-1),:));
for t=start+window:(window-1):T
RV(t-(window-1):t,:)=ones(size(t-(window-1):t,2),1)*sum(RET(t-(window-1):t,:).2);
RCOV(t-(window-1):t,:)=ones(size(t-(window-1):t,2),1)*sum(RET(t-(window-1):t,1).
*RET(t-(window-1):t,2));
RET CUM(t-(window-1):t,:)=ones(size(t-(window-1):t,2),1)*sum(RET(t-(window1):t,:));
end
In Figure A14, we plot the resulting monthly realized variances, covariances, and the autocorrelograms of theses series.

Figure A14: Monthly realized variance and associated autocorrelogram functions

The ACFs emphasize that realized variances and covariances are rather persistent and therefore
predictable.
10. Finally, we have used the first 100 observations of the monthly realized variance sample that
we have obtained in point 9 to estimate two univariate AR(1) models for the US and German
RVs and for the US-German RCOV. We use these models to compute one-step ahead forecasts
of the US, German and US-German realized variances and covariance from observation 100
to the end of the sample. At this point, we construct an equally weighted portfolio, compute
71

its returns and the one-step ahead predicted variance and covariances. Figure A15 shows the
corresponding Q-Q plots for German, US, and the equally weighted portfolio standardized
returns.

Figure A15: QQ plots for standardized returns obtained from realized variances and covariance

This confirms something mentioned during the lectures: when returns are standardized using
realized second moments, the resulting ( = Germany) are approximately normal, even
though some problems remain the left tail.

Appendix C Another Matlab Workout


You are a European investor with the Euro as a reference currency. Using daily data in STOCKINT2013.XLS, construct monthly returns (in Euros) using the three price indices DS Market-PRICE
indices for Germany, the US and the UK.
1. For the sample period January 1, 2007 - December 31, 2010, compute and plot the return
series of each of the three indices expressed in Euros.
2. With reference to the same sample period, compute the unconditional covariance matrix
of the 3 1 vector of index returns as well as the unconditional variance of an equallyweighted portfolio. Based on these sample estimates of variances, compute and plot the 1%
unconditional (Gaussian) VaR measures for each of the three national indices as well as for
the equally weighted portfolio. Moreover, compute an equally weighted average of the three
1% VaR measures and compare it with the 1% normal VaR of the equally weighted portfolio:
why are they not the same? Link this finding to the concepts of passive and active risk
management.

72

3. With reference to the post-financial crisis period January 3, 2011 - December 31, 2012 and to
your equally weighted portfolio returns, compute and plot the following 3 recursive, daily 1%
VaR measures: (i) a constant, unconditional Gaussian VaR; (ii) a Gaussian GARCH(1,1) VaR
directly estimated on your portfolio returns (i.e., a passive VaR measure); (iii) a trivariate
Gaussian Constant Conditional Correlation (CCC) GARCH (1,1) VaR in which correlations
are assumed to be constant and equal to the unconditional pair-wise correlations between
first-stage standardized residuals from appropriately defined GARCH(1,1) models. (iii) is an
active risk-management measure because it depends on your specific portfolio weights. [Hint:
Notice that this question implies that you will have to estimate three simple GARCH(1,1)
processes for the three indices and also for your own portfolio over the assigned 6-month
sample]
4. Repeat question 3 with reference to the same sample period, but this time comparing the
daily, recursive, 1% VaR measures for: (i) a Gaussian GARCH(1,1) VaR directly estimated
on your portfolio returns (i.e., a passive VaR measure); (ii) a trivariate Gaussian Constant
Conditional Correlation GARCH (1,1) VaR in which correlations are assumed to be constant
and equal to the unconditional pair-wise correlations between first-stage standardized residuals
from appropriately defined GARCH(1,1) models, (iii) a trivariate Gaussian Dynamic Conditional Correlation GARCH (1,1) VaR in which correlations are estimated from an Exponential
Smoothing, RiskMetrics-style model for the elements of the auxiliary matrix Q as discussed
in Lecture 4 of the second part of the course.
5. Estimate over the 2007-2010 sample a simple constant-mean GARCH(1,1)-DCC(1) model to
filter the dynamics of the correlations of returns at daily frequency. In essence, the model is:

+1 = + +1
+1

,
+1 = + +1
+1 0 +1
+1 = + +1
+1
where +1 denotes daily returns ( = EU/Ger, US, UK). Also extract the dynamic, conditional correlation matrix implied by the model and plot the (predicted) correlations during
the period January 2011 - December 2012.
6. Compute and plot the 1% VaR from the DCC model of question 5 over the out-of-sample
period January 1, 2011 - December 31, 2012 and compare it with the 1% VaR computed from
the CCC and the RiskMetrics, exponentially smoothed DCC of questions 3 and 4.
7. Estimate over the 2007-2010 sample a constant-mean Principal Component (also called Orthogonal) GARCH(1,1) model. Plot the predicted, one-day ahead dynamic volatilities and
correlations resulting from the PC/Orthogonal GARCH model. In the case of volatilities, it
is easier if you compute and plot the volatility over time of your portfolio. Compare such a

73

dynamic, conditional volatility with the time series you should have derived from the DCC
GARCH of question 5.
8. Recursively compute and plot 1% VaR over the out-of-sample period January 1, 2011 - December 31, 2012 using both historical and weighted historical simulations with a rolling window of
= 252 days andin the case of weighted historical simulationsa decay factor of = 099
Apply your calculations to each index return series individually as well as to your portfolio
returns. Check whether a simple, weighted-combination of individual asset historical VaRs
equals the historical VaR for your portfolio.
9. Estimate a BEKK-GARCH(1,1) model. In case you fail, try again using a longer sample
Jan. 1, 2003 - Dec. 31, 2010. How long does it take on your computer to estimate a truly
multivariate (and yet, already simplified in some ways, as commented in the lectures) BEKK
model? [Hint: You need to use Kevin Shepards full bekk mvgarch function; in case your first
attempt at estimating the BEKK model fails, you will need to change the beginning and end
dates of the sample and then F9 to re-estimate the BEKK, since the code will have stopped at
that point] WARNING: on not-so-good, not-so-new laptops this point of the code may take up
to 15 minutes to run. You can always stop execution by pressing the combination CRTL+C.

Solution
This solution is a commented version of the MATLAB code Ex Multi GARCH 2013.m posted
on the course web site. Note that in this case, all the Matlab functions needed for the correct
functioning of the code have been included. The loading and pre-processing of the data is similar
to Appendix B and therefore it will not be repeated here. The same applies to the exchange rate
transformations that have now become customary in the first part of our Matlab workouts.
1. Figure C1 shows the plots of the daily data and shows no surprises.

Figure C1: Daily index returns (expressed in euros) for the sample 2007-2010

74

2. Next, we compute the unconditional covariance matrix of the 3 1 vector of index returns
as well as the unconditional variance of an equally-weighted portfolio. Based on these sample
estimates of variances, we compute and plot the 1% unconditional (Gaussian) VaR measures
for each of the three national indices as well as for the equally weighted portfolio. We compare
it to an equally weighted average of the three 1% VaR measures and compare it with the 1%
normal VaR of the equally weighted portfolio. This is accomplished by the following simple
lines of code:
Cov matrix=cov(ret(first:last,:));
Corr matrix=corr(ret(first:last,:));
p=0.01;
VaR port unc=-norminv(p,0,sqrt(w*Cov matrix*w));
VaR ger=-norminv(p,0,sqrt(Cov matrix(1,1)));
VaR us=-norminv(p,0,sqrt(Cov matrix(2,2)));
VaR uk=-norminv(p,0,sqrt(Cov matrix(3,3)));
VaR sum=[VaR ger, VaR us, VaR uk]*w;

Figure C2: Comparing 1% VaR measures based on unconditional correlation estimates

Figure C2 shows the results. Interestingly, the VaR of the equally weighted portfolio is not the same
asit is in fact considerably lower (3.2% per day vs. 3.7%)the equally weighted average of the
VaRs of each individual market. This is caused by the fact that if VaR is computed on the basis of
w0 w then it becomes a highly complex (non-linear) function of the portfolio weights, which is not
the case when one simply sums and weights the VaR measures obtained for each individual market.
In fact, the figure clearly shows that diversification reducesas you would expectrisk not only
when the latter is measured by portfolio variance, but also when you measure risk as VaR: this is
the dierence between the first bar concerning the VaR of w0 w and the remaining values, in which
75

either diversification is not applied (when you put 100% of your wealth in each of the markets) or
the calculation is incorrectly performed, as the VaR of a portfolio is not the same as the portfolio
of the VaRs.
3. With reference to the post-financial crisis period January 3, 2011 - December 31, 2012 and
the equally weighted portfolio returns, next we compute and plot 3 recursive, daily (i.e., in
correspondence on each single day in this out-of-sample period) 1% VaR measures: (i) a
constant, unconditional Gaussian VaR computed as
VaR port unc=-norminv(p,0,sqrt(w*Cov matrix*w));
(ii) a Gaussian GARCH(1,1) VaR directly estimated on portfolio returns (i.e., a passive VaR
measure)
VaR garch port=-vol Port*norminv(p,0,1);
(iii) a trivariate Gaussian Constant Conditional Correlation (CCC) GARCH (1,1) VaR in which
correlations are assumed to be constant and equal to the unconditional pair-wise correlations between first-stage standardized residuals from appropriately defined GARCH(1,1) models:56
% Standardized returns to be used in CCC calculations
ret std GER=ret ger(first:last,1)./sigma GER;
ret std US=ret us(first:last,1)./sigma US;
ret std UK=ret uk(first:last,1)./sigma UK;
ret std Port=port ret(first:last,1)./sigma Port;
% Estimates correlations from a Constant Conditional Correlation (CCC)
Multivariate ARCH
Gamma=corr([ret std GER ret std US ret std UK]);
VaR CCC=NaN(final-last,1);
for i=1:final-last
D=diag([vol GER(i) vol US(i) vol UK(i)]);
VaR CCC(i,1)=-norminv(p,0,sqrt(w*D*Gamma*D*w));
end
Figure C3 shows the resulting VaR estimates from the three models, plus realized daily returns
(the blue time series). Obviously the unconditional VaR gives a constant 1% VaR that is clearly
and repeatedly violated only in the Summer of 2011. The remaining two models give similar VaR
estimates that are visibly time-varying andwhich debunks a myth often entertainedmost of
56

This question implies that we estimate three GARCH(1,1) processes for the three indices. We omit those Matlab
commands as they are identical to those already commented in chapter 4.

76

the time less restrictive than the unconditional VaR. However, between the Summer and the Fall
of 2011, in correspondence to the first bout of the European sovereign debt crisis, the univariate
GARCH-based and the CCC VaRs drastically decline (i.e., VaR becomes larger in absolute value)
and as a result a few of the violations recorded with respect to the unconditional, normal-based
VaR are avoided.

Figure C3: Recursive daily 1% VaR estimates from dierent alternative models

4. As instructed by the text of the workout, we have repeated question 3 with reference to the
same sample period, but this time comparing the daily, recursive, 1% VaR measures for: (i) a
Gaussian GARCH(1,1) VaR directly estimated on portfolio returns (i.e., a passive VaR measure); (ii) a trivariate Gaussian Constant Conditional Correlation GARCH (1,1) VaR in which
correlations are assumed to be constant and equal to the unconditional pair-wise correlations
between first-stage standardized residuals from appropriately defined GARCH(1,1) models
and these are the same models already employed in Figure C3and (iii) a trivariate Gaussian
Dynamic Conditional Correlation GARCH (1,1) VaR in which correlations are estimated from
an Exponential Smoothing, RiskMetrics-style model for the elements of the auxiliary matrix
Q :
options = optimset(fmincon);
options.Display = iter;
parm=0.5;

LB = 0;

UB = 1;

A = 1;

b = 0.998;

lambda = fmincon(@dcc ES 3assets,parm,A,b,[],[],LB,UB,[],options,ret std GER,ret std US,ret std UK);


These lines of code perform manual fmincon-type estimation of a RiskMetrics-based DCC model
that uses as an objective function the routine @dcc ES 3assets that comes with Kevin Sheppards
GARCH package. Here LB is the lower bound of the region over which the search for the parameter
lambda, and UB is the upper bound.57 As printed on the Matlab screen, the estimated RiskMetrics
57

A = 1 means that the constraint is linear and proportional.

77

parameter is 0.998. Figure C4 shows the recursive daily 1% VaR estimated under the three models.
The three models now all give roughly similar estimates, in spite of the important qualitative
dierence between the CCC that assumes a constant correlation and the DCC that estimates a
time-varying conditional correlation.

Figure C4: Recursive daily 1% VaR estimates from dierent alternative models including RiskMetrics-DCC

Figure C5: Estimation output from GARCH(1,1)-DCC(1,1) model

78

5. We estimate over the 2007-2010 sample a simple constant-mean GARCH(1,1)-DCC(1) model


to filter the dynamics of the correlations of returns at daily frequency. The lines of code
accomplishing these instructions are performed using functions that were already used in
Appendix B. The estimates of the parameters are printed on the screen and are shown in
Figure C5.The estimates obtained are rather typical, although the sum of the conditional
covariance ( ) DCC coecients (0.735) is rather low and surely lower than what we have
already reported in Appendix B. Such a low persistence of covariance also explains why in
Figure C4 the dierences between CCC and DCC estimates are modest at best. The dynamic,
predicted correlations implied by the model are plotted for the period January 2011 - December
2012 in Figure C6.

Figure C6: Predicted dynamic conditional correlations from DCC model

The plot confirmsin spite of some peaks and troughsthe approximate constancy of pairwise
correlations from the estimated DCC model, even over the out-of-sample 2011-2012 period.
These predicted correlations are obtained from the lines of code:
%A function designed to compute conditional variances and standardize returns
[sigma2 z]=garchfor(garch p,ret1);
[rho12 for rho23 for rho13 for Gamma dyn garch]=
dcc mvgarch for(par dcc,sret GER,sret US,sret UK,Corr matrix(1,2), ...
...Corr matrix(1,3),Corr matrix(2,3));
6. We compute and plot the 1% VaR from the DCC model of question 5 over the out-of-sample
period January 1, 2011 - December 31, 2012 and compare it with the 1% VaR computed from
the CCC and the RiskMetrics, exponentially smoothed DCC of questions 3 and 4. Figure
C7 shows the 1% VaR from CCC and from two alternative implementations of DCC. Apart
from the very early part of the out-of-sample period, when the RiskMetrics-based DCC risk
measures are lower (in absolute value) than those under CCC and GARCH-DCC, the model
79

give very similar results. As seen in Figure C3, the real dierences can be computed with
respect to the unconditional VaR and passive measures of VaR that are based on the univariate
time series of portfolio returns.

Figure C7: Recursive daily 1% VaR estimates from alternative CCC and DCC models

7. We estimate over a 2007-2010 sample a constant-mean Principal Component (also called


Orthogonal) GARCH(1,1) model:
% ORTHOGONAL GARCH (or PC-GARCH) - estimation
z pc=[z GER(:,1) z US(:,1) z UK(:,1)];
[H orth, parameters orth, Ht orth, stdresid orth, stderrors orth, A orth, B orth,
weights orth, principalcomponets, cumR2 orth] =...
o mvgarch(z pc,3,1,1);
This uses a toolbox function, o mvgarch for which help information appears in the code. Figure
C8 plots the predicted, one-day ahead dynamic volatilities and correlations resulting from the
PC/Orthogonal GARCH model.

Figure C8: Predicted dynamic conditional correlations from PC/Orthogonal GARCH model

80

Figure C9 compares instead the the dynamic, conditional volatility with the time series we have
derived from the DCC GARCH in question 5. Of course, the dierence is given by the fact that
the predicted volatilities in the case of a PC GARCH are obtained from the principal components
of returns. The figure shows tha two serieswith the exception of the first month of 2011are
quite similar, which indicates that an application of GARCH to principal components tends to give
results that are close to those obtained from each of the return series individually.

Figure C9: Predicted dynamic volatilities from PC/Orthogonal vs. DCC GARCH models

8. We have recursively computed and plotted 1% VaR over the out-of-sample period January
1, 2011 - December 31, 2012 using both historical and weighted historical simulations with a
rolling window of = 252 days andin the case of weighted historical simulationsa decay
factor of = 099 We apply calculations to each index return series individually as well as to
your portfolio returns:
m=253; %Length of the rolling window (approximately one year)
eta=0.99; %Decay parameter in weighted historical simulation
VaR HS=NaN(final-last,4);
VaR WHS=NaN(final-last,4);
for jj=1:4
for k=1:final-last
%Historical Simulation
VaR HS(k,jj)=-quantile(RET all(last+k-m:last+k,jj),p);
%Weighted Historical Simulation
[ret sort I]=sort(RET all(last+k-m:last+k,jj));
weight=(eta.(m-I)*(1-eta)/(1-etam));
csum=cumsum(weight);
ind=rows(csum(csump));

81

if ind==0
VaR WHS(k,jj)=-ret sort(ind+1);
else
lambda=(p-csum(ind,1))./(csum(ind+1,1)-csum(ind));
VaR WHS(k,jj)=-(lambda*ret sort(ind+1,1)+(1-lambda)*ret sort(ind,1));
end
end
Figure C10 shows results for individual stock markets, while Figure C11 for the equally-weighted
portfolio. Clearly for all national stock markets, the late Summer and Fall of 2011 turned dicult
for these rather simple methods of VaR calculation. In both pictures we also notice that the VaR
estimated by simulation tend to be rather generous, i.e., to imply high levels for the risk measure,
generally higher than what is required by 99% of all realized returns over long periods of time.

Figure C10: Recursive daily 1% VaR estimates from historical and weighted historical simulations

Figure C11: Recursive daily 1% VaR estimates from historical and weighted historical simulations

9. We estimate a BEKK-GARCH(1,1) model using the code:


82

first=datefind(datenum(01/03/2006),date);
last=datefind(datenum(12/31/2007),date);
[par bekk, llk bekk, Ht bekk, likelihoods bekk, stdresid bekk, stderrors bekk, A, B,
scores bekk] =...
full bekk mvgarch([ret ger(first:last,1) ret us(first:last,1) ret uk(first:last,1)],1,1);
% Display estimation results
disp([FULL BEKK-GARCH(1,1) PARAMETERS]);
disp([ Estimate Std. Error Robust t-stat]);
disp([par bekk diag(stderrors bekk) (par bekk./diag(stderrors bekk))]);
As advised, we use Kevin Shepards full bekk mvgarch function obtaining

Figure C12: Parameter estimates from BEKK GARCH(1,1) model

83

As a matter of fact, even on a relatively new and decent laptop, because of its many parameters,
estimation of a BEKK model may take up to 5 minutes.
References
[1] Alexander, C., 2001. Market Models. John Wiley: New York.
[2] Andersen T., T. Bollerslev, P. Christoersen, and F. Diebold, 1998. Practical Volatility and
Correlation Modeling for Financial Market Risk Management, in Risks of Financial Institutions, University of Chicago Press, pp. 513-548.
[3] Bauwens, L., S., Laurent, and J., Rombouts, 2006. Multivariate GARCH Models: A Survey,
Journal of Applied Econometrics, 21, 79-109.
[4] Comte, F., and O., Lieberman, 2003. Asymptotic theory for multivariate GARCH processes,
Journal of Multivariate Analysis, 84, 61-84.
[5] Engle, R., 2002. Dynamic Conditional Correlation: A Simple Class of Multivariate GARCH
Models, Journal of Business and Economic Statistics, 20, 339-350.
[6] Engle R., and F., Kroner, 1995. Multivariate Simultaneous Generalized ARCH, Econometric
Theory, 11, 122-150.
[7] Fama, E., and K., French, 1992. The Cross-Section of Expected Stock Returns, Journal of
Financem 47, 427-465.
[8] Gourieroux, C., 1997. ARCH Models and Financial Applications. Springer-Verlag: New York.
[9] Green, W., 2008. Econometric Analysis, 6th Edition, Prentice Hall.
[10] Jeantheau T., 1998. Strong Consistency of Estimators for Multivariate ARCH Models, Econometric Theory, 14, 70-86.
[11] Hamilton, J., 2004. Time Series Analysis, Princeton University Press.
[12] Kroner F., K., and V., Ng, 1998. Modelling Asymmetric Comovements of Asset Returns,
Review of Financial Studies, 11, 817-844.
[13] Ledoit O., P., Santa-Clara, and M., Wolf, 2003. Flexible Multivariate GARCH Modeling
with an Application to International Stock Markets, Review of Economics and Statistics, 85,
735-747.
[14] Litterman R., and K. Winkelmann. (1998). Estimating Covariance Matrices, Goldman Sachs,
Risk Management Series.
[15] Tse Y., K., and A., KC, Tsui, 2002. A Multivariate GARCH Model with Time-Varying
Correlations, Journal of Business and Economic Statistics, 20, 351-362.

84

Errata Corrige
(21/05/2013, p. 25) The formula for (matrix) covariance targeting in the DCC is:
Q+1 = (1 )[z z0 ] + z z0 + Q

(29/05/2013, p. 12) A term

2 2
+1
=1

1 2
() had been omitted twice in the formulas

appearing in this page. No qualitative eects for the conclusions in this page.

85

You might also like