0% found this document useful (0 votes)
74 views68 pages

Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models

This thesis proposes a novel nonparametric modeling framework called Gaussian Process Regression Stochastic Volatility (GPRSV) models for analyzing financial time series data. The GPRSV models extend standard state-space models by adding a Gaussian process prior to the hidden state transition process. Both online particle filtering and offline particle Markov chain Monte Carlo methods are studied for learning the proposed GPRSV models from simulated and empirical financial data. The thesis demonstrates the GPRSV framework and inference methods can provide more flexible modeling of time-varying volatility compared to existing parametric volatility models.

Uploaded by

Federico Dragoni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views68 pages

Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models

This thesis proposes a novel nonparametric modeling framework called Gaussian Process Regression Stochastic Volatility (GPRSV) models for analyzing financial time series data. The GPRSV models extend standard state-space models by adding a Gaussian process prior to the hidden state transition process. Both online particle filtering and offline particle Markov chain Monte Carlo methods are studied for learning the proposed GPRSV models from simulated and empirical financial data. The thesis demonstrates the GPRSV framework and inference methods can provide more flexible modeling of time-varying volatility compared to existing parametric volatility models.

Uploaded by

Federico Dragoni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Financial Time Series Volatility Analysis

Using Gaussian Process State-Space


Models
by

Jianan Han

Bachelor of Engineering, Hebei Normal University, China, 2010

A thesis

presented to Ryerson University

in partial fulfillment of the

requirements for the degree of

Master of Applied Science

in the Program of

Electrical and Computer Engineering

Toronto, Ontario, Canada, 2015

Jianan
c Han 2015
AUTHOR’S DECLARATION FOR ELECTRONIC SUBMISSION OF A
THESIS

I hereby declare that I am the sole author of this thesis. This is a true copy of the
thesis, including any required final revisions, as accepted by my examiners.

I authorize Ryerson University to lend this thesis to other institutions or individuals for
the purpose of scholarly research.

I further authorize Ryerson University to reproduce this thesis by photocopying or by


other means, in total or in part, at the request of other institutions or individuals for
the purpose of scholarly research.

I understand that my dissertation may be made electronically available to the public.

ii
Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models

Master of Applied Science 2015

Jianan Han

Electrical and Computer Engineering

Ryerson University

Abstract

In this thesis, we propose a novel nonparametric modeling framework for financial time

series data analysis, and we apply the framework to the problem of time varying volatility

modeling. Existing parametric models have a rigid transition function form and they

often have over-fitting problems when model parameters are estimated using maximum

likelihood methods. These drawbacks effect the models’ forecast performance. To solve

this problem, we take Bayesian nonparametric modeling approach. By adding Gaussian

process prior to the hidden state transition process, we extend the standard state-space

model to a Gaussian process state-space model. We introduce our Gaussian process

regression stochastic volatility (GPRSV) model. Instead of using maximum likelihood

methods, we use Monte Carlo inference algorithms. Both online particle filter and offline

particle Markov chain Monte Carlo methods are studied to learn the proposed model.

We demonstrate our model and inference methods with both s imulated and empirical

financial data.

iii
Acknowledgements
I sincerely thank all the people who have helped and supported me during my graduate
study at Ryerson University. Without their help I would never have been able to complete
this thesis.
First and foremost I would like to express my deepest gratitude to my supervisor
Dr. Xiao-Ping Zhang, whose expertise and enthusiasm for research had set an excellent
example for me. I appreciate all his inspiration, understanding, and patience. Thank
you for providing me such a great research atmosphere.
Also I am really grateful to Dr. Alagan Anpalagan, Dr. Bobby Ma and Dr. Karthi
Umapathy. Thank you for the time, efforts, and contributions to this work.
It has been a pleasure working with all my colleagues in Communications and Signal
Processing Applications Laboratory (CASPAL), who are always willing to discuss and
share their knowledge with me. A very special thanks goes to Dr. Zhicheng Wei, without
whose motivation and encouragement I would not have considered a graduate study in
Electrical Engineering.
Finally, my heartfelt thanks to my parents for their continuous support, understanding
and encouragement.

iv
Contents

Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1 Introduction 1
1.1 Motivation and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Volatility Modeling Literature Review . . . . . . . . . . . . . . . . . . . . 3
1.2.1 GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Stochastic Volatility Models . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Background 11
2.1 Financial Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Volatility Modeling Preliminaries . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Bayesian Nonparametric Framework . . . . . . . . . . . . . . . . . . . . . 15
2.4 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 GP for Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 State-Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

v
3 Gaussian Process Regression Stochastic Volatility Model 23
3.1 GPRSV Models Framework . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 GP-SSM for Financial Time Series Modeling . . . . . . . . . . . . . . . . 25
3.3 Model Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Model Building Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 GPRSV with Exogenous Factors . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 GPRSV Models Inference 37


4.1 Bayesian Inference for State-Space Models . . . . . . . . . . . . . . . . . 37
4.2 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 Sequential Monte Carlo Methods . . . . . . . . . . . . . . . . . . 40
4.2.2 Particle MCMC methods . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Volatility Analysis with GPRSV Models 45


5.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Empirical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Volatility Forecast Evaluation . . . . . . . . . . . . . . . . . . . . 50
5.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6 Conclusion and Future Work 57

References 63

vi
List of Tables

2.1 Descriptive Statistic of S&P 500 Daily Return Data . . . . . . . . . . . . 12

3.1 Descriptive Statistic of GE Daily Return Data . . . . . . . . . . . . . . . 34

5.1 Descriptive Statistic of IBM Daily Return Data . . . . . . . . . . . . . . 52


5.2 Estimated GPRSV Model Hyper-parameters Results for IBM Daily Re-
turn Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Results of IBM Volatility Forecast Using Loss Functions . . . . . . . . . 55

vii
List of Figures

2.1 Standard & Poor 500 Index Return and Close Price Data . . . . . . . . 13
2.2 Graphical Model Representation of Standard Gaussian Process Regression 18
2.3 Graphical Model Representation of State-Space Model . . . . . . . . . . 21

3.1 Graphical Model for Gaussian Process Regression Stochastic Volatility Model 24
3.2 Graphical Model Representation of a Gaussian Process State-Space Model 26
3.3 Flowchart of Volatility Model Building Process . . . . . . . . . . . . . . . 29
3.4 GE Daily Return Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Sample ACF and PACF Functions Plot for GE Daily Returns . . . . . . 33
3.6 Graphical Model for Gaussian Process Regression Stochastic Volatility
Model with Exogenous Factor . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1 Return and Variance Values of Simulated Data . . . . . . . . . . . . . . 46


5.2 Estimated Hidden States Densities of Simulated Data . . . . . . . . . . . 47
5.3 Results of Simulated Data Hyper-parameters learned from RAPCF . . . 48
5.4 Predictive Log-likelihood value of Simulated Data . . . . . . . . . . . . . 49
5.5 Results of IBM Daily Data Volatility Analysis . . . . . . . . . . . . . . . 53

viii
Chapter 1

Introduction

1.1 Motivation and Objective


Financial time series data analysis is one of the most studied areas in financial economics
research, also it is a highly empirical discipline. Both academic researchers and finance
market practitioners are interested in questions like these: what is the mechanism of
financial market? What are the determining reasons for asset prices change? To answer
these questions we need to find a proper way to describe the market and the vast data it
generates. The financial market is a huge complex system determined by many factors
such as political, corporate, and individual decisions. Financial data contain both mean-
ingful information and random noises. So from an information processing point of view,
it is nature to take the statistical modeling approach to the problem of financial time
series analysis. For last decades, we have seen a lot of similar applications in engineering
areas. Recent years there has been a dramatic growth of statistical models and related
techniques used in finance as well. In this research, we would like to apply the engineering
modeling techniques to the financial time series data. When working on financial time
series data, we take the same methodology and utilize the same set of mathematical tools
as we process other signals like audio, image and video in engineering applications.
In this research, we would like to exploit the problem of financial time series analysis
using Bayesian nonparametric (BNP) models. One key feature of financial time series
data is that there exists certain levels of uncertainty in the data [52]. For example, the
asset volatility is not directly observed and some data is generally corrupted by noise. As

1
a result, we can use probability theory and related methods to express aspects of these
uncertainties in our models. Another problem with financial time series data modeling is
the huge amount and rapid growth of data. Ideally, a model should be adaptive enough
to handle this. If the forecast output of the model is probability distribution, when
more data comes, we should increase the probability of events which actually happened
[21]. Based on above requirement, we would prefer a model which is flexible and robust
enough to fit the financial time series data in. There is a very large body of current
research on ways of doing approximate Bayesian machine learning [17]. The Bayesian
nonparametric framework can provide an appropriate platform on which to approach
massive data analysis [10].
Volatility plays a uniquely important role in the financial market, and modeling
volatility becomes an increasingly important task in financial time series research. The
main objective of this research is apply the Bayesian nonparametric modeling framework
to analyze the conditional volatility. The volatility of the asset return series is an impor-
tant factor in measuring risk. Because volatility describe the magnitude and speed of the
time series’ fluctuations, it can be interpreted as the variability of a financial time series.
Although volatility is not the same as risk, its importance in conveying the uncertainty
when making investment decisions makes it one of the most important variables. There
are three main purposes of modeling asset volatility:

Risk Management potential future losses of one asset are measured because they ac-
count for large part of risk management. When we calculate these losses, the future
volatilities are needed as an input.

Portfolio Optimization the standard approach of Markowitz [38] by minimizing risk


for given level of expected return, the estimate of the variance-covariance matrix is
required to proxy the risk. Like the application to risk management, the volatility
for each asset in the portfolio is crucial important to optimize the portfolio.

Option Pricing all option traders try to develop their own volatility trading strategy,
and based on that to compare the estimate for one option’s value and the market
price. Hence they can take bets on future volatility. This is perhaps the most
challenging application. Since the Chicago board of option exchange (CBOE) in-
troduced the ground-breaking volatility index VIX in 1993 [44], Many investors

2
worldwide consider VIX as the world’s premier barometer of investor sentiment
and market volatility. Because of its importance, the volatility index of a market
has become a financial instrument. The VIX volatility index has been traded in
futures since March 26, 2004.

1.2 Volatility Modeling Literature Review

In this section, we review main parametric volatility models, their advantages and dis-
advantages and techniques for estimate model parameters. Some alternative approaches
for analyzing volatility are presented as well. Volatility which is often expressed as con-
ditional standard deviation of asset return. In Equation (1.1), rt denotes the return of
an asset at time t, and It−1 describe all the information we can obtain until time t − 1.
The expected value µt and variance σt2 of the return series are:

µt = E(rt |It−1 ) (1.1a)


σt2 = V ar(rt |It−1 ) = E[(rt − µt )2 |It−1 ] (1.1b)

There are two general categories of volatility models: the generalized autoregressive
conditional heteroscedasticity (GARCH) models and Stochastic Volatility (SV) models.
The first category models describe the evolution of σt2 using an exact function of all
variables available until time t − 1, while those belong to the second category assume a
stochastic process governs σt2 . Both these two type of models share the same structure
which can be expressed:

rt = µt + at (1.2a)
σt2 = Var(rt |It−1 ) = Var(at |It−1 ) (1.2b)

where at is called as the innovation of the asset return at time t.

3
1.2.1 GARCH Models

Autoregressive conditional heteroscedasticity (ARCH) model [16] was first introduced by


Engle in 1982. An ARCH(p) model can be specified as follows:

at = σt t (1.3a)
p
X
σt2 = α0 + α1 a2t−1 + ... + αp a2t−p = α0 + αi a2t−i (1.3b)
i=1

where at is the innovation of the asset return at time t. t is assumed to be a sequence


of independent and identically distributed(i.i.d.) random variables with zero mean and
unit variance. α0 and αi ,...,αp are model parameters, and α0 > 0, and αi ≥ 0 for i > 0.
For t , it is often assumed to follow the standard Gaussian distribution or a generalized
error distribution (GED) or standardized Student-t distribution.
ARCH model is the first systemic framework for volatility modeling, and it gives a
good way to describe the asset return series features such as volatility clustering. The
ARCH model is not only suitable to asset return data but also does well with other
financial time series. Since the introduction of ARCH model, a lot of variants and
extensions have been proposed. Bollerslev extended the model, and give a form of the
Generalized Autoregressive Conditional Heteroskedasticity Model (GARCH) [6]. Similar
to (1.3), the GARCH(p,q) can be summarized as follows:

at = σt t (1.4a)
p q
X X
σt2 = α0 + αi a2t−i + 2
βi σt−j (1.4b)
i=1 j=1

where at , σt and t are with the same meaning in (1.3). The GARCH(1,1) with t follows
a standard Gaussian distribution is easy to estimate and widely used in many real world
financial applications. Here we can simplify (1.4) to obtain GARCH(1,1) with Gaussian
innovation:

at ∼ N (0, σt2 ) (1.5a)


σt2 = α0 + α1 a2t−1 + βσt−1
2
(1.5b)

4
Although ARCH and GARCH are good candidate models to represent the properties of
financial asset return series, such as volatility clusters, they are not perfect. There are
some weaknesses like: Both models can not handle the leverage effect which is found by
empirical financial data that volatility tend to react differently with positive and negative
return shocks. This asymmetry effect in volatility equation is not captured in ARCH and
GARCH models. Some GARCH extensions are developed to fix this problem. In 1991,
Nelson [43] proposed the Exponential GARCH (EGARCH) model.
q p
X X
log(σt2 ) = α0 + αj g(at−j ) + 2
βi log(σt−i ) (1.6a)
j=1 i=1

g(at ) = θat + λ|at | (1.6b)

In Nelson’s model, the logarithm of σt2 is modeled instead of σt .


Another popular GARCH extension Threshold-GARCH [22]model was introduced by
Glosten et al. in 1993, also see Zakoian [56]. The model is also called as GJR-GARCH,

σt2 = α0 + βαt−1
2
+ γa2t−1 Ht−1 (1.7a)


0, if a ≥ 0
t−1
Ht−1 = (1.7b)
1, if a < 0
t−1

where Ht−1 is the threshold function. α0 , β and γ are model parameters.


The last GARCH typed model we want discuss in ths section is the Markov Regime-
switching GARCH model. The idea of regime switching models for economic data anal-
ysis was introduced at least three decades ago. See [25], [24], and [26] for details of
regime switching models. The literature on Markov Regime-switching GARCH models
(MRS-GARCH) begins with Cai [8]. In the paper of Marcucci [37] compared a set of
standard GARCH models with a group of Markov Regime-Switching GARCH models.
The main feature of MRS-GARCH is that it allows the parameters to switch across dif-
ferent regimes according to a Markov chain process. If we denote the regime variable as
st , the transition probability is:

Pr(st = j|st−1 = i) = pij (1.8)

5
Usually we assume there are two regimes, and the matrix form of Equation 1.8:
" # " #
p11 p21 p (1 − q)
P= = (1.9)
p12 p22 (1 − p) q

If the regime variable st takes the value of i, then the conditional mean and the conditional
variance can be expressed in GARCH(1,1)-like form:

(i) (i)
rt = µt + σt t (1.10a)
(i)
at = rt − µt (1.10b)
(i) (i) (i) (i)
ht = α0 + α1 a2t−1 + β1 ht−1 (1.10c)

where ht denotes the conditional variance, so we have ht = σt2 .


The parameters of GARCH class models can be learned using maximum likelihood
methods. There are a lot of papers on the topic and many software environment provide
the program implementing the algorithms. Among these , see Kevin Shepperd’s Matlab
code of Oxford MFE Toolbox 1 . In [7], the authors explore the volatility forecasting
performance of GARCH family models. There are two advantages of GARCH-type mod-
els: the analytical tractability and flexibility descripting empirically observed features of
asset returns.

1.2.2 Stochastic Volatility Models

Stochastic volatility (SV) models difference GARCH typed models with the process of
how the conditional volatility evolves over time. For SV models, the volatility equation
is expressed as a stochastic process, which means the value of volatility at time t is latent
and unobservable. While for GARCH and its extensions, this value is totally determined
by the information up to time t, which we defined as It−1 in before. For example, Hull and
White replaced Black-Scholes option-pricing formula [4]with a stochastic process [27].
The first discrete time-varying stochastic volatility mdoel was introduced by Taylor,
see [48] [50] [49]. The logarithm of variance was modeled by a latent AR(1) process.

1
The toolbox can be download from https://www.kevinsheppard.com/MFE_Toolbox. The earlier
versions are called as UCSD GARCH toolbox.

6
Taylor’s stochastic model can be presented as:

rt = µt + at = µt + σt t (1.11a)
log(σt2 ) = α0 + α1 log(σt−1
2
) + σn ηt (1.11b)

where α1 is a parameter which controls the persistence of logarithm variance, the value
of α1 is between (-1,1). There are two independent and identically distributed random
variables t and ηt . The original idea of SV model assume these two noise parts to be
i.i.d. normal distributed. Recently, some researchers brought the idea of making t and ηt
negative correlated: corr(t , ηt ) < 0. By doing this, the SV model can react asymmetric
fashion to return shocks. This is similar to the way EGARCH extend GARCH model to
reflect empirical observation of financial return series.
The inference of a SV model parameters is not as straightforward as the corresponding
simple GARCH typed model. In [47], Shephard reviewed SV models and inference meth-
ods like methods of moments (MM) and quasi-maximum likelihood (QML). Simulation-
based methods to learn SV models become more and more popular because their accuracy
and flexibility of handling complicated models.

1.2.3 Alternative Approaches

Besides GARCH and SV models, there are some alternative approaches to solve this
conditional volatility modeling problem. Here we discuss some of these methods: Using
High-Frequency Data and Using Open, High, Low, and Close Price. High-Frequency
Data, for example when we modeling daily conditional volatility, we can use the intra-
day like 5-minute or 10-minute data to calculate the daily volatility. This approach
sometimes is also called realized volatility. We will elaborate this later when this realized
volatility is used as the proxy of the real volatility.
Another approach is called implied volatility, and it is related to option trading prob-
lems. If we assume that the prices are governed by some econometric models for example
the Black-Scholes equation, we can use the price to calculate the “implied” volatility.
Experience shows implied volatility is often larger than the value of GARCH type model.
The VIX of CBOE is an example of implied volatility. The calculation of VIX is based

7
on this equation (see [14] for more details):

2 X ∆Ki RT 1 F
σ2 = 2
e Q(Ki ) − [ − 1]2 (1.12)
T i Ki T K0

where σ is VIX/100, T is time to expiration, F is forward index level desired from index
option prices, K0 is the first strike below the forward index level F . Ki is strike price of
the ith out-of-the-money option. R is the risk-free interest rate to expiration, and Q(Ki )
is the midpoint of the bid-ask spread for each option with strike Ki .

1.3 Main Contributions

In this thesis, we propose a novel nonparametric model which we call it Gaussian pro-
cess regression stochastic volatility (GPRSV) model. We use GPRSV model to solve the
problem of modeling and forecasting time varying variance of financial time series data.
For the standard econometric volatility models ( including both GARCH and SV classes),
model forecast performance is limited by the rigid linear transition function form. More-
over, the model parameters are usually learned by maximum likelihood methods, which
can lead to over-fitting problems. We apply the recent development of Bayesian non-
parametric modeling methods to unblock the bottleneck of financial time series volatility
modeling. The Gaussian process regression stochastic volatility models are more natural
to describe the financial time series dynamic behaviors.
The second contribution of this research is the development of algorithms to learn
the proposed models. We applied the recent developed learning algorithms to learn
the GPRSV models. We use tailed sequential Monte Carlo and particle Markov chain
Monte Carlo methods to jointly learn the hidden states trajectory and Gaussian process
hyper-parameters. Most of the previous work on state-space model inference has took
the approach of separating the hidden states filtering and parameters estimating. The
GPRSV model usually is more difficult to estimate than a GARCH or SV model. By
taking a full Bayesian nonparametric approach we learn the hidden states or system
variable distribution, so our inference method is free of the over-fitting problem as using
maximum likelihood methods for the traditional parametric models.

8
1.4 Organization of Thesis
In chapter 2, we describe the background of this thesis. The characteristics of financial
time series data, preliminaries of volatility modeling and basic of Bayesian nonparametric
models are discussed. Also we present fundamental knowledge of Gaussian process and
state-space models.
In chapter 3, we propose our Gaussian process regression stochastic volatility models.
We discuss the model’s structure, the process of building a GPRSV model and the issue
of introducing exogenous factors to improve the forecasting performance.
In chapter 4, we discuss how to learn Gaussian process regression stochastic volatility
models. We introduce a novel estimating approach to learn both the hidden volatility
and model’s hyper-parameters together. Monte Carlo methods are provided to learn the
nonparametric models.
In chapter 5, we conduct experiment to prove the advantages of proposed modeling
approach. Both simulated and empirical financial data are tested using our GPRSV
model and tailored Sequential Monte Carlo algorithm.
In chapter 6, we conclude our work and discuss future work for this research.

9
Chapter 2

Background

In this chapter, we would like to present the background of this research. There are
two parts, the first part is the data set characters we are study and the preliminaries of
volatility modeling. The other part is the methodology we are going to use: Bayesian
nonparametric framework, Gaussian process and state-space models.

2.1 Financial Time Series Data


Time series data are collected through time. One time series is a sequence of data points
of measurement zt ∈ R index by time t. Time series can be in discrete or continuous
form. The discrete time series can be viewed as special case of continuous time series.
If we measure the continuous time series exactly once per unit time we can obtain the
discrete ones. This is often called as uniform sampling [53]. To simplify we often but not
always focus on discrete time series. Another classification for time series is univariate or
multivariate. Throughout this thesis, we mainly discuss the problem of discrete univariate
time series analysis. The objective of time series analysis is utilizing the theory and
methods to extract meaningful statistics and other characteristics from the data. Time
series analysis is widely used for many real world applications in the domains of science,
economic and engineering.
Financial time series analysis is a highly empirical discipline, and people more concern
with the theory and practice how asset valuation change over time. In financial time series
research, people usually analyze assets return instead of price [9]. See Figure 2.1 for the

11
differences between asset return and price data. The data we use here is Standard & Poor
500 (S&P 500) index data from January 3, 1990 to July 21, 2009. Clearly we can observe
that consecutive prices are highly correlated and the variance increases with time. The
return series (in percent) we get in Figure 2.1 and Table 2.1 is defined as:

rt = 100[log(pt ) − log(pt−1 )] (2.1)

where rt is the return at time t, pt is the asset price at time t. In Table 2.1, we give the
descriptive statistics of the data.

Table 2.1: Descriptive Statistic of S&P 500 Daily Return Data

Mean Standard Skewness Kurtosis Min Max


Deviation

0.0198 1.1761 -0.1936 12.2642 -9.4695 10.9572

2.2 Volatility Modeling Preliminaries


The idea behind volatility modeling is to express the relationship of the return and
volatility and how these two processes evolve over time. Before we go on to discuss
Bayesian nonparametric volatility models, there are several preliminaries and background
we need to present in this chapter. Volatility modeling and forecasting is an important
task in financial markets and it was born about 30 years ago ever since Engle introduced
the autoregressive conditional heteroskedasticity (ARCH) model in 1982. Both academics
and practitioners are interested in this problem.
The most special feature of asset return is that volatility can not be directly observed
from the return data. For example, the daily data we used to plot Figure 2.1 consist of
4895 observations of Standard and Poor 500 daily returns. There is only one observation
in a trading day, if we do not use intra-day data. Even we use the intra-day data we
can only estimate part of the volatility. Another part is the overnight volatility which

12
S&P500 Daily Return Data
15

10

−5

−10
1990 1992 1995 1997 2000 2002 2005 2007 2010

S&P500 Daily Close Price Data


1600

1400

1200

1000

800

600

400

200
1990 1992 1995 1997 2000 2002 2005 2007 2010

Figure 2.1: Standard & Poor 500 index return and close price data from January 1,
1990 to July 21, 2010

13
intra-day data can provide very little information. This observability of volatility makes
evaluating the forecasting of candidate models difficult.
Besides the hidden feature, there are some other characteristics which are commonly
observed in asset return series.
• Heteroscedastic: the volatility of asset return is not constant through time. This is
also called Heteroscedastictiy. The most common phenomenon of this heteroskedas-
tic effect is the example of launching rocket. For asset return, the value of this
conditional volatility is time varying.

• Volatility Clustering: it is widely accepted that the asset return tend to exist clus-
ters for volatility, which also means there is some period the market is with high
volatility and there is some period with lower volatility. In 1963, Mandelbrot [36]
pointed out that ”large changes tend to be followed by large changes, of either sign,
and small changes tend to be followed by small changes.”

• Asymmetric Effect: based on rich empirical observations of financial asset returns,


volatility tend to react differently on positive and negative returns. This is one
important character that the early models like ARCH, GARCH and basic SV all
failed to capture, and there are many ways of modifying those models to deal with
this asymmetric effect. For example, the EGARCH model was introduced to fix
this problem based on the GARCH framework. For the SV models, one possible
solution is to change the independence of t and ηt in Equation (1.11), and make
the correlation of the two innovation negative: corr(t , ηt ) < 0.

• Heavier tails: volatility models should explain that the asset returns are not nor-
mally distributed. Actually rich evidences prove that financial asset return exhibit
heavy tails and high-peakedness. Even in GARCH models, we assume that returns
are conditionally normally distributed, the unconditional (marginal) distribution
can be represented as a mixture of normal distributions. The tail of the mixture
normals turn out to be heavier than the single normal distribution.

• Stationary: volatility usually changes within some fixed ranges, and it evolves over
time in a continuous manner. Sudden jumps are rare for most asset returns. Before
modeling the return series, we can use some statistical tests to test the stationary
of the series.

14
2.3 Bayesian Nonparametric Framework
The Bayesian approach to data analysis and machine learning is based on using prob-
ability to represent all forms of uncertainty [41]. The process flow can be summarized
as:

• Define model : We expresses qualitative aspects of the system, by defining random


variables, their forms of distributions, and independence assumptions. Also we
specify prior probability distribution for the unknown parameters.

• Collect data: We can compute the posterior probability distribution for the un-
known parameters, given the collected data.

• Make decisions: With the posterior we can make scientific conclusion, predict future
output by averaging over the posterior distribution and make decisions to minimize
expected loss.

The Bayes’ Rule for modeling:

P (θ)P (D|θ)
P (θ|D) =
P (D)

where P (D|θ) is likelihood of unknown parameters θ, P (θ) is prior probability of θ and


P (θ|D) is the posterior of θ given data D. Also, we can write this relationship as:

P (parameters|data) ∝ P (parameters)P (data|parameters)

or
Posterior ∝ Prior × Likelihood

We can see from above that choosing suitable prior is very important to Bayesian
Modeling. In parametric models, finite set of parameters are assumed. Given the param-
eters,future predictions are independent of the observed data. so the unknown parameters
capture everything about the data. The capability of the model is bounded even if the
amount of data is unbounded [21]. While nonparametric models often assume that there
are an infinite dimensional parameters θ. The nonparametric models make fewer as-
sumptions about the dynamics, and thereby the data drive the complexity of the model

15
[17].We can think of θ as function instead of a vector. By saying nonparametric, it is not
mean there is no parameters, but actually infinitely many ones. The infinite dimensional
θ often takes form of a function. From information channel viewpoint, all the models
are like information channels, with the past data D as input and future prediction y as
output. For parametric models, given the model parameters, future predictions, y are
independent of the observed data. The model’s complexity or the channel’s capacity is
bounded. That is to say the parameters constitute a bottleneck in channel. nonparamet-
ric models are free of this bottleneck problem, with more data D, the more information
θ can capture. To make predictions, nonparametric models need to process a growing
amount of training data D. This information channel view of nonparametric modeling
was first pointed out in [21] by Ghahramani.
As presented in recent report [10], “big data” arises in many areas. Terabytes of
data, in some cases petabytes of data are generated. The rapid growth heralds an era
of “data-centric science”. The Bayesian nonparametric modeling framework is adaptive,
robust and flexible way of analyzing data and it could be promising technique for the
problem of “big data” analysis.

2.4 Gaussian Process


Gaussian process, together with Dirichlet process (DP), constitute the fundamental tools
in Bayesian nonparametrics. For a Gaussian process, it places distribution over functions,
while a Dirichlet Process is a distribution on distribution. Here we give a compressed
introduction to Gaussian process (GP), more detailed discussion can be found in the
textbook of Rasmussen and Williams [45], the paper of Neal [42]. Gaussian process was
first introduced in the statistics community as kriging. Most probability distributions are
over finite dimensional objects (scalars, vectors), while functions are infinite dimensional.
So a Gaussian process can be viewed as an extension of multivariate Gaussian distribution
to infinite dimensions. Like a Gaussian distribution is specified by a mean vector and
covariance matrix, a Gaussian process is determined by mean function and covariance
function. Similarly we can have the definition of Gaussian process:

f ∼ GP(m(x), k(x, x0 )) (2.2)

16
Here we use this notation to say function f is drawn from a Gaussian process. m(x) and
k(x, x0 ) are the mean function and covariance function. The covariance function k(x, x0 )
is also called as kernel function.
From the definition we can see that any finite subset of sampled function values y
from the process follows a multiple Gaussian distribution. The mean function is often
set to be zero: m(x) = 0 for many Gaussian process applications, but this is not case
for our Gaussian process regression stochastic volatility (GPRSV) model in Chapter 3.
In GPRSV model, the mean functions are not simply assumed to be zero but to be
adjusted to the specific application requirements, we will discuss the reason in details in
that chapter. The covariance function k(x, x0 ) measures the “similarity” between inputs
x and x0 . The parameters in mean and covariance functions are called hyper-parameters.
These hyper-parameters will control the sampled function’s properties: smoothness, input
output scale and so on (see [53]). One of the most used covariance function is the squared-
exponential (SE) function:

1
k(x, x0 ) = γ exp(− 2
|x − x0 |2 ) (2.3)
2l
where γ and l are the hyper-parameters for the covariance function.

2.4.1 Gaussian Process Regression

Gaussian process can be used for the non-linear regression problems in many fields, such
as machine learning, statistics and engineering. In this thesis we would like to apply this
useful tool to financial time series data analysis. The object of a non-linear regression
problem is to find how to express y using the covariates x. Simply we use the following
equation to describe the relationship:

yi = f (xi ) + i (2.4)

where f is called the regression function, and i is the random residuals which are assumed
to be i.i.d. Gaussian distributed with mean zero and constant variance σ 2 . A Gaussian
process represents a powerful way to perform Bayesian inference about functions.

17
y0 y1 y2 y*
…...
f0 f1 f2 f*

x0 x1 x2 x*

Figure 2.2: Graphical model representation of standard Gaussian process regression,


where xt is the input at time t, and yt are the output of regression process at time t. The
Gaussian process regress function value is f . The thick horizontal line represent fully
connected nodes.

18
2.4.2 GP for Time Series

In [53], Turner pointed out that there are two approaches for Gaussian Process time
series modeling: Gaussian process time series (GPTS) , and autoregressive Gaussian
process (ARGP) . The first one GPTS can be described below:

yt = f (t) + t (2.5a)
f ∼ GP(0, k) (2.5b)
t ∼ N (0, θ2 ) (2.5c)

where the time series input is time index t, and the time series output is yt . t is standard
normal distributed noise. The GPTS models generalizes a lot of the classic time series
models: the autoregressive (AR), the autoregressive moving average (ARMA).
The other Gaussian process to time series modeling approach is ARGP. Compared to
GPTS, ARGP is more general and powerful but more computational. In Equation (2.6)
we present an ARGP with order p, yt−p:t−1 are the p previous values of the output yt .

yt = f (yt−p:t−1 ) + t (2.6a)
f ∼ GP(0, k) (2.6b)
t ∼ N (0, θ2 ) (2.6c)

In this thesis, we choose the ARGP instead of GPTS to model financial time series because
ARGP is more general than GPTS and ARGP can model more complex dynamics than
GPTS do.
Both the GPTS and ARGP can handle external inputs with the regress process, the
ARGP with external inputs zt can be generalize as below:

yt = f (yt−p:t−1 , zt−p:t−1 ) + t (2.7a)


f ∼ GP(0, k) (2.7b)
t ∼ N (0, θ2 ) (2.7c)

19
2.5 State-Space Models
Discrete time series data can be modeled mainly with two approaches: autoregressive and
state-spaces. The main difference of the two methodologies is that state-space models
assume the system is governed by a chain of hidden states, and we can inference the
system by a sequence of observations which are determined only the hidden states. The
advantages of state-space models over autoregressive models are: not requiring the spec-
ification of an arbitrary order parameters and a richer family of naturally interpretable
representation. For most of the times, a state-space model can be a general framework
for describing dynamical phenomena. Only when inference is difficult the autoregressive
models are more advantageous, but this special case is not discussed in this thesis.
State-space models (SSM) or hidden Markov models (HMM) are one of the most
widely used types of methods for effectively modeling time series and describing dy-
namical system. In different areas SSM maybe named differently: structural models
(Econometrics), linear dynamic models (LDS) or Bayesian forecasting models (Statis-
tics), linear system models (Engineering). In Finance, state-space models can generalize
other popular time series models such as ARMA, ARCH, GARCH and SV.
A state-space model consists of two parts: hidden state xt and observation variable
yt . The essential idea is that behind the observed time series yt there is an underlying
process xt which itself is evolving through time in a way that reflects the structure of the
system. The general form of SSM can be summarized as:

xt = f (xt−1 ) + , xt ∈ RM (2.8a)
yt = g(xt ) + ν, yt ∈ RD (2.8b)

where  and ν are both i.i.d. noise with zero mean and unit variance. The unknown
function f describe the system dynamics and function g links the observation and the
system hidden state. Both the f and g functions can be linear or non-linear. The hidden
state xt follows a Markov chain process which is the reason we use the terminology of
hidden Markov models. In figure 2.5, we give the graphical model representation of one
SSM.
We use p(xt |xt−1 ; θ) and p(yt |xt ; θ) to define the conditional distributions of both

20
x t-1 x t x t+1

yt-1 yt yt+1

Figure 2.3: Graphical model representation of a state-space model, where xt is the hidden
state at time t, and yt is the observation at time t. The system state follows a Markov
chain process, and the unknown parameter vector θ is omitted in the figure.

state and observation variables, θ is a vector of unknown parameters. Also we define


x1:t = {x1 , x2 , ..., xt } and y1:t = {y1 , y2 , ..., yt }. Since we cannot observe the system
variable x0:t , we are interested in estimate x0:t using the observation y1:t . The conditional
probability distribution of p(x0:t |y1:t ; θ) is calculated using Bayes’ Rule:

p(x0:t , y1:t ; θ)
p(x0:t |y1:t ; θ) = (2.9)
p(y1:t ; θ)
where the integration p(y1:t ; θ) is called the likelihood of the state-space model. Be-
sides the joint smoothing distribution, there are three marginal distributions people are
interested as well: the one-step ahead predictive distribution p(xt |y1:t−1 ; θ), the filtering
distribution p(xt |y1:t ; θ), and the smoothing distribution p(xt |y1:T ; θ).

2.6 Chapter Summary


In this chapter, we present the basic background knowledge for this thesis, the financial
time series data we are studying and the methodology we use, and most important

21
the preliminaries for volatility modeling. The characteristics for volatility models try
to capture, the prepare work we need to do before we apply our modeling procedure
to the asset return. And we present the structure of Gaussian process for time series
analysis and state-space models which are the fundamental knowledge for our Bayesian
nonparametric modeling approach to volatility analysis.

22
Chapter 3

Gaussian Process Regression


Stochastic Volatility Model

In this chapter we introduce the Gaussian process regression stochastic volatility (GPRSV)
model to solve the problem of financial time series volatility modeling and forecasting.
Like GARCH and basic stochastic volatility (SV) models, we model the financial asset
return and volatility in state-space way. The logarithm of variance is modeled as the
system’s unobserved latent variable in our model. We use Gaussian process (GP) to
sample unknown hidden states transition function. A GPRSV model can be viewed as
an instant of Gaussian process state-space model (GP-SSM). Gaussian process which
has been discussed in chapter 2 is a flex and powerful tool to model time series data.
After introducing the GPRSV model, we continue to discuss the procedure of building a
GPRSV model, and the issue of introducing exogenous factors to improve GPRSV model
forecast performance.

3.1 GPRSV Models Framework


In GPRSV model, the conditional volatility is modeled in a Bayesian nonparametric way.
We assume that the hidden system states process is governed by a stationary stochastic
process. The main difference between GPRSV and traditional stochastic volatility models
is the driven force for the stochastic process. In traditional stochastic volatility models,
the process is assumed to follow a rigid linear autoregressive form. In GPRSV model the

23
f1 f2 f*

v0 v1 v2 …... v*

a0 a1 a2 a*

Figure 3.1: Graphical model representation of a Gaussian process regression stochastic


volatility model, where at is the observation variable at time t, and vt are the hidden
variable (logarithm of volatility) at time t. ft is the Gaussian process sampled function
value at time t, and the thick horizontal line represent fully connected nodes. Hyper-
parameters of the Gaussian process are omitted in the figure.

24
process is not limited to take a rigid form but a Gaussian process prior is placed over the
transition function. The basic framework of a GPRSV model can be presented with the
following equations:

at = rt − µ = σt t (3.1a)
vt = log(σt2 ) = f (vt−1 ) + τ ηt (3.1b)
f ∼ GP(m(x), k(x, x0 )) (3.1c)

where rt is the asset return at time t and µ is the mean of asset return series, and at is
the innovation of the return series. vt is the logarithm of variance at time t, t and ηt are
i.i.d. Gaussian (or student’s t) distributed noises. τ is unknown scaling parameters to be
estimated. The function f is the hidden state transition function. Here we assume this
function f follow a Gaussian process, which is defined by the mean function m(x) and
covariance function k(x, x0 ). We use logarithm of variance instead of standard deviation
directly in our model. The advantage of using logarithm form can be found in [48] and
[43]. Taylor’s SV model and Nelson’s EGARCH model used logarithm form in their
models as well.
In Gaussian process, the mean function m(x) can encode prior knowledge of system
dynamics. For volatility modeling problems, we can encode the asymmetric effect in the
mean function. The covariance function k(x, x0 ) is defined by covariance between function
values Cov(f (vt ), f (vt0 )), so the covariance function is used to describe the correlation
relationship of the time-varying volatility values. In Figure 3.1 we give the graphical
model representation of a GPRSV model. We model the logarithm of variance instead of
volatility directly, which is the same way as in EGARCH model and Taylor’s stochastic
volatility model.

3.2 GP-SSM for Financial Time Series Modeling


Our GPRSV model can be viewed as an instant of Gaussian process state-space models
(GP-SSM) which are proved to be powerful tool to describe the nonlinear dynamic sys-
tems. Gaussian process are widely used as dimensionality reduction technique in machine
learning community. In [34], Lawrence introduced the Gaussian Process Latent Variable

25
f0 f1 f2 f*

x0 x1 x2 …... x*

y0 y1 y2 y*

Figure 3.2: Graphical Model Representation of a Gaussian Process State-Space Model,


where xt is the hidden state variable value at time t, and yt is the observation value
at time t. ft is the Gaussian process regression function (the hidden state transition
function) value at time t. The thick horizontal line represent fully connected nodes

Model (GPLVM) for principal component analysis (PCA). In Lawrence’s model, Gaus-
sian process prior is used to map from a latent space to the observed data-space which is
high dimensional. In [33], Ko and Fox proposed a Gaussian process based Bayesian Filter
which is a nonparametric way of recursively estimating the state of a dynamical system.
Wang et al. proposed the Gaussian Process Dynamical Model (GPDM) in [54]. GPDM
enriches the GPLVM to capture temporal structure by incorporating a Gaussian process
prior over the dynamics in the latent space. Frigola et al. pointed out that Gaussian
process can represent functions of arbitrary complexity and provide a straightforward
way to specify assumptions about the unknown function in [18]. Gaussian process re-
lated regression and classification problem has emerged as a major research field for time
series modeling in machine learning community, however the advantage of this Bayesian
nonparametric framework has got enough attention of financial researchers and market
practitioners. We would like to apply this flexible and powerful modeling tool to the prob-
lem of financial time series analysis. We can combine Gaussian process and state-space

26
model together. The way of combining the two is to use the state-space model’s structure
and apply Gaussian process to describe the hidden state transition function. The essence
of the GP-SSMs is to change the rigid form of states transition function of traditional
state-space models to a Gaussian process prior. In Figure 3.2, we show the Gaussian
process state-space model graphical model representation. Financial data exhibits many
dynamics because of the market is changing all the time and a lot of small change of the
involved factors can result in significant fluctuation. As more and more data is available
to process, the rigid form of state transition function in traditional state-space models
becomes the bottleneck of improving the models’ forecast performance.
To learn a GP-SSM is more difficult than a standard SSM. In our work, we take the
Bayesian approach to solve the GP-SSMs inference problem. Bayesian filtering is a type
of technique used to estimate the hidden states in dynamic systems. The goal of Bayesian
filtering is to recursively compute the posterior distribution of the current hidden state
given the whole history of observations. One of the most fundamental and widely used
Bayesian filters is the Kalman filter, but one of Kalman filter’s limitation is that it can
only deal with linear and Gaussian noisy models. Two popular extensions for non-linear
systems are the extended Kalman filter (EKF) and the unscented Kalman filter (UKF)
(see [39], [46], [29] and [30]). Markov chain Monte Carlo (MCMC) methods can be used
to learn a state-space model parameters. Particle filter (PF) is another Monte Carlo
method to the problem of filtering a state-space model. We will discuss the details of
leaning GP-SSMs in Chapter 4.

3.3 Model Structure


All volatility models try to describe how the hidden volatility value evolve over time
and capture the characteristics of the asset return series we have discussed in Chapter
2. To achieve these goals, we need to put the conditional volatility modeling problem
in a reasonable structure. Compared with auto regression approach, state-space models
provide us a more general and flex framework to describe dynamic systems. Both the
GARCH and SV models can be viewed as instants of state-space models. The hidden
volatility is naturally modeled as the system’s state variable, and the return is observable
to us. In Chapter 1 literature review part, we have reviewed these two categories of

27
conditional volatility models. Both these types of models follow the same structure to
presenting how the return and variance process with time. In this structure there are
two equations: mean equation and variance equation. For all these models the mean
equations are similar. Usually we can assume rt follows simple time series models like
stationary AR, ARMA models (we can add some explanatory variables if necessary). For
most of the times, the conditional mean can be simply described like in Equation (1.2a).
What distinguish one volatility model from others is the variance equation. In Equa-
tion (1.2b), we can see the statistical meaning of the conditional volatility but it does not
give any information about the manner how σt2 evolves over time. All the GARCH family
models use an exact function to describe the evolution of σt2 , while all stochastic volatility
models use a stochastic equation to describe σt2 . Although these two categories models
are quite different at this point, but both the two typed of models use a linear regress
form in variance equation. Both GARCH and standard SV models belong to parametric
models. In [55], the Gaussian process was introduced to volatility modeling problems, the
authors proposed the Gaussian process volatility (GP-Vol) model which can be viewed as
an extension of GARCH model. Our GPRSV model apply the Gaussian process regres-
sion tool to replace the linear regression function in standard stochastic volatility model.
In GP-Vol model and our GPRSV models, the volatility as the hidden state variable is
modeled in a nonparametric way. The state transition function is assumed to generated
from a Gaussian process. The function is not specified to follow a certain linear or non-
linear form as in standard economic models. Functions sampled from a Gaussian process
can take many forms despond on the mean and covariance functions associated with the
Gaussian process.

3.4 Model Building Process


In [52], Tsay gave a four steps process for building a conditional volatility model and
applied it to analyze empirical stock market data using GARCH models. Similarly, we
can build our GPRSV model with the following steps:

• Step 1: specify mean equation. First we need to test the serial dependence in
the return series. If the series are linear dependent, we should use an econometric
model (e.g. an ARMA model) to remove the linear dependence in return series.

28
Start

Specify Mean
Equation

Test Serial Linear No


Yes
Dependence ?

Build an Model(e.g.
ARMA) to Remove Test ARCH Effects
Linear Dependence

If ARCH Statistically
Significant?

yes
Specify Volatility
Equation

Estimate Mean and


No Volatility Equations No

Check Model
Fitness

Yes

Finish

Figure 3.3: Flowchart of Volatility Model Building Process


29
Depending on the data we want to model we can use different methods to remove
the linear dependence. After doing that, we can specify the distribution the return
variable. In Equation (3.1a), we simply normalize the return series to remove the
linear dependence part. If mean of the return series is significant small, we can use
the return series directly, otherwise we model the innovation or residuals at , and
we specify t as a Gaussian or Student’s-t distribution.

• Step 2: test ARCH effect. The residuals of the asset return at expressed
in (3.1a) are often used to test the series’ conditional heteroscedasticity. This
conditional heteroscedasticity is also known as the ARCH effects [52]. There are
two kind of test for ARCH effect, the first one is to apply the Ljung-Box statistics
Q(m) to a2t [40], and the second test is the Lagrange multiplier (LM) test [16]. The
null hypothesis of Ljung-Box test is that the first m lags of autocorrelation function
(ACF) of the testing series are zero. For the Lagrange multiplier test, we assume
in the linear regression form:

a2t = α0 + α1 a2t−1 + ... + αm a2t−m + ct (3.2)

where t = m + 1, ..., T , ct is the noise term and T is the sample size. Additional
we define
T
X
SSR0 = (a2t − ω̄)2 (3.3a)
t=m+1
T
X
SSR1 = ĉ2t (3.3b)
t=m+1
(SSR0 − SSR1 )/m
F = (3.3c)
SSR1 /(T − 2m − 1)

where ω̄ = (1/T ) Tt=1 a2t is the sample mean of a2t . F is asymptotically distributed
P

as a chi-squared distribution χ2m under null hypothesis. m is the degree of freedom.


The null hypothesis H0 is α1 = ... = αm = 0. The decision rule is to reject H0
if F > χ2m (α)( here χ2m (α) is the upper 100(1 − α)th percentile of χ2m ), or type-I
error: the p value of F is less than α (see [52] for details).

30
• Step 3: specify volatility equation. The key of volatility modeling is to specify
how the hidden variable volatility or logarithm of variance evolves over time. In
GPRSV models, this part is modeled using the flexible Bayesian nonparametric
tool, Gaussian process regression. For GARCH and SV models this part is modeled
in linear regression approach. Once we estimated the model’s parameters, those
parametric models are determined. When the hidden variable is modeled using
Gaussian process regression, we need to specify both the mean and covariance
functions. Besides these functions forms, the initial value of hyper-parameters
(the parameters in mean and covariance functions are called hyper-parameters)
associated with them need to be specified as well. How to choose the function
forms and initial hyper-parameters are discussed in details in Chapter 5 when we
analyze the empirical financial asset data.

• Step 4: estimate model parameters and check model fitness. After specify-
ing both the mean and volatility equations and associated parameters and in Step
2 and Step 3, we can use training data to estimate the unknown parameters. Once
we get our estimated parameters we can use testing data to test learned model, and
it is necessary to check the fitness of model we obtained so far. Sometimes we need
to go back to Step 3 to modify our Gaussian process mean and covariance function
forms or hyper-parameters.

To demonstrate the four steps process, here we show the flowchart of this model
building process in Figure 3.3. Also we use stock market data to further explain the
process. We analyze the daily return data of GE corporation. The data is collected
from January 1, 1990 to September 29, 1994 with 1200 observations. See Figure 3.4 for
the return series. In Table 3.1 we give the descriptive statistic of the series. Because
the mean value is very small, we can model the return series directly for this data set.
In Figure 3.5, the sample autocorrelation function (ACF) and partial autocorrelation
function (PACF) [28], [20] of the return and square return series are plotted. From these
figures, we can clearly see that there is no significant serial correlation but the series are
dependent for the GE daily return data during the period.

31
GE Daily Return Data
6

0
Log Return

−2

−4

−6

−8
1990,01,01 1991,03,12 1992,05,18 1993,07,26 1994,09,29
Date

Figure 3.4: GENERAL ELECTRIC (NYSE: GE) daily return data from January 1, 1990
to September 29,1994

32
Sample Autocorrelation Function Sample Partial Autocorrelation Function
1 1

Sample Partial Autocorrelations


Sample Autocorrelation

0.5 0.5

0 0

-0.5 -0.5
0 5 10 15 20 0 5 10 15 20
Lag Lag

Sample Autocorrelation Function Sample Partial Autocorrelation Function


1 1
Sample Partial Autocorrelations
Sample Autocorrelation

0.5 0.5

0 0

-0.5 -0.5
0 5 10 15 20 0 5 10 15 20
Lag Lag

Figure 3.5: Sample ACF and PACF functions for GE Daily returns from January 1, 1990
to September 29,1994. The first row: ACF and PACF of the returns; the second row:
ACF and PACF of the squared returns.

33
Table 3.1: Descriptive Statistic of GE Daily Return Data

Mean Standard Skewness Kurtosis Min Max


Deviation

0.0433 1.2394 0.0092 5.2534 -6.3326 5.9952

3.5 GPRSV with Exogenous Factors


The GPRSV model with exogenous factors can be summarized as:

at = rt − µ = σt t (3.4a)
vt = log(σt2 ) = f (vt−1 , ut−1 ) + τ ηt (3.4b)
f ∼ GP(m(x), k(x, x0 )) (3.4c)

where τ is scale of variance process noise, and at , rt , σt , t , ηt and f have the same meaning
as Equation 3.1. ut is known exogenous factor data at time t. When modeling different
financial time series, we can take different information into account. In Figure 3.6,
we show the graphical model representation of a Gaussian process regression stochastic
volatility model with exogenous factor. There are many macro-finance variables besides
the asset return series itself which can be applied to volatility modeling, but how to
manage fitting these variables can be complicated. The ultimate purpose of adding
exogenous factors is to improve the forecasting performance of the model. If we treat
these extra factors as simple linear regression variables, it can lead to the problem of over
fitting and introducing too many parameters in which case learning such a model would
be too difficult. By putting the exogenous factors in a Gaussian process, we can avoid the
above problems. In [5], the authors investigated to use the mood measurements derived
from large-scale Twitter feeds to predict the value of the Dow Jones Industrial Average
(DJIA). They obtained an accuracy of 86.7% in predicting the daily up and down changes
in the closing values of the DJIA and a reduction of the Mean Average Percentage Error
(MAPE) by more than 6%. Although this is not a volatility forecast case, we can see
that there are rich exogenous factors we can explore to improve our GPRSV models.

34
f0 f1 f2 ft-1 ft

v0 v1 v2 …... vt-1 vt

u0 u1 u2 ut-1 ut

Figure 3.6: Graphical model representation of a Gaussian process regression stochastic


volatility model with exogenous factor ut , in this figure the nodes of observation variable
(return series) is omitted. vt is the hidden variable (volatility) at time t. ft is the
Gaussian process sampled function value at time t, ut is the exogenous factor, and the
thick horizontal line represent fully connected nodes. Hyper-parameters of the Gaussian
process are omitted in the figure as well. In GPRSV models we can add more than one
exogenous factors, we show one factor case for clarity

35
3.6 Chapter Summary
In this chapter, we introduced the GPRSV models, and discussed the advantages com-
pared with traditional parametric volatility models. We gave the process how to build a
GPRSV model, and discussed the model’s structure. One possible way of improving the
basic GPRSV model is to introduce exogenous factors into the model.

36
Chapter 4

GPRSV Models Inference

In this chapter, we discuss the problem of learning the proposed GPRSV models. It is
much more challenging to learn a GPRSV model than its parametric competitors. As we
have discussed a GPRSV model can be viewed as an instant of Gaussian process state-
space model. Learning a GP-SSM model is a more complex task than learning a standard
HMM model. Gaussian process dynamics are embeded in the hidden state transition
equation. In GP-SSM models, we need to estimate two types of unknown variables
from training data: the hidden states trajectory and the Gaussian process dynamics. In
GPRSV models, the hidden state is the logarithm of variance and the Gaussian process
dynamics are the hyper-parameters in the mean, covariance, likelihood functions. We
specify the forms of these functions before the learning step, the only unknown part is the
hyper-parameters. Jointly learning the hidden states trajectory, the unknown Gaussian
process regression function values and hyper-parameters is computational challenging.
Our approach is marginalizing out the Gaussian process regression function values, and
then jointly learning the hidden volatility states and hyper-parameters using Monte Carlo
methods.

4.1 Bayesian Inference for State-Space Models


In time series analysis settings, the task of learning a state-space model include two
parts: the hidden system states and the parameters of the model. Bayesian Inference is
to use the posterior distribution to answer questions of our interest. We have the prior

37
knowledge of the hidden states and the model’s parameters. Although we can not update
our knowledge of hidden states and parameters directly, we have the observation variable
(return for volatility modeling problems) which is related to the hidden system states
and unknown parameters with the likelihood function. With more observations come,
we can update the posterior distribution applying Bayes’ theorem.
First we consider estimate the unknown parameters. We denote the proposed model’s
parameters as a vector θ and the observations y1:T . We can consider estimation as a
special case of inference as the parameter is our target of the posterior distribution. We
use Bayes’ rule to describe the estimation problem as:

p(y1:T |θ)p(θ)
p(θ|y1:T ) = (4.1)
p(y1:T )

where p(θ) is the distribution quantifying the modeler’s belief of parameters’ value before
any observation data comes, p(θ|y1:T ) is the poster distribution, and p(y1:T |θ) is the
likelihood and p(y1:T ) is the marginal likelihood.
The optimized θ value can be achieved by the Maximum-a-Posteriori (MAP) point
estimate,
θ M AP = arg max p(y1:T |θ)p(θ)
θ

More conveniently we can use the logarithm posterior,

θ M AP = arg max log(p(y1:T |θ)p(θ))


θ

When the prior p(θ) is equal to constant, the the MAP solution becomes the maximum
likelihood (ML) solution.

θ M AP = arg max p(y1:T |θ)p(θ) = θ M L = arg max log(p(y1:T |θ))


θ θ

This ML method is widely used to the parameter estimation problems in time series
modeling, but one possible drawback of this approach is the over-fitting problem.
Secondly, we would like to discuss learning the hidden system states trajectory. If we
assume that we know the parameters or we have estimated the them using ML methods,
the distribution of the hidden states x1:t can be estimated iteratively. We can decompose

38
Equation 2.9 recursively as:

p(yt |x1:t , y1:t−1 ; θ)p(x1:t |y1:t−1 ; θ)


p(x1:t |y1:t ; θ) =
p(yt |y1:t−1 ; θ)
p(yt |x1:t , y1:t−1 ; θ)p(x1:t−1 |y1:t−1 ; θ)
= p(x1:t−1 |y1:t−1 ; θ) (4.2)
p(yt |y1:t−1 ; θ)
p(yt |xt ; θ)p(xt |xt−1 ; θ)
= p(x0:t−1 |y1:t−1 ; θ)
p(yt |y1:t−1 ; θ)

From Equation 4.2 we can find the recursive relationship between p(x1:t |y1:t ; θ) and
p(x0:t−1 |y1:t−1 ; θ). This is the foundation for designing recursive algorithms to solve the
problem of learning the hidden states.

4.2 Monte Carlo Methods


We have discussed the general idea of Bayesian inference for state-space models, but
the two parts: hidden states and unknown parameters are not learned together. As we
discussed in Chapter 3, GP-SSMs provide us a flexible framework for time series analysis.
However this great descriptive power comes with the expense of a computational cost.
It is impossible to obtain analytic solution like learning linear Gaussian residuals state-
space models using Kalman filter [31]. Our solution to this problem is applying the
Monte Carlo methods to simulate the unknown densities. The core idea of Monte Carlo
methods is to draw a set of i.i.d. samples (particles) from a target distribution density,
and use the samples to approximate the target density with point-mass function [1].

N
1 X
pN (x) = δ (i) (x) (4.3)
N i=1 x

where x(i) is the i th sample, N is the number of samples, and δx(i) (x) denotes the Delta-
Dirac mass function value at x(i) . Further more we can approximate integrals of f which
is function of interest. I(f ) can be achieved with tractable sums IN (f ),

N Z
1 X (i) a.s.
IN (f ) = f (x ) −−−→ I(f ) = f (x)p(x)dx (4.4)
N i=1 N →∞

39
In a standard Gaussian process regression problem setting, the inputs and outputs are
fully observed, so the regression function value f can be learned using exact Bayesian
inference methods, for the details how to do that we refer readers to [45]. For GP-SSMS
inference, both the hidden states and the Gaussian process dynamics are unknown. Di-
rect learning the hyper-parameters, hidden states and Gaussian process function values
is a challenging task. Most of previous work on inference GP-SSMs focused on filtering
and smoothing hidden variables without jointly learning the Gaussian process hyper-
parameters. In [35], a novel particle Markov chain Monte Carlo (particle MCMC) algo-
rithm, particle Gibbs with ancestor sampling (PGAS) was proposed. In [19], Frigola et
al. apply the algorithm to learn a GP-SSM’s hidden states and Gaussian process dy-
namics jointly. In [55], a regularized Auxiliary Particle Filter which the authors named
as Regularized Auxiliary Particle Chain Filter (RAPCF) was introduced. The RAPCF
algorithm belongs to the sequential Monte Carlo (SMC) methods.
To learn GPRSV models using the PGAS and RAPCF algorithms, we marginalize
out the Gaussian process regression function value f first. Then we can targeting jointly
learning the hidden states and hyper-parameters together. After marginalizing out f , the
models become non Markovian state-space models. Traditional filter and smooth meth-
ods are not capable to learn such models. The Monte Carlo methods based algorithms we
presented here provide us a powerful tool to solve this problem. Both of the hidden states
and parameters can be represented using particles associated with normalized weights.

4.2.1 Sequential Monte Carlo Methods


Sequential Monte Carlo (SMC) concept was first introduced by Gordon et al. [23] in 1993,
and Del Moral [13] gave the first consistency proof for that in 1996. SMC is also called
as particle filter in some applications. Ever since its introduction, SMC method has been
widely used in many areas to solve the problem of inference complex nonlinear models.
References for SMC methods applied to engineers, finance and economics are [15], [11]
and [12]. In Economic study, economists introduced many dynamic stochastic general
equilibrium (DSGE) models to many real world time series which often exhibit strong
non-Gaussian and time-varying behaviors. In this scenario, SMC methods are used to
learn nonlinear, non-Gaussian state-space models. For volatility modeling research, Kim
et al. first learned a stochastic volatility model using particle filter in [32]. In [55], Wu et

40
Algorithm 1 RAPCF for GPRSV Model
1: Input: return data r1:T , number of particles N , shrinkage parameter 0 < λ < 1,
prior p(θ).
2: Remove linear dependence from r1:T to get the residuals a1:T .
3: Sample N parameter particles from the prior, and set initial importance weights,
W0i = 1/N
4: for t = 1 to T do
5: Shrink parameter particles towards empirical means

θ̄t−1 = ΣN i i
i=1 Wt−1 θt−1 (4.5a)
θ̃ti = λθt−1
i
+ (1 − λ)θ̄t−1 (4.5b)

6: Compute expected states:

µit = E(vt |θ̃ti , v1:t−1


i
) (4.6)

7: Compute important weights

gti ∝ Wt−1
i
p(at |µit , θ̄ti ) (4.7)

8: Resample N auxiliary indices {j} according to {gti }.


j
9: Propagate the chains of vt forward, {v1:t−1 }j∈J .
i j 2
10: Add jitter: θt−1 ∼ N (θt , (1 − λ )Vt−1 ), and Vt−1 is empirical covariance of θt−1 .
11: Propose new states vtj ∼ p(vt |θtj , v1:t−1
j
, a1:t−1 )
12: Adjust weights with newly proposed states:

Wti ∝ p(at |vtj , θtj )/p(at |µjt , θ̃tj ) (4.8)

13: end for


j
14: Output: particles of v1:T , particles of θtj and particle weights Wtj .

41
al. proposed the RAPCF algorithm to learn a Gaussian process based GARCH model.
Here, we can modify the RAPCF algorithm to learn our GPRSV model.
In Algotithm 1, we present our version of RAPCF for jointly learning the hidden
states and Gaussian process hyper-parameters for the GPRSV models.

4.2.2 Particle MCMC methods

Algorithm 2 PGAS for GPRSV models


1: Input: the return data r1:T , the iteration times l.
2: Remove linear dependence from r1:T to get the residuals a1:T .
3: Set θ[0] and v1:T [0] arbitrarily.
4: for l ≥ 1 do
5: Draw particles of θ[l] conditionally on v1:T [l − 1] and a1:T .
6: Run CPF-AS, targeting p(v1:T |θ[l], a1:T ), conditionally on v1:T [l − 1].
7: Sample k with p(k = i) = wTi and set v1:T [l] = v1:T
k
.
8: end for
9: Output: the hidden volatility v1:T and the hyper-parameter θ.

Besides SMC methods, we can learn the GPRSV models using Markov chain Monte
Carlo methods as well. MCMC played significant important role in statistics, economics,
computing science and physics over the last three decades. One of the MCMC methods:
the Metropolis algorithm was considered to be one of the ten algorithms which have had
the greatest influence on the development and practice of science and engineering in the
20th century [3]. In this section we focus on particle Markov chain Monte Carlo methods
for learning the GPRSV models. The particle MCMC method was first introduced in
[2]. The idea of particle MCMC is to use of a certain SMC sampler to construct a
Markov kernel leaving the joint smoothing distribution invariant. In [35], Lindsten et
al. proposed the PGAS algorithm. Frigola et al. applied the PGAS algorithm to the
problem of Gaussian process state-space models inference [19]. Based their results, the
PGAS algorithm is suitable to learn a non Markovian state-space model. In Algorithm
2, we show the PGAS algorithm to learn a GPRSV model. The main block of the PGAS
algorithm is the conditional particle filter with ancestor sampling (CPF-AS) which is
a particle filter like procedure. The CDF-AS part is presented in Algorithm 3. The
two types of methods both can learn the proposed GPRSV models. Particle MCMC

42
0
Algorithm 3 CPF-AS conditional on v1:T
1: Initialize(t=1)
2: Sample v1i ∼ pθ1 (x1 ), for i = 1, 2, ..., N − 1.
3: Set v1N = v10 .
4: Set w1i = W1θ (v1i ), for i = 1, 2, ..., N .
5: for t ≥ 2 do
j
6: Sample eit ∼ Discrete({wt−1 }Nj=1 ), for i = 1, 2, ..., N − 1.
ei
7: Sample vti ∼ pθt (vt |v1:t−1
t
), for i = 1, 2, ..., N − 1.
N 0
8: Set vt = vt .
0 i
9: Sample eN i
t with wt−1 fθ (vt |vt−1 ).
i ei
10: Set v1:t = {v1:t−1
t
, vti } and wti = Wtθ (v1:t
i
), for i = 1, 2, ..., N .
11: end for

methods are offline algorithms which are more accurate than the SMC methods, but the
disadvantage is they are slower than SMC methods. In our experiment, we find that SMC
method can provide us desired accuracy results. In Chapter 5, the empirical financial
data are learned with SMC methods.

4.3 Chapter Summary


In this chapter, we discussed the inference methods for our proposed GPRSV models.
Our approach is taking the SMC and particle MCMC algorithms to jointly learning the
volatility and hyper-parameters posterior distribution. The advantage of our method is
that the hidden states and the model’s parameters are estimated simultaneously.

43
Chapter 5

Volatility Analysis with GPRSV


Models

In this chapter, we apply both the simulated and empirical financial data to demonstrate
our GPRSV models and inference methods. First, to prove the RAPCF and PGAS
algorithms we discussed in Chapter 4 can be used to learn the proposed GPRSV models,
we generated sets of simulated data. The results show that the algorithms can effectively
learn the nonparametric models. Then we continue to demonstrate the GPRSV models
with real financial data. The empirical data sets are used to demonstrate the forecasting
performance our models.

5.1 Simulated Data

We generated ten synthetic data sets of length T = 200 according the equations in
Chapter 3. Based on Equation (3.1), we sample our hidden state transition function
f from a Gaussian process prior. We specified that the mean function m(xt ) and the
covariance function k(y, z) as follow:

m(xt ) = axt−1 (5.1a)


k(y, z) = γexp(−0.5|y − z|2 /l2 ) (5.1b)

45
Return of Simulated Data
3

0
Value

-1

-2

-3

-4
0 20 40 60 80 100 120 140 160 180 200
time

Variance of Simulated Data


1

0.9

0.8
Value

0.7

0.6

0.5

0.4
0 20 40 60 80 100 120 140 160 180 200
time

Figure 5.1: Return and variance values of one set of generated simulation data. Total
number of observations is 200. The data are generated follow basic GPRSV model, and
the Gaussian process mean and covariance functions are specified in Equation 5.1.

46
Evolution of the state density

0
2
1.5
1 200
180
0.5 160
0 140
120
-0.5 100
80
-1
60
-1.5 40
20
-2 0

Figure 5.2: Estimated hidden states densities of simulated Data. There are 200 iteration
steps for the simulated data, and we plot every 5 densities in this figure. The distribution
densities are generated using particles and weights as described in Algorithm 1.

where a is the mean equation hyper-parameter, and γ and l are the covariance hyper-
parameters.
In Figure 5.1, we show one set of the simulated data set. The return and variance
are plotted. We applied the RAPCF algorithm to jointly learn the hidden states and the
hyper-parameters. As we have explained in Chapter 4, although the Gaussian process
regression parameters are fixed, we still learn these values using particles. The prior
knowledge is given and at each iteration we update the unknown hidden state and pa-
rameter distribution with coming observation. The first 50 iteration is used as burn in
period. The particles or samplers number we used for these simulated data sets is 1000.
The shrink parameter in RAPCF λ = 0.99 is used.
One of the advantages of our approach for learning is the hidden states and Gaussian
process dynamics are jointly learned together using particles. At each iteration step we

47
Mean Function Hyper-parameter Covariance Function Hyper-parameter
3.5 6

3
5
2.5

2
4
1.5

1 3

0.5
2
0

-0.5
1
-1

-1.5 0
0 50 100 150 200 0 50 100 150 200

Covariance Function Hyper-parameter Likelihood Function Hyper-parameter


4 0

-0.5
3
-1

2 -1.5

-2
1
-2.5

0 -3

-3.5
True Value
-1
GPRSV 5%
-4 GPRSV Mean
GPRSV 95%
-2 -4.5
0 50 100 150 200 0 50 100 150 200
Time Time

Figure 5.3: Results of the Gaussian process hyper-parameters. The hyper-parameters


are learned from RAPCF algorithm using particles.

48
Predictive Log-likelihood Values
-0.8
True Value

GPRSV

-1

-1.2

-1.4
Value

-1.6

-1.8

-2

-2.2
0 50 100 150
Time(t)

Figure 5.4: RAPCF algorithm learned predictive Log-likelihood value are compared with
true value calculate from Equation 3.1. We discard the first 50 burn in iterations. The
RAPCF learned predictive Log-likelihood result proves that the algorithm can success-
fully learn the hidden volatility.

can approximate the hidden state distribution. In Figure 5.2, we plot the hidden state
variable density at every 5 iteration step. In Figure 5.3, we plots of the expected value
and 90% posterior intervals for all the hyper-parameters learned from particles. Although
the hyper-parameters are not random variables, we can learn those values using particles.
In Figure 5.4, we show the results of predictive log-likelihood. At each iteration step,
we can calculate the log-likelihood with the learned hidden state value and the observation
value. Compared with the values obtained from the true hidden state and observation,
our particle filter based learned results are close enough. With more particles used the
accuracy of results can improve. Based on our experiment, 800 to 1000 particles are
enough to learn these sets of GPRSV models. With different Gaussian process function

49
forms and numbers of hyper-parameters, the more particles may be needed.

5.2 Empirical Data


In this part we apply our GPRSV model to the real financial data, and compared our
model with a class of GARCH models which are the traditional parametric volatility
models. We use the realized volatility calculated from intra day data as the proxy for
the true daily volatility value. The process of the comparing is as follow : first we use in-
sample data to train both the two typed models, and then we estimate the volatility values
for the out-of-sample period. Finlay we use the average loss function values criterion to
rank the models.

5.2.1 Volatility Forecast Evaluation


The evaluation of model’s forecasting performance is the key step in the empirical data
experiment. In finance study, it is rare to find a method that is consistently superior
for forecasting the price of financial assets, and empirical studies are often inconclusive.
The problem of volatility forecasting is that we cannot observe the variance directly.
The evaluation of volatility forecasting can be complicated. There is many metrics to
evaluate different forecast models. One of the most popular approach is using a particular
statistical loss function, the model which achieved a minimized loss function value is the
best forecasting model [7]. There are extensive choices of lost functions. We adopt a
class of statistical loss functions instead of a particular one. Here we denote the unbiased
2 2
ex post proxy of conditional variance as σt+m and a model’s forecast value as σ̂t+m . We
take the following loss functions:
n
X
2 2 −1 2 2
M SE : L(σ̂t+m , σt+m ) =n (σ̂t+m − σt+m )2 (5.2)
t=1

n
X
−1
M AD1 : L(σ̂t+m , σt+m ) = n |σ̂t+m − σt+m | (5.3)
t=1

n
X
−1 2 2
M AD2 : L(σ̂t+m , σt+m ) = n |σ̂t+m − σt+m | (5.4)
t=1

50
n
X
−1 2 2
M LAE1 : L(σ̂t+m , σt+m ) = n log(|σ̂t+m − σt+m |) (5.5)
t=1

n
X
−1
M LAE2 : L(σ̂t+m , σt+m ) = n log(|σ̂t+m − σt+m |) (5.6)
t=1

n
X
2 2 −1 2 2 2
QLIKE : L(σ̂t+m , σt+m ) =n (σ̂t+m /σt+m + logσt+m ) (5.7)
t=1

n
X
2
HM SE : L(σ̂t+m 2
, σt+m ) = n−1 2
(σ̂t+m 2
/σt+m − 1)2 (5.8)
t=1

These loss functions include the typical mean squared errors, mean absolute deviation
criteria and logarithmic loss functions which are more used in econometric literature.
Another problem with volatility forecasting evaluation is that we do not have the real
true volatility value in the loss functions. We have to use some proxy to standard for
the real value. Some proxy like the square of return can be quite inaccurate. In our
experiment, we apply the high frequency data to calculate the “realized volatility” [51].
In our experiment we want to model daily return series volatility, so we can use the intra-
daily data as the high frequency data to estimate the daily volatility. Compared with
the squared return, realized volatility is considered to more precise proxy for volatility
forecast evaluation.

5.2.2 Data
The data set we analyzed is the IBM stock daily closing price data 1 . We used the
daily closing price as our input. The data period is from January 1, 1988 to September
14, 2003. There is 1000 observations in total, the first 252 ones (from January1, 1988
to September 27,2001) are used as in-sample part for training purposes and the rest
observations (from September 28, 2001 to September 14, 2003) are used as out-of-sample
for evaluating forecasting performance.
Follow the process we proposed in Chapter 3, we can build our basic GPRSV model
with the IBM return data, we measure that the in-sample data mean value is quite small
1
The data set can be obtained from YAHOO! finance website http://finance.yahoo.com/

51
and the standard deviation is around one. The detailed statistics are presented in Table
5.1.

Table 5.1: Descriptive Statistic of IBM Daily Return Data

Mean Standard Skewness Kurtosis Min Max


Deviation

0.1319 1.7840 0.9583 11.0807 -9.6498 12.0474

5.2.3 Results
We compare our GPRSV model with two standard parametric volatility models: GARCH
and GJR-GARCH. For the parametric models, we use Kevin Sheppard’s Oxford MFE
Toolbox to estimate parameters and make prediction. For GPRSV model, the Gaus-
sian process dynamics are specified as follows: the mean function is m(xt ) = axt−1
and the covariance function is the squared exponential covariance function k(y, z) =
γexp(−0.5|y − z|2 /l2 ). The hyper-parameters include a, γ, l and likelihood function
parameter log(sn). The learned parameters are presented in Table 5.2.

Table 5.2: Estimated GPRSV Model Hyper-parameters Results for IBM Daily Return
Data

a γ l log(sn)

1.8777 3.3064 1.3044 -1.7664

In Figure 5.5, we plot the learned volatility values of GARCH, GJR-GARCH, stan-
dard SV and GPRSV models. In Table 5.3, we give the results of three models loss
function values with realized volatility as proxy. The GPRSV achieved the lowest aver-
age loss function values for all functions except for MSE loss function. The GJR-GARCH
obtained the lowest MSE loss function value. Our GPRSV model’s performance is the
best based on the loss function values.

52
Learned Volatility Values for Out of Sample Period
20

GARCH

GJR-GARCH
18
GPRSV

16

14
Learned Volatility Values

12

10

0
0 100 200 300 400 500 600 700 800
Time

Figure 5.5: The volatility values learned from the three models are plotted. GARCH and
GJR-GARCH results are both estimated using Kevin Shepperd’s Oxford MFE Toolbox.
GPRSV model results are learned using RAPCF algorithm

53
5.3 Chapter Summary
In this chapter we conducted experiments with both simulated and empirical data. Based
on our results, the modified RAPCF algorithm can successfully learn a GPRSV model.
We use loss functions to compare model forecasting performance, and the realized volatil-
ity is adopted as true volatility proxy instead of squared return. Our GPRSV model can
provide better forecasting performance compared with standard parametric volatility
models.

54
Table 5.3: Results of IBM Volatility Forecast Using Loss Functions

Model M SE M AD1 M AD2 M LAE1 M LAE2 HM SE QLIKE

GARCH 3.656 3.7870 1.0751 0.7876 -0.2536 9.3797 2.13466

GJR-GARCH 0.7988 3.7920 1.0682 0.7717 -0.2577 6.9648 2.0917

SV 4.5024 3.800 1.077 0.7944 -0.2466 9.1739 2.1312

GPRSV 3.2330 3.3607 0.9682 0.6277 -0.3918 1.8617 1.8549


Note: The lowest loss function value is marked using the bold text. Except for MSE,
GPRSV model obtained the lowest loss function values for all others. GJR-GARCH
obtain the lowest MSE loss function. The volatility proxy is the 65-minutes sampled
realized volatility.
Chapter 6

Conclusion and Future Work

In this thesis, we proposed a Gaussian process regression based volatility model for the
problem of analyzing and predicting the time varying volatility of financial time series
data. After we introduced the GPRSV model, we gave a solution to jointly learning
the hidden volatility states and the Gaussian process dynamics. Also we discussed the
possible way of adding exogenous factors to improve its forecasting performance.
Based on our experiment results, we can successfully learn the Gaussian process re-
gression stochastic volatility model’s hidden states and hyper-parameters. Also the Gaus-
sian process regression based stochastic volatility models can achieve better performance
compared with the standard economic parametric models.
For future research, there are several possible directions. First, we introduced the
GPRSV framework to analyze the time varying volatility, and discussed that we can
add exogenous factors to improve the forecasting performance. There are many factors
that can be used. In different applications, depending on the data we analyze, we can
study what information is more relevant. In our experiment, we used only one covariance
function. The squared-exponential (SE) function gave us the best performance for our
data set, but we can apply different covariance functions besides the most common used
SE function. There are many more choices for molders to explore. Second, we think
the way of learning the GPRSV model can be applied to other Gaussian process state-
space models as well. Third, we can extend our Gaussian process regression stochastic
volatility model to other financial data analyze. Any application using state-space models
to analyze the data is also applicable using Gaussian process state-space model. Our

57
approach is marginalizing out the Gaussian process regression function values and then
jointly learning the hidden states and the hyper-parameters. This methodology can be
applied to those applications. Especially the learning procedure we used.

58
References

[1] Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael Jordan. An
introduction to MCMC for machine learning. Machine learning, 50(1-2):5–43, 2003.

[2] Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. Particle markov chain
monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 72(3):269–342, 2010.

[3] Isabel Beichl and Francis Sullivan. The metropolis algorithm. Computing in Science
and Engineering, 2(1):65–69, 2000.

[4] Fischer Black and Myron Scholes. The pricing of options and corporate liabilities.
Journal of Political Economy, 81(3):637–654, 1973.

[5] Johan Bollen, Huina Mao, and Xiaojun Zeng. Twitter mood predicts the stock
market. Journal of Computational Science, 2(1):1–8, 2011.

[6] Tim Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal


of Econometrics, 31(3):307–327, April 1986.

[7] Christian Brownlees, Robert Engle, and Bryan Kelly. A practical guide to volatility
forecasting through calm and storm. Journal of Risk, 14(2):1–20, 2011.

[8] Jun Cai. A markov model of switching-regime ARCH. Journal of Business and
Economic Statistics, 12(3):309–316, 1994.

[9] John Campbell, Andrew Wen-Chuan Lo, and Archie Craig MacKinlay. The econo-
metrics of financial markets, volume 2. princeton University press Princeton, NJ,
1997.

59
[10] National Research Council. Frontiers in Massive Data Analysis. The National
Academies Press, Washington, DC, 2013.

[11] Drew Creal. A survey of sequential monte carlo methods for economics and finance.
Econometric Reviews, 31(3):245–296, 2012.

[12] Dan Crisan and Arnaud Doucet. A survey of convergence results on particle filtering
methods for practitioners. Signal Processing, IEEE Transactions on, 50(3):736–746,
2002.

[13] Pierre Del Moral. Nonlinear filtering: Interacting particle solution. Markov Processes
and Related Fields, 2(4):555–580, 1996.

[14] Kresimir Demeterfi, Emanuel Derman, Michael Kamal, and Joseph Zou. More than
you ever wanted to know about volatility swaps. Goldman Sachs Quantitative Strate-
gies Research Notes, 1999.

[15] Arnaud Doucet and Adam M Johansen. A tutorial on particle filtering and smooth-
ing: Fifteen years later. Handbook of Nonlinear Filtering, 12:656–704, 2009.

[16] Robert Engle. Autoregressive conditional heteroscedasticity with estimates of the


variance of united kingdom inflation. Econometrica, 50(4):987–1007, 1982.

[17] Emily Fox, Erik Sudderth, Michael Jordan, and Alan Willsky. Bayesian nonpara-
metric methods for learning markov switching processes. 2010.

[18] Roger Frigola, Yutian Chen, and Carl Rasmussen. Variational Gaussian process
State-Space Models. In Advances in Neural Information Processing Systems, pages
3680–3688, 2014.

[19] Roger Frigola, Fredrik Lindsten, Thomas B. Schön, and Carl E. Rasmussen. Bayesian
inference and learning in gaussian process state-space models with particle MCMC.
In Advances in Neural Information Processing Systems 26, pages 3156–3164, 2013.

[20] Box George. Time Series Analysis: Forecasting and Control. Pearson Education
India, 1994.

60
[21] Zoubin Ghahramani. Bayesian non-parametrics and the probabilistic approach to
modelling. Philosophical Transactions of the Royal Society A: Mathematical, Phys-
ical and Engineering Sciences, 371(1984):20110553–20110553, 2012.

[22] Lawrence Glosten, Ravi Jagannathan, and David Runkle. On the relation between
the expected value and the volatility of the nominal excess return on stocks. The
Journal of Finance, 48(5):1779–1801, 1993.

[23] Neil Gordon, David Salmond, and Adrian Smith. Novel approach to nonlinear/non-
gaussian bayesian state estimation. 140(2):107–113, 1993.

[24] James Hamilton. A new approach to the economic analysis of nonstationary time
series and the business cycle. Econometrica, 57(2):357–384, 1989.

[25] James Hamilton. Time Series Analysis. Princeton University Press, 1994.

[26] James Hamilton and Raul Susmel. Autoregressive conditional heteroskedasticity


and changes in regime. Journal of Econometrics, 64(1-2):307–333, 1994.

[27] John Hull and Alan White. The pricing of options on assets with stochastic volatil-
ities. The Journal of Finance, 42(2):281–300, 1987.

[28] Gareth Janacek. Time series analysis forecasting and control. Journal of Time
Series Analysis, 31(4):303–303, 2010.

[29] Simon Julier and Jeffrey Uhlmann. Unscented filtering and nonlinear estimation.
Proceedings of the IEEE, 92(3):401–422, Mar 2004.

[30] Simon Julier, Jeffrey Uhlmann, and Hugh Durrant-Whyte. A new approach for
filtering nonlinear systems. 3:1628–1632, 1995.

[31] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems.
Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960.

[32] Sangjoon Kim, Neil Shephard, and Siddhartha Chib. Stochastic volatility: likelihood
inference and comparison with ARCH models. The Review of Economic Studies,
65(3):361–393, 1998.

61
[33] Jonathan Ko and Dieter Fox. Gp-bayesfilters: Bayesian filtering using gaussian
process prediction and observation models. Autonomous Robots, 27(1):75–90, 2009.

[34] Neil Lawrence. Probabilistic non-linear principal component analysis with gaussian
process latent variable models. The Journal of Machine Learning Research, 6:1783–
1816, 2005.

[35] Fredrik Lindsten, Michael Jordan, and Thomas Schön. Particle gibbs with ancestor
sampling. Journal of Machine Learning Research, 15:2145–2184, 2014.

[36] Benoit Mandelbrot. The Variation of Certain Speculative Prices. The Journal of
Business, 36:394, 1963.

[37] Juri Marcucci. Forecasting Stock Market Volatility with Regime-Switching GARCH
Models. Studies in Nonlinear Dynamics and Econometrics, 9(4):1–55, December
2005.

[38] Harry Markowitz. Portfolio selection*. The Journal of Finance, 7(1):77–91, 1952.

[39] Bruce McElhoe. An assessment of the navigation and course corrections for a manned
flyby of mars or venus. Aerospace and Electronic Systems, IEEE Transactions on,
AES-2(4):613–623, July 1966.

[40] Allan McLeod and William Li. Diagnostic checking arma time series models using
squared-residual autocorrelations. Journal of Time Series Analysis, 4(4):269–273,
1983.

[41] Radford Neal. Bayesian Learning for Neural Networks. Springer-Verlag New York,
Inc., Secaucus, NJ, USA, 1996.

[42] Radford Neal. Regression and classification using gaussian process priors. Bayesian
Statistics, 6:475–501, 1998.

[43] Daniel Nelson. Conditional Heteroskedasticity in Asset Returns: A New Approach.


Econometrica, 59(2):347–70, March 1991.

[44] Chicago Board of Option Exchange (CBOE). Vix index and volatility. http://www.
cboe.com/micro/vix-and-volatility.aspx, January 2015.

62
[45] Carl Rasmussen and Christopher Williams. Gaussian Processes for Machine Learn-
ing. the MIT Press, 2006.

[46] Stanley Schmidt. The kalman filter-its recognition and development for aerospace
applications. Journal of Guidance, Control, and Dynamics, 4(1):4–7, 1981.

[47] Neil Shephard and Andersen Torben. Stochastic Volatility: Origins and Overview.
Economics Series Working Papers 389, University of Oxford, Department of Eco-
nomics, March 2008.

[48] Stephen Taylor. Modelling financial time series, 1986.

[49] Stephen Taylor. Modeling stochastic volatility: A review and comparative study.
Mathematical Finance, 4(2):183–204, 1994.

[50] Stephen Taylor. Financial returns modelled by the product of two stochastic pro-
cesses, a study of daily sugar prices. Oxford University Press, 2005.

[51] Andersen Torben, Tim Bollerslev, Francis Diebold, and Paul Labys. Modeling and
forecasting realized volatility. Econometrica, 71(2):579–625, 2003.

[52] Ruey Tsay. Analysis of financial time series. Wiley, 2010.

[53] Ryan Turner. Gaussian Processes for State Space Models and Change Point Detec-
tion. PhD thesis, University of Cambridge, Cambridge, UK, July 2011.

[54] Jack Wang, Aaron Hertzmann, and David Blei. Gaussian process dynamical models.
In Advances in neural information processing systems, pages 1441–1448, 2005.

[55] Yue Wu, José Miguel Hernández-Lobato, and Zoubin Ghahramani. Gaussian process
volatility model. In Advances in Neural Information Processing Systems, pages 1044–
1052, 2014.

[56] Jean-Michel Zakoian. Threshold heteroskedastic models. Journal of Economic Dy-


namics and Control, 18(5):931–955, September 1994.

63

You might also like