0% found this document useful (0 votes)
15 views

Modelling Data

Modelling_data
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Modelling Data

Modelling_data
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Modeling data and parameter

estimation

Predictions are difficult. Especially about the future

Mark Twain.

Florian Herzog

2010
Modeling real-world data

When we model observed (realized) data, we encounter usually the following situations:
• Time-series data of a single time series (Dt), e.g. economic data.
• Online observations also single time series (Dt)
• Ensemble data from measurements (experiments) or seasonal data (Dtj )
Modeling ensemble data is often easier than single time series, because we have the
same stochastic information of a given point in time t with multiple realizations.

By Dt t = 1, ..., T we denote the data observation at time in the case that we have
only one single time series observation. By Dtj t = 1, ..., T j = 1, ..., l we denote
the time series observation at time t and measurement j .

Data Dt is recorded in discrete intervals (sampled data) and assumed to be generated


by an SDE.

Stochastic Systems, 2009 2


Modeling real-world data

Stochastic differential equations are can be viewed as stochastic process from two point
of view:
• Stochastic process with a probability distribution p(t|θ) across-time, where θ
denotes the parameters
• A stochastic process, where the changes in the resulting time series is the stochastic
D −D
process, i.e. p(Dt − Dt−1|θ) or p( tD t−1 |θ)
t−1
The first interpretation is help full to describe ensemble data and the second to analyze
single time series.

In order to deal with discrete data, all SDEs need to be discretized.

Stochastic Systems, 2009 3


Empirical statistical tools

In order to develop models, we first need to describe the data. Statistical tools to
analyze ensemble data are:
• Empirical ensemble mean µt = Et[Dtj ] and ensemble variance σt = V art[Dtj ]
across time
• Empirical ensemble distributions, i.e. histograms across time p(Dt)
• Tests for distributions, e.g. test for normality or transformations of normal-distributed
random variables
• Calculation of higher central moments across time such as skewness γt and kurtosis
κt
Pl
The ensemble mean is defined as Et[Dtj ] = l−1 1 j
j=1 (Dt )
Pl
Central moment n > 1 is defined as: Mtn = Et[(Dtj − µt)n] = l−n 1
j=1 (D j
t −
mut)n, where n = 3 is called skewness γt and n = 4 kurtosis κt.

Stochastic Systems, 2009 4


Autocovariance, Autocorrelation

When the data from different points in time are not independent, they possess
autocovariance or autocorrelation. Autocovariance is defined as:
T
X
Cov(Dt, Dt−j ) = E[(Dt − µt)(Dt−j − µt−j )] = ((Dt − µt)(Dt−j − µt−j ) (1)
t=

and autocorrelation as
Cov(Dt, Dt−j )
acf ((Dt, Dt−j ) = p (2)
Var[Dt]Var[Dt−j ]

When we assume a discrete white noise process εt which identically and independent-
ly distributed (i.i.d) with zero expectation and unit variance, the autocovariance is
calculated as: E[(εt)(εt−j )] = 0 ∀j . Therefore, the autocorrelation is also zero.

Stochastic Systems, 2009 5


Autocorrelation

Sample Autocorrelation Function (ACF)


1

0.8
Autocorrelation
Level

Serial correlation 0.6

Significance Level
0.4

0.2

−0.2
0 2 4 6 8 10 12 14 16 18 20
Lag

Abbildung 1: Autocorrelogram

Stochastic Systems, 2009 6


Discrete-time white noise

When we assume a discrete white noise process εt which identically and independently
distributed (i.i.d) with zero expectation and σ 2 variance. Then following statistical
properties apply:

E[εt] = 0 (3)
2 2
E[εt ] = σ (4)
E[εtετ ] = 0 f or t 6= τ (5)

The last property is derived from the assumption of independence. A discretized


Brownian Motion is a discrete-time white noise process with εt ∼ N (0, ∆t) where
∆t is the time discretization.
In discrete-time models, a white noise process can be normally distributed (Gaussian
white noise) but can be distributed by any other distribution as long as the i.i.d.
assumption is valid.

Stochastic Systems, 2009 7


Statistical Theory

Stationarity: A process is covariance-stationary if and only if (iff):

E[Dt] = µ ∀t (6)
E[(Dt − µ)(Dt−j − µ)] = νj ∀t ∀j . (7)

This form of stationarity is often also called weak stationarity. For example the process:
Dt = µ + εt where εt is white noise process with zero mean and σ 2 variance. In this
case Dt is stationary since:

E[Dt] = µ
½
σt j=0
E[(Dt − µ)(Dt−j − µ)] =
0 j 6= 0

Stochastic Systems, 2009 8


Statistical Theory

Stationarity: A process is strictly stationary iff for any value of j1, j2, j3, ... the joint
distribution of Dt, Dt+j1 , Dt+j2 , Dt+j3 , .. depend only on the intervals separating the
different values of j and not the time t.
Ergodicity: A process is called ergodic for the mean iff:

lim E[Dt] = µ. (8)


t→∞

A process is said to be ergodic for the second moment iff:

1 p
lim E[(Dt − µ)(Dt−j − µ)] → νj (9)
T →∞ T − j

where →p denotes convergence in probability. From an ergodic process, we can estimate


all necessary data without possing ensemble data. If an process is not ergodic, we have
only one observation, we can not identify its parameters.

Stochastic Systems, 2009 9


Examples: GBM and Mean-Reversion

We concentrate on two main examples for our discussion of modelling real world data:

dx = µxdt + σxdW (I)


dx = (a − bx)dt + σdW (II)

which are Euler discretized with ∆t and we get:

xk+1 = µxk + xk σ²k (I)


xk+1 = a + bxk + xk σ²k (II)

where µ = 1 + µ∆t, σ = σ ∆a, a = a∆t, b = 1 − b∆t, and ²k is a Gaussian
white noise with zero mean and unit variance. Both process have an similar discrete-
time representation. Both are conditionally normal distributed, i.e. xk+1 is a normal
distribution when xk is given. Both have the form of autoregressive processes.

Stochastic Systems, 2009 10


Autoregressive first order Process

We analyze an so-called first order autoregressive (AR(1)):

Yk+1 = c + ΦYk + εk

Long-term mean: We take the mean on both sides of the equation

E[Yk+1] = E[c + ΦYk + εk ]


E[Yk+1] = c + ΦE[Yk ]
µ= = c + Φµ
c
µ =
1−Φ

Stochastic Systems, 2009 11


Autoregressive first order Process

c
µ =
1−Φ

From this formula, we can see that Φ must be smaller than 1, otherwise the mean
would be negative when c ≥ 0. Also we have assumed that E[Yk ] = E[Yk+1] = µ
and exists. Second moment:

Yk+1 = µ(1 − Φ) + ΦYk + εk


Yk+1 − µ = Φ(Yk − µ) + εk

Stochastic Systems, 2009 12


Autoregressive first order Process

2 2
E[(Yk+1 − µ) ] = E[(Φ(Yk − µ) + εk ) ]
2 2
= Φ E[(Yk − µ)] + 2ΦE[(Yk − µ)εk ] + E[εk ]
2 2 2
E[(Yk+1 − µ) ] = Φ E[(Yk − µ)] + σ

where E[(Yk − µ)εk ] = 0. Assuming covariance stationarity, i.e. E[(Yk+1 − µ)2] = ν


and E[(Yk − µ)2] = ν . We get:
2 2
ν = Φ ν+σ
σ2
ν =
1 − Φ2

Stochastic Systems, 2009 13


Autoregressive first order Process

Autocorrelation:

E[(Yk+1 − µ)(Yk − µ)] = E[(Φ(Yk − µ) + εk )(Yk − µ)]


2
E[(Yk+1 − µ)(Yk − µ)] = ΦE[((Yk − µ) ] + E[(εk )(Yk − µ)] .

Again E[(εk )(Yk − µ)] = 0 and E[((Yk − µ)2] = ν we get:

σ2
E[(Yk+1 − µ)(Yk − µ)] = Φν = Φ .
1 − Φ2
p
Autocorrelation is defined as Cov[Yk+1Yk ]/ V ar[Yk+1]V ar[Yk ] and V ar[Yk ] =
V ar[Yk+1] = ν . Therefore we get

acf (Yk+1Yk ) = φ (10)

Stochastic Systems, 2009 14


Autoregressive first order Process

AR(1) process with Yk+1 = ΦYk + c + εk hast the following properties:


c
M ean : µ =
1−Φ
σ2
V ariance : ν =
1 − Φ2
j
Autocorr. : acf (Yk+j Yk ) = φ

A covariance stationary AR(1) process is only well defined when |Φ| ≤ 1, otherwise
the variance is infinite and mean makes no sense. We can check if a process is a AR(1)
process by calculating the mean and variance over-time and the autocorrelation. From
these three calculations the parameters can be estimated. It is important to notice that
the autocorrelation is exactly the coefficient of the autoregression.

Stochastic Systems, 2009 15


Example: GBM and mean reversion

The two examples have a discrete-time structure as an first-order autoregressive process


with:

(I) Φ = 1 + µ∆t
(II) Φ = 1 − b∆t

When we assume that µ > 0, the first model is instable and we can not use the
techniques from auto-regressions to estimate the parameters. We have to deal different-
ly in this case, by using p = ln(x), since then we get an SDE with constant coefficients.

When we assume that b > 0, the process is a true mean reverting process that can be
estimated by AR(1) methods. In this case, the discrete-time model is also stable.

Stochastic Systems, 2009 16


Rules for empirical Data

In order to model empirical data where we have only one time series, the following
checks and tests are necessary:
• Inspect the time-series for instability (constant upward slopes).
• Check the differences and log-differences across time for assuming ergodicity by
plotting variance
• For time-series with strongly time-varying variance, especially check the log diffe-
rences across-time
• For stable time-series, calculate the autocorrelation for multiple lags.
• Plot the histogram for the time-series and differences for checking normality

Stochastic Systems, 2009 17


Parameter Identification

We start to examine the appropriate log-likelihood function with our frame work of the
discretized SDEs. Let ψ ∈ Rd be the vector of unknown parameters, which belongs to
the admissible parameter space Ψ, e.g. a, b, etc. The various system matrices of the
discrete models, as well as the variances of stochastic processes, given in the previous
section, depend on ψ . The likelihood function of the state space model is given by the
joint density of the observational data x = (xN , xN −1, . . . , x1)

l(x, ψ) = p(xN , xN −1, . . . , x1; ψ) (11)

which reflects how likely it would have been to have observed the date if ψ were the
true values of the parameters.

Stochastic Systems, 2009 18


Parameter Identification

Using the definition of condition probability and employing Bayes’s theorem recursively,
we now write the joint density as product of conditional densities

l(x, ψ) = p(xN |xN −1, . . . , x1; ψ) · . . . · x(xk |xk−1, . . . , x1; ψ) · . . . · p(x1; ψ) (12)

where we approximate the initial density function p(x1; ψ) by p(x1|x0; ψ). Further-
more, since we deal with a Markovian system, future values of xl with l > k only
depend on (xk , xk−1, . . . , x1) through the current values of xk , the expression in (12)
can be rewritten to depend on only the last observed values, thus it is reduced to

l(x, ψ) = p(xN |xN −1; ψ) · . . . · p(xk |xk−1; ψ) · . . . · p(x1; ψ) (13)

Stochastic Systems, 2009 19


Parameter Identification

A Euler discretized SDE is given as:



xk+1 = xk + f (xk , k)∆t + g(xk , k) ∆tεk (14)

where εk is an i.i.d random variable with normal distribution, zero expectation and unit
variance. Since f and g are deterministic functions, xk+1 is conditional (on xk normal
distributed with

E[xk+1] = xk + f (xk , k)∆t (15)


T
Var[xk+1] = g(xk , k)g(xk , k) ∆t (16)
1 1 (x
−2 T
k+1 −E[xk+1 ]) Var[xk+1 ]
−1 (x
k+1 −E[xk+1 ])
p(xk+1) = np e (17)
(2π) 2 |Var[xk+1]|

Stochastic Systems, 2009 20


Parameter Identification

The parameter of an SDEs are estimated by maximizing the log-likelihood. To maximize


the log-likelihood is equivalent to maximizing the likelihood, since the log function is
strictly increasing function. The log likelihood is given as:

ln(l(x, ψ)) = ln(p(xN |xN −1; ψ) · . . . · p(yk |xk−1; ψ) · . . . · p(x1; ψ))(18)


N
X
ln(l(x, ψ)) = ln(p(xj |xj−1; ψ) (19)
j=1

N n
X −n 1
ln(l(x, ψ)) = ln((2π) 2 |Var[xk+1 ]|− 2 )
j=1

1 o
T −1
− (xk+1 − E[xk+1]) Var[xk+1] (xk+1 − E[xk+1]) (20)
2

Stochastic Systems, 2009 21


Example: Parameter Identification

We want to identify the parameters of a AR(1) process without any constant:

xk+1 = Φxk + σεk (21)

which has the log-likelihood function:

N
X −1 −1 (xk+1 − Φxk )2
ln(l(x, ψ)) = ln((2π) 2 σ )− . (22)
j=1
2σ 2

The function is maximized by taking the derivatives for Φ and σ . The log-likelihood
function has the same from as an so-called ordinary least-square (OLS) estimation.

Stochastic Systems, 2009 22


Example: Parameter Identification

The solution of the maximum likelihood estimations is:


PN
b j=1 xk+1 xk
Φ = PN (23)
2
j=1 xk
N
1 X 2
σ
b = (xk+1 − Φxk ) (24)
N j=1

The solution for Φ is the same solution as for an OLS regression. Furthermore, the
solution is exactly the definition of autocorrelation with lag one. The estimation for σ
is the empirical standard deviation of the estimation of the empirical white noise, i.e.
εbk = xk+1 − Φxk

Stochastic Systems, 2009 23

You might also like