Modelling Data
Modelling Data
estimation
Mark Twain.
Florian Herzog
2010
Modeling real-world data
When we model observed (realized) data, we encounter usually the following situations:
• Time-series data of a single time series (Dt), e.g. economic data.
• Online observations also single time series (Dt)
• Ensemble data from measurements (experiments) or seasonal data (Dtj )
Modeling ensemble data is often easier than single time series, because we have the
same stochastic information of a given point in time t with multiple realizations.
By Dt t = 1, ..., T we denote the data observation at time in the case that we have
only one single time series observation. By Dtj t = 1, ..., T j = 1, ..., l we denote
the time series observation at time t and measurement j .
Stochastic differential equations are can be viewed as stochastic process from two point
of view:
• Stochastic process with a probability distribution p(t|θ) across-time, where θ
denotes the parameters
• A stochastic process, where the changes in the resulting time series is the stochastic
D −D
process, i.e. p(Dt − Dt−1|θ) or p( tD t−1 |θ)
t−1
The first interpretation is help full to describe ensemble data and the second to analyze
single time series.
In order to develop models, we first need to describe the data. Statistical tools to
analyze ensemble data are:
• Empirical ensemble mean µt = Et[Dtj ] and ensemble variance σt = V art[Dtj ]
across time
• Empirical ensemble distributions, i.e. histograms across time p(Dt)
• Tests for distributions, e.g. test for normality or transformations of normal-distributed
random variables
• Calculation of higher central moments across time such as skewness γt and kurtosis
κt
Pl
The ensemble mean is defined as Et[Dtj ] = l−1 1 j
j=1 (Dt )
Pl
Central moment n > 1 is defined as: Mtn = Et[(Dtj − µt)n] = l−n 1
j=1 (D j
t −
mut)n, where n = 3 is called skewness γt and n = 4 kurtosis κt.
When the data from different points in time are not independent, they possess
autocovariance or autocorrelation. Autocovariance is defined as:
T
X
Cov(Dt, Dt−j ) = E[(Dt − µt)(Dt−j − µt−j )] = ((Dt − µt)(Dt−j − µt−j ) (1)
t=
and autocorrelation as
Cov(Dt, Dt−j )
acf ((Dt, Dt−j ) = p (2)
Var[Dt]Var[Dt−j ]
When we assume a discrete white noise process εt which identically and independent-
ly distributed (i.i.d) with zero expectation and unit variance, the autocovariance is
calculated as: E[(εt)(εt−j )] = 0 ∀j . Therefore, the autocorrelation is also zero.
0.8
Autocorrelation
Level
Significance Level
0.4
0.2
−0.2
0 2 4 6 8 10 12 14 16 18 20
Lag
Abbildung 1: Autocorrelogram
When we assume a discrete white noise process εt which identically and independently
distributed (i.i.d) with zero expectation and σ 2 variance. Then following statistical
properties apply:
E[εt] = 0 (3)
2 2
E[εt ] = σ (4)
E[εtετ ] = 0 f or t 6= τ (5)
E[Dt] = µ ∀t (6)
E[(Dt − µ)(Dt−j − µ)] = νj ∀t ∀j . (7)
This form of stationarity is often also called weak stationarity. For example the process:
Dt = µ + εt where εt is white noise process with zero mean and σ 2 variance. In this
case Dt is stationary since:
E[Dt] = µ
½
σt j=0
E[(Dt − µ)(Dt−j − µ)] =
0 j 6= 0
Stationarity: A process is strictly stationary iff for any value of j1, j2, j3, ... the joint
distribution of Dt, Dt+j1 , Dt+j2 , Dt+j3 , .. depend only on the intervals separating the
different values of j and not the time t.
Ergodicity: A process is called ergodic for the mean iff:
1 p
lim E[(Dt − µ)(Dt−j − µ)] → νj (9)
T →∞ T − j
We concentrate on two main examples for our discussion of modelling real world data:
Yk+1 = c + ΦYk + εk
c
µ =
1−Φ
From this formula, we can see that Φ must be smaller than 1, otherwise the mean
would be negative when c ≥ 0. Also we have assumed that E[Yk ] = E[Yk+1] = µ
and exists. Second moment:
2 2
E[(Yk+1 − µ) ] = E[(Φ(Yk − µ) + εk ) ]
2 2
= Φ E[(Yk − µ)] + 2ΦE[(Yk − µ)εk ] + E[εk ]
2 2 2
E[(Yk+1 − µ) ] = Φ E[(Yk − µ)] + σ
Autocorrelation:
σ2
E[(Yk+1 − µ)(Yk − µ)] = Φν = Φ .
1 − Φ2
p
Autocorrelation is defined as Cov[Yk+1Yk ]/ V ar[Yk+1]V ar[Yk ] and V ar[Yk ] =
V ar[Yk+1] = ν . Therefore we get
A covariance stationary AR(1) process is only well defined when |Φ| ≤ 1, otherwise
the variance is infinite and mean makes no sense. We can check if a process is a AR(1)
process by calculating the mean and variance over-time and the autocorrelation. From
these three calculations the parameters can be estimated. It is important to notice that
the autocorrelation is exactly the coefficient of the autoregression.
(I) Φ = 1 + µ∆t
(II) Φ = 1 − b∆t
When we assume that µ > 0, the first model is instable and we can not use the
techniques from auto-regressions to estimate the parameters. We have to deal different-
ly in this case, by using p = ln(x), since then we get an SDE with constant coefficients.
When we assume that b > 0, the process is a true mean reverting process that can be
estimated by AR(1) methods. In this case, the discrete-time model is also stable.
In order to model empirical data where we have only one time series, the following
checks and tests are necessary:
• Inspect the time-series for instability (constant upward slopes).
• Check the differences and log-differences across time for assuming ergodicity by
plotting variance
• For time-series with strongly time-varying variance, especially check the log diffe-
rences across-time
• For stable time-series, calculate the autocorrelation for multiple lags.
• Plot the histogram for the time-series and differences for checking normality
We start to examine the appropriate log-likelihood function with our frame work of the
discretized SDEs. Let ψ ∈ Rd be the vector of unknown parameters, which belongs to
the admissible parameter space Ψ, e.g. a, b, etc. The various system matrices of the
discrete models, as well as the variances of stochastic processes, given in the previous
section, depend on ψ . The likelihood function of the state space model is given by the
joint density of the observational data x = (xN , xN −1, . . . , x1)
which reflects how likely it would have been to have observed the date if ψ were the
true values of the parameters.
Using the definition of condition probability and employing Bayes’s theorem recursively,
we now write the joint density as product of conditional densities
l(x, ψ) = p(xN |xN −1, . . . , x1; ψ) · . . . · x(xk |xk−1, . . . , x1; ψ) · . . . · p(x1; ψ) (12)
where we approximate the initial density function p(x1; ψ) by p(x1|x0; ψ). Further-
more, since we deal with a Markovian system, future values of xl with l > k only
depend on (xk , xk−1, . . . , x1) through the current values of xk , the expression in (12)
can be rewritten to depend on only the last observed values, thus it is reduced to
where εk is an i.i.d random variable with normal distribution, zero expectation and unit
variance. Since f and g are deterministic functions, xk+1 is conditional (on xk normal
distributed with
N n
X −n 1
ln(l(x, ψ)) = ln((2π) 2 |Var[xk+1 ]|− 2 )
j=1
1 o
T −1
− (xk+1 − E[xk+1]) Var[xk+1] (xk+1 − E[xk+1]) (20)
2
N
X −1 −1 (xk+1 − Φxk )2
ln(l(x, ψ)) = ln((2π) 2 σ )− . (22)
j=1
2σ 2
The function is maximized by taking the derivatives for Φ and σ . The log-likelihood
function has the same from as an so-called ordinary least-square (OLS) estimation.
The solution for Φ is the same solution as for an OLS regression. Furthermore, the
solution is exactly the definition of autocorrelation with lag one. The estimation for σ
is the empirical standard deviation of the estimation of the empirical white noise, i.e.
εbk = xk+1 − Φxk