Dynamic Linear Models
Dynamic Linear Models
Marko Laine
arXiv:1903.11309v2 [stat.ME] 21 May 2019
Abstract Dynamic linear models (DLM) offer a very generic framework to analyse
time series data. Many classical time series models can be formulated as DLMs, in-
cluding ARMA models and standard multiple linear regression models. The models
can be seen as general regression models where the coefficients can vary in time. In
addition, they allow for a state space representation and a formulation as hierarchical
statistical models, which in turn is the key for efficient estimation by Kalman formu-
las and by Markov chain Monte Carlo (MCMC) methods. A dynamic linear model
can handle non-stationary processes, missing values and non-uniform sampling as
well as observations with varying accuracies. This chapter gives an introduction to
DLM and shows how to build various useful models for analysing trends and other
sources of variability in geodetic time series.
Statistical analysis of time series data is usually faced with the fact that we have
only one realization of a process whose properties might not be fully understood.
We need to assume that some distributional properties of the process that generate
the observations do not change with time. In linear trend analysis, for example, we
assume that there is an underlying change in the background mean that stays approx-
imately constant over time. Dynamic regression avoids this by explicitly allowing
temporal variability in the regression coefficients and by letting some of the system
properties to change in time. Furthermore, the use of unobservable state variables
allows direct modelling of the processes that are driving the observed variability,
such as seasonal variation or external forcing, and we can explicitly allow some
modelling error.
Marko Laine
Finnish Meteorological Institute, Helsinki, Finland, e-mail: [email protected]
1
2 Marko Laine
The state space description offers a unified formulation for the analysis of dynamic
regression models. The same formulation is used extensively in signal processing
and geophysical data assimilation studies, for example. A general dynamic linear
model with an observation equation and a model equation is
yt = Ht xt + εt , εt ∼ N(0, Rt ), (3.1)
xt = Mt xt−1 + Et , Et ∼ N(0, Qt ). (3.2)
cess, like trend, seasonality, etc. We observe a linear combination of the states with
noise εt , and matrix Ht (m×k) is the observation operator that transforms the model
states into observations. Both observations and the system states can have additive
Gaussian errors with covariance matrices Rt (k × k) and Qt (m × m), respectively. In
univariate time series analysis we will have k = 1. With multivariate data, the system
matrices Mt , Ht , Rt and Qt can be used to define correlations between the observed
components.
This formulation is quite general and flexible as it allows handling of many time
series analysis problems in a single framework. Moreover, a unified computational
tool can be used, i.e. a single DLM computer code can be used for various purposes.
Below we give examples of different analyses. As we are dealing with linear models,
we assume that the operators Mt and Ht are linear. However, they can change with
the time index t and we will drop the time index in the cases where the matrices
are assumed static in time. The state space framework can be extended to non-linear
model and non-Gaussian errors, and to spatial-temporal analyses as well, see, e.g.,
Cressie and Wikle (2011); Särkkä (2013). However, as can be seen in the following
example, already the dynamic linear Gaussian formulation provides a large class of
models for time series trend analyses.
A simple local level and local trend model can be used as a basis for many trend
related studies. Consider a mean level process µt which is changing smoothly in
time and which we observe with additive Gaussian noise. We assume that the change
in the mean, µt+1 − µt , is controlled by a trend process αt and the temporal change
2
in these processes is assumed to be Gaussian with given variances σlevel 2 .
and σtrend
This can be written as
2
yt = µt + εobs , εobs ∼ N(0, σobs ), observations, (3.3)
2
µt = µt−1 + αt−1 + εlevel , εlevel ∼ N(0, σlevel ), local level, (3.4)
2
αt = αt−1 + εtrend , εtrend ∼ N(0, σtrend ), local trend, (3.5)
We have dropped the time index t from those elements that do not depend on time.
2
It is interesting to note, that if we set σlevel = 0, we have a second difference
process for µt as
and it can be shown (Durbin and Koopman, 2012) that this is equivalent to cubic
spline smoothing with smoothing parameter λ = σtrend 2 /σ 2 > 0.
obs
Figure 3.1 shows simulated observations with a true piecewise trend and the fitted
mean process µt , t = 1, . . . , n together with its 95% uncertainty limits. In this exam-
ple, the observation uncertainty standard deviation (σobs = 0.3) as well as the level
and trend variability standard deviations (σlevel = 0.0, σtrend = 0.01) are assumed to
be known. In the later examples these values are estimated from the data.
1.5
observations
dlm fit
95% uncertainty
1 true trend
0.5
-0.5
0 10 20 30 40 50 60 70 80 90 100
Fig. 3.1 DLM smoother fit to synthetic data set using a local trend model. In this example σobs =
0.3, σlevel = 0.0, and σtrend = 0.01, with time interval equal to one unit.
The DLM formulation can be seen as a special case of a general hierarchical statisti-
cal model with three levels: data, process and parameters (see e.g. Cressie and Wikle
(2011)), with corresponding conditional statistical distributions. First, the observa-
tion uncertainty p(yt |xt , θ ) described by the observation equation and forming the
statistical likelihood function, second, the process uncertainty of the unknown states
xt and their evolution given by the process equations as p(xt |θ ) or p(xt |xt−1 , θ ), and
third, the unconditional prior uncertainty for the model parameters p(θ ). This for-
mulation allows both an efficient description of the system and computational tools
to estimate the components. It also combines different statistical approaches, as we
4 Introduction to Dynamic Linear Models for Time Series Analysis 5
can have full prior probabilities for the unknowns (the Bayesian approach), estimate
them by maximum likelihood and plug them back (frequentistic approach), or even
fix the model parameters by expert knowledge (a non-statistical approach). By the
Bayes formula, we can write the state and parameter posterior distributions as a
product of the conditional distributions
which is the basis for full Bayesian estimation procedures. Next we will describe
the steps needed for Bayesian DLM estimation of model states, parameters and their
uncertainties.
To recall the notation, yt are the observations and xt are the hidden system states for
time indexes t = 1, . . . , n. In addition, we have a static vector θ that contains aux-
iliary parameters needed in defining the system matrices Mt and Ht and the model
and observation error covariance matrices Qt and Rt . For dynamic linear models we
have efficient and well founded computational tools for all relevant statistical dis-
tributions of interest. For the state estimation assuming a known parameter vector
θ the assumptions on linearity and Gaussian errors allows us to estimate the model
states by classical recursive Kalman formulas. The variance and other structural
parameters appear in non-linear way and their estimation can be done either by nu-
merical optimization or by Markov chain Monte Carlo (MCMC) methods. MCMC
allows for a full Bayesian statistical analysis for the joint uncertainty in the dynamic
model states and the static structural parameters (Gamerman, 2006). Table 3.1 re-
lates the different statistical distributions to the algorithms, which are outlined later.
The notation y1:t , x1:t , etc. means the collection of observations or states from time
1 to time t.
Table 3.1 Conditional DLM distributions and the corresponding algorithms. The variables usead
are: xt for the time varying state of the system (e.g. trend), yt for the observations at each time t,
and θ for structural parameters used in the model and covariance matrices. Notation x1:n means all
time instances for 1, . . . , n.
distribution meaning algorithm
p(xt |xt−1 , y1:t−1 , θ ) one step prediction Kalman filter
p(xt |y1:t , θ ) filter solution Kalman filter
p(xt |y1:n , θ ) smoother solution Kalman smoother
p(x1:n |y1:n , θ ) full state given parameters simulation smoother
p(y1:t |θ ) marginal likelihood for parameters Kalman filter likelihood
p(x1:n , θ |y1:n ) full state and parameter MCMC
p(θ |y1:n ) marginal for parameter MCMC
p(x1:n |y1:n ) marginal for full state MCMC
6 Marko Laine
Below we give the relevant parts of the recursive formulas for Kalman filter and
smoother to estimate the conditional distributions of DLM states given the observa-
tions and static parameters. For more details, see Rodgers (2000); Laine et al (2014).
A notable feature of the linear Gaussian case is that the formulas below are exact
and easily implemented in computer as long as the model state dimension or the
number of observations at one time is not too large.
To start the calculations, we assume that the initial distribution of x0 at t = 0
is available. The first step in estimating the states is to use Kalman filter forward
recursion to calculate the distribution of the state vector xt given the observations up
to time t, p(xt |y1:t , θ ) = N(xt , Ct ), which is Gaussian by the linearity assumptions.
At each time t this step consists of first calculating, as prior, the mean and covariance
matrix of one-step-ahead predicted states p(xt |xt−1 , y1:t−1 , θ ) = N(b xt , C
b t ) and the
covariance matrix of the predicted observations Cy,t asb
xt = Mt xt−1
b prior mean for xt , (3.11)
Ct = Mt Ct−1 MtT + Qt
b prior covariance for xt , (3.12)
C b t HtT + Rt
b y,t = Ht C covariance for predicting yt . (3.13)
Then the posterior state mean and its covariance are calculated using the Kalman
gain matrix Gt as
−1
b t HtT C
Gt = C b y,t Kalman gain, (3.14)
rt = yt − Ht b
xt prediction residual, (3.15)
xt = b
xt + Gt rt posterior mean for xt , (3.16)
b t − Gt Ht C
Ct = C bt posterior covariance for xt . (3.17)
These equations are iterated for t = 1, . . . , n and the values of xt and Ct are stored
for further calculations. As initial values, we can use x0 = 0 and C0 = κI, i.e. a
vector of zeros and a diagonal matrix with some large value κ in the diagonal. Note
that the only matrix inversion required in the above formulas is the one related to
the observation prediction covariance matrix C b y,t , which is of size 1 × 1 when we
analyse univariate time series.
The Kalman filter provides distributions of the states at each time t given the
observations up to the current time. As we want to do retrospective time series anal-
ysis that accounts for all of the observations, we need to have the distributions of the
states for each time, given all the observations y1:n . By the linearity of the model,
these distributions are again Gaussian, p(xt |y1:n , θ ) = N(e xt , C
e t ). Using the matrices
generated by the Kalman forward recursion, the Kalman smoother backward recur-
sion gives us the smoothed states for t = n, n − 1, . . . , 1. There are several equivalent
versions of the backward recursion algorithm. Below we show the Rauch-Tung-
4 Introduction to Dynamic Linear Models for Time Series Analysis 7
Striebel recursion (Särkkä, 2013) for illustration. For alternatives, see Durbin and
Koopman (2012):
The Kalman smoother algorithm provides the distributions p(xt |y1:n , θ ) for each t,
which are all Gaussian. However, for studying trends and other dynamic features in
the system, we are interested in the joint distribution spanning the whole time range
p(x1:n |y1:n , θ ). Note that we are still conditioning on the unknown parameter vector
θ and will account for it later. This high dimensional joint distribution is not easily
accessible directly. As in many cases, instead of analytic expressions, it is more
important to be able to draw realizations from the distribution and use the sampling
distribution for statistical analysis. This has several benefits. One important is that
by comparing simulated realizations to the observations, we see how realistic the
model predictions are, which can reveal if the modelling assumptions are not valid.
Also, we can study the distributions of model outputs directly from the samples and
do not need to resolve to approximate statistics.
A simple simulation algorithm by Durbin and Koopman (2012) is the following.
The state space system equations provide a direct way to recursively sample real-
izations of both the states x1:n and the observations y1:n , but the generated states
will be independent of the original observations. However, it can be shown (Durbin
and Koopman, 2012, Section 4.9) that the distribution of the residual process of
generated against smoothed state does not depend on y1:n . This means that if we
add simulated residuals over the original smoothed state xe1:n , we get a new realiza-
8 Marko Laine
1. Generate a sample using the state space system equations, Eqs. (3.1) and (3.2) to
get x̌1:n and y̌1:n .
2. Smooth y̌1:n to get x̆1:n according to formulas in Section 3.5.
3. Add the residuals from step 2 to the original smoothed state:
∗
x1:n = x̌1:n − x̆1:n + xe1:n . (3.22)
This simulation smoother can be used in trend studies and as a part of more gen-
eral MCMC simulation algorithm that will sample from the joint posterior distri-
bution p(x1:n , θ |y1:n ) and by marginalization argument also from p(x1:n |y1:n ) where
the uncertainty in θ has been integrated out (Laine et al, 2014).
In the first examples, the variance parameters defining the model error covariance
matrix Qt were assumed to be known. In practice we need some estimation method-
ology for them. Basically there are three alternatives. The first one uses subject level
knowledge with trial and error to fix the parameters without any algorithmic tuning.
The second one use the marginal likelihood function with a numerical optimization
routine to find the maximum likelihood estimate of the parameter θ and plug the
estimate back to the equations and re-fit the DLM model. The third one use MCMC
algorithm to sample from the posterior distribution of the parameters to estimate the
parameters and to integrate out their uncertainty.
To estimate the free parameters θ in the model formulation by optimization or
by MCMC we need the marginal likelihood function p(y1:n |θ ). By the assumed
Markov properties of the system, this can be obtained sequentially as a byproduct
of the Kalman filter recursion (Särkkä, 2013),
n h
b −1
i
−2 log (p(y1:n |θ )) = constant + ∑ (yt − Ht xbt )T C y,t (yt − Ht x
b y,t |) .
bt ) + log(|C
t=1
(3.23)
On the right hand side, the parameter θ will appear in the model predictions xbt
as they depend on the matrix Mt as well as on the model error Qt . For the same
reason we need the determinant of the model prediction covariance matrix |C b y,t |.
A fortunate property is that this likelihood can be calculated along the DLM filter
recursion without much extra effort.
The scaled one-step prediction residuals
−1/2
rt∗ = C
b y,t (yt − Ht xbt ) (3.24)
can be used to check the goodness of fit of the model. In order of the DLM model
to be consistent with the observations these residuals should be approximately in-
4 Introduction to Dynamic Linear Models for Time Series Analysis 9
dependent, N(0, I) Gaussian and without serial autocorrelation. Later in the GNSS
time series example, we will do model diagnostics by residual quantile-quantile and
autocorrelation function plots.
In the following, we give several useful DLM formulations for model compo-
nents that are typically used in geodetic or in more general environmental analy-
ses. They have been used in existing applications for stratospheric ozone (Laine
et al, 2014), ionosonde analysis (Roininen et al, 2015) and for station temperature
records (Mikkonen et al, 2015). In Section 3.10, we will show analysis for synthetic
GNSS station positioning time series.
2
In the first example in Section 3.2.1 the variance σtrend was assumed to be known and
fixed. Altering the variance affects the smoothness of the fit. In Figure 3.2 the effect
of different variance parameters are shown for the same data. Note that by setting
2
both σlevel 2
and σtrend to zero results in classical linear regression without dynamical
evolution of the regression components. In this case, the 95% probability limits for
the level obtained from the smoother covariance matrix C e t coincide with the classi-
cal confidence intervals for the mean. In classical non-dynamic linear regression the
modelling error is included in the residual term, whereas in DLM we can include it
in the model definition by allowing temporal change in model parameters.
If we estimate the parameters by the likelihood approach and MCMC outlined in
Section 3.7, we get the values in the last panel of Figure 3.2 corresponding to the
posterior mean. Figure 3.3 shows MCMC chain histograms together with estimated
marginal posterior densities. It also has the point values obtained by likelihood op-
timization. Note by optimization we get an estimate for σlevel which is very close to
zero relatively to the MCMC solution, which tries to find all values of the parameter
that are consistent with the data.
Seasonal variability can be modelled by adding extra state components for the effect
of each season. A common description of seasonality uses trigonometric functions
and is achieved by using two model states for each harmonic component. Monthly
4 Introduction to Dynamic Linear Models for Time Series Analysis 11
level
= 0, trend
=0 level
= 0.1, trend
=0
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
20 40 60 80 20 40 60 80
level
= 0, trend
= 0.1 level
= 0.0121, trend
= 0.0064
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
20 40 60 80 20 40 60 80
Fig. 3.2 DLM smoother fit for sythetic data set with different smoothing levels. The dots are the
observations and solid blue line is the mean DLM fit. The grey area corresponds to 95% probability
limit from the Kalman smoother. The last panel uses the parameter obtained by MCMC.
data with annual and semiannual cycles would use four state components and the
following model and observation matrices
cos(π/6) sin(π/6) 0 0
− sin(π/6) cos(π/6) 0 0
Mseas = (3.25)
0 0 cos(π/3) sin(π/3)
0 0 − sin(π/3) cos(π/3)
and
Hseas = 1 0 1 0 . (3.26)
In addition, a corresponding part or the model error covariance matrix Qseas has
to be set up to define the allowed variability in the seasonal amplitudes. A sim-
ple approach is to use a diagonal matrix with equal values for each component as
12 Marko Laine
level
chain histogram
density estimate
optimized value
trend
12.5
12
11.5
11
y
10.5
observations
10 fitted trend
trend uncertainty
9.5 fitted model
true trend
9
0 10 20 30 40 50 60 70 80
time
Fig. 3.4 DLM smoother fit to synthetic data in Section 3.9.2 with seasonal variation, piecewise
linear trend, and missing observations.
2
yt = ρ1 yt−1 + ρ2 yt−2 + · · · + ρ p yt−p + ε, ε ∼ N(0, σAR ) (3.27)
A pure AR(3) process would then be obtained by setting the observation error
2 in Eq. (3.1) to zero and the model error component equal to the innovation
σobs
2 . If we, in addition, have σ 2 > 0, it will result to an ARMA. In fact
variance σAR obs
all ARMA and ARIMA models can be represented as DLM models (Petris et al,
2009, Section 3.2.5) and many ARIMA estimation software implementations use
the Kalman filter likelihood Eq. (3.23) to formulate the cost function for estimation.
yt = µt + γt + βt Zt + εobs , (3.29)
where µt and γt are the mean level and the seasonal components, Zt is a row matrix
of the values of the regression variables at time t, and βt is a vector of time-varying
regression coefficients. The effect of the covariates can be formulated by having the
coefficients as extra states, xproxy,t = βt , using an identity model operator, and by
adding the covariate values to the observation operator Ht as
Hproxy(t) = Zt = Zt,1 , . . . , Zt,p , (3.30)
Mproxy = I p = diag(1, . . . , 1), (3.31)
2 2
Qproxy = diag σproxy,1 , . . . , σproxy,2 . (3.32)
The DLM model for equation Eq. (3.29) is then build up as diagonal block matrix
combination of the components:
T
xt = xtrend,t xseas,t xproxy,t , (3.33)
Mtrend 0 0
Mt = 0 Mseas 0 , (3.34)
0 0 Mproxy
Ht = Htrend Hseas Hproxy(t) , (3.35)
Qtrend 0 0
Qt = 0 Qseas 0 . (3.36)
0 0 Qproxy
2
The covariate variances σproxy control the allowed temporal variability in the
coefficients βt and their values can be estimated or set to some prior value. By setting
the variances to zero, turns this model into classical multiple linear regression.
Next we estimate trends in synthetic GNSS time series provided by Machiel Bos
and Jean-Philippe Montillet. In this application, the trend estimated in the GNSS
time series represents the tectonic rate on the East and North components and the
vertical land motion on the Up coordinate. The characteristics of the GNSS time
series are discussed in details in Chapter 1 and 2. We select data for one of the sta-
tions (labeled station n:o 3 in the figures) with the three components (East, North,
Up) shown in Figure 3.5, top left panel. The time series are simulated using linear
trend, yearly seasonal variation and a combination of coloured and i.i.d Gaussian
noise. We assume that we do not know the noise structure a priori. We are interested
in the (non-local) linear trend and we need a model component for the local fluc-
tuations seen in the data. This chosen data sets does not contain any sudden jumps
in the measured position. Modelling offset changes would require a different strat-
4 Introduction to Dynamic Linear Models for Time Series Analysis 15
egy, with some iterative estimate of the jump locations, which we will not consider
here. We use a DLM approach, where we assume that the non-stationary part can be
modelled by local polynomials and the stochastic stationary part can be described as
an AR or ARMA process in addition to the i.i.d. Gaussian observation uncertainty.
See Dmitrieva et al (2015) for a somewhat similar approach, which uses state space
representation and Kalman filter likelihood to model flicker and random walk type
noise in several stations at the same time.
So, in contrast to the spline smoothing example in Section 3.2.1, which had
2
σlevel 2
= 0 and σtrend 2
> 0, we will extract a non-local linear trend, σtrend = 0, and
2
model the local non-stationary fluctuations as a local level model with σlevel > 0.
In addition, we use a yearly seasonal component for the daily observations and
an autoregressive AR(1) noise component to account for the possible residual cor-
relation. The observation error is assumed Gaussian and to have known standard
deviation, σobs = 1mm for components "East" and "North" and σobs = 4mm for the
"Up" component. The AR(1) innovation variance σAR as well as the AR coefficient
ρAR will be estimated from the data. We use Kalman filter likelihood to estimate the
2 variance parameters and the AR(1) coefficient by MCMC. We analyse the three
components (East, North, Up) separately.
The true trend coefficients used in the simulation for the three data sets were
give as 12.59, 17.64, and 2.778 mm/yr. The estimates obtained for them were
12.62 ± 0.61, 17.76 ± 0.69 and 2.22 ± 1.00 mm/yr, with one-sigma posterior stan-
dard deviations after ±. Table 3.2 shows the parameter estimates obtained by combi-
nation of Kalman simulation smoother for the linear slope and seasonal amplitude,
and MCMC for θ = [σlevel , σAR , ρAR ]T . Figures 3.5 and 3.6 visualise the results
graphically. There is a hint of negative autocorrelation in the ACF plot for the East
components in Figure 3.5, but otherwise the residuals, obtained from the scaled
prediction residuals, equation Eq. (3.24), look very Gaussian. In overall, the se-
lected DLM model seems to provide statistically consistent fit and reproduce the
true trends within the estimated uncertainty.
Table 3.2 Parameter estimates from DLM/MCMC estimation for the synthetic GNSS time se-
ries example. The uncertainty value is one-sigma posterior standard deviation. The true values for
trends were 12.59, 17.64, and 2.778 mm/yr. The true seasonal amplitude was 1 mm.
data trend [mm/yr] seasonal [mm] σlevel σAR ρAR
East 12.62 ± 0.61 0.93 ± 0.15 0.20 ± 0.024 0.85 ± 0.024 0.62 ± 0.03
North 17.76 ± 0.69 1.19 ± 0.16 0.22 ± 0.02 0.86 ± 0.024 0.64 ± 0.29
Up 2.22 ± 1.00 0.74 ± 0.29 0.34 ± 0.07 2.00 ± 0.075 0.87 ± 0.016
The examples and code to fit DLM models described here are available from a
Github repository at https://github.com/mjlaine/dlm. The code is writ-
16 Marko Laine
Station 3 Station 3 - zoomed
150 60
observation
DLM model
North
50
100
40 East
50 30
20
mm
mm
0
10
Up
-50 0
-10
-100 East
North -20
Up
-150 -30
2000 2002 2004 2006 2008 2010 2012 2014 2008 2010
Estimated autocorrelation function of the DLM residuals
1
Station 3, component East, level and trend removed
5
acf
0.5
3
2
0
1 0 5 10 15 20 25 30 35
lag
mm
Empirical Quantiles
-1
2
-2
0
-3
observation -2
-4 model
95% unc -4
-4 -3 -2 -1 0 1 2 3 4
-5
2014 2000 2002 2004 Theoretical Quantiles
Fig. 3.5 GNSS example data set and the DLM fit. Top left: three data components. Top right:
zoomed component with DLM fit. Bottom left: The "East" component with the modelled level
and trend removed, showing the seasonal variation and the model residual over it. Bottom right:
residual diagnostics of the DLM fit for "East".
level AR
level
0.9
AR
0.85
0.8
0.1 0.15 0.2 0.25 0.3 0.75 0.8 0.85 0.9 0.95
AR
AR
0.7
0.65
AR
0.6
0.55
0.5
0.15 0.2 0.25 0.3 0.8 0.85 0.9 0.4 0.5 0.6 0.7 0.8
Fig. 3.6 GNSS example data set and the DLM fit. On left are the pairwise scatter plots of the
MCMC samples for the model parameters for the "East" observations. Right panel shows the esti-
mated marginal posterior densities. The dashed line is the corresponding prior density used.
4 Introduction to Dynamic Linear Models for Time Series Analysis 17
3.12 Conclusions
References