151 A S1053811911002643 Main PDF
151 A S1053811911002643 Main PDF
151 A S1053811911002643 Main PDF
NeuroImage
j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g
a r t i c l e i n f o a b s t r a c t
Article history: This paper presents a new approach to inverting (fitting) models of coupled dynamical systems based on
Received 23 December 2010 state-of-the-art (cubature) Kalman filtering. Crucially, this inversion furnishes posterior estimates of both the
Revised 23 February 2011 hidden states and parameters of a system, including any unknown exogenous input. Because the underlying
Accepted 2 March 2011
generative model is formulated in continuous time (with a discrete observation process) it can be applied to a
Available online 9 March 2011
wide variety of models specified with either ordinary or stochastic differential equations. These are an
Keywords:
important class of models that are particularly appropriate for biological time-series, where the underlying
Neuronal system is specified in terms of kinetics or dynamics (i.e., dynamic causal models). We provide comparative
fMRI evaluations with generalized Bayesian filtering (dynamic expectation maximization) and demonstrate
Blind deconvolution marked improvements in accuracy and computational efficiency. We compare the schemes using a series of
Cubature Kalman filter difficult (nonlinear) toy examples and conclude with a special focus on hemodynamic models of evoked brain
Smoother responses in fMRI. Our scheme promises to provide a significant advance in characterizing the functional
Stochastic architectures of distributed neuronal systems, even in the absence of known exogenous (experimental) input;
Hemodynamic modeling
e.g., resting state fMRI studies and spontaneous fluctuations in electrophysiological studies. Importantly,
Dynamic expectation maximization
unlike current Bayesian filters (e.g. DEM), our scheme provides estimates of time-varying parameters, which
Nonlinear
we will exploit in future work on the adaptation and enabling of connections in the brain.
© 2011 Elsevier Inc. All rights reserved.
1053-8119/$ – see front matter © 2011 Elsevier Inc. All rights reserved.
doi:10.1016/j.neuroimage.2011.03.005
2110 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
have been described at macroscopic level by systems of differential (2008) addressed inference on model parameters, using a Metropo-
equations. The hemodynamic model (Friston et al., 2000) links lis–Hastings algorithm for sampling their posterior distribution.
neuronal activity to flow and subsumes the Balloon–Windkessel None of the methods mentioned above, except (Riera et al., 2004)
model (Buxton et al., 1998; Mandeville et al., 1999), linking flow to with its restricted parameterization of the input, can perform a
observed fMRI signals. The hemodynamic model includes model of complete deconvolution of fMRI signals and estimate both hidden
neurovascular coupling (i.e., how changes in neuronal activity cause a states and input; i.e. the neuronal activation, without knowing the
flow-inducing signal) and hemodynamic processes (i.e. changes in input (stimulation function). Here, an important exception is the
cerebral blood flow (CBF), cerebral blood volume (CBV), and total de- methodology introduced by Friston et al. (2008) called dynamic
oxyhemoglobin (dHb)). In this paper, we will focus on a hemodynamic expectation maximization (DEM) and its generalizations: variational
model of a single region in fMRI, where experimental studies suggest filtering (Friston, 2008a) and generalized filtering (Friston et al.,
that the neuronal activity that drives hemodynamic responses 2010). DEM represents a variational Bayesian technique (Hinton and
corresponds more afferent synaptic activity (as opposed to efferent van Camp, 1993; MacKay, 1995), that is applied to models formulated
spiking activity (Lauritzen, 2001; Logothetis, 2002)). In the future in terms of generalized coordinates of motion. This scheme allows one
work, we will use exactly the same scheme to model distributed to estimate not only the states and parameters but also the input and
neuronal activity as observed in multiple regions. hyperparameters of the system generating those states. Friston et al.
The hemodynamic model is nonlinear in nature (Berns et al., 1999; (2008) demonstrated the robustness of DEM compared to standard
Mechelli et al., 2001). Therefore, to infer the hidden states and Bayesian filtering methods, particularly the extended Kalman filter
parameters of the underlying system, we require methods that can and particle filter, on a selection of difficult nonlinear/linear dynamic
handle these nonlinearities. In Friston et al. (2000), the parameters of a systems. They concluded that standard methods are unable to
hemodynamic model were estimated using a Volterra kernel expan- perform joint estimation of the system input and states, while
sion to characterize the hemodynamic response. Later, Friston (2002) inferring the model parameters.
introduced a Bayesian estimation framework to invert (i.e., fit) the In this paper, we propose an estimation scheme that is based on
hemodynamic model explicitly. This approach accommodated prior nonlinear Kalman filtering, using the recently introduced cubature
constraints on parameters and avoided the need for Volterra kernels. Kalman filter (CKF) (Arasaratnam and Haykin, 2009), which is
Subsequently, the approach was generalized to cover networks of recognized as the closest known approximation to Bayesian filtering.
coupled regions and to include parameters controlling the neuronal Our procedure applies a forward pass using the CKF that is finessed by
coupling (effective connectivity) among brain regions (Friston et al., a backward pass of the cubature Rauch–Tung–Striebel smoother.
2003). The Bayesian inversion of these models is known as dynamic Moreover, we utilize the efficient square-root formulation of these
causal modeling (DCM) and is now used widely to analyses effective algorithms. Crucially, we augment the hidden states with both
connectivity in fMRI and electrophysiological studies. However, parameters and inputs, enabling us to identify hidden states, model
current approaches to hemodynamic and causal models only account parameters and estimate the system input. We will show that we can
for noise at the level of the measurement; where this noise includes obtain accurate estimates of hidden hemodynamic and neuronal
thermally generated random noise and physiological fluctuations. This states, well beyond the temporal resolution of fMRI.
is important because physiological noise represents stochastic The paper is structured as follows: First, we review the general
fluctuations due to metabolic and vascular responses, which affect concept of nonlinear continuous–discrete state-space models for
the hidden states of the system; furthermore, neuronal activity can simultaneous estimation of the system hidden states, its input and
show pronounced endogenous fluctuations (Biswal et al., 1995; parameters. We then introduce the forward–backward cubature
Krüger and Glover, 2001). Motivated by this observation, Riera et al. Kalman estimation procedure in its stable square-root form, as a
(2004) proposed a technique based on a fully stochastic model (i.e. suitable method for solving this complex inversion problem. Second,
including physiological noise) that used the local linearization filter we provide a comprehensive evaluation of our proposed scheme and
(LLF) (Jimenez and Ozaki, 2003), which can be considered a form of compare it with DEM. For this purpose, we use the same nonlinear/
extended Kalman filtering (EKF) (Haykin, 2001) for continuous linear dynamic systems that were used to compare DEM with the EKF
dynamic systems. Besides estimating hemodynamic states and and particle filter algorithms (Friston et al., 2008). Third, we devote
parameters, this approach allows one to estimate the system's input, special attention to the deconvolution problem, given observed
i.e. neuronal activity; by its parameterization via radial basis functions hemodynamic responses; i.e. to the estimation of neuronal activity
(RBFs). In Riera et al. (2004), the number of RBFs was considered fixed and parameter identification of a hemodynamic model. Again, we
a priori, which means that the solution has to lie inside a regularly provide comparative evaluations with DEM and discuss the advan-
distributed but sparse space (otherwise, the problem is under- tages and limitations of each approach, when applied to fMRI data.
determined). Recently, the LLF technique was applied by Sotero et al.
(2009) to identify the states and parameters of a metabolic/ Nonlinear continuous–discrete state-space models
hemodynamic model.
The hemodynamic response and hidden states of hemodynamic Nonlinear filtering problems are typically described by state-space
models possess strong nonlinear characteristics, which are prescient models comprising a process and measurement equation. In many
with respect to stimulus duration (Birn et al., 2001; Miller et al., 2001). practical problems, the process equation is derived from the underlying
This makes one wonder whether a linearization approach such as LLF physics of a continuous dynamic system, and is expressed in the form of
can handle such strong nonlinearities. Johnston et al. (2008) proposed a set of differential equations. Since the measurements y are acquired by
particle filtering, a sequential Monte Carlo method, that accommo- digital devices; i.e. they are available at discrete time points (t = 1,2,…,
dates true nonlinearities in the model. The approach of Johnston et al. T), we have a model with a continuous process equation and a discrete
was shown to be both accurate and robust, when used to estimate measurement equation. The stochastic representation of this state-
hidden physiologic and hemodynamic states; and was superior to LLF, space model, with additive noise, can be formulated as:
though a suboptimal numerical procedure was used in evaluating LLF.
Similarly, two-pass particle filtering, including a smoothing (back- dxt = hðxt ; θt ; ut ; t Þdt + lðxt ; t Þdβt ;
ð1Þ
wards pass) procedure, was introduced by Murray and Storkey yt = gðxt ; θt ; ut ; t Þ + rt ;
(2008). Another attempt to infer model parameters and hidden states
used the unscented Kalman filter (UKF), which is more suitable for where θt represents unknown parameters of the equation of motion
highly nonlinear problems (Hu et al., 2009). Finally, Jacobsen et al. h and the measurement function g, respectively; ut is the exogenous
M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128 2111
input (the cause) that drives hidden states or the response; rt is a noise (and the prior distribution of the system states), defines a
vector of random Gaussian measurement noise, rt eN ð0; Rt Þ; l(xt, t) can probabilistic generative model of how system evolves over time and
be a function of the state and time; and βt denotes a Wiener process or of how we (partially or inaccurately) observe this hidden state (Van
state noise that is assumed to be independent of states and der Merwe, 2004).
measurement noise. Unfortunately, the optimal Bayesian recursion is usually tractable
The continuous time formulation of the stochastic differential only for linear, Gaussian systems, in which case the closed-form
equations (SDE) in Eq. (1) can also be expressed using Riemann and recursive solution is given by the classical Kalman filter (Kalman,
Ito integrals (Kloeden and Platen, 1999): 1960) that yields the optimal solution in the minimum-mean-square-
error (MMSE) sense, the maximum likelihood (ML) sense, and the
t+Δt t +Δt
xt+Δt = xt + ∫t hðxt ; θt ; ut ; t Þdt + ∫t lðxt ;t Þdβt ; ð2Þ maximum a posteriori (MAP) sense. For more general real-world
(nonlinear, non-Gaussian) systems the optimal Bayesian recursion is
where the second integral is stochastic. This equation can be further intractable and an approximate solution must be used.
converted into a discrete-time analog using numerical integration Numerous approximation solutions to the recursive Bayesian
such as Euler–Maruyama method or the local linearization (LL) estimation problem have been proposed over the last couple of
scheme (Biscay et al., 1996; Ozaki, 1992). This leads to the standard decades, in a variety of fields. These methods can be loosely grouped
form of a first order autoregressive process (AR(1)) of nonlinear state- into the following four main categories:
space models:
• Gaussian approximate methods: These methods model the pertinent
xt = f ðxt−1 ; θt ; ut Þ + qt densities by Gaussian distributions, under assumption that a
ð3Þ
yt = gðxt ; θt ; ut Þ + rt ; consistent minimum variance estimator (of the posterior state
density) can be realized through the recursive propagation and
where qt is a zero-mean Gaussian state noise vector; qt eN ð0; Qt Þ. Our updating of the first and second order moments of the true
preference is to use LL-scheme, which has been demonstrated to densities. Nonlinear filters that fall under this category are (in
improve the order of convergence and stability properties of chronological order): a) the extended Kalman filter (EKF), which
conventional numerical integrators (Jimenez et al., 1999). In this linearizes both the nonlinear process and measurement dynamics
case, the function f is evaluated through: with a first-order Taylor expansion about current state estimate; b)
h i the local linearization filter (LLF) is similar to EKF, but the
−1
f ðxt−1 ; θt ; ut Þ≈xt−1 + fxt exp fxt Δt −I hðxt−1 ; θt ; ut Þ; ð4Þ approximate discrete time model is obtained from piecewise linear
discretization of nonlinear state equation; c) the unscented Kalman
where fxt is a Jacobian of h and Δt is the time interval between filter (UKF) (Julier et al., 2002) chooses deterministic sample
samples (up to the sampling interval). The LL method allows (sigma) points that capture the mean and covariance of a Gaussian
integration of a SDE near discretely and regularly distributed time density. When propagated through the nonlinear function, these
instants, assuming local piecewise linearity. This permits the points capture the true mean and covariance up to a second-order of
conversion of a SDE system into a state-space equation with Gaussian the nonlinear function; d) the divided difference filter (DDF)
noise. A stable reconstruction of the trajectories of the state-space (Norgaard et al., 2000) uses Stirling's interpolation formula. As
variables is obtained by a one step prediction. Note that expression in with the UKF, DDF uses a deterministic sampling approach to
Eq. (4) is not always the most practical; it assumes the Jacobian has propagate Gaussian statistics through the nonlinear function; e) the
full rank. See (Jimenez, 2002) for alternative forms. Gaussian sum filters (GSF) approximates both the predicted and
posterior densities as sum of Gaussian densities, where the mean
Probabilistic inference and covariance of each Gaussian density is calculated using separate
and parallel instances of EKF or UKF; f) the quadrature Kalman filter
The problem of estimating the hidden states (causing data), (QKF) (Ito and Xiong, 2002) uses the Gauss–Hermite numerical
parameters (causing the dynamics of hidden states) and any non- integration rule to calculate the recursive Bayesian estimation
controlled exogenous input to the system, in a situation when only integrals, under a Gaussian assumption; g) the cubature Kalman
observations are given, requires probabilistic inference. In Markovian filter (CKF) is similar to UKF, but uses the spherical-radial
setting, the optimal solution to this problem is given by the recursive integration rule.
Bayesian estimation algorithm which recursively updates the poste- • Direct numerical integration methods: these methods, also known as
rior density of the system state as new observations arrive. This grid-based filters (GBF) or point-mass method, approximate the
posterior density constitutes the complete solution to the probabilis- optimal Bayesian recursion integrals with large but finite sums over
tic inference problem, and allows us to calculate an “optimal” estimate a uniform N-dimensional grid that covers the complete state-space
of the state. In particular, the hidden state xt, with initial probability in the area of interest. For even moderately high dimensional state-
p(x0), evolves over time as an indirect or partially observed first- spaces, the computational complexity can become untenably large,
order Markov process, according to the conditional probability which precludes any practical use of these filters (Bucy and Senne,
density p(xt|xt − 1). The observations yt are conditionally indepen- 1971).
dent, given the state, and are generated according to the conditional • Sequential Monte-Carlo (SMC) methods: these methods (called
posterior probability density p(yt|xt). In this sense, the discrete-time particle filters) use a set of randomly chosen samples with
variant of state-space model presented in Eq. (3) can also be written associated weights to approximate the density (Doucet et al.,
in terms of transition densities and a Gaussian likelihood: 2001). Since the basic sampling dynamics (importance sampling)
degenerates over time, the SMC method includes a re-sampling
pðxt jxt−1 Þ = N ðxt jf ðxt−1 ; ut ; θt Þ; Q Þ step. As the number of samples (particles) becomes larger, the
ð5Þ
pðyt jxt Þ = N ðyt jgðxt ; θt ; ut Þ; RÞ: Monte Carlo characterization of the posterior density becomes more
accurate. However, the large number of samples often makes the
The state transition density p(xt|xt − 1) is fully specified by f and the use of SMC methods computationally prohibitive.
state noise distribution p(qt), whereas g and the measurement noise • Variational Bayesian methods: variational Bayesian methods approx-
distribution p(rt) fully specify the observation likelihood p(yt|xt). The imate the true posterior distribution with a tractable approximate
dynamic state-space model, together with the known statistics of the form. A lower bound on the marginal likelihood (evidence) of the
2112 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
posterior is then maximized with respect to the free parameters of small). The input or cause of motion on hidden states ut can also be
this approximation (Jaakkola, 2000). treated in this way, with input noise vt eN ð0; Vt Þ. This is possible
because of the so-called natural condition of control (Arasaratnam
The selection of suitable sub-optimal approximate solutions to the
and Haykin, 2009), which says that the input ut can be generated
recursive Bayesian estimation problem represents a trade-off between
using the state prediction x̂t j t−1 .
global optimality on one hand and computational tractability (and
A special case of system identification arises when the input to the
robustness) on the other hand. In our case, the best criterion for sub-
nonlinear mapping function Dð:Þ, i.e. our hidden states xt, cannot be
optimality is formulated as: “Do as best as you can, and not more”. Under
observed. This then requires both state estimation and parameter
this criterion, the natural choice is to apply the cubature Kalman filter
estimation. For this dual estimation problem, we consider a discrete-
(Arasaratnam and Haykin, 2009). The CKF is the closest known direct
time nonlinear dynamic system, where the system state xt, the
approximation to the Bayesian filter, which outperforms all other
parameters θt and the input ut, must be estimated simultaneously
nonlinear filters in any Gaussian setting, including particle filters
from the observed noisy signal yt. A general theoretical and
(Arasaratnam and Haykin, 2009; Fernandez-Prades and Vila-Valls,
algorithmic framework for dual Kalman filter based estimation was
2010; Li et al., 2009). The CKF is numerically accurate, can capture
presented by Nelson (2000) and Van der Merwe (2004). This
true nonlinearity even in highly nonlinear systems, and it is easily
framework encompasses two main approaches, namely joint estima-
extendable to high dimensional problems (the number of sample points
tion and dual estimation. In the dual filtering approach, two Kalman
grows linearly with the dimension of the state vector).
filters are run simultaneously (in an iterative fashion) for state and
parameter estimation. At every time step, the current estimate of the
Cubature Kalman filter
parameters θt is used in the state filter as a given (known) input and
likewise, the current estimate of the state x̂t is used in the parameter
The cubature Kalman filter is a recursive, nonlinear and derivative
filter. This results in a step-wise optimization within the joint state-
free filtering algorithm, developed under Kalman filtering framework.
parameter space. On the other hand, in the joint filtering approach,
It computes the first two moments (i.e. mean and covariance) of all
the unknown system state and parameters are concatenated into a
conditional densities using a highly efficient numerical integration
single higher-dimensional joint state vector, x̃t = ½xt ; ut ; θt T . It was
method (cubature rules). Specifically, it utilizes the third-degree
shown in (Van der Merwe, 2004) that parameter estimation based on
spherical-radial rule to approximate the integrals of the form
nonlinear Kalman filtering represents an efficient online 2nd order
(nonlinear function × Gaussian density) numerically using a set of m
optimization method that can be also interpreted as a recursive
equally weighted symmetric cubature points {ξi, ωi}m i = 1:
Newton–Gauss optimization method. They also showed that nonlin-
m ear filters like UKF and CKF are robust in obtaining globally optimal
∫RN f ðxÞN ðx; 0; IN Þdx≈∑i = 1 ωi f ðξi Þ ð6Þ
estimates, whereas EKF is very likely to get stuck in a non-optimal
rffiffiffiffiffi local minimum.
m 1
ξ= ½I ; −IN ; ωi = ; i = 1; 2; …; m = 2N ; ð7Þ There is a prevalent opinion that the performance of joint
2 N m estimation scheme is superior to dual estimation scheme (Ji and
Brown, 2009; Nelson, 2000; Van der Merwe, 2004). Therefore, the
where ξi is the i-th column of the cubature points matrix ξ with
joint CKF is used below to estimate states, input, and parameters. Note
weights ωi and N is dimension of the state vector.
that since the parameters are estimated online with the states, the
In order to evaluate the dynamic state-space model described by
convergence of parameter estimates depends also on the length of the
Eq. (3), the CKF includes two steps: a) a time update, after which the
time series.
predicted density pðxt jy1:t−1 Þ = N x̂t j t−1 ; Pt j t−1 is computed; and b)
The state-space model for joint estimation scheme is then
a measurement update, after which the posterior density
formulated as:
pðxt jy1:t Þ = N x̂t j t ; Pt j t is computed. For a detailed derivation of
the CKF algorithm, the reader is referred to (Arasaratnam and Haykin, 2 3 2 3 2 3
xt f ðxt−1 ; θt−1 ; ut−1 Þ qt−1
2009). We should note that even though CKF represents a derivative- x̃t = 4 ut 5 = 4 ut−1 5 + 4 v 5
free nonlinear filter, our formulation of the continuous–discrete
t−1 : ð9Þ
θt θt−1 wt−1
dynamic system requires first order partial derivatives implicit in the yt = gðx̃t Þ + rt−1
Jacobian, which is necessary for implementation of LL scheme.
Although, one could use simple Euler's methods to approximate the Since the joint filter concatenates the state and parameter variables
numerical solution of the system (Sitz et al., 2002), local linearization into a single state vector, it effectively models the cross-covariances
generally provides more accurate solutions (Valdes Sosa et al., 2009). between the state, input and parameters estimates:
Note that since the Jacobian is only needed to discretise continuous 2 3
state variables in the LL approach (but for each cubature point), the Pxt Pxt ut Pxt θt
main CKF algorithm remains discrete and derivative-free. Pt = 4 Put xt Put Put θt 5: ð10Þ
Pθt xt Pθt ut Pθt
Parameters and input estimation
This full covariance structure allows the joint estimation framework
not only to deal with uncertainty about parameter and state estimates
Parameter estimation sometimes referred to as system identifica-
(through the cubature-point approach), but also to model the
tion can be regarded as a special case of general state estimation in
interaction (conditional dependences) between the states and
which the parameters are absorbed into the state vector. Parameter
parameters, which generally provides better estimates.
estimation involves determining the nonlinear mapping:
Finally, the accuracy of the CKF can be further improved by
yt = Dðxt ; θt Þ; ð8Þ augmenting the state vector with all the noise components (Li et al.,
2009; Wu et al., 2005), so that the effects of process noise,
where the nonlinear map Dð:Þ is, in our case, the dynamic model f(.) measurement noise and parameter noise are explicitly available to
parameterized by the vector θt. The parameters θt correspond to a the scheme model. By augmenting the state vector with the noise
stationary process with an identity state-transition matrix, driven by variables (Eqs. (11) and (12)), we account for uncertainty in the noise
an “artificial” process noise wt eN ð0; Wt Þ (the choice of variance Wt variables in the same manner as we do for the states during the
determines convergence and tracking performance and is generally propagation of cubature-points. This allows for the effect of the noise
M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128 2113
on the system dynamics and observations to be treated with the same the “chol” operator represents a Cholesky factorization for efficient
level of accuracy as state variables (Van der Merwe, 2004). It also matrix square-rooting and “diag” forms block diagonal matrix.
means that we can model noise that is not purely additive. Because
this augmentation increases the number of cubature points (by the Time update step
number of noise components), it may also capture high order moment We evaluate the cubature points (i = 1, 2, …, m = 2N):
information (like skew and kurtosis). However, if the problem does
a a a
not require more than first two moments, augmented CKF furnishes X i;t−1 j t−1 = St−1 j t−1 ξi + x̃t−1 j t−1 ; ð14Þ
the same results as non-augmented CKF.
where the set of sigma points ξ is pre-calculated at the beginning of
Square-root cubature Kalman filter algorithm (Eq. (7)). Next, we propagate the cubature points through
the nonlinear dynamic system of process equations and add noise
In practice, Kalman filters are known to be susceptible to components:
numerical errors due to limited word-length arithmetic. Numerical
errors can lead to propagation of an asymmetric, non-positive-
x;u;θ aðxÞ aðuÞ aðθÞ aðq;v;wÞ
X i;t j t−1 = F X i;t−1 j t−1 ; X i;t−1 j t−1 ; X i;t−1 j t−1 + X i;t−1 j t−1 ; ð15Þ
definite covariance, causing the filter to diverge (Kaminski et al.,
1971). As a robust solution to this, a square-root Kalman filter is
recommended. This avoids the matrix square-rooting operations where F comprises [f(xt − 1, θt − 1, ut − 1), ut − 1, θt − 1]T as expressed in
P = SST that are necessary in the regular CKF algorithm by propagating process Eq. (9). The superscripts distinguish among the components
the square-root covariance matrix S directly. This has important of cubature points, which correspond to the states x, input u,
benefits: preservation of symmetry and positive (semi)definiteness of parameters θ and their corresponding noise variables (q, v, w) that
the covariance matrix, improved numerical accuracy, double order are all included in the augmented matrix X a . Note that the size of new
x;u;θ
precision, and reduced computational load. Therefore, we will matrix X i;t j t−1 is only (nx + nu + nθ) × m.
consider the square-root version of CKF (SCKF), where the square- We then compute the predicted mean x̂t j t−1 and estimate the
root factors of the predictive posterior covariance matrix are square-root factor of predicted error covariance St|t − 1 by using
propagated (Arasaratnam and Haykin, 2009). weighted and centered (by subtracting the prior mean x̂t j t−1 ) matrix
Bellow, we summarize the steps of SCKF algorithm. First, we Xt|t − 1:
describe the forward pass of a joint SCKF for the simultaneous
estimation of states, parameters, and of the input, where we consider 1 m x;u;θ
x̂t j t−1 = ∑ X : ð16Þ
the state-space model in Eq. (9). Second, we describe the backward m i = 1 i;t j t−1
pass of the Rauch–Tung–Striebel (RTS) smoother. This can be derived
easily for SCKF due to its similarity with the RTS smoother for square-
St j t−1 = qr Xt j t−1 ; ð17Þ
root UKF (Simandl and Dunik, 2006). Finally, we will use the
abbreviation SCKS to refer to the combination of SCKF and our RTS
square-root cubature Kalman smoother. In other words, SCKF refers to 1 h x;u;θ x;u;θ x;u;θ
i
Xt j t−1 = pffiffiffiffiffi X 1;t j t−1 −x̂t j t−1 ; X 2;t j t−1 −x̂t j t−1 ; …; X m;t j t−1 −x̂t j t−1 :
the forward pass, which is supplemented with a backward pass in m
SCKS. ð18Þ
Forward filtering pass The expression S = qr(X) denotes triangularization, in the sense of
the QR decomposition,1 where resulting S is a lower triangular matrix.
Filter initialization
During initialization step of the filter we build the augmented form Measurement update step
of state variable: During the measurement update step we propagate the cubature
a h iT points through the measurement equation and estimate the predicted
a T T
x̂0 = E x0 = x̃0 ; 0; 0; 0; 0 = ½x0 ; u0 ; θ0 ; 0; 0; 0; 0 : ð11Þ measurement:
x u θ aðr Þ
The effective dimension of this augmented state is N = nx + nu + Y i;t j t−1 = g X i;t j t−1 ; X i;t j t−1 ; X i;t j t−1 + X i;t−1 j t−1 ; ð19Þ
nθ + nq + nv + nw + nr, where nx is the original state dimension, nu is
dimension of the input, nθ is dimension of the parameter vector, {nq,
1 m
nv, nw} are dimensions of the noise components (equal to nx, nu, nθ, ŷt j t−1 = ∑ Y : ð20Þ
m i = 1 i;t j t−1
respectively), and nr is the observation noise dimension (equal to the
number of observed variables). In a similar manner, the augmented
state square-root covariance matrix is assembled from the individual Subsequently, the square-root of the innovation covariance matrix
(square-roots) covariance matrices of x, u, θ, q, v, w, and r: Syy, t|t − 1 is estimated by using weighted and centered matrix Yt|t − 1:
h i
a a a a a T
S0 = chol E x0 −x̂0 x0 −x̂0 = diag S0 ; Sq ; Sv ; Sw0 ; Sr ; ð12Þ Syy;t j t−1 = qr Yt j t−1 ; ð21Þ
2 pffiffiffiffiffi 3
Px p0ffiffiffiffiffi 0 pffiffiffiffi pffiffiffiffi pffiffiffiffiffiffi pffiffiffi
1 h i
6 7 Yt j t−1 = pffiffiffiffiffi Y 1;t j t−1 −ŷt j t−1 ; Y 2;t j t−1 −ŷt j t−1 ; …; Y m;t j t−1 −ŷt j t−1 :
S0 = 4 0 S = Q ; Sv = V ; Sw = W ; Sr = R;
pffiffiffiffiffi 5; q
Pu 0 m
0 0 Pθ ð22Þ
ð13Þ
where Px, Pu, Pθ are process covariance matrices for states, input and
parameters. Q, V, W are their corresponding process noise covariances, 1
The QR decomposition is a factorization of matrix XT into an orthogonal matrix Q
respectively and R is the observation noise covariance. The square- and upper triangular matrix R such that XT = QR, and XXT = RTQTQR = RTR = SST, where
root representations of these matrices are calculated (Eq. (13)), where the resulting square-root (lower triangular) matrix is S = RT.
2114 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
This is followed by estimation of the cross-covariance Pxy, t|t − 1 estimates from the SCKF forward pass, x̂t j T , St|T, and square-roots
matrix and Kalman gain Kt: covariance matrices of the noise components:
T
h iT
a T
Pxy;t j t−1 = Xt j t−1 Yt j t−1 ; ð23Þ x̂t j t = x̂t j T ; 0; 0; 0; 0 ; ð28Þ
T
Kt = Pxy;t j t−1 = Syy;t j t−1 = Syy;t j t−1 ð24Þ a
St j t = diag St j T ; Sq;T ; Sv ; Sw;T ; Sr : ð29Þ
The symbol / represents the matrix right divide operator; i.e. the We then evaluate and propagate cubature points through
operation A/B, applies the back substitution algorithm for an upper nonlinear dynamic system (SDEs are integrated in forward fashion):
triangular matrix B and the forward substitution algorithm for lower
a a a
triangular matrix A. X i;t j t = St j t ξi + x̂t j t ; ð30Þ
Finally, we estimate the updated state x̂t j t and the square-root
factor of the corresponding error covariance: x;u;θ aðxÞ aðuÞ aðθÞ aðq;v;wÞ
X i;t +1 j t = F X i;t j t ; X i;t j t ; X i;t j t + X i;t j t : ð31Þ
x̂t j t = x̂t j t−1 + Kt yt −ŷt j t−1 ; ð25Þ We compute the predicted mean and corresponding square-root
h i error covariance matrix:
St j t = qr Xt j t−1 −Kt Yt j t−1 ð26Þ
1 m x;u;θ
x̂t +1 j t = ∑ X ; ð32Þ
m i = 1 i;t +1 j t
The difference yt −ŷt j t−1 in Eq. (25) is called the innovation or the
residual. It basically reflects the difference between the actual St +1 j t = qr Xt +1 j t ; ð33Þ
measurement and predicted measurement (prediction error). Fur-
ther, this innovation is weighted by Kalman gain, which minimizes 1 h x;u;θ x;u;θ x;u;θ
i
the posterior error covariance. Xt = 1 j t = pffiffiffiffiffi X 1;t +1 j t −x̂t +1 j t ; X 2;t +1 j t −x̂t +1 j t ; …; X m;t +1 j T −x̂t +1 j t :
m
In order to improve convergence rates and tracking performance, ð34Þ
during parameter estimation, a Robbins–Monro stochastic approxi-
mation scheme for estimating the innovations (Ljung and Söderström, Next, we compute the predicted cross-covariance matrix, where
1983; Robbins and Monro, 1951) is employed. In our case, this the weighted and centered matrix X′t|t is obtained by using the
involves approximation of square-root matrix of parameter noise partition (x, u, θ) of augmented cubature point matrix X ai;t j t and the
covariance Swt by: a
estimated mean x̂t j t before it propagates through nonlinear dynamic
system (i.e. the estimate from forward pass):
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T
Px′ x;t +1 j t = X t′ j t Xt +1 j t ;
T T
Swt = ð1−λw ÞS2wt−1 + λw K̃ t yt −ŷt j t−1 yt −ŷt j t−1 K̃ t ; ð27Þ ð35Þ
h
Backward smoothing pass s
St j T = qr X t′ j t −At Xt +1 j t ; At St +1 j T ;
s
ð39Þ
are used to estimate prediction errors et = yt −ŷt , which allows us of SDEs. Crucially, DEM (and its generalizations) does not use a
to calculate the log-likelihood of the model given the data as: recursive Bayesian scheme but tries to optimize the posterior
moments of hidden states (and inputs) through an generalized
T (“instantaneous”) gradient ascent on (free-energy bound on) the
logpðy1:T j θÞ = − logð2πÞ
2 " # marginal likelihood. This generalized ascent rests on using the
T T et eTt
T generalized motion (time derivatives to high order) of variables as
− ∑ log Syy;t j t−1 Syy;t j t−1 + :
2 t =1 Syy;t j t−1 STyy;t j t−1 part of the model generating or predicting discrete data. This means
ð40Þ that DEM is a formally simpler (although numerically more
demanding) than recursive schemes and only requires a single pass
2) Evaluate the backward pass of the SCKS to obtain smoothed though the time-series to estimate the states.
s s s
estimates of the states x̂t j T , the input ût j T , and the parameters θ̂t j T . DEM comprises additional E (expectation) and M (maximization)
Again, this operation involves discretization of the process steps that optimize the conditional density on parameters and
equations by the LL-scheme for all cubature points. hyperparameters (precisions) after the D (deconvolution) step.
3) Iterate until the stopping condition is met. We evaluate log- Iteration of these steps proceeds until convergence. For an exhaustive
likelihood (Eq. (40)) at each iteration and terminate the description of DEM, see (Friston et al., 2008). A key difference
optimization when the increase of the (negative) log-likelihood between DEM (variational and generalized filtering) and SCKS is that
is less than a tolerance value of e.g. 10− 3. the states and parameters are optimized with respect to (a free-
energy bound on) the log-evidence or marginal likelihood, having
Before we turn to the simulations, we provide with a brief integrated out dependency on the parameters. In contrast, SCKS
description of DEM, which is used for comparative evaluations. optimizes the parameters with respect to the log-likelihood in
Eq. (40), to provide maximum likelihood estimates of the parameters,
Dynamic expectation maximization as opposed to maximum a posteriori (MAP) estimators. This reflects
the fact that DEM uses shrinkage priors on the parameters and
DEM is based on variational Bayes, which is a generic approach to hyperparameters, whereas SCKS does not. SCKS places priors on the
model inversion (Friston et al., 2008). Briefly, it approximates the parameter noise that encodes our prior belief that they do not change
conditional density p(ϑ|y, m) on some model parameters, ϑ = {x, (substantially) over time. This is effectively a constraint on the
u, θ, η}, given a model m, and data y, and it also provides lower-bound volatility of the parameters (not their values per se), which allows the
on the evidence p(y|m) of the model itself. In addition, DEM assumes a parameters to ‘drift’ slowly to their maximum likelihood value. This
continuous dynamic system formulated in generalized coordinates of difference becomes important when evaluating one scheme in
motion, where some parameters change with time, i.e. hidden states x relation to the other, because we would expect some shrinkage in
and input u, and rest of the parameters are time-invariant. The state- the DEM estimates to the prior mean, which we would not expect in
space model has the form: the SCKS estimates (see next section).
DEM rests on a mean-field assumption used in variational Bayes;
ỹ = g̃ ðx; u; θÞ + r̃
ð41Þ in other words, it assumes that the states, parameters and hyperpara-
Dx̃ = f˜ðx; u; θÞ + q̃;
meters are conditionally independent. This assumption can be relaxed
where by absorbing the parameters and hyperparameters into the states as
in SCKS. The resulting scheme is called generalized filtering (Friston
2 3 2 3 et al., 2010). Although generalized filtering is formally more similar to
g = gðx; u; θÞ f = f ðx; u; θÞ
0
6 g ′ = gx x′ + gu u′ 7 6 f = f x′ + f u′ 7 SCKS than DEM (and is generally more accurate), we have chosen to
g̃ = 6
4 g ″ = g x″ + g u″ 5;
7 f˜ = 6 7
4 f ″ = f x″ + f u″ 5 :
x u ð42Þ use DEM in our comparative evaluations because DEM has been
x u x u
⋮ ⋮ validated against EKF and particle filtering (whereas generalized
filtering has not). Furthermore, generalized filtering uses prior
Here, g̃ and f˜ are the predicted response and motion of the hidden constraints on both the parameters and how fast they can change.
states, respectively. D is derivative operator whose first leading In contrast, SCKS and DEM only use one set of constraints on the
diagonal contains identity matrices, and which links successive change and value of the parameters, respectively. However, we hope
temporal derivatives (x′, x″, … ; u′, u″, …). These temporal derivatives to perform this comparative evaluation in a subsequent paper; where
are directly related to the embedding orders2 that one can specify we will consider Bayesian formulations of cubature smoothing in
separately for input (d) and for states (n) a priori. We will use greater detail and relate its constraints on changes in parameters to
embedding orders d = 3 and n = 6. the priors used in generalized filtering.
DEM is formulated for the inversion of hierarchical dynamic causal Finally, for simplicity, we assume that the schemes have access to
models with (empirical) Gaussian prior densities on the unknown all the noise (precision) hyperparameters, meaning that they are not
parameters of generative model m. These parameters are {θ, η}, where estimated. In fact, for SCKS we assume only the precision of
θ represents set of model parameters and η = {α, β, σ} are hyperpara- measurement noise to be known and update the assumed values of
meters, which specify the amplitude of random fluctuations in the the hyperparameters for fluctuations in hidden states and input
generative process. These hyperparameters correspond to (log) during the inversion (see Eq. (27)). We can do this because we have
precisions (inverse variances) on the state noise (α), the input noise an explicit representation of the errors on the hidden states and
(β), and the measurement noise (σ), respectively. In contrast to input.
standard Bayesian filters, DEM also allows for temporal correlations
among innovations, which is parameterized by additional hyperpara- Inversion of dynamic models by SCKF and SCKS
meter γ called temporal precision.
DEM comprises three steps that optimize states, parameters and In this section, we establish the validity and accuracy of the SCKF
hyperparameters receptively: The first is the D-step, which evaluates and SCKS scheme in relation to DEM. For this purpose, we analyze
Eq. (41), for the posterior mean, using the LL-scheme for integration several nonlinear and linear continuous stochastic systems that were
previously used for validating of DEM, where its better performance
2
The term “embedding order” is used in analogy with lags in autoregressive was demonstrated in relation to the EKF and particle filtering. In
modeling. particular, we consider the well known Lorenz attractor, a model of a
2116 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
double well potential, a linear convolution model and, finally, we and SCKS converged in 6 iteration steps, providing very accurate
devote special attention to the inversion of a hemodynamic model. estimates of both states and parameters (Fig. 1B). This was not the
Even though some of these models might seem irrelevant for case for DEM, which did not converge, exceeding the maximum
hemodynamic and neuronal modeling, they are popular for testing allowed number of iteration, 50.
the effectiveness of inversion schemes and also (maybe surprisingly) The reason for DEM's failure is that the updates to the parameters
exhibit behaviors that can be seen in models used in neuroimaging. are not properly regularized in relation to their highly nonlinear
To assess the performance of the various schemes, we perform impact on the trajectories of hidden states. In other words, DEM
Monte Carlo simulations, separately for each of these models; where makes poor updates, which are insensitive to the highly nonlinear
the performance metric for the statistical efficiency of the estimators form of this model. Critically, SCKF and SCKS outperformed DEM
was the squared error loss function (SEL). For example, we define the because it uses an online parameter update scheme and were able to
SEL for states as: accommodate nonlinearities much more gracefully, through its
cubature-point sampling. Heuristically, cubature filtering (smooth-
T
2 ing) can be thought of as accommodating nonlinearities by relaxing
SELðxÞ = ∑ ðxt −x̂t Þ : ð43Þ
t=1 the strong assumptions about the form of the likelihood functions
used in optimizing estimates. DEM assume this form is Gaussian and
Similarly, we evaluate SEL for the input and parameters (when therefore estimates its local curvature with second derivatives. A
appropriate). Since the SEL is sensitive to outliers; i.e. when summing Gaussian form will be exact for linear models but not non-linear
2
over a set of ðxt −x̂t Þ , the final sum tends to be biased by a few large models. Conversely, cubature filtering samples this function over
values. We consider this a convenient property when comparing the greater distances in state or parameter space and relies less on linear
accuracy of our cubature schemes and DEM. Furthermore, this approximations.
measure of accuracy accommodates the different constraints on the
parameters in DEM (shrinkage priors on the parameters) and SCKS
MC simulations
(shrinkage priors on changes in the parameters). We report the SEL
To verify this result, we conducted a series of 100 Monte Carlo
values in natural logarithmic space; i.e. log(SEL).
simulations under three different estimation scenarios. In the 1st
Note that all data based on the above models were simulated through
scenario, we considered unknown initial conditions of hidden states
the generation function in the DEM toolbox (spm_DEM_generate.m) that
but known model parameters. The initial conditions were sampled
is available as part of SPM8 (http://www.fil.ion.ucl.ac.uk/spm/).
randomly from uniform distribution x0 eU ð0; 20Þ, and the true values
were the same as in all previous cases. In the 2nd scenario, the initial
Lorenz attractor
states were known but the model parameters unknown, being
sampled from the normal distribution around the true values
The model of the Lorenz attractor exhibits deterministic chaos,
θ0 eN ðθtrue ; 10Þ. Finally, the 3rd scenario was combination of the
where the path of the hidden states diverges exponentially on a
first two; with both initial conditions and parameters unknown. In
butterfly-shaped strange attractor in a three dimensional state-space.
this case, the states were always initialized with x0 = [2, 8, 22]T and
There are no inputs in this system; the dynamics are autonomous,
parameters sampled from the normal distribution. Results, in terms of
being generated by nonlinear interactions among the states and their
average log(SEL), comparing the performance of SCKS and DEM are
motion. The path begins by spiraling onto one wing and then jumps to
shown in Fig. 4.
the other and back in chaotic way. We consider the output to be the
simple sum of all three states at any time point, with innovations of
unit precision σ = 1 and γ = 8. We further specified a small amount of Double-well
the state noise (α = e16). We generated 120 time samples using this
model, with initial state conditions x0 = [0.9, 0.8, 30]T, parameters θ = The double-well model represents a dissipative system with
[18, − 4, 46.92]T and an LL-integration step Δt = 1. bimodal variability. What makes this system particularly difficult to
This sort of chaotic system shows sensitivity to initial conditions; invert for many schemes is the quadratic form of the observation
which, in the case of unknown initial conditions, is a challenge for function, which renders inference on the hidden states and their
any inversion scheme. Therefore, we first compare SCKF, SCKS and causes ambiguous. The hidden state is deployed symmetrically
DEM when the initial conditions x0 differ from the true starting about zero in a double-well potential, which makes the inversion
values, with known model parameters. This simulation was problem even more difficult. Transitions from one well to other can
repeated five times with random initializations and different be then caused either by input or high amplitude fluctuations. 1 We
innovations. Since we do not estimate any parameters, only a single drove this system with slow sinusoidal input uðt Þ = 8⋅ sin 16 πt
iteration of the optimization process is required. We summarized and generated 120 time points response with noise precision σ = e2,
the resulting estimates in terms of the first two hidden states and a small amount of state noise α = e16, and with a reasonable level of
plotted their trajectories against each other in their corresponding input noise β = 1/8. The temporal precision was γ = 2 and LL-
state-space (Fig. 1A). It can be seen that all three inversion schemes integration step again Δt = 1, with initial condition x0 = 1, and
converge quickly to the true trajectories. DEM provides the least mildly informative (initial) prior on the input precision
accurate estimate (but still exhibits high performance when pðuÞ = N ð0; 1Þ. We tried to invert this model using only observed
compared to EKF and particle filters (Friston, 2008a; Friston et al., responses by applying SCKF, SCKS and DEM. Fig. 2 shows that DEM
2008)). The SCKF was able to track the true trajectories more failed to estimate the true trajectory of the hidden state, in the sense
closely. This accuracy is even more improved by SCKS, where the that the state is always positive. This had an adverse effect on the
initial residuals are significantly smaller, hence providing the fastest estimated input and is largely because of the ambiguity induced by
convergence. the observation function. Critically, the accuracy of the input
Next, we turned to testing the inversion schemes when both initial estimate will be always lower than that of the state, because the
conditions and model parameters are unknown. We used initial state input is expressed in measurement space vicariously through the
conditions x0 = [2, 8, 22]T and parameters θ0 = [10, − 8, 43]T, where hidden states. Nevertheless, SCKF and SCKS were able to identify
their true values were the same as above. We further assumed an this model correctly, furnishing accurate estimates for both the state
initial prior precision on parameter noise pðθÞ = N ð0; 0:1Þ, and and the input, even though this model represents a non-Gaussian
allowed the algorithm to iterate until the convergence. The SCKF (bimodal) problem (Fig. 2).
M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128 2117
Fig. 1. (A) The Lorenz attractor simulations were repeated five times, using different starting conditions (dots) and different random innovations. The hidden states of this model
were estimated using DEM, SCKF and SCKS. Here, we summarize the resulting trajectories in terms of the first two hidden states, plotted against each other in their corresponding
state-space. The true trajectories are shown on the upper left. (B) The inversion of Lorenz system by SCKF, SCKS and DEM. The true trajectories are shown as dashed lines, DEM
estimates with dotted lines, and SCKF and SCKS estimates with solid lines including the 90% posterior confidence intervals (shaded areas). (C) Given the close similarity between the
responses predicted by DEM and SCKS, we show only the result for SCKS. (D) The parameters estimates are summarized in lower left in terms of their expectation and 90% confidence
intervals (red lines). Here we can see that DEM is unable to estimate the model parameters.
MC simulations compensate for the divergence between the true (bimodal) and
To evaluate the stability of SCKS estimates in this context, we assumed unimodal posterior.
repeated the simulations 100 times, using different innovations. It can
be seen from the results in Fig. 4 that the SCKS estimates of the state and Convolution model
input are about twice as close to the true trajectories than the DEM
estimates. Nevertheless, the SCKS was only able to track the true The linear convolution model represents another example that
trajectories of the state and input completely (as shown in Fig. 3.) in was used in (Friston, 2008a; Friston et al., 2008) to compare DEM, EKF,
about 70% of all simulations. In remaining 30% SCKS provided results particle filtering and variational filtering. In this model (see Table 1),
where some half-periods of hidden state trajectories had the wrong the input perturbs hidden states, which decay exponentially to
sign; i.e. flipped around zero. At the present time, we have no real insight produce an output that is a linear mixture of hidden states.
into why DEM fails consistently to cross from positive to negative Specifically, we used the
input specified
by Gaussian bump function
conditional estimates, while the SCKS scheme appears to be able to do of the form uðt Þ = exp 14 ðt−12Þ2 , two hidden states and four output
this. One might presume this is a reflection of cubature filtering's ability responses. This is a single input-multiple output system with the
to handle the nonlinearities manifest at zero crossings. The reason this is following parameters:
a difficult problem is that the true posterior density over the hidden
state is bimodal (with peaks at positive and negative values of hidden
2 3
state). However, the inversion schemes assume the posterior is a 0:125 0:1633
6 0:125 0:0676 7
unimodal Gaussian density, which is clearly inappropriate. DEM was not
θ1 = 6 7; θ2 = −0:25 1:00 ; θ3 = 1 :
able to recover the true trajectory of the input for any simulation, which 4 0:125 −0:0676 5 −0:50 −0:25 0
suggests that the cubature-point sampling in SCKS was able to partly 0:125 −0:1633
2118 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
Fig. 2. Inversion of the double-well model, comparing estimates of the hidden state and input from SCKF, SCKS and DEM. This figure uses the same format as Figs. 1B and C. Again, the
true trajectories are depicted with dashed lines and the shaded area represents 90% posterior confidence intervals. Given the close similarity between the responses predicted by
DEM and SCKS, we show only the result for SCKS.
We generated data over 32 time points, using innovations sampled return to this issue later). Having said this, the SCKS identified the
from Gaussian densities with precision σ = e8, a small amount of state unknown parameters more accurately than DEM, resulting in better
noise α = e12 and minimal input noise β = e16. The LL-integration step estimates of hidden states.
was Δt = 1 and temporal precision γ = 4. During model inversion, the
input and four model parameters are unknown and are subject to MC simulations
mildly informative
prior precisions, pðuÞ = N ð0; 0:1Þ, and For Monte Carlo simulation we looked at two different scenarios.
pðθÞ = N 0; 10−4 , respectively. Before initializing the inversion First, we inverted the model when treating only the input as
process, we set parameters θ1(1, 1); θ1(2, 1); θ2(1, 2); and θ2(2, 2) to unknown, and repeated the simulations 100 times with different
zero. Fig. 3, shows that applying only a forward pass with SCKF does innovations. In the second scenario, which was also repeated 100
not recover the first hidden state and especially the input correctly. times with different innovations, both input and the four model
The situation is improved with the smoothed estimates from SCKS, parameters were treated as unknown. The values of these parameters
when both hidden states match the true trajectories. Nevertheless, the were sampled from the normal distribution θ0 = N ð0; 1Þ. Fig. 4,
input estimate is still slightly delayed in relation to the true input. We shows that DEM provides slightly more accurate estimates of the
have observed this delay repeatedly, when inverting this particular input than SCKS. This is mainly because of the delay issue above.
convolution model with SCKS. The input estimate provided by DEM is, However, SCKS again furnishes more accurate estimates, with a higher
in this case, correct, although there are more perturbations around the precision on inverted states and markedly higher accuracy on the
baseline compared to the input estimated by SCKS. The reason that identified model parameters.
DEM was able to track the input more accurately is that is has access to
generalized motion. Effectively this means it sees the future data in a Hemodynamic model
way that recursive update schemes (like SCKF) do not. This becomes
important when dealing with systems based on high-order differen- The hemodynamic model represents a nonlinear “convolution”
tial equations, where changes in a hidden state or input are expressed model that was described extensively in (Buxton et al., 1998; Friston
in terms of high-order temporal derivatives in data space (we will et al., 2000). The basic kinetics can be summarized as follows: Neural
M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128 2119
Fig. 3. Results of inverting the linear convolution model using SCKF, SCKS and DEM; summarizing estimates of hidden states, input, four model parameters and the response. This
figure uses the same format as Figs. 1B–D.
activity u causes an increase in vasodilatory signal h1 that is subject to Eðh2 Þ = φ1 1−ð1−φÞ1 = h2 is a function of flow, where φ is a resting
auto-regulatory feedback. Blood flow h2 responds in proportion to this oxygen extraction fraction. The description of model parameters,
signal and causes changes in blood volume h3 and deoxyhemoglobin including the prior noise precisions is provided in Table 3.
content, h4. These dynamics are modeled by a set of differential In order to ensure positive values of the hemodynamic states and
equations and the observed response is expressed as a nonlinear improve numerical stability of the parameter estimation, the hidden
function of blood volume and deoxyhemoglobin content (see Table 1). states are transformed xi = log(hi) ⇔ hi = exp(xi). However, before
In this model, the outflow is related to the blood volume F(h3) = h1/α 3 evaluating the observation equation, the log-hemodynamic states are
through Grubb's exponent α. The relative oxygen extraction exponentiated. The reader is referred to (Friston et al., 2008; Stephan
et al., 2008) for a more detailed explanation.
Although there are many practical ways to use the hemodynamic
Table 1 model with fMRI data, we will focus here on its simplest instance; a
State and observation equations for dynamic systems. single-input, single-output variant. We will try to estimate the hidden
f(x, u, θ) g(x, θ)
states and input though model inversion, and simultaneously identify
2 3 model parameters from the observed response. For this purpose, we
Lorenz θ1 x2 −θ1 x1 x 1 + x2 + x 3
attractor 4 θ3 x1 −2x1 x3 −x2 5 1 generated data over 60 time points using the hemodynamic model,
32
2x1 x2 + θ2 x3 with an input in the form of a Gaussian bump functions with different
amplitudes centered at positions (10; 15; 39; and 48), and model
Double-well 2x
1 + x2
− 16
x
+ u
4
1 2
16 x parameters as reported in Table 2. The sampling interval or repeat
time (TR) was equal to TR = 1 s. We added innovations to the output
Convolution θ2x + θ3u θ1x
model 2 3
with a precision σ = e6. This corresponds to a noise variance of about
u−κðh1 −1Þ−χðh2 −1Þ = h1 0.0025, i.e. in range of observation noise previously estimated in real
6 ðh1 −1Þ = h2 7
Hemodynamic 6 7 V0[k1(1 − x4) + k2(1 − x4/ fMRI data (Johnston et al., 2008; Riera et al., 2004), with a temporal
4 τðh2 −F ðh3 ÞÞ = h3 5
model τðh2 Eðh2 Þ−F ðh3 Þh4 = h3 Þ = h4 x3) + k3(1 − x3)] precision γ = 1. The precision of state noise was α = e8 and precision
of the input noise β = e8. At the beginning of the model inversion, the
2120 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
Fig. 4. The Monte Carlo evaluation of estimation accuracy using an average log(SEL) measure for all models under different scenarios. The SEL measure is sensitive to outliers, which
enables convenient comparison between different algorithms tested on the same system. However, it cannot be used to compare performance among different systems. A smaller log
(SEL) value reflects a more accurate estimate. For quantitative intuition, a value of log(SEL) = –2 is equivalent to mean square error (MSE) of about 2·10− 3 and a log(SEL) = 7 is a
MSE of about 7·101.
true initial states were x0 = [0, 0, 0, 0]T. Three of the six model the decaying part of the Gaussian input function, compared to the
parameters, specifically θ = {κ, χ, τ}, were initialized randomly, sam- true trajectory. This occurred even though the hidden states were
pling from thenormal distribution
centered on the mean of the true tracked correctly. The situation is very different for Δt = 0.2: Here
values θi = N θtrue
i ; 1 = 12 . The remaining parameters were based on the results obtained by SCKS are very precise for both the states and
their true values. The reasons for omitting other parameters from input. This means that a finer integration step had beneficial effects
random initializations will be discussed later in the context of on both SCKF and SCKS estimators. In contrast, the DEM results did
parameter identifiability. The prior precision of parameter noise are not improve. Here, including more integration steps between
given in Table 3, where we allowed a small noise variance (10− 8) in observation samples decreased the estimation accuracy for the
the parameters that we considered to be known {α, φ, }; i.e. these input and the states. This means that DEM, which models high order
parameters can only experience very small changes during estima- motion, does not require the small integration steps necessary for
tion. The parameter priors for DEM were as reported in (Friston et al., SCKF and SCKS. Another interesting point can be made regarding
2010) with the exception of {α, φ}, which we fixed to their true values. parameter estimation. As we mentioned above, SCKS estimated the
For model inversion we considered two scenarios that differed in hidden states in both scenarios accurately, which might lead to the
the size of the integration step. First, we applied an LL-integration step conclusion that the model parameters were also indentified
of Δt = 0.5; in the second scenario, we decreased ffi step to Δt = 0.2.
pffiffiffiffiffithe correctly. However, although some parameters were indeed iden-
Note that all noise precisions are scaled by Δt before estimation tified optimally (otherwise we would not obtain correct states) they
begins. The same integration steps were also used for DEM, where we were not equal to the true values. This is due to the fact that the
additionally increased the embedding orders (n = d = 8) to avoid effects of some parameters (on the output) are redundant, which
numerical instabilities. The results are depicted in Figs. 5 and 6. It is means different sets of parameter values can provide veridical
noticeable that in both scenarios neither the hidden states nor input estimates of the states. For example, the effects of increasing the
can be estimated correctly by SCKF. For Δt = 0.5, SCKS estimates the first parameter can be compensated by decreasing the second, to
input less accurately than DEM, with inaccuracies in amplitude and in produce exactly the same output. This feature of the hemodynamic
Table 2
Parameters of the generative model for the simulated dynamic systems.
Table 3 MC simulations
Hemodynamic model parameters. We examined three different scenarios for the hemodynamic
Biophysical parameters of the state equations model inversion. The simulations were inverted using an integration
step Δt = 0.2 for SCKF and SCKS and Δt = 0.5 for DEM. First, we focus
Description Value Prior on noise variance
on performance when the input is unknown, we have access to the
κ Rate of signal decay 0.65 s −1
pðθκ Þ = N 0; 10−4
true (fixed) parameters and the initial states are unknown. These
χ Rate of flow-dependent elimination 0.38 s− 1 p θχ = N 0; 10−4 were sampled randomly from the uniform distribution x0 eU ð0; 0:5Þ. In
τ Hemodynamic transit time 0.98 s pðθτ Þ = N 0; 10−4 the second scenario, the input is again unknown, and instead of
unknown initial conditions we treated three model parameters θ =
α Grubb's exponent 0.34 pðθα Þ = N 0; 10−8
{κ, χ, τ} as unknown. Finally in the last scenario, all three variables (i.e.
φ Resting oxygen extraction fraction 0.32 p θφ = N 0; 10−8 the initial conditions, input, and three parameters) are unknown. All
Neuronal efficiency 0.54 pðθ Þ = N 0; 10−8 three simulations were repeated 100 times with different initializa-
tions of x0, θ0, innovations, and state and input noise. From the MC
Fixed biophysical parameters of the observation equation
simulation results, the following interesting behaviors were observed.
Description Value
Since the DEM estimates are calculated only in a forward manner, if
V0 Blood volume fraction 0.04 the initial states are incorrect, it takes a finite amount of time before
k1 Intravascular coefficient 7φ they converge to their true trajectories. This error persists over
k2 Concentration coefficient 2
k3 Extravascular coefficient 2φ − 0.2
subsequent iterations of the scheme (E-steps) because they are
initialized with the same incorrect state. This problem is finessed with
SCKS: Although the error will be present in the SCKF estimates of the
model has been discussed before in (Deneux and Faugeras, 2006) first iteration, it is efficiently corrected during the smoothing by SCKS,
and is closely related to identifiably issues and conditional which brings the initial conditions closer to their true values. This
dependence among parameters estimates. enables an effective minimization of the initial error over iterations.
Fig. 5. Results of the hemodynamic model inversion by SCKF, SCKS and DEM, with an integration step of Δt = 0.5 and the first three model parameters were identified. This figure
uses the same format as Figs. 1B–D.
2122 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
Fig. 6. Results of the hemodynamic model inversion by SCKF, SCKS and DEM, with an integration step of Δt = 0.2 and the first three model parameters were identified. This figure
uses the same format as Figs. 1B–D.
This feature is very apparent from MC results in terms of log(SEL) for χ = 1/τf is defined by the time constant of the auto-regulatory
all three scenarios. When the true initial state conditions are known mechanism τf. The effect of increasing parameter χ (decreasing the
(2nd scenario), the accuracy of the input estimate is the same for SCKS feedback time constant τf) is to increase the frequency of the response
and DEM, SCKS has only attained slightly better estimates of the and lower its amplitude, with small change of the undershoot (see
states, hence also better parameter estimates. However, in the case of also the effect on the first hemodynamic state h1). The parameter τ is
unknown initial conditions, SCKS is superior (see Fig. 4). the mean transit time at rest, which determines the dynamics of the
signal. Increasing this parameter slows down the hemodynamic
Effect of model parameters on hemodynamic response and their response, with respect to flow changes. It also slightly reduces
estimation response amplitude and more markedly suppresses the undershoot.
Although the biophysical properties of hemodynamic states and The next parameter is the stiffness or Grub's exponent α, which is
their parameters were described extensively in (Buxton et al., 1998; closely related to the flow–volume relationship. Increasing this
Friston et al., 2000), we will revisit the contribution of parameters to parameter increases the degree of nonlinearity of the hemodynamic
the final shape of hemodynamic response function (see Fig. 7A). In response, resulting in decreases of the amplitude and weaker
particular, our interest is in the parameters θ = {κ, χ, τ, α, φ, }, which suppression of undershoot. Another parameter of hemodynamic
play a role in the hemodynamic state equations. We evaluated model is resting oxygen extraction fraction φ. Increasing this
changes in hemodynamic responses over a wide range of parameters parameter can have quite profound effects on the shape of the
values (21 regularly spaced values for each parameter). In Fig. 7A, the hemodynamic response that bias it towards an early dip. This
red lines represent biologically plausible mean parameter values that parameter has an interesting effect on the shape of the response:
were estimated empirically in (Friston et al., 2000), and which are During the increase of φ, we first see an increase of the response peak
considered to be the true values here (Table 3). The arrows show amplitude together with deepening of undershoot, whereas after the
change in response when these parameters are increased. The first value passes φ = 0.51, the undershoot is suppressed. Response
parameter is κ = 1/τs, where τs is the time constant of signal decay. amplitude continues to grow until φ = 0.64 and falls rapidly after
Increasing this parameter dampens the hemodynamic response to that. Additionally, the early dip starts to appear with φ = 0.68 and
any input and suppresses its undershoot. The second parameter higher values. The last parameter is the neuronal efficacy , which
M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128 2123
Fig. 7. (A)The tops row depicts the effect of changing the hemodynamic model parameters on the response and on the first hidden state. For each parameter, the range of values
considered is reported, comprising 21 values. (B) The middle row shows the optimization surfaces (manifolds) of negative log-likelihood obtained via SCKS for combinations of the
first three hemodynamic model parameters {κ, χ, τ}. The trajectories of convergence (dashed lines) for four different parameter initializations (dots) are superimposed. The true
values (at the global optima) are depicted by the green crosshair and the dynamics of the parameters over the final iteration correspond to the thick red line. (C) The bottom row
shows the estimates of hidden states and input for the corresponding pairs of parameters obtained during the last iteration, where we also show the trajectory of the parameters
estimates over time.
simply modulates the hemodynamic response. Increasing this fixed to some physiologically plausible values. This is in accordance
parameter scales the amplitude of the response. with (Riera et al., 2004), where these parameters were also fixed.
In terms of system identification, it has been shown in (Deneux Grub's exponent is supposed to be stable during steady-state
and Faugeras, 2006) that very little accuracy is lost when values of stimulation (Mandeville et al., 1999); α = 0.38 ± 0.1 with almost
Grub's exponent and resting oxygen extraction fraction {α, φ} are negligible effects on the response within this range. The resting
2124 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
oxygen extraction fraction parameter is responsible for the early dip and input, provided the true parameters also change slowly.
that is rarely observed in fMRI data. Its other effects can be Moreover, we prefer to consider all parameters of hemodynamic
approximated by combining parameters {κ, τ}. In our case, where state equations as unknown and limit their variations with high prior
the input is unknown, the neuronal efficiency parameter is fixed as precisions. This allows us to treat all the unknown parameters
well. This is necessary, because a change in this parameter is uniformly; where certain (assumed) parameters can be fixed to their
degenerate with respect to the amplitude of neuronal input. prior mean using an infinitely high prior precision.
To pursue this issue of identifiably we examined the three
remaining parameters θ = {κ, χ, τ} in terms of the (negative) log-
likelihood for pairs of these three parameters; as estimated by the Beyond the limits of fMRI signal
SCKS scheme (Fig. 7B). The curvature (Hessian) of this log-likelihood One of the challenges in fMRI research is to increase a speed of
function is, in fact, the conditional precision (inverse covariance) used brain volume sampling; i.e. to obtain data with a higher temporal
in variational schemes like DEM and is formally related to the Fisher resolution. Higher temporal resolution allows one to characterize
Information matrix for the parameters in question. A slow curvature changes in the brain more accurately, which is important in many
(shallow) basin means that we are conditionally uncertain about the aspects of fMRI. In this section, we will show that estimating
precise value and that large changes in parameters will have relatively unobserved (hidden) hemodynamic states and, more importantly,
small effects on the observed response or output variables. The global the underlying neuronal drives solely from observed data by blind
optimum (true values) is marked by the green crosslet. To compute deconvolution can significantly improve the temporal resolution and
these log-likelihoods we ran SCKS for all combinations of parameters provide estimates of the underlying neuronal dynamics at a finer
within their selected ranges, assuming the same noise precisions as in temporal scale. This may have useful applications in the formation of
the hemodynamic simulations above (Table 3). Note that we did not things like psychophysiological interactions (Gitelman et al., 2003).
perform any parameter estimation, but only evaluated log-likelihood In the hemodynamic model inversions above we did not use very
for different parameter values, having optimized the states. Looking at realistic neuronal input, which was a Gaussian bump function and the
the ensuing (color-coded) optimization manifolds, particularly at the data were generated with a temporal resolution of 1 s. This was
white area bounded by the most inner contour, we can see how much sufficient for our comparative evaluations; however in real data, the
these parameters can vary around the global optimum and still changes in underlying neuronal activation are much faster (possibly
provide reasonably accurate predictions (of output, hidden states and in the order of milliseconds) and may comprise a rapid succession of
input). This range is especially wide for the mean transient time τ. One events. The hemodynamic changes induced by this neuronal
can see from the plot at the top of Fig. 7A that changing τ = 0.3 ; 2.0 activation manifest as a rather slow response, which peaks at about
over a wide range has little effect on the response. The region around 4–6 s.
the global maximum also discloses conditional dependencies and To make our simulations more realistic, we considered the
redundancy among the parameters. These dependencies make following generation process, which is very similar to the simulation
parameter estimation a generally more difficult task. and real data used previously in Riera et al. (2004). First, we generated
Nevertheless, we were curious if, at least under certain circum- our data with a time step of 50 ms using the sequence of neuronal
stances, the true parameter values could be estimated. Therefore, we events depicted at the top of Fig. 8. These Gaussian-shaped neuronal
allowed for faster dynamics on the parameters {κ, χ, τ} by using higher events (inputs) had a FWTM (full-width at tenth of maximum) of less
noise variances (4 ⋅ 10− 4, 2 ⋅ 10− 4, and 10− 2, respectively) and than 200 ms. Otherwise, the precisions on innovations, states noise,
evaluated all three possible parameter combinations using SCKS. In and input noise were identical to the hemodynamic simulations
other words, we optimized two parameters with the third fixed, over all above. Next we down-sampled the synthetic response with a realistic
combinations. These noise parameters were chosen after intensive TR = 1.2 s, obtaining data of 34 time points from the original 800. For
testing, to establish the values that gave the best estimates. We repeated estimation, we used the same priors on the input pðuÞ = N ð0; 0:1Þ
these inversions four times, with different initial parameter estimates and parameters as summarized in Table 3.
selected within the manifolds shown in Fig. 7A. In Fig. 7B, we can see Our main motivation was the question: How much of the true
how the parameters moved on the optimization surface, where the underlying neuronal signal can we recover from this simulated sparse
black dashed line depicts the trajectory of the parameter estimates over observation, when applying either SCKS or DEM? To answer this, two
successive iterations, starting from the initial conditions (black dot) and different scenarios were considered. The first used an integration step
terminating around the global optimum (maximum). The red thick line Δt = TR = 2 = 0:6 s which had provided quite favorable results
represents the dynamic behavior of parameters over time during the above. The top row of Fig. 8 shows the estimated input and states
last iteration. The last iteration estimate for all states, input and provided by SCKS and DEM. It can be seen that the states are traced
parameters is depicted in Fig. 7C. Here the dynamics of transit time (τ) is very nicely by both approaches. For the input estimates, SCKS captures
especially interesting; it drops with the arrival of the neuronal activation the true detailed neuronal structure deficiently, although the main
and is consequently restored during the resting period. This behavior is envelope is correct. For DEM, the input estimate is much closer to the
remarkably similar to that observed by Mandeville et al. (1999) in rat true structure of the neuronal signal, distinguishing all seven events.
brains, where mean transit time falls during activation. Clearly, we are However, one can not overlook sharp undershoots that appear after
not suggesting that the transit time actually decreased during activation the inputs. The reason for these artifacts rests on the use of
in our simulations (it was constant during the generation of data). generalized coordinates of motion, where the optimization of high
However, these results speak to the interesting application of SCKS to order temporal derivatives does not always produce the optimal low
identify time-dependent changes in parameters. This could be impor- order derivatives (as shown in the Fig. 8).
tant when applied to dynamic causal models of adaptation or learning In the second scenario, where we decreased the integration step to
studies that entail changes in effective connectivity between neuronal Δt = TR = 10 = 0:12s, we see that the SCKS estimate of the input has
populations. The key message here is that if one can (experimentally) improved markedly. For DEM the input estimate is actually slightly
separate the time scale of true changes in parameters from the (fast) worse than in the previous case. Recalling the results from previous
fluctuations inherent in recursive Bayesian filtering (or generalized simulations (Figs. 5 and 6) it appears that the optimal integration step
filtering), it might be possible to estimate (slow) changes in parameters for DEM is Δt = TR = 2, and decreasing this parameter does not
that are of great experimental interest. improve estimation (as it does for SCKS). Conversely, an excessive
In general, enforcing slow dynamics on the parameters (with a decrease of Δt can downgrade accuracy (without an appropriate
small noise variance) will ensure more accurate results for both states adjustment of the temporal precision).
M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128 2125
Fig. 8. Inversion of the hemodynamic model for more realistic neuronal inputs (top left) and fMRI observations sampled with a TR = 1.2 s (bottom left — dotted line). The input and
hidden states estimates obtained by SCKS and DEM are shown for an integration step Δt = TR/2 (top row) and Δt = TR/10 (middle row). The parameter estimates are shown on the
bottom right. The best estimate of the input that could be provided by the local linearization filter is depicted on the middle left panel by the solid green line.
Here we can also compare our results with the results obtained in (SCKS) for the inversion of nonlinear stochastic dynamic causal
(Riera et al., 2004), where the LL-innovation technique was used with models. We have illustrated its application by estimating neuronal
a constrained nonlinear optimization algorithm (Matlab's fmincon.m activity by (so called) blind deconvolution from fMRI data. Using
function) to estimate the neuronal activation. In our simulations the simulations of different stochastic dynamic systems, including
neuronal input was parameterized by a set of RBFs, regularly spaced validation via Monte Carlo simulations, we have demonstrated its
with an inter-distance interval equal to TR, where the amplitudes of estimation and identification capabilities. Additionally, we have
RBFs together with the first three hemodynamic model parameters, compared its performance with an established (DEM) scheme,
including noise variances, were subject to estimation. The resulting previously validated in relation to EKF and particle filtering (Friston
estimate is depicted by the solid green line at the bottom of Fig. 8. It is et al., 2008).
obvious that this only captures the outer envelope of the neuronal In particular, using a nonlinear model based on the Lorenz
activation. Although this approach represented the most advanced attractor, we have shown that SCKF and SCKS outperform DEM
technique at the time of its introduction (2004), its use is limited to when the initial conditions and model parameters are unknown. The
relatively short time-seriesseries that ensures the number of para- double-well model turned out (as anticipated) to be difficult to invert.
meters to be estimated is tractable. In this case, both SCKF and SCKS could invert both states and input
We conclude that inversion schemes like DEM and especially SCKS correctly, i.e. to track their true trajectories in about 70% of the
can efficiently reconstruct the dynamics of neuronal signals from fMRI simulations (unlike DEM). Both the Lorenz attractor and double-well
signal, affording a considerable improvement in effective temporal system are frequently used for testing the robustness of new
resolution. nonlinear filtering methods and provide a suitable forum to conclude
that SCKF and SCKS show a higher performance in nonlinear and non-
Discussion Gaussian setting than DEM. The third system we considered was a
linear convolution model, were the performance of both inversion
We have proposed a nonlinear Kalman filtering based on an schemes was comparable. In contrast to the previous models, the
efficient square-root cubature Kalman filter (SCKF) and RTS smoother SCKF alone was not sufficient for successful estimation of the states
2126 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
and input. Although DEM provided a better estimate of the input, the function. This degeneracy or redundancy is a ubiquitous aspect of
SCKS was more precise in tracking hidden states and inferring model inversion and is usually manifest as conditional dependency
unknown model parameters. among the parameter estimates. The problem of conditional depen-
We then turned to the hemodynamic model proposed by Buxton dencies is usually finessed by optimizing the model in terms of its
et al. (1998) and completed by Friston et al. (2000), which comprises evidence. Model evidence ensures that the conditional dependences
nonlinear state and observation equations. The complexity of this are suppressed by minimizing complexity (which removes redundant
model, inherent in a series of nonlinear differential equations (i.e. parameters). In our setting, we are estimating both states and
higher order ODEs) makes the inversion problem fairly difficult. If the parameters and have to contend with possible conditional depen-
input is unknown, it cannot be easily solved by a forward pass of the dences between the states and parameters. In principle, this can be
SCKF or any other standard nonlinear recursive filter. It was precisely resolved by comparing the evidence for different models and
this difficulty that motivated Friston et al. (2008) to develop DEM by optimizing the parameterization to provide the most parsimonious
formulating the deconvolution problem in generalized coordinates of model. We will pursue this in a subsequent paper, in which we
motion. The same problem motivated us to derive a square-root examine the behavior of model evidence, as estimated under cubature
formulation of the Rauch–Tung–Striebel smoother and solve the same smoothing. It should be noted, that this work uses models that have
problem with a recursive scheme. already been optimized over the past few years, so that they provide
Both DEM and SCKS (SCKF) use an efficient LL-scheme for the the right balance of accuracy and complexity, when trying to explain
numerical integration of non-autonomous multidimensional stochas- typical fMRI data. However, we may have to revisit this issue when
tic differential equations (Jimenez et al., 1999). Using simulations, we trying to estimate the hidden neuronal states as well as parameters.
have demonstrated that for a successful inversion of the hemody- There are further advantages of SCKS compared to DEM. Since
namic model, SCKS requires an integration step of at least Δt = TR = 2 DEM performs inference on states and input in a forward manner
for the accurate estimation of hidden states, and preferably a smaller only, it is sensitive to misspecification of initial conditions. Critically,
integration step for an accurate inference on the neuronal input. recent implementations of DEM (Friston et al., 2008) start each
Unlike SCKS, DEM provides the best estimates of the input when the iteration with the same initial values of the states and the input,
integration step is Δt = TR = 2. This is because it uses future and past resulting in significant error at the initial phase of deconvolution. This
observations to optimize a path or trajectory of hidden states, in is not the case for SCKS, which, by applying smoothing backward step,
contrast to recursive schemes that update in a discrete fashion. minimizes the initial error and converges to the true initial value over
Nevertheless, with smaller integration steps, SCKS affords more iterations. Next, DEM can produce sharp undershoots in the input
precise estimates of the underlying neuronal signal than DEM under estimate when the hidden states or their causes change too quickly.
any integration step. Additionally, in the case of more realistic The SCKS does not have this problem. However, the use of generalized
hemodynamic simulations we have shown that with the smaller motion enables DEM to be applied online. Additionally, this
integration step of about Δt = TR = 10 we were able to recover the framework also allows DEM to model temporal dependencies in the
true dynamics of neuronal activity that cannot be observed (or innovations or fluctuations of hidden states, which might be more
estimated) at the temporal resolution of the measured signal. This plausible for biological systems. In Kalman filtering, these fluctuations
takes us beyond the limits of the temporal resolution of hemody- are generally assumed to be Markovian. Having said this, it is possible
namics underlying the fMRI signal. to cast dynamical models in generalized coordinates of motion as
An interesting aspect of inversion schemes is their computational classical Markovian models, where the innovations are successively
cost. Efficient implementations of SCKS with the integration step of colored before entering the state equation (see Eq. (3) in (Friston,
Δt = TR = 10 (including parameter estimation) are about 1.3 times 2008b)).
faster than DEM (with an integration step of Δt = TR = 2 and a Based on our MC simulations, we conclude that in general SCKS
temporal embedding n = 6 and d = 3). If the integration step is the provided a more accurate inversion of nonlinear dynamic models,
same, then SCKS is about 5 times faster, which might have been including estimation of the states, input and parameters, than DEM.
anticipated, given that DEM is effectively dealing with six times the Since DEM has been shown to outperform EKF and particle filtering, it
number of (generalized) hidden states. makes the SCKS the most efficient blind nonlinear deconvolution
We have also examined the properties of parameter identification schemes for dynamic state-space models.
of hemodynamic model under the SCKS framework. Based on the Finally, all evaluations of the proposed approach, including the
previous experience (Deneux and Faugeras, 2006; Riera et al., 2004), comparison with DEM, were performed under the assumption that
we constrained the hemodynamic model by allowing three para- SCKS algorithm had access to the true precision parameter on the
meters to vary; i.e. rate of signal decay, rate of flow-dependent measurement noise and DEM had access to precisions on all noise
elimination, and mean transit time. The remaining parameters were components. However, for application to the real data we have to be
kept (nearly) constant, because they had only minor effects on the able to estimate these precision parameters as well. DEM is
hemodynamic response function. formulated as a hierarchical dynamic model, which allows for an
Our procedure for parameter identification uses a joint estimation elegant triple inference on hidden states, input, parameters and
scheme, where both hidden states and parameters are concatenated hyperparameters. In the case of SCKS we have introduced dynamic
into a single state vector and inferred simultaneously in dynamic approximation techniques for the efficient estimation of the param-
fashion. The SCKS is iterated until the parameters converge. Moreover, eter state noise covariance matrices. We also observed that the input
the convergence is enhanced by a stochastic Robbins–Monro noise variance can be considered time-invariant, with a reasonable
approximation of the parameter noise covariance matrix. This enabled value (for the hemodynamic model) of about V = 0.1. This value
very efficient parameter identification in all of the stochastic models seemed to be consistent over different levels of noise and different
we considered, including the hemodynamic model. However, specif- input. The last outstanding unknown quantity is the measurement
ically in the case of the hemodynamic model, we witnessed a noise covariance. We have found a robust solution (Särkkä and
particular phenomenon, which was also reported by Deneux and Nummenmaa, 2009) that combines the variational Bayesian method
Faugeras (2006). Put simply, the effects of some parameters on the with the nonlinear Kalman filtering algorithm for the joint estimation
hemodynamic response are degenerate, in that different combina- of states and time-varying measurement noise covariance in a
tions can still provide accurate predictions of observed responses. In nonlinear state-space model. We have implemented this approach
this context, we have shown in Fig. 7A that different sets of for our SCKS scheme with a minimal increase in computational cost.
parameters can produce a very similar hemodynamic response Although this variational Bayesian extension was not utilized in our
M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128 2127
proposal (for simplicity), it is now part of SCKS algorithm for future References
application to the real data.
There are several application domains we hope to explore within Aguirre, G.K., Zarahn, E., D'esposito, M., 1998. The variability of human, BOLD
hemodynamic responses. Neuroimage 8, 360–369.
our framework: Since SCKS can recover the underlying time course Arasaratnam, I., Haykin, S., 2008. Nonlinear Bayesian filters for training recurrent neural
of synaptic activation, we can model effective connectivity at networks. MICAI 2008: Advances in Artificial Intelligence, pp. 12–33.
synaptic (neuronal) level. Because no knowledge about the input Arasaratnam, I., Haykin, S., 2009. Cubature Kalman filters. IEEE Trans. Autom. Control
54, 1254–1269.
is necessary, one can use this scheme to invert the dynamic causal Attwell, D., Buchan, A.M., Charpak, S., Lauritzen, M., MacVicar, B.A., Newman, E.A., 2010.
models on the resting state data, or pursue connectivity analyses in Glial and neuronal control of brain blood flow. Nature 468, 232–243.
the brain regions that are dominated by endogenous activity Berns, G.S., Song, A.W., Mao, H., 1999. Continuous functional magnetic resonance
imaging reveals dynamic nonlinearities of “dose–response” curves for finger
fluctuations, irrespective of task-related responses. We will also opposition. J. Neurosci. 19, 1–6.
consider conventional approaches to causal inference that try to Birn, R.M., Saad, Z.S., Bandettini, P.A., 2001. Spatial heterogeneity of the nonlinear
identify the direction of the information flow between different dynamics in the FMRI BOLD response. Neuroimage 14, 817–826.
Biscay, R., Jimenez, J.C., Riera, J.J., Valdes, P.A., 1996. Local linearization method for the
brain regions (e.g. Granger causality, dynamic Bayesian networks,
numerical solution of stochastic differential equations. Ann. Inst. Stat. Math. 48,
etc.). In this context, one can compare the analysis of deconvolved 631–644.
hidden (neuronal) states with explicit model comparison within the Biswal, B., Yetkin, F.Z., Haughton, V.M., Hyde, J.S., 1995. Functional connectivity in the
DCM framework. Another challenge would be to exploit the motor cortex of resting human brain using echo-planar MRI. Magn. Reson. Med. 34,
537–541.
similarity among neighboring voxels in relation to their time Bucy, R.S., Senne, K.D., 1971. Digital synthesis of non-linear filters. Automatica 7,
courses. There are thousands of voxels in any volume of the 287–298.
human brain, and the judicious pooling of information from Buxton, R.B., Wong, E.C., Frank, L.R., 1998. Dynamics of blood flow and oxygenation
changes during brain activation: the balloon model. Magn. Reson. Med. 39,
multiple voxels may help to improve accuracy of our deconvolution 855–864.
schemes. Last but not least, we hope to test variants of the Buxton, R.B., Uludag, K., Dubowitz, D.J., Liu, T.T., 2004. Modeling the hemodynamic
hemodynamic model, starting with extension proposed by Buxton response to brain activation. Neuroimage 23, S220–S233.
David, O., in press. fMRI connectivity, meaning and empiricism Comments on:
et al. (2004), which accounts for non-steady-state relationships Roebroeck et al. The identification of interacting networks in the brain using
between CBF and CBV arising due to viscoelastic effects. This is fMRI: Model selection, causality and deconvolution. Neuroimage.
particularly interesting here, because we can, in principle, charac- David, O., Guillemain, I., Saillet, S., Reyt, S., Deransart, C., Segebarth, C., Depaulis, A.,
2008. Identifying neural drivers with functional MRI: an electrophysiological
terize these inconstant relationships in terms of time-varying validation. PLoS Biol. 6, 2683–2697.
parameter estimates afforded by our recursive schemes. Deneux, T., Faugeras, O., 2006. Using nonlinear models in fMRI data analysis: model
The Matlab code for our methods (including estimation of selection and activation detection. Neuroimage 32, 1669–1689.
Doucet, A., De Freitas, N., Gordon, N., 2001. Sequential Monte Carlo methods in practice.
measurement noise covariance), which is compatible with the
Springer Verlag.
subroutines and variable structures used by the DEM in SPM8, is Fernandez-Prades, C., Vila-Valls, J., 2010. Bayesian nonlinear filtering using quadrature
available from the authors upon request. and cubature rules applied to sensor data fusion for positioning. Proceeding of IEEE
International Conference on Communications, pp. 1–5.
Friston, K.J., 2002. Bayesian estimation of dynamical systems: an application to fMRI.
Neuroimage 16, 513–530.
Conclusion Friston, K., 2008a. Variational filtering. Neuroimage 41, 747–766.
Friston, K.J., 2008b. Hierarchical models in the brain. PLoS Comput. Biol. 4, e1000211.
Friston, K., in press. Dynamic casual modeling and Granger causality Comments on: the
In this paper, we have introduced a robust blind deconvolution identification of interacting networks in the brain using fMRI: Model selection,
technique based on the nonlinear square-root cubature Kalman filter causality and deconvolution. Neuroimage.
and Rauch–Tung–Striebel smoother, which allows an inference on Friston, K.J., Mechelli, A., Turner, R., Price, C.J., 2000. Nonlinear responses in fMRI: the
Balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12,
hidden states, input, and model parameters. This approach is very 466–477.
general and can be applied to the inversion of any nonlinear Friston, K.J., Harrison, L., Penny, W., 2003. Dynamic causal modelling. Neuroimage 19,
continuous dynamic model that is formulated with stochastic 1273–1302.
Friston, K.J., Trujillo-Barreto, N., Daunizeau, J., 2008. DEM: a variational treatment of
differential equations. This first description of the technique focused dynamic systems. Neuroimage 41, 849–885.
on the estimation of neuronal synaptic activation by generalized Friston, K.J., Stephan, K.E., Daunizeau, J., 2010. Generalised filtering. Mathematical
deconvolution from observed fMRI data. We were able to estimate the Problems in Engineering 2010. .
Gitelman, D.R., Penny, W.D., Ashburner, J., Friston, K.J., 2003. Modeling regional and
true underlying neuronal activity with a significantly improved
psychophysiologic interactions in fMRI: the importance of hemodynamic decon-
temporal resolution, compared to the observed fMRI signal. This volution. Neuroimage 19, 200–207.
speaks to new possibilities for fMRI signal analysis; especially in Handwerker, D., Ollinger, J., D'Esposito, M., 2004. Variation of BOLD hemodynamic
effective connectivity and dynamic causal modeling of unknown responses across subjects and brain regions and their effects on statistical analyses.
Neuroimage 21, 1639–1651.
neuronal fluctuations (e.g. resting state data). Haykin, S.S., 2001. Kalman Filtering and Neural Networks. Wiley Online Library.
We validated the inversion scheme using difficult nonlinear and Hinton, G.E., van Camp, D., 1993. Keeping the neural networks simple by minimizing
linear stochastic dynamic models and compared its performance with the description length of the weights. Proceedings of COLT-93. ACM, pp. 5–13.
Hu, Z., Zhao, X., Liu, H., Shi, P., 2009. Nonlinear analysis of the BOLD signal. EURASIP J.
dynamic expectation maximization; one of the few methods that is Adv. Signal Process. 2009, 1–13.
capable of this sort of model inversion. Our approach afforded the Iadecola, C., 2002. CC commentary: intrinsic signals and functional brain mapping:
same or better estimates of states, input, and model parameters, with caution, blood vessels at work. Cereb. Cortex 12, 223–224.
Ito, K., Xiong, K., 2002. Gaussian filters for nonlinear filtering problems. IEEE Trans.
reduced computational cost. Automatic Control 45, 910–927.
Jaakkola, T.S., 2000. Tutorial on variational approximation methods. Advanced mean
field methods: theory and practice, pp. 129–159.
Jacobsen, D., Hansen, L., Madsen, K., 2008. Bayesian model comparison in nonlinear
Acknowledgments
BOLD fMRI hemodynamics. Neural Comput. 20, 738–755.
Ji, Z., Brown, M., 2009. Joint state and parameter estimation for biochemical dynamic
This work was supported by the research frame no. pathways with iterative extended Kalman filter: comparison with dual state and
parameter estimation. Open Automation Control Syst. J. 2, 69–77.
MSM0021630513 and no. MSM0021622404 and also sponsored by
Jimenez, J.C., 2002. A simple algebraic expression to evaluate the local linearization
the research center DAR no. 1 M0572, all funded by the Ministry of schemes for stochastic differential equations* 1. Appl. Math. Lett. 15, 775–780.
Education of the Czech Republic. Additional funding was provided by Jimenez, J.C., Ozaki, T., 2003. Local linearization filters for non-linear continuous–
NIH grant no. R01EB000840 from the USA. KJF was funded by the discrete state space models with multiplicative noise. Int. J. Control 76, 1159–1170.
Jimenez, J.C., Shoji, I., Ozaki, T., 1999. Simulation of stochastic differential equations
Wellcome Trust. We would like to thank to Jorge Riera for providing through the local linearization method. A comparative study. J. Stat. Phys. 94,
his implementation of LL-innovation algorithm. 587–602.
2128 M. Havlicek et al. / NeuroImage 56 (2011) 2109–2128
Johnston, L.A., Duff, E., Mareels, I., Egan, G.F., 2008. Nonlinear estimation of the BOLD Nelson, A.T., 2000. Nonlinear estimation and modeling of noisy time-series by dual
signal. Neuroimage 40, 504–514. Kalman filtering methods. Ph.D thesis, Oregon Graduate Institute of Science and
Julier, S., Uhlmann, J., Durrant-Whyte, H.F., 2002. A new method for the nonlinear Technology.
transformation of means and covariances in filters and estimators. IEEE Trans. Norgaard, M., Poulsen, N.K., Ravn, O., 2000. New developments in state estimation for
Automatic Control 45, 477–482. nonlinear systems. Automatica 36, 1627–1638.
Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. J. Basic Ozaki, T., 1992. A bridge between nonlinear time series models and nonlinear stochastic
Eng. 82, 35–45. dynamical systems: a local linearization approach. Stat. Sin. 2, 113–135.
Kaminski, P., Bryson Jr., A., Schmidt, S., 1971. Discrete square root filtering: a survey of Riera, J., Watanabe, J., Kazuki, I., Naoki, M., Aubert, E., Ozaki, T., Kawashima, R., 2004. A
current techniques. IEEE Trans. Automatic Control 16, 727–736. state-space model of the hemodynamic approach: nonlinear filtering of BOLD
Kloeden, P.E., Platen, E., 1999. Numerical Solution of Stochastic Differential Equations, signals. Neuroimage 21, 547–567.
Stochastic Modeling and Applied Probability, 3 rd editon. Springer. Robbins, H., Monro, S., 1951. A stochastic approximation method. Ann. Math. Stat. 22,
Krüger, G., Glover, G.H., 2001. Physiological noise in oxygenation sensitive magnetic 400–407.
resonance imaging. Magn. Reson. Med. 46, 631–637. Roebroeck, A., Formisano, E., Goebel, R., in press-a. Reply to Friston and David: After
Lauritzen, M., 2001. Relationship of spikes, synaptic activity, and local changes of comments on: The identification of interacting networks in the brain using fMRI:
cerebral blood flow. J. Cereb. Blood Flow Metab. 21, 1367–1383. Model selection, causality and deconvolution. Neuroimage.
Li, P., Yu, J., Wan, M., Huang, J., 2009. The augmented form of cubature Kalman filter and Roebroeck, A., Formisano, E., Goebel, R., in press-b. The identification of interacting
quadrature Kalman filter for additive noise. IEEE Youth Conference on Information, networks in the brain using fMRI: model selection, causality and deconvolution.
Computing and Telecommunication, YC-ICT '09. IEEE, pp. 295–298. Neuroimage.
Ljung, L., Söderström, T., 1983. Theory and Practice of Recursive Identification. MIT Särkkä, S., Nummenmaa, A., 2009. Recursive noise adaptive Kalman filtering by
Press, Cambridge, MA. variational Bayesian approximations. IEEE Trans. Automatic Control 54,
Logothetis, N.K., 2002. The neural basis of the blood-oxygen-level-dependent 596–600.
functional magnetic resonance imaging signal. Philos. Trans. R. Soc. Lond. B Biol. Simandl, M., Dunik, J., 2006. Design of derivative-free smoothers and predictors. Proceeding
Sci. 357, 1003. of the 14th IFAC Symposium on System Identification, SYSID06, pp. 1240–1245.
MacKay, D.J.C., 1995. Developments in probabilistic modelling with neural networks- Sitz, A., Schwarz, U., Kurths, J., Voss, H.U., 2002. Estimation of parameters and unobserved
ensemble learning. Proceedings of 3 rd Annual Symposium on Neural Networks, components for nonlinear systems from noisy time series. Phys. Rev. E 66, 1–9.
Nijmegen, Netherlands, pp. 191–198. Sotero, R.C., Trujillo-Barreto, N.J., Jiménez, J.C., Carbonell, F., Rodríguez-Rojas, R.,
Magistretti, P., Pellerin, L., 1999. Cellular mechanisms of brain energy metabolism and 2009. Identification and comparison of stochastic metabolic/hemodynamic
their relevance to functional brain imaging. Philos. Trans. R. Soc. Lond. B Biol. Sci. models (sMHM) for the generation of the BOLD signal. J. Comput. Neurosci.
354, 1155–1163. 26, 251–269.
Mandeville, J.B., Marota, J.J.A., Ayata, C., Zaharchuk, G., Moskowitz, M.A., Rosen, B.R., Stephan, K.E., Kasper, L., Harrison, L.M., Daunizeau, J., den Ouden, H.E.M., Breakspear,
Weisskoff, R.M., 1999. Evidence of a cerebrovascular postarteriole windkessel with M., Friston, K.J., 2008. Nonlinear dynamic causal models for fMRI. Neuroimage 42,
delayed compliance. J. Cereb. Blood Flow Metab. 19, 679–689. 649–662.
Mechelli, A., Price, C., Friston, K., 2001. Nonlinear coupling between evoked rCBF and Valdes Sosa, P.A., Sanchez Bornot, J.M., Sotero, R.C., Iturria Medina, Y., Aleman Gomez,
BOLD signals: a simulation study of hemodynamic responses. Neuroimage 14, Y., Bosch Bayard, J., Carbonell, F., Ozaki, T., 2009. Model driven EEG/fMRI fusion of
862–872. brain oscillations. Hum. Brain Mapp. 30, 2701–2721.
Miller, K.L., Luh, W.M., Liu, T.T., Martinez, A., Obata, T., Wong, E.C., Frank, L.R., Buxton, Van der Merwe, R., 2004. Sigma-point Kalman filters for probabilistic inference in
R.B., 2001. Nonlinear temporal dynamics of the cerebral blood flow response. dynamic state-space models. Ph.D. thesis, Oregon Graduate Institute of Science and
Hum. Brain Mapp. 13, 1–12. Technology.
Murray, L., Storkey, A., 2008. Continuous time particle filtering for fMRI. Adv. Neural Inf. Wu, Y., Hu, D., Wu, M., Hu, X., 2005. Unscented Kalman filtering for additive noise case:
Process. Syst. 20. augmented vs. non-augmented. IEEE Signal Process Lett. 12, 357–359.