0% found this document useful (0 votes)
28 views17 pages

Ryali Multivariate Dynamical 10

Uploaded by

Cyrus Ray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views17 pages

Ryali Multivariate Dynamical 10

Uploaded by

Cyrus Ray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

YNIMG-07670; No.

of pages: 17; 4C:


NeuroImage xxx (2010) xxx–xxx

Contents lists available at ScienceDirect

NeuroImage
j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g

Q1 1 Multivariate dynamical systems models for estimating causal interactions in fMRI


2 Srikanth Ryali a,⁎, Kaustubh Supekar b,c, Tianwen Chen a, Vinod Menon a,d,e,⁎
3 a
Department of Psychiatry & Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
4 b
Graduate Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, CA 94305, USA
5 c
Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA 94305, USA
6 d
Program in Neuroscience, Stanford University School of Medicine, Stanford, CA 94305, USA
7 e
Department of Neurology & Neurological Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
8

a r t i c l e i n f o a b s t r a c t
9
10 Article history: Analysis of dynamical interactions between distributed brain areas is of fundamental importance for 26
11 Received 5 June 2010 understanding cognitive information processing. However, estimating dynamic causal interactions between 27
12 Revised 15 September 2010 brain regions using functional magnetic resonance imaging (fMRI) poses several unique challenges. For one, 28
13 Accepted 21 September 2010
fMRI measures Blood Oxygenation Level Dependent (BOLD) signals, rather than the underlying latent 29
14 Available online xxxx
neuronal activity. Second, regional variations in the hemodynamic response function (HRF) can significantly 30
16
15
17
18 Keywords:
influence estimation of casual interactions between them. Third, causal interactions between brain regions 31
19 Causality can change with experimental context over time. To overcome these problems, we developed a novel state- 32
20 Dynamical systems space Multivariate Dynamical Systems (MDS) model to estimate intrinsic and experimentally-induced 33
21 Variational Bayes modulatory causal interactions between multiple brain regions. A probabilistic graphical framework is then 34
22 Bilinear used to estimate the parameters of MDS as applied to fMRI data. We show that MDS accurately takes into 35
23 Expectation maximization account regional variations in the HRF and estimates dynamic causal interactions at the level of latent signals. 36
24 Kalman smoother We develop and compare two estimation procedures using maximum likelihood estimation (MLE) and 37
25 Deconvolution
variational Bayesian (VB) approaches for inferring model parameters. Using extensive computer simulations, 38
we demonstrate that, compared to Granger causal analysis (GCA), MDS exhibits superior performance for a 39
wide range of signal to noise ratios (SNRs), sample length and network size. Our simulations also suggest that 40
GCA fails to uncover causal interactions when there is a conflict between the direction of intrinsic and 41
modulatory influences. Furthermore, we show that MDS estimation using VB methods is more robust and 42
performs significantly better at low SNRs and shorter time series than MDS with MLE. Our study suggests that 43
VB estimation of MDS provides a robust method for estimating and interpreting causal network interactions 44
in fMRI data. 45
© 2010 Published by Elsevier Inc. 46
50 48
47
49
51 Introduction from the pattern of connections of other functionally related brain 64
areas (Passingham et al., 2002). A critical aspect of this effort is to 65
52 Functional magnetic resonance imaging (fMRI) has emerged as a better understand how causal interactions between specific brain 66
53 powerful tool for investigating human brain function and dysfunction. areas and networks change dynamically with cognitive demands 67
54 fMRI studies of brain function have primarily focused on identifying (Abler et al., 2006; Deshpande et al., 2008; Friston, 2009b; Goebel et 68
55 brain regions that are activated during performance of perceptual or al., 2003; Mechelli et al., 2003; Roebroeck et al., 2005; Sridharan et al., 69
56 cognitive tasks. There is growing consensus, however, that localiza- 2008). These and other related studies in the literature highlight the 70
57 tion of activations provides a limited view of how the brain processes importance of dynamic causal interactions for understanding brain 71
58 information and that it is important to understand functional function at the systems level. 72
59 interactions between brain regions that form part of a neurocognitive In recent years, several methods have been developed to estimate 73
60 network involved in information processing (Bressler and Menon, causal interactions in fMRI data (Deshpande et al., 2008; Friston et al., 74
61 2010; Friston, 2009c; Fuster, 2006). Furthermore, evidence is now 2003; Goebel et al., 2003; Guo et al., 2008; Rajapakse and Zhou, 2007; 75
62 accumulating that the key to understanding the functions of any Ramsey et al., 2009; Roebroeck et al., 2005; Seth, 2005; Smith et al., 76
63 specific brain region lies in disentangling how its connectivity differs 2009; Valdes-Sosa et al., 2005). Of these, Granger causal analysis (GCA) 77
(Roebroeck et al., 2005; Seth, 2005) and dynamic causal modeling 78
(DCM) (Friston et al., 2003) are among the more commonly used 79
⁎ Corresponding authors. Department of Psychiatry & Behavioral Sciences, 780
approaches thus far. There is a growing debate about the relative merits 80
Welch Rd, Room 201, Stanford University School of Medicine, Stanford, CA 94305-5778,
USA. Fax: +1 650 736 7200. and demerits of these approaches for estimating causal interactions 81
E-mail addresses: [email protected] (S. Ryali), [email protected] (V. Menon). using fMRI data (Friston, 2009a,b; Roebroeck et al., 2009). The main 82

1053-8119/$ – see front matter © 2010 Published by Elsevier Inc.


doi:10.1016/j.neuroimage.2010.09.052

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
2 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

83 limitations of GCA highlighted by this debate are that: (1) GCA


84 estimates causal interactions in the observed Blood-Oxygenation-
85 Level-Dependent (BOLD) signals, rather than in the underlying
86 neuronal responses; (2) GCA may not be able to accurately recover
87 causal interactions because of regional variations in hemodynamic
88 response; and (3) GCA does not take into account the experimentally
89 induced modulatory effects while estimating causal interactions
90 (Friston, 2009a,b). The main limitations of DCM highlighted in this
91 debate are that: (1) DCM is a confirmatory method wherein several
92 causal models are tested and the model with the highest evidence is
93 chosen. This is problematic if the number of regions under investigation
94 is large since a large number of models need to be tested increases
95 exponentially with the increase in the number of regions; (2)
96 conventional DCM uses a deterministic model to describe the dynamics
97 of the latent neuronal signals which may not be adequate to capture the
98 dynamics of the underlying neuronal processes (a stochastic version
99 was recently proposed (Daunizeau et al., 2009)); and (3) the
100 assumptions used by DCM for deconvolution of hemodynamic response
101 have not yet been adequately verified (Roebroeck et al., 2009). Here, we
102 develop a new method that incorporates the relative merits of both GCA
103 and DCM while attempting to overcome their limitations.
104 We propose a novel multivariate dynamical systems (MDS)
105 approach (Bishop, 2006) for modeling causal interactions in fMRI
Fig. 1. Probabilistic graphical model for multivariate dynamical system (MDS). All
106 data. MDS is based on a state-space approach which can be used to
conditional interdependencies in MDS can be inferred from this model. The state
107 overcome many of the aforementioned problems associated with variables s(t) are modeled as a linear dynamical system. The non-diagonal elements of
108 estimating causal interactions in fMRI data. State-space models have matrices A and C represent the intrinsic and modulatory connection strengths
109 been successfully used in engineering applications of control systems respectively. The diagonal elements of D represent the weight of external stimulus at
110 and machine learning (Bishop, 2006) but their use in neuroscience i-th node. Q(m,m) is the state noise variance at m-th node. Each element of A, C and D has
precision of α. Each element of α follows Gamma distribution with parameters co and do
111 has been limited. Notable examples of state-space models include The prior for 1/Q(m,m) follow Gamma distribution with parameters ao and bo. y(t)
112 Hidden Markov models (HMM) which are widely used in speech represents the observed BOLD signal, the elements of B represent weights
113 recognition applications (Rabiner, 1989) and Kalman filters for object corresponding to the basis functions for HRFs and R(m,m) is the observation noise
114 tracking (Koller and Friedman, 2009). Critically, state-space models variance at m-th node. Each element of B has precision of α. Each element of α follow the
Gamma distribution with parameters co and do The prior for 1/R(m,m) follows the
115 can be represented as probabilistic graphical models (Koller and
Gamma distribution with parameters ao and bo. The random variables are indicated as
116 Friedman, 2009), which as we show below (Fig. 1), greatly facilitate open circles and deterministic quantities as rectangles.
117 representation and inference for causal modeling of fMRI data.
118 Critically, MDS estimates causal interactions in the underlying
119 latent signals, rather than the observed BOLD-fMRI signals. In order to tions between multiple brain regions. We test the performance of 149
120 estimate causal interactions from the observed fMRI data, it is MDS using computer-simulated data sets as a function of network 150
121 important to take into account variations in hemodynamic response size, fMRI time points and signal to noise ratio (SNR). We evaluate 151
Q2 122 function (HRF) across different brain regions (David et al., 2008). MDS performance of our MDS models with extensive computer simulations 152
123 is a state-space model in which a “state equation” is used to model the and examine several metrics, including sensitivity, false positive rate 153
124 unobserved states of the system and an “observation equation” is used and accuracy, in terms of correctly identifying both intrinsic and 154
125 to model the observed data as a function of latent state signals (Fig. 1). modulatory causal interactions. Finally, we contrast our results with 155
126 The state equation is a vector autoregressive model incorporating those obtained with GCA. 156
127 both intrinsic and modulatory causal interactions. Intrinsic interac-
128 tions reflect causal influences independent of external stimuli and Methods 157
129 task conditions, while modulatory interactions reflect context
130 dependent influences. The observation models produce BOLD-fMRI Notation: In the following sections, we represent matrices by upper 158
131 signals as a linear convolution of latent signals and basis functions case letters and scalars and vectors by lower-case letters. Random 159
132 spanning the space of variations in HRF. matrices are represented by bold face letters whereas random vectors 160
133 The latent signals and unknown parameters that characterize causal and scalars are represented by bold face lower-case letters. 161
134 interactions between brain regions are estimated using two different
135 approaches. In the first approach, we use expectation maximization MDS Model 162
136 (EM) to obtain maximum likelihood estimates (MLE) of the parameters
137 and test the statistical significance of the estimated causal relationships Consider the following state-space model to represent the multi- 163
138 between brain regions using a nonparametric approach. We refer to this variate fMRI time series 164
139 approach as MDS-MLE. In the second approach, we use a Variational
J
140 Bayes (VB) approach to compute the posterior distribution of latent sðt Þ = Asðt−1Þ + ∑j = 1 vj ðt ÞCj sðt−1Þ + Duðt Þ + wðt Þ ð1Þ
141 variables and parameters which cannot be computed analytically using 165
166
142 a fully Bayesian approach. We refer to this approach as MDS-VB. By xm ðt Þ = ½sm ðt Þ sm ðt−1Þ…:sm ðt−L + 1Þ′ ð2Þ
143 representing MDS as probabilistic graphical network (Fig. 1), we show 167
168
144 that MDS-VB provides an elegant analytical solution for computing the ym ðt Þ = bm Φxm ðt Þ + em ðt Þ ð3Þ
145 posterior distributions and deriving causal connectivity estimates which 169
170
146 are sparse and more readily interpretable. In Eq. (1), s(t) is a M × 1 vector of latent signals at time t of M 171
147 We first describe our MDS model and discuss MLE and VB regions, A is an M × M connection matrix wherein A(m, n) denotes the 172
148 approaches for estimating intrinsic and modulatory causal interac- strength of intrinsic causal connection (which is independent of 173

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 3

174 external stimuli or task condition) from n-th region to m-th region. Cj is E-step, the posterior distribution of latent variables is computed given 238
175 an M × M connection matrix ensued by modulatory input vj(t), J is the the current estimates of parameters. In the M-step, given the current 239
176 number of modulatory inputs. The non-diagonal elements of Cj posterior distribution of latent variables, the parameters of the model 240
177 represent the coupling of brain regions in the presence of modulatory are estimated by maximizing the conditional expectation of log of 241
178 input vj(t). Therefore, latent signals s(t) in M regions at time t is a bilinear complete likelihood given the data. The E and M steps are repeated 242
179 function of modulatory inputs vj(t)and its previous state s(t-1). D is an until convergence. The log-likelihood of the data is guaranteed to 243
180 M × M diagonal matrix wherein D(i, i) denotes external stimuli strength increase or remain the same for every iteration of E and M steps. Also, 244
181 to i-th region. u(t) is an M × 1 binary vector whose elements represent the application of EM algorithm asymptotically gives maximum 245
182 the external stimuli to m-th region under investigation. w(t) is an M × 1 likelihood estimates of the parameters. In the E step, the posterior 246
183 state noise vector whose distribution is assumed to be Gaussian distributions are obtained by using Kalman filtering and smoothing 247
184 distributed with covariance matrix Q(w(t) ∼ N(0, Q)). Additionally, algorithms (Bishop, 2006). The detailed equations for E and M steps 248
185 state noise vector at time instances 1,2,….,T (w(1),w(2) … w(T)) are are given in Appendix A. We refer this solution to MDS using MLE as 249
186 assumed to be identical and independently distributed (iid). Eq. (1) MDS-MLE. 250
187 represents the time evolution of latent signals in M brain regions. More
188 specifically, the latent signals at time t, s(t), is expressed as a linear Inference 251
189 combination of latent signals at time t-1, external stimulus at time t The statistical significance of intrinsic (A(m, n)) and modulatory 252
190 (u(t)), bilinear combination of modulatory inputs vj(t), j = 1,2..J and its (Cj(m, n), j = 1, 2.., J) causal connections estimated using the EM 253
191 previous state, and state noise w(t). The latent dynamics modeled in approach was tested using a Bootstrap method. In this approach, 254
192 Eq. (1) gives rise to observed fMRI time series represented by Eqs. (2) the distribution of connection strengths, under the null hypothesis 255
193 and (3). that there are no connections between the regions, was generated by 256
194 We model the fMRI time series in region m as a linear convolution estimating A and C from 100 surrogate data sets constructed from the 257
195 of HRF and latent signal sm(t) in that region. To represent this linear observed data. A surrogate data set was obtained by applying a Fourier 258
196 convolution model as an inner product of two vectors, the past L transform to observed signal at the m-th region and then randomizing 259
197 values of sm(t) are stored as a vector. xm(t) in Eq. (2) represents an its phase response by adding a random phase shift at every frequency. 260
198 L × 1 vector with L past values of latent signal at m-th region. The phase shifts were obtained by randomly sampling in the interval 261
199 In Eq. (3), ym(t) is the observed BOLD signal at t of m-th region. Φ is [0, 2π]. Inverse Fourier transform was then applied to generate one 262
200 a p × L matrix whose rows contain bases for HRF. bm is a 1 × p instance of surrogate data (Prichard and Theiler, 1994). Randomiza- 263
201 coefficient vector representing the weights for each basis function in tion of the phase response destroys the causal interactions between 264
202 explaining the observed BOLD signal ym(t). Therefore, the HRF in m-th the brain regions while preserving their power spectra. EM algorithm 265
203 region is represented by the product bmΦ.The BOLD response in this was then run on this surrogate data to obtain A and C under the null 266
204 region is obtained by convolving HRF (bmΦ) with the L past values of hypothesis. This procedure was repeated on 100 surrogate data sets 267
205 the region's latent signal (xm(t)) and is represented mathematically and the empirical distributions under null hypothesis were obtained 268
206 by the vector inner product bmΦ xm(t). Uncorrelated observation for elements of A and C. The statistical significance of each connection 269
207 noise em(t) with zero mean and variance σ2m is then added to generate was then estimated using these distributions at p value of 0.01. with 270
208 the observed signal ym(t). em(t) is also assumed to be uncorrelated Bonferroni correction to account for multiple comparisons. 271
209 with w(τ), at all t and τ. Eq. (3) represents the linear convolution
210 between the embedded latent signal xm(t) and the basis vectors for Variational Bayes (VB) 272
211 HRF. Here, we use the canonical HRF and its time derivative as bases,
212 as is common in most fMRI studies (Penny et al., 2005; Smith et al., Estimation of posterior distributions 273
213 2009). In this approach, we use a VB framework to obtain the posterior 274
214 Eqs. (1)–(3) together represent a state-space model for estimating distributions of the unknown parameters and latent variables. Let 275
 
215 the causal interactions in latent signals based on observed multivar- Θ = A; C1 ; ::CJ ; D; Q ; R; B represent the unknown parameters and 276
216 iate fMRI time series. This model can be seen both as a multivariate S = fsðtÞ; t = 1; 2; …Tg be the latent variables of the model. Given the 277
217 extension of univariate time series models (Makni et al., 2008; Penny observations Y = {y(t), t = 1, 2, … T} and the probabilistic model, the 278
218 et al., 2005), and also as an extension of GCA wherein vector Bayesian approach aims to find the joint posterior pðS; ΘjYÞ. However, 279
219 autoregressive model for latent, rather than BOLD-fMRI, signals are obtaining this posterior distribution using a fully Bayesian approach is 280
220 used to model the causal interactions among brain regions. Further- analytically not possible for most models including MDS. In the VB 281
221 more, our MDS model also takes into account variations in HRF as well approach, we make analytical approximation to pðS; ΘjYÞ. Let qðS; ΘjYÞ 282
222 as the influences of modulatory and external stimuli in estimating be any arbitrary probability distribution then the log of the marginal 283
223 causal interactions between the brain regions. distribution of observations Y can be written as (Bishop, 2006) 284
224 Estimating causal interactions between M regions specified in the
225 model is equivalent to estimating the unknown parameters A and Cj, log P ðY Þ = LðqÞ + KLðq j jpÞ ð4Þ
226 j = 1,2..J. In order to estimate A and Cjs, the other unknown parameters 285
286
227 D, Q , {bm} M 2 M T
m = 1 and {σm} m = 1 and the latent signal {s(t)} t = 1 based on where 287
M
228 the observations { ym(t)} m = 1, t = 1, 2..T, where T is the total number of
229 time samples, needs to be estimated. We use the following MLE and VB pðY; S; ΘÞ
LðqÞ = ∫dSdΘqðS; ΘjYÞ log ð5Þ
230 methods for estimating the parameters of the MDS model. qðS; Θj YÞ
288
289
231 Maximum Likelihood Estimation (MLE) pðS; Θ j YÞ
KLðqjjpÞ = −∫dSdΘqðS; ΘjYÞ log ð6Þ
qðS; ΘjYÞ
232 Estimation 290
291
233 Maximum likelihood estimates of MDS model parameters and KL(q||p) is the Kullback–Leibler divergence between qðS; ΘjYÞ and 292
234 latent signals are obtained by maximizing the log-likelihood of the pðS; ΘjYÞ. KL(q||p) ≥ 0, with equality, if and only if, qðS; ΘjYÞ = 293
235 observed fMRI data. We use EM algorithm to estimate the unknown pðS; ΘjYÞ. Therefore, L(q) serves as a lower bound on the log of the 294
236 parameters and latent variables of the model. EM algorithm is an evidence (log P(Y)). The maximum of this lower bound occurs when 295
237 iterative method consisting of two steps viz., E-step and M-step. In the KL divergence is zero for which the optimal choice of qðS; ΘjYÞ is 296

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
4 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

297 pðS; ΘjYÞ: Since pðS; ΘjYÞ is not tractable, certain assumptions on the and variance σ 2. If σ 2 → ∞ or 1/ σ 2 → 0 (precision), the distribution 340
298 form of qðS; ΘjYÞ are made and then the optimal distribution is found becomes flat and the random variable z can take any value between 341
299 by maximizing the lower bound L(q). In this work, we assume that the − ∞ and ∞ with equal probability. Here, we refer to such distributions 342
300 posterior distribution qðS; ΘjYÞ factorizes over S and Θ, i.e., as non-informative. Let x and y be two random variables with 343
probabilities p(x) and p(y) respectively. p(y) is said to be a conjugate 344
qðS; ΘjYÞ = qS ðSjYÞqΘ ðΘjYÞ ð7Þ prior for p(x) if the functional form of the posterior p(x|y) is same as 345
301
302 that of p(y). Specifying conjugate priors leads to elegant analytical 346
303 We note that no further assumptions are made on the functional solutions, and it also allows us to specify priors in such a way that we 347
304 form of these distributions qS ðSjYÞ and qΘ(Θ|Y). These quantities are get sparse and interpretable solutions. For example, one can specify 348
305 obtained by taking functional derivatives of L(q) with respect to the Gaussian priors, on the elements of connection matrices A and C. If 349
306 qS ðSjYÞ and qΘ(Θ|Y). It can be shown that the prior on each element of A (or Cj) is 350

log qS ðSjYÞ∝EΘ ðlogpðY; S; ΘÞÞ ð8Þ !


1
308
307 Ai;j e N 0; ð10Þ
λi;j
log qΘ ðΘjYÞ∝ES ðlogpðY; S; ΘÞÞ ð9Þ
310
309
311 Eqs. (8) and (9) are respectively VB-E and VB-M steps. Expecta- where, λi, j is the prior precision for Ai, j. Such a specification of priors 351
352
312 tions are computed with respect to qΘ(Θ|Y) in Eq. (8) and with helps in automatic relevance determination (ARD) of the connections 353
313 respect to qS ðSjYÞ in Eq. (9). In the VB-E step, the distribution of latent Ai, j between the regions (Tipping, 2001). During the learning process 354
314 signal s(t), for each t, is updated given the current distribution of the of A and λ's, a significant proportion of λi, j ' s go towards infinity and 355
315 parameters Θ. For reasons described below, s(t) has a Gaussian the corresponding connections Ai, j ' s have posterior distributions 356
316 distribution and in this step updating the distribution amounts to whose mean values shrink towards its prior mean which is zero. The 357
317 updating the mean and variance of the Gaussian distribution. elements of the matrix A which do not have significant values become 358
318 Therefore, in the VB-E step, estimating means of s(t) at every t is very small and only the elements which are significant survive. 359
319 equivalent to estimating the latent signals. In the VB-M step, the Therefore, adopting this procedure helps in automatically identifying 360
320 distributions for model parameters Θ are updated given the update the relevant entries of the matrix A and hence the name “Automatic 361
321 distributions for latent signal s(t). These VB-E and VB-M steps are relevance determination”. This is very important because unlike the 362
322 repeated until convergence. Note that we do not make any MLE approach, inference on connection weights (A and Cj ' s) is now 363
323 assumptions about the factorization of Θ and S. Any further straightforward. The details of prior specification for various para- 364
324 conditional independencies in these sets are derived from the meters are given in Appendix-B. We test the significance of 365
325 probabilistic graphical model of MDS shown in Fig. 1. The details of parameters by thresholding the corresponding posterior probabilities 366
326 the derivation of the posterior probabilities using the graphical model at a p-value of 0.01 with Bonferroni correction to account for multiple 367
327 are given in Appendix-B. Fig. 2 shows a flow chart of various steps comparisons. 368
328 involved in both MDS-VB and MDS-MLE methods.
Simulated data sets 369
329 Choice of priors and inference
330 The Bayesian approach allows the specification of both informative Data sets with modulatory effects and external stimuli 370
331 and non-informative priors on the model parameters. The specifica- We assess the performance of MDS using a number of computer- 371
332 tion of these priors helps in regularizing the solution and avoids over simulated data sets generated at various SNRs (10, 5 and 0 dB), for 372
333 fitting in the case where number of parameters to be estimated is different number of brain regions or nodes (M = 2, 3 and 5) and for 373
334 large compared to the number of observations. This is generally the different number of time samples (T = 200, 300 and 500). 374
335 case where the number of brain regions to be modeled is large. Since Fig. 3 shows the intrinsic and modulatory connectivity of three 375
336 we do not have a priori information on these parameters, we specify networks with 2, 3 and 5 nodes. For example, in the two nodes 376
337 non-informative conjugate priors on these parameters. Here, we network (Fig. 3A), node 1 receives an external input and there is an 377
338 briefly explain the notion of non-informative conjugate priors intrinsic causal connection from node 1 to node 2 with a weight of 378
339 (Bishop, 2006). Let z be a Gaussian random variable with mean μ −0.3 (A(2,1) = −0.3). A modulatory input induces a connection from 379
node 1 to node 2 with weight of 0.5 (C(2,1) = 0.5) and whose sign is 380
opposite to that of intrinsic connection. Similarly in the five-node 381
structure (Fig. 3C), node 1 receives the external input and has causal 382
influences on nodes 2, 3 and 4 (matrix elements A). Nodes 4 and 5 383
have bidirectional influences. Modulatory inputs induce causal 384
influences from node 1 to 2 and from node 3 to 2 (matrix C). Note 385
that all three networks have intrinsic and modulatory connections 386
from node 1 to 2 with weights −0.3 and 0.5 respectively. We 387
simulated networks with these weights to explicitly test whether 388
MDS could recover these interactions which could be missed by GCA 389
because of opposing signs of the weights of intrinsic and modulatory 390
connections. We set the autocorrelations of the time series (diagonal 391
elements of matrix A) to 0.7 (Ge et al., 2009; Roebroeck et al., 2005). 392
Our simulations also modeled variations in HRFs across regions. Fig. 4 393
shows the simulated HRFs at each of the network nodes in the 394
Fig. 2. Flow chart showing major steps in implementation of MDS. Weiner structure. The HRFs were constructed in such a way that the direction 395
deconvolution is used to get an initial estimate of latent signals and a least square of hemodynamic delays is confounded with the direction of latent 396
estimation procedure is used to find an initial estimation of model parameters. The
signal delay, making the task of recovering network parameters more 397
estimates of latent signals and model parameters are refined in E and M steps,
respectively. These steps are repeated until convergence. The significance of model challenging. For example in the 5-node network, node 1 drives nodes 398
parameters is then assessed in the inference step. 2, 3 and 4 at the level of latent signals. But the HRF at node 1 peaks 399

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 5

(TR), is 2 s. This corresponds to an embedding dimension of L = 16, 404


which is also the length of the HRF. Note that the duration of the 405
canonical HRF is approximately about 32 s. 406
Fig. 5 shows the experimental and modulatory inputs applied to 407
the nodes shown in Fig. 3. The external input was simulated to reflect 408
an event-related fMRI design with stimuli occurring at random 409
instances with the constraint that the time difference between two 410
consecutive events is at least 2 TR (Fig. 5A). This input can also be a 411
slow event or block design. In the MDS framework, there is no 412
restriction on the nature of the experiment design. The modulatory 413
input is assumed to be a boxcar function (Fig. 5B). The modulatory 414
inputs indicate the time periods wherein the network configuration 415
could change because of context specific influences such as changes in 416
attention, alertness and explicit experimental manipulations. 417
The simulated data sets were generated using the model described 418
in the Eqs. (1)–(3). The latent noise covariance was fixed at Q = 0.1IM, 419
where IM is identity matrix of size M. The observed noise variance at 420
m-th region for a given SNR was computed as 421

2 −0:1SNR
σ m = Varðym Þ10 ð11Þ
422
423
We assume that the canonical HRF and its temporal derivative 424
span the space of HRFs. Therefore, they constitute the rows of Φ which 425
would be a 2 × 16 matrix. The coefficients of the matrices A and C for 426
each network structure are shown in Fig. 3. 427
We generated 25 data sets for each SNR, network structure and 428
Fig. 3. Simulated models with intrinsic and experimentally induced modulatory
connections for (A) 2-node, (B) 3-node and (C) 5-node networks. Intrinsic connections time samples. The performance of the method was assessed using the 429
are shown in solid lines and modulatory connections are shown in broken lines and performance metrics described in the next section. 430
highlighted with the connecting back circles. A(i,j) and C(i,j) are the weights of intrinsic
and modulatory connections, respectively, between nodes i and j. D(i,i) is the strength
Data sets without modulatory effects and external stimuli 431
of external stimulus to the i-th node.
To examine the performance of the methods when there are no 432
modulatory or external stimulus effects, we simulated 25 data sets 433
400 later than that in the nodes 2, 3 and 4. These HRFs were simulated for a 3-node network shown in Fig. 3B at 5 dB SNR. We set the 434
401 using a linear combination of canonical HRF and its temporal weights to the same values as in the previous case except that there 435
402 derivative. Analogous to most fMRI data acquisition paradigms, we are no causal interactions from modulatory inputs. The weights 436
403 assume that the sampling interval, also referred to as repetition time corresponding to external stimuli were all set to zero. The diagonal 437

Fig. 4. Variable regional hemodynamic response function used in the simulations for each node in (A) 2, (B) 3 and (C) 5 node networks shown in Fig. 3.

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
6 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

Fig. 5. Onset and duration of event related experimental stimuli (A) and modulatory inputs (B) used in the simulations.

438 elements (autocorrelations) in matrix A were set to 0.8, 0.7 and 0.6, performance metrics such as sensitivity, false positive rate and accuracy in 471
439 respectively. These data sets were created to provide a more correctly identifying causal intrinsic and modulatory interactions, where : 472
440 appropriate, albeit less general, comparison of MDS with GCA.
TP
sensitivity ¼ ð12Þ
441 Effects of fMRI down-sampling TP þ FN
442 Interactions between brain regions occur at finer time scales 473
474
443 compared to the sampling intervals of fMRI signals. fMRI signals are FP
false positive rate ¼ ð13Þ
444 typically sampled at TR = 1 or 2 s while neuronal processes occur at TN þ FP
445 millisecond resolution. To investigate the effects of fMRI down 475
476
446 sampling on MDS performance, we adopted the approach described TP + TN
accuracy = ð14Þ
447 by Deshpande et al. (2009).We generated data sets with 1 ms TP + FP + FN + TN
448 sampling interval at 0 dB network shown in Fig. 3A. We obtained
449 neuronal signals in node 1 and node 2 with a delay of dn milliseconds where, TP is the number of true positives, TN is the number of true 477
478
450 between them. In this case, node 1 drives node 2 under both intrinsic negatives, FN is the number of false negatives and FP is the number of 479
451 and modulatory conditions with weights shown in Fig. 3A. The false positives. These performance metrics are computed for each of 480
452 autocorrelations in node1 and 2 were set to 0.8 and 0.7 respectively. the 25 data sets and then are averaged to obtain the overall 481
453 We then convolved neuronal signal at node 1 with a canonical HRF performance. 482
454 generated again at 1 KHz sampling rate and then re-sampled to
455 sampling interval of TR = 2 s to obtain fMRI signal. In node 2, we Results 483
456 convolved the “neuronal” signal with the HRF which was delayed by
457 dh milliseconds with respect to the HRF in node 1, and again re- Applying MDS – An example 484
458 sampled to TR = 2 s to obtain fMRI signal. We obtained simulated data
459 sets at various neuronal delays dn = {0, 200, 400, 600, 800, 1000} and We first illustrate the performance of MDS-MLE and MDS-VB by 485
460 HRF delays dh = {0, 500, 2500} milliseconds (Deshpande et al., 2009). computing the estimated intrinsic and modulatory connections and 486
461 We also examined two cases for HRF delays: (1) HRF delay is in the the deconvolved (or estimated) latent signals of a five node network 487
462 same direction of neuronal delay and (2) HRF delay is in the opposite simulated at 5 dB noise shown in Fig. 3C. The MDS approach, using 488
463 direction of neuronal delay. The second case represents the scenario either MLE or VB, simultaneously estimates both latent signals and 489
464 where HRF confounds the causal interactions at neuronal level. We unknown parameters in the model using E and M steps, respectively. 490
465 generated 25 simulated data sets for each combination of dn and dh. The left and right panels in Fig. 6, respectively, show the actual and 491
466 Supplementary Table S1 summarizes the characteristics of each data estimated latent signals and the actual and estimated BOLD signal at 492
467 set used for evaluating the performance of MDS. the five nodes in the network using MDS-MLE and MDS-VB. The 493
estimated BOLD signal yˆm at m-th node using these methods was 494
468 Performance metrics computed as follows 495

469 The performance of MDS in discovering the intrinsic and modulatory


470 causal interactions in simulated data sets was assessed using various yˆm = bˆm′ Φ xˆ
m ðt Þ ð15Þ 496
497

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 7

498 Where, bˆm′ are the estimated coefficients (using MLE or VB) Table 1 t1:1
499 corresponding to the basis functions spanning the subspace of HRFs Mean square error (MSE) between actual and estimated neuronal and actual and Q14
estimated BOLD signals using MLE-MBDS and VB-MBDS at five nodes of the network.
500 and xˆ ′ (using MLE or VB) is the estimated latent signal at the m-th
m t1:2
501 node. As shown in this figure, both MDS-MLE and MDS-VB were able Nodes Neuronal signals BOLD signals t1:3
502 to recover the latent and BOLD signals at this SNR. Table 1 shows the MLE VB MLE VB t1:4
503 mean square error (MSE) between the estimated and latent signals
1 0.024 0.023 0.027 0.027 t1:5
504 and estimated and actual BOLD-fMRI responses in each node using 2 0.024 0.024 0.015 0.014 t1:6
505 these two methods. The MSE in estimating these signals is very low by 3 0.019 0.019 0.025 0.024 t1:7
506 both methods. Fig. 7A and B, respectively, shows the estimated 4 0.017 0.017 0.021 0.02 t1:8
507 intrinsic and modulatory causal interactions by MDS-MLE and MDS- 5 0.018 0.017 0.02 0.02 t1:9

508 VB in the simulated five node network. MDS-VB correctly identified


509 both intrinsic (solid lines) and modulatory connections (dotted lines)
510 in this network as shown in Fig. 7B. MDS-MLE also correctly recovered nodes have both intrinsic and modulatory interactions with opposing 524
511 both intrinsic and modulatory networks but it introduced an actions. Since GCA does not model these interactions separately, the 525
512 additional false modulatory connection from node 3 to node 1 as net connection strength between these nodes is not significant. On the 526
513 shown in Fig. 7A. other hand, MDS models these interactions explicitly and therefore is 527
514 We next compare the performance of MDS with that of GCA using able to recover both types of connections. This example demonstrates 528
515 the same simulated data. This analysis was performed using the that GCA cannot recover all the connections under these conditions 529
516 multivariate GCA toolbox developed by Seth (Seth, 2010). We applied while both MDS methods could recover all the connections and at the 530
517 GCA on the same data set to verify whether it can recover the causal same time differentiate between the different types of interactions. 531
518 connections (either intrinsic or modulatory). As shown in Fig. 7C, GCA
519 likely missed both the intrinsic and modulatory interactions from Performance of MDS on simulated data with modulatory effects and 532
520 node 1 to 2 but it was able to recover modulatory interactions from external stimuli 533
521 node 3 to 4 in addition to other connections. However, unlike MDS,
522 GCA cannot distinguish between intrinsic and modulatory interac- We evaluated the performance of MDS-MLE and MDS-VB on 534
523 tions. GCA missed the connection from node 1 to 2 because these simulated data sets by computing sensitivity, false positive rate and 535

Fig. 6. Left panel: actual and estimated latent signals at each of the nodes of the 5-node network shown in Fig. 3C. Right panel: estimated and actual BOLD-fMRI signals at each node.
MDS using both MLE and VB approaches, accurately recovered latent signals and predicted the fMRI signals based on the estimated model parameters and latent signals.

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
8 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

The sensitivity of GCA for this network is less than 10% because 574
intrinsic and modulatory interactions have weights with opposite 575
signs. Since GCA does not model these interactions explicitly, it does 576
not detect the interactions between the two nodes. On the other hand, 577
both MDS methods showed better sensitivity and accuracy in 578
identifying both types of interactions at all SNRs for this network. 579

Comparison of MDS and GCA on simulated data in the absence of 580


modulatory effects and external stimuli 581

Table 2 shows the relative performance of MDS-VB, MDS-MLE and 582


GCA on 25 data sets simulated for a 3-node network at 5 dB SNR 583
without any modulatory effects and external stimuli. The perfor- 584
mance of GCA improved and recovered the causal network with 585
sensitivity of 0.9, FPR of 0 and accuracy of 0.98. In this case, the 586
performance of GCA is comparable to MDS, suggesting that in the 587
absence of modulatory effects and external stimuli GCA can perform 588
as well as MDS even in the presence of HRF variations. 589
Fig. 7. (A) Intrinsic and modulatory connections estimated by MDS using Maximum
likelihood estimates (MDS-MLE) and (B) estimates Variational Bayes estimates (MDS- Effects of fMRI down-sampling on MDS performance 590
VB). (C) Causal interactions estimated by Granger Causal Analysis (GCA). MDS-VB
correctly identified both intrinsic and modulatory connections. MDS-MLE correctly
We examined the performance of MDS-VB on simulated data in 591
estimated all the intrinsic and modulatory connections in the five node network but
also introduced a false modulatory connection from node 3 to node 1. GCA missed both which latent signals were generated at various delays using a 592
intrinsic and modulatory connections from node 1 to 2 for reasons described in the text. sampling interval of 1 ms and convolved with various delays in the 593
HRF. Causal interactions were then estimated based on the observed 594
time series obtained at sampling interval of 2 s. We examined the 595
536 accuracy in finding intrinsic and modulatory causal interactions as a performance of MDS-VB under four different cases: 596
537 function of SNR, network size and the number of time samples.
538 Figs. 8–10 respectively show the performance of MDS-MLE and MDS-
No latent signal delay but HRF is delayed by 500 and 2500 ms 597
539 VB for time samples T = 500, 300, 200. For each T and network size,
In this case, there are no causal interactions between the two 598
540 the performance of MDS-MLE and MDS-VB is evaluated for SNR = 0, 5
nodes with respect to the latent signals but the observed fMRI time 599
541 and 10 dB. In each of these figures panels A, B and C show the
series are delayed with respect to each other because of delays in their 600
542 performance of MDS-MLE and MDS-VB with respect to sensitivity,
respective HRFs. In this case, MDS-VB was accurate in that it did not 601
543 false positive rate and accuracy in identifying the intrinsic and
recover any causal interactions (both intrinsic and modulatory) 602
544 modulatory interactions for 2, 3 and 5 node networks, respectively.
despite variations in HRF. 603
545 The performance of these methods improved with the increase in SNR
546 and time samples (T).
547 Between the two MDS methods, MDS-VB showed superior No HRF delays 604
548 performance compared to MDS-MLE across all the SNRs, time samples In this case, HRFs in both nodes are identical. As shown in Fig. S1 605
549 and network size. MDS-VB showed significantly greater performance (A), the sensitivity of MDS-VB in recovering both intrinsic and 606
modulatory interactions is above 0.9 (left panel) with FPRs below 0.1 607
550 at low SNR and for shorter time series. For example, for SNR = 0 dB,
551 T = 200 time points and number of nodes = 5 (Fig. 10C), the (middle panel) and therefore with accuracies of above 0.9 (right 608
panel) at all the latent signal delays. The performance of MDS-VB 609
552 sensitivity of MDS-VB in recovering intrinsic and modulatory
553 interactions was about 0.75 and 0.6, respectively, while MDS-MLE improved with the increase in latent signal delays. 610
554 has sensitivity of only about 0.3 and 0.5, respectively. The accuracy of
555 MDS-VB is also high (N0.8) under all conditions (panel (C) in Figs. 8– Latent and HRF delays both in the same direction 611
556 10) because the sensitivities are high and false positive rates of this In this scenario, HRF delays do not confound the causal interac- 612
557 method are very low (panels A and B in Figs. 8–10). More generally, in tions between the nodes at the latent signal level. For HRF delay of 500 613
558 cases with high noise, lower sample length, and larger network size and 2500 ms, the performance metrics shown in Figs. S1 (B) and (C) 614
559 MDS-VB consistently outperforms MDS-MLE. suggest that MDS-VB is able to recover both intrinsic and modulatory 615
causal interactions reliably. For the HRF delay of 2500 ms, there is a 616
560 Comparison of MDS and GCA on simulated data with modulatory effects small drop in sensitivity for latent signal delays of 800 and 1000 ms. 617
561 and external stimuli
HRF delays oppose the delays in latent signals 618
562 Finally, we compared the performance of GCA with MDS methods This is the most difficult situation for any method because HRF 619
563 on the same data sets and performance metrics. Fig. 8–10, respec- delays confound the causal interactions at latent signal level. Fig. S2 620
564 tively, shows the comparative performance of GCA with MDS-VB and (B) shows the performance of MDS-VB when HRF in node 2 peaks 621
565 MDS-MLE for T = 500, 300 and 200 at various SNRs and network sizes. 500 ms before the HRF in node 1 while node1 drives node 2 at latent 622
566 The results suggest that performance of GCA is poor compared to both signal level. The performance of MDS-VB improved with the increase 623
567 MDS methods with respect to sensitivity and accuracy in indentifying in latent signal delay. Fig. S2(C) shows the performance of MDS-VB for 624
568 causal interactions between brain nodes. Since GCA cannot distin- HRF delay of 2500 ms. Although the sensitivities for latent signal 625
569 guish between intrinsic and modulatory interactions, we computed delays from 200 to 600 ms are higher (left panel) but are 626
570 the performance metrics by considering both the connection types. accompanied by greater false positives (middle panel) and therefore 627
571 The performance of GCA declined with the decrease in SNR for have poor accuracies (right panel). The performance of MDS-VB 628
572 networks of size 3 and 5 (Figs. 8–10). The performance of GCA is improved for latent signal delays of 800 and 1000 ms in recovering 629
573 worse even for a 2-node network as shown in Figs. 8A, 9A and 10A. causal interactions. 630

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 9

Fig. 8. (A) Sensitivity, false positive rate (FPR) and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and
10 dB using MDS-MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCA
for all 3 networks and SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. Sample size T = 500 time points.

631 Discussion even at low SNRs and smaller number of observed samples (time 652
points). We then contrast the performance of MDS with the widely 653
632 We have developed a novel dynamical systems method to model used GCA method. In this context, we highlight instances where GCA 654
633 intrinsic and modulatory interactions in fMRI data. MDS uses a vector works reasonably well and where it fails. Finally, we discuss several 655
634 autoregressive state-space model incorporating both intrinsic and important conceptual issues concerning the investigation of dynamic 656
635 modulatory causal interactions. Intrinsic interactions reflect causal causal interactions in fMRI, contrasting MDS with other recently 657
636 influences independent of external stimuli and task conditions, while developed methods. 658
637 modulatory interactions reflect context dependent influences. Our
638 proposed MDS method overcomes key limitations of commonly used Performance of MDS on simulated data sets—contrasting MLE and VB 659
639 methods for estimating the causal relations using fMRI data. approaches 660
640 Critically, causal interactions in MDS are modeled at the level of
641 latent signals, rather than at the level of the observed BOLD-fMRI In the following sections, we evaluate and discuss the performance 661
642 signals. Our simulations clearly demonstrate that this has the added of MDS under various scenarios. Importantly, we demonstrate, for the 662
643 effect of eliminating the confounding effects of regional variability in first time, that VB approaches provide better estimates of model 663
644 HRF. The parameters and latent variables of the state-space model parameters than MLE based approaches. We investigated the 664
645 were estimated using two different methods. In the MDS-MLE performance of MDS-MLE and MDS-VB on simulated data sets 665
646 method, the statistical significance of the parameters of the state generated at SNRs of 0, 5 and 10 dB for network structures of sizes 2, 666
647 equation, which represent the causal interactions between multiple 3 and 5 and time samples of 200, 300 and 500. We simulated regional 667
648 brain nodes, was tested using a Bootstrap method. In the MDS-VB HRF variations in such a way that the directions of hemodynamic 668
649 method, we used non-informative priors to facilitate automatic response delays were in opposite direction to the delays in the latent 669
650 relevance detection. We first discuss findings from our simulations, signals (Fig. 4). HRF delays could therefore influence the estimation of 670
651 and show that MDS-VB provides the robust and accurate solutions causal interactions when applied directly on the observed BOLD-fMRI 671

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
10 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

Fig. 9. (A) Sensitivity, FPR and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and 10 dB using MDS-
MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCA for all 3 networks
and SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. In contrast to Fig. 8, the sample size (T) here is 300 time points.

672 signals. This makes the problem of estimating causal interactions parameters is high, but also for providing sparse and interpretable 694
673 particularly challenging and provides novel insights into strengths and solutions. This feature of VB can be especially important in analyzing 695
674 weaknesses of multiple approaches used here. networks with large number of nodes, an aspect often overlooked in 696
675 The performance of MDS was found to be robust when tested most analyses of causality in complex networks. 697
676 under various simulated conditions. Specifically, MDS was able to Another advantage of Bayesian analysis lies in computing the 698
677 reliably recover both intrinsic and modulatory causal interactions statistical significance of the network connections estimated by MDS 699
678 from the simulated data sets and its performance was found to be methods. In the MLE approach, we need to resort to a Bootstrap 700
679 superior to the conventional approaches such as GCA. Among MDS approach, which can be computationally expensive. MDS-VB, on the 701
680 methods, the performance of MDS-VB was found to be superior to other hand, provides posterior probabilities of each model parameter, 702
681 MDS-MLE with respect to performance metrics such as sensitivity, as opposed to point estimates in MLE, which can be used to compute 703
682 false positive rate and accuracy in identifying intrinsic and modula- their statistical significance. From a computational perspective, MDS- 704
683 tory causal interactions (Figs. 7–10). MDS-VB showed significantly VB is several orders of magnitude faster than MLE-MLE because it does 705
684 improved performance over MDS-MLE, especially at adverse condi- not require nonparametric tests for statistical significance testing. 706
685 tions such as low SNRs, large network size and for less number of Taken together, these findings suggest that MDS-VB is a superior and 707
686 observed samples (Fig. 10C). more powerful method than MDS-MLE. 708
687 The superior performance of MDS-VB can be attributed to the
688 regularization imposed by priors in this method. Our priors not only Comparison with GCA 709
689 regularized the solution but also helped in achieving sparse solutions.
690 By using sparsity promoting priors, the weights corresponding to We demonstrated the importance of modeling the influence of 710
691 insignificant links are driven towards zero and therefore enable both external and modulatory stimuli for estimating the causal 711
692 automatic relevance detection (Tipping, 2001). This approach is not networks by applying GCA on a five-node network. On this data set, 712
693 only useful for regularizing solutions when the number of unknown GCA failed to detect both the modulatory and intrinsic connections 713

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 11

Fig. 10. (A) Sensitivity, FPR and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and 10 dB using MDS-
MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCA for all 3 networks
and SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. In contrast to Fig. 8, the sample size (T) here is 200 time points.

714 between nodes 1 and 2 (Fig. 7C). As mentioned earlier, GCA missed various SNR, networks and for different number of observations, was 727
715 this connection because the network has both intrinsic and modula- found to be inferior when compared to both MDS methods. This was 728
716 tory connections between these nodes but with weights of opposite true with respect to both sensitivity and accuracy in indentifying 729
717 signs. Therefore, in GCA the net strength of this connection is very causal interactions between multiple brain nodes (Figs. 8–10). When 730
718 small and it did not survive a conservative test of statistical compared to MDS, the performance of GCA drops significantly at 731
719 significance. Our MDS methods, on the other hand, uncovered both lower SNRs. These results suggest that MDS is more robust against 732
720 these connections. This phenomenon is most obvious in our observation noise than GCA. Therefore, our simulations suggest that 733
721 simulations of the 2-node network wherein GCA could not find causal MDS-VB outperforms GCA for networks consisting of less than 6 734
722 interactions between the nodes (Figs. 8A, 9A and 10A). In this network nodes. More extensive simulations however are needed to compare 735
723 also both intrinsic and modulatory connections have weights with the performance of MDS with GCA, for larger networks. 736
724 opposite signs. These results demonstrate the importance of explicitly Conventional GCA methods do not take into account dynamic 737
725 modeling the influence of external and modulatory stimuli. Overall, changes in modulatory inputs and their effect on context dependent 738
726 the performance of GCA, when applied on the data sets generated at causal interactions. In order to compare GCA more directly with MDS, 739
we examined causal interactions in the absence of modulatory 740
influences. As expected, in this case, the performance of GCA was 741
comparable to that of MDS. Together, these findings suggest that GCA 742
t2:1 Table 2
Relative performance of MDS and GCA in the absence of modulatory effects. can accurately recover causal interactions in the absence of modula- 743
t2:2 tory effects. Although newer dynamic GCA methods have been 744
t2:3 Method Sensitivity FPR Accuracy
proposed, they appear to be designed more for improving estimations 745
t2:4 MDS-VB 0.98 0.02 0.98 of causal interactions rather than examining context dependent 746
t2:5 MDS-MLE 0.92 0.03 0.96 dynamic changes in causal interactions (Havlicek et al., 2010; 747 Q3
t2:6 GCA 0.9 0 0.98
Hemmelmann et al., 2009; Hesse et al., 2003; Sato et al., 2006). 748

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
12 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

749 Further simulation studies are needed to assess how well dynamic Comparison of MDS with other approaches 812
750 GCA can estimate context specific modulatory effects.
751 We next contrast our findings using GCA and MDS in the context of As noted above, like GCA, MDS can be used to estimate causal 813
752 the equivalence of ARIMA and structural time series models. GCA is interactions between a large numbers of brain nodes. Unlike GCA, 814
753 based on autoregressive integrated moving average (ARIMA) models however, causal interactions are estimated on the underlying latent 815
754 proposed by Box and Jenkins whereas MDS is a structural time series signals while simultaneously accounting for regional variations in the 816
755 model (Box et al., 1994). In the econometrics literature, it is well HRF. Furthermore, unlike GCA, MDS can differentiate between 817
756 known that the linear structural time series models have equivalent intrinsic and stimulus-induced modulatory interactions. Like DCM, 818
757 ARIMA model representations (Box et al., 1994). This equivalence has MDS takes into account regional variations in HRF while estimating the 819
758 been under-appreciated in the neuroimaging literature, as demon- causal interactions between brain regions. And like DCM, MDS also 820
759 strated by the recent discussion regarding the relative merits of GCA explicitly models external and modulatory inputs, allowing us to 821
760 and DCM (Friston, 2009a,b; Roebroeck et al., 2009). Our detailed simultaneously estimate intrinsic and modulatory causal interactions 822
761 simulations suggest that, under certain conditions, GCA is able to between brain regions. Unlike DCM, however, MDS does not require 823
762 recover much of the causal network structure in spite of the presence the investigator to test multiple models and choose one with the 824
763 of HRF delay confounds. This is most clearly illustrated using the highest model evidence. This overcomes an important limitation of 825
764 simulations shown in Fig. 7C, where we found that GCA could recover DCM - as the number of brain regions of interest increases, an 826
765 the network structure except for the intrinsic/modulatory connection exponentially large number of models needs to be examined; as a 827
766 from node 1 to 2. Our simulations also suggest that GCA may not be result, the computational burden in evaluating these models and 828
767 able to uncover causal connections when there is a conflict between identifying the appropriate model can become prohibitively high. MDS 829
768 intrinsic and modulatory connections (Figs. 7C, 8–10) but for other overcomes such problems and as our study illustrates, MDS incorpo- 830
769 cases it is able to recover the underlying networks. In estimating the rates the relative merits of both GCA and DCM while attempting to 831
770 causal interactions, the estimated model order for GCA using the overcome their limitations. 832
771 Akaike information criterion (AIC) was more than 3. Note that in our Both DCM and MDS are state-space modes but DCM uses a 833
772 simulations, the causal interactions at the latent signal level were deterministic state model, (although a stochastic version has been 834
773 generated using VAR with model order 1 (Eq. (1)). It is plausible that recently developed (Daunizeau et al., 2009)) whereas MDS employs a 835
774 in GCA, the higher model order is used to compensate for variations in stochastic model. Modeling latent interactions as a stochastic process 836
775 HRF delay and experimental effects such as context specific is important for taking into account intrinsic variations in latent signals 837
776 modulatory connections (Deshpande et al., 2009). Our simulations that are not induced by experimental stimuli. Another important 838
777 suggest that this is indeed that case and that optimal model order difference is that MDS uses empirical basis functions to model 839
778 selection in GCA results in improved estimation of causal interactions variations in HRF whereas DCM uses a biophysical Balloon model 840
779 between nodes. Nevertheless, structural time series based models like (Friston et al., 2003). Since the Balloon model is a nonlinear model, 841
780 MDS and DCM can provide better interpretation of network structure several approximations are required to solve it. In contrast, empirical 842
781 since they can distinguish between intrinsic and context specific HRF basis functions allow MDS to use a linear dynamical systems 843
782 modulatory causal interactions in latent signals. framework. The relative accuracy of these approaches is currently not 844
known. 845
One important advantage of MDS is that it does not assume that 846
783 Effects of down-sampling on MDS performance the fMRI time series is stationary unlike methods based on vector 847
autoregressive modeling such as GCA. This is important because the 848
784 In most fMRI studies, data are typically acquired at sampling rates dynamics of the latent signals can be altered significantly by 849
785 of about 2 s (or TR = 2 s). However, dynamical interactions between experimental stimuli, leading to highly non-stationary signals. In 850
786 brain regions occur at faster time scales of 10–100 ms. To examine the GCA, the time series is tested for stationarity by either examining the 851
787 effects of downsampling fMRI data on the performance of MDS, we autocorrelation of the time series or by investigating the presence of 852
788 first simulated interactions between nodes at a sampling rates of unit roots. If the time series is found to be nonstationary then one 853
789 1KHz and then re-sampled the time series to 0.5 Hz after convolving commonly used approach to remove non-stationarity is to replace the 854
790 with region-specific HRFs. MDS-VB was then applied on these data original time series with a difference of the current and previous time 855
791 sets to estimate causal interactions between nodes. We also examined points (Seth, 2010). A problem with the use of such a manipulation is 856
792 the influence of HRF delays on the estimation of causal interactions it acts as a high-pass filter that can significantly distort the estimated 857
793 under four scenarios (Figs. S1 and S2), similar to the strategy used by causal interactions (Bressler and Seth, 2010). 858
794 Deshpande and colleagues (Deshpande et al., 2009) to study the effect Two methods based on dynamical systems based approach for 859
795 of HRF variability on GCA. In the first scenario, there were no causal modeling fMRI data have been proposed recently (Ge et al., 2009; 860
796 interactions between nodes but HRFs were delayed between the Smith et al., 2009). Smith and colleagues used a switching linear 861
797 nodes. In this case, MDS-VB performed accurately and did not infer dynamical systems model wherein modulatory inputs were treated as 862
798 any false causal interactions. This shows that MDS-VB can model and random variables (Smith et al., 2009). In contrast, MDS models them as 863
799 remove the effects HRF variation while estimating causal interactions deterministic quantities which are known for a given fMRI experi- 864
800 at latent signal level. In the second scenario, we introduced causal ment. Modeling modulatory inputs as unknown random variables is 865
801 interactions between nodes, but without HRF variations. MDS reliably useful for fMRI experiments in which the occurrence of modulatory 866
802 estimated causal interactions for various delays in latent signals (Fig. inputs is unknown. However, for most fMRI studies modulatory inputs 867
803 S1A). In the third scenario, we introduced causal interactions between are known and modeling them as unknown quantities unnecessarily 868
804 nodes and also varied HRFs such that delays in latent signals and HRFs increases the number of parameters to be estimated. Also, the 869
805 were in the same direction. In this case, MDS was able to recover both switching dynamical systems model makes additional assumptions 870
806 intrinsic and modulatory causal interactions accurately (Figs. S1B, in computing the probability distributions of the state variables 871
807 S1C). In the fourth scenario, when delays in latent signal and HRF (Murphy, 1998). Further, Smith and colleagues used an MLE approach 872
808 opposed each other performance dropped significantly, just as with to estimate latent signals and model parameters. As we show in this 873
809 GCA (Deshpande et al., 2009). Further research is needed to examine study, compared to MLE, a VB approach yields more robust, 874
810 whether causal interactions under this scenario are inherently computationally efficient and accurate model estimation even when 875
811 unresolvable by MDS and other techniques such as DCM. the SNR and the number of time points are low, as is generally the case 876

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 13

877 with fMRI data. Another difference is that Smith and colleagues as well as the application of MDS real experimental fMRI data (Menon 940
878 combine intrinsic and modulatory matrices i.e., for every j-th et al., in preparation). 941
879 modulatory input, the connection matrix (Aj = A + Cj) is estimated
880 from the data. In MDS, we estimate intrinsic A and modulatory Acknowledgments 942
881 matrices Cj separately which explicitly dissociates intrinsic and
882 modulatory effects on causal interactions between brain regions. This research was supported by grants from the National Institutes 943
883 Another difference lies in testing the statistical significance of the of Health (HD047520, HD059205, HD057610) and the National 944
884 estimated causal connections. In MLE- MDS, we use a non-parametric Science Foundation (BCS-0449927). 945
885 approach and in MDS-VB, posterior probabilities of the model
886 parameters are used for testing the significance of the causal Appendix A 946
887 interactions. Finally, it should be noted that the performance of this
888 method under varying SNRs and sample sizes is not known since no In this appendix, we provide detailed equations for estimating the 947
889 simulations were performed. model parameters and latent states of MDS using an expectation 948
890 Ge et al. (2009) used a different state-space approach to estimate maximization algorithm. 949
891 causal interactions in the presence of external stimuli. They used
892 vector autoregressive modeling for the state equation to model causal Solving MDS Using Maximum Likelihood Estimation 950
893 interactions among brain regions, whereas the observation model was
894 nonlinear. They used an extended Kalman filtering approach to The state space and observation Eqs. (1)–(3) can be expressed in 951
895 estimate the state variables and model parameters. This method was the standard state form so that Kalman filtering and smoothing 952
896 applied on local field potential data, so its usefulness for fMRI data is recursive equations can be used to estimate the probability distribu- 953
897 unclear. However, there are several differences between MDS and this tion of latent signals which constitutes the E-step in our EM algorithm 954
898 approach. MDS has been developed explicitly for fMRI data to account (Penny et al., 2005). 955
899 for HRF variations in brain regions while simultaneously estimating Let 956
900 causal interactions. In the work of Ge and colleagues, both state
901 variables and unknown model parameters were treated as state xðt Þ = ½s′ðt Þs′ðt−1Þ…::s′ðt−L + 1Þ′ ðA:1Þ
902 variables and extended Kalman filtering is used to obtain maximum 957
958
903 likelihood estimates of these variables (Ge et al., 2009). In MDS, we The Eqs. (1)–(3) can be written in terms of x(t) as 959
904 have taken a different approach - state variables are separated from
905 model parameters. This allowed us to use sparsification promoting xðt Þ = G̃ðt Þxðt−1Þ + D̃uðt Þ + w̃ðt Þ ðA:2Þ
906 priors in the MDS-VB approach. Our results on simulated data suggest 960
961
907 that MDS-VB outperforms MDS-MLE especially at low SNRs and yðt Þ = BΦ̃xðt Þ + eðt Þ ðA:3Þ
908 smaller number of temporal observations. Finally, Ge and colleagues
909 used Kalman filtering to estimate state variables while in MDS we where, 962
963
910 used Kalman smoothing for estimating the latent signals (Ge et al.,
J J
911 2009). In Kalman smoothing, both past and future data is used to
Gðt Þ = A + ∑ vj ðt ÞCj ; G̃ðt Þ = Ã + ∑ vj ðt ÞC̃ j
912 estimate latent signals whereas filtering uses only past values to j=1 j=1
913 estimate the current values. In general, smoothing provides better ! ! ðA:4Þ
A 0MðL−1Þ Cj 0MðL−1Þ
914 estimates of latent signal than the filtering approach (Bishop, 2006). Ã = ; C̃ j =
915 Finally, although Ge and colleagues validated their approach on two Ψ 0M Ψ 0M
916 three-node toy models, the performance of this method is not known
917 under varying conditions such as different SNRs, network sizes and Ψ is the M(L − 2) × ML delay matrix that fills the lower rows of Ã. 964
965
918 number of data samples. Similarly, 966
 
D 0MðL−1Þ
D̃ = ðA:5Þ
0MðL−2ÞML 0M
919 Conclusions
  967
968
U ðt Þ = u′ðt Þ 01;MðL−1Þ ′ ðA:6Þ
920 The Bayesian multivariate dynamical system framework we have
921 developed here provides a robust method for estimating and   969
970
922 interpreting causal network interactions in simulated BOLD-fMRI w̃ðt Þ = w′ðt Þ 01;MðL−1Þ ′ ðA:7Þ
923 data. Extensive computer simulations demonstrate that this MDS  971
972
924 method is more accurate and robust than GCA and among the MDS w̃ðt Þ e N 0; Q̃ ðA:8Þ
925 methods developed here MDS-VB exhibits superior performance over
926 MDS-MLE. Critically, MDS estimates both intrinsic and experimentally   973
974
Q 0MðL−1Þ
927 induced modulatory interactions in the latent signals, rather than the Q̃ = ðA:9Þ
0MðL−2ÞML 0M
928 observed BOLD-fMRI signals. Unlike DCM, our proposed MDS
929 framework does not require testing multiple models and may 2 3 975
976
b1 ⋯ 0
930 therefore be more useful for analyzing networks with a large number B=4 ⋮ ⋱ ⋮ 5 ðA:10Þ
931 of nodes and connections. One limitation of this work is that our 0 ⋯ bM
932 simulations were based on data sets created using the same model as
2 3 977
978
933 the one used to estimate causal interactions. In this vein, preliminary ⋯
934 analysis using simulations with delayed latent signals at millisecond Φ̃ = 4 ⋮ ⋱ ⋮ 5 ðA:11Þ
935 temporal resolution suggests that MDS can accurately recover ⋯
936 intrinsic and modulatory causal interactions in the presence of 979
980
937 confounding delays in HRF. Future studies will examine the eðt Þ = ½e1 ðt Þ e2 ðt Þ−−−em ðt Þ′ ðA:12Þ
938 performance of MDS using more realistic simulations in which causal 981
982
939 influences are generated independently of any one particular model, eðt Þ e Nð0; RÞ ðA:13Þ

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
14 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx


984
983 2 2 2
R = diag σ 1 ; σ 2 ; …:σ M ðA:4Þ where, Jt is defined as 1033
1034

985
986  −1
t t−1
987 Let x(0) be the initial state and uncorrelated with state noise w̃ðt Þ Jt = Σt G̃′ðt Þ Σt ðA:24Þ
988 and observation noise e(t) and is normally distributed with mean μo
1035
1036
989 and covarianceΣo. The Eqs. (9) and (10) are now in standard linear
The above backward recursions are initialized noting the fact that 1037
990 state-space system. Therefore, Kalman filtering and smoothing
for t = T, xTT and ΣTT can be obtained from Eqs. (21) and (22) 1038 Q4
991 recursions can be used to carry out the E-step.
respectively. 1039

992 E-step
M-step 1040

993 In E-step, the probability of latent signal x(t), t = 1, 2 ….., T given


In the M-step, the goal is to find the unknown parameters Θ = {A, 1041
994 the observed data y(t), t = 1, 2 ….., T is computed. Since the state,
C1,.., CJ, D, Q, B, R} given the data and the current posterior distributions 1042
995 observation noise and initial state x(0) are assumed to be Gaussian,
of x(t), t = 1, 2 ….., T. The parameters Θ can be estimated by 1043
996 the latent signals x(t), t = 1, 2 ….., T are also normally distributed
maximizing the expected complete log-likelihood of the data and 1044
997 whose means and covariances can be estimated by forward (filtering)
therefore the resulting estimates are called as maximum likelihood 1045
998 and backward (smoothing) recursion steps in Kalman estimation
estimates. 1046
999 framework. In the filtering step, mean and covariance of x(t) are
The complete log-likelihood of the data is given by 1047
1000 computed given the observations y(τ), τ = 1, 2 ….., t. In the smoothing
1001 step, the above means and covariances are updated such that they L = logpðxð1Þ; xð2Þ…xðT Þ; yð1Þ; yð2Þ…yðT ÞjΘÞ
1002 represent mean and covariance at each time t given all the data y(t), T T
1003 t = 1, 2 ….., T. = logpðxð1Þ jΘÞ + ∑ logpðxðt Þjxðt−1Þ; ΘÞ + ∑ logpðyðt Þjxðt Þ; ΘÞ
t=2 t=1

1004 Kalman filtering ðA:25Þ


1005 In the filtering step, the goal is to compute the following posterior 1048
1049
1006 distribution of x(t) given the observations y(τ), τ = 1, 2 ….., t and the
1007 parameters of the model Estimation of A, Cj's and D 1050
The complete log-likelihood that depends on the parameters A, C 1051
 and D is given by 1052
t t
pðxðt Þjyð1Þ; yð2Þ…:yðt ÞÞ = N xt ; Σt ðA:15Þ

1009
1008 ! !′
 T J
1010 The mean and covariance of this distribution can be computed L A; C1; ::CJ ; D ∝−0:5 ∑ xs ðt Þ− A + ∑ vj ðt ÞCj Fxðt−1Þ−duðt Þ
1011 using the following forward recursive steps. t =2 j=1

! ! ðA:26Þ
J
t−1 t−1 −1
xt = G̃xt−1 + D̃ U ðt Þ ðA:16Þ Q xs ðt Þ− A + ∑ vj ðt ÞCj Fxðt−1Þ−duðt Þ
j=1
1013
1012
t−1 t−1
Σt = G̃ðt ÞΣt−1 G̃′ + Q̃ ðA:17Þ
1053
1054
 
1015
1014
 where, F = IM 0MðL−1Þ ; d = diag ðDÞ and xs(t)=x(1:M,t)=s(t). 1055
t−1 t−1 −1
K ðt Þ = Σt E′ EΣt E′ + R ðA:18Þ Taking expectations of Eq. (A.26) with respect to p(x(t)|y(1), 1056
y(2) …., y(T)) and then differentiating the resulting equation with A, 1057
C and d results in the following coupled linear equations: 1058
1016 where, E = BΦ̃
1017
0 1
T T
 T
t
xt = xt
t−1 t−1
+ K ðt Þ yðt Þ−Ext ðA:19Þ h iB ∑ F ðt ÞP ðt−1ÞF ðt Þ′ ∑ F ðt Þxt−1 uðt Þ C
B t =2 t =2 C
A C1 ; ::CJ d B
B T
C
C ðA:27Þ
1019
1018 @ T
T
2 A
t t−1 t−1 ∑ xt−1 uðt ÞF ðt Þ′ ∑ uðt Þ
Σt = Σt −K ðt ÞEΣt ðA:20Þ t=2 t =2

1021
1020 " #
T T
1022 The above recursion is initialized using x01 = μ o and Σ01 = Σo.
= ∑ Ps ðt; t−1ÞF ðt Þ′ ∑ ms ðt Þuðt Þ
t =2 t=2
1023 Kalman Smoothing 1059
1060
1024 In the smoothing step, the goal is to compute the posterior where, 1061
1025 distribution of x(t) given all the observations y(τ), τ = 1, 2 ….., T and
 
1026 the parameters T
P ðt Þ = Σt + xt
T T
xt ′ ðA:28Þ
 1062
1063
T T
pðxðt Þjyð1Þ; yð2Þ…:yðT ÞÞ = N xt ; Σt ðA:21Þ T
ms ðt Þ = xt ð1 : M Þ ðA:29Þ
1028
1027 h i′ 1064
1065
1029 The mean xTt and covariance ΣTt at each time t can be estimated F ðt Þ = IM v1 ðt ÞIM …_:vJ ðt ÞIM F ðA:30Þ
1030 using the following backward recursions.

 (ms(t) is first M elements of xTt ) , 1066


1067
T t T t
xt = xt + Jt xt + 1 −G̃ðt Þxt ðA:22Þ  
T T T
P ðt; t−1Þ = Jt−1 Σt + xt xt ′ ðA:31Þ
1032
1031 
T t T + 1 t−1
Σt = Σt + Jt Σ t −Σt J′t ðA:23Þ
Ps(t, t − 1) is the first M × M sub matrix of P(t, t − 1). 1068
1069

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 15

1070 Estimation of Q VB-E-step 1115


1071 Taking expectations of Eq. (A.26) with respect to p(x(t)|y(1),
1072 y(2) …., y(T)) and then differentiating the resulting equation with Q, In this step, the posterior distributions of latent variables qS ðSjYÞ 1116
1073 the estimate of Q is given by are estimated given the current posterior probability of the model 1117
parameters qΘ(Θ|Y). As in MLE approach, we compute the posteriors 1118
1 T of embedded latent signals x(t) from which the posterior of s(t) can be 1119
Q̂ = ∑ ðPs ðt Þ−Ps ðt; t−1ÞF′Gðt Þ′−ms ðt Þuðt Þ′d−Gðt ÞFPs ðt; t−1Þ obtained. The distribution over these latent variables is obtained using 1120
T−1 t = 2
a sequential algorithm similar to Kalman smoothing which we used in 1121
+ Gðt ÞFP ðt−1ÞF′Gðt Þ′ + Gðt ÞFxTt−1 u′ðt Þ d−duðt Þm′sðt Þ the E-step of MLE approach. In the VB version of Kalman smoothing, 1122
 the point estimates of the parameters are replaced by their 1123
+ duðt Þ xTt−1
′ F′Gðt Þ′ + d′uðt Þ2 d
expectations of the type E(ZWZ′) where Z is some parameter of the 1124
ðA:32Þ model and W a matrix. Although, these expectations are straightfor- 1125
1074
1075 ward to compute, but are computationally expensive for higher order 1126
1076 Where, models. We, therefore, use the approximation E(AWA′) = E(A)WE(A′), 1127
which gives qualitatively similar results and is computationally 1128
1077  firstM × M sub matrix of P(t). Note that the estimated
Ps(t) is the
Q5 1078 values of A Â ; C Ĉ and d (d̂) obtained by solving Eq. (34) are used efficient. This approach was also taken in (Cassidy and Penny, 1129
Q6 1079 in Eq. (35) in place of A, C and d. 2002). As a result, the VB-E step is same as the E-step in MLE 1130
approach. Therefore, mean and covariance of x(t) are given by xTt and 1131
1080 Estimation of B ΣTt . 1132
1081 Each row vector bm, m = 1, 2 ….., M can be estimated (we assume
1082 the noise covariance matrix R to be diagonal) by maximizing the VB-M step 1133
1083 conditional expectation of the complete log-likelihood given in
1084 Eq. (A.25). By taking the derivative of the conditional expectation In this step, the posterior distributions of model parameters qΘ(Θ| 1134
1085 and equating it to zero, the estimate of bm is given by Y) are estimated given the current posterior probability of the latent 1135
variablesqS ðSjYÞ . Using the probabilistic graphical model in Fig. 1, one 1136
!−1 can show that the joint posterior distribution of parameters qΘ(Θ|Y) 1137
T T
bˆm′ = Φ ∑ Pm ðt ÞΦ′
T further factorizes as
Φ ∑ ym ðt Þxt ðmÞ ðA:33Þ 1138
t =1 t=1

1086
1087 qΘ ðΘjYÞ = q A; C1 ; ::CJ ; D; Q qðB; RÞ ðB:1Þ
1088 Where, Pm(t) = E(sm(t)s′m(t)|y(1), y(2) …. y(T)) that can be easily
1139
1140
1089 found from P(t).
In this work, we also assume that the state and observation noise 1141
1090 ym(t) and xTt (m) are m-th elements of the vectors y(t) and xTt
covariance matrices (Q &RÞ to be diagonal. Therefore, the distribution 1142
1091 respectively.
of the elements in the rows of A; C1 ; ::CJ ; D&B can be inferred 1143
separately. Consider the state equation for the m-th node 1144
1092 Estimation of R
1093 The diagonal observation covariance matrix R can be estimated by !
J
1094 maximizing the conditional expectation of the complete log-likeli- sm ðt Þ = am + ∑ vj ðt Þcj;m sm ðt−1Þ + dm uðt Þ + wm ðt ÞÞ; ðB:2Þ
1095 hood given in Eq. (A.25). The estimation of diagonal components of R j=1
1096 are given by wm ðt Þ e Nð0; βm Þ

1 T 
where, am and cj, m are m-th rows of A and Cj respectively and 1145
∑ ðym ðt Þ−2 bˆm′ Φym ðt Þxt ðmÞ + trace bˆm′ ΦPr ðt ÞΦ bˆm ;
2 T 1146
R̂ðm; mÞ =
T t =1 βm = Q ðm;m
1
Þ : In terms of embedded signal x(t), the above equation 1147
ðA:34Þ can be written as: 1148
m = 1; 2; …M
1098
1097
sm ðt Þ = θ′m ½F ðt Þxðt Þ; uðt Þ + wm ðt Þ ðB:3Þ
1149
1150
1099 Estimation of μo and Σo
1100 The maximum likelihood estimates of initial state mean μo and Where, θ′m = [am, c1, m, …, cJ, m, dm]F(t) = [IM v1(t)IM ….. vJ(t)IM]′F. 1151

1101 covariance Σo are given by We assume the following Gaussian-Gamma conjugate priors for θm 1152
and βm 1153

μˆo = x1
T
ðA:35Þ 
−1
pðθm ; βm jαÞ e N 0; ðβm Aα Þ Gaðao ; bo Þ ðB:4Þ
1103
1102
ˆo =
Σ
T
Σ1 ðA:36Þ 1154
1155
Where, α = [α1, α2, …., α2M + 1] are the hyperpriors on each element 1156
1105
1104
of θm and Aα = diag(α). 1157
1106 The above E and M steps are repeated until the change in log-
Let the prior on α be 1158
1107 likelihood of the data between two iterations is below a specified
1108 threshold. 2M + 1
pðαÞ = ∏ Gaðco ; do Þ ðB:5Þ
1109 Appendix B i=1

1110 1159
1160
1111 Solving MDS using VB framework Therefore, by applying Eq. (9), the joint posterior for θm & βm is 1161
given by 1162
1112 In VB, the goal is to find the posterior distributions of latent
 
1113 variables qS ðSjYÞ and parameters qΘ(Θ|Y) by maximizing the lower −1
qðθm ; βm jY Þ = N θm ; βm Σm Ga am;N ; bmN ðB:6Þ
1114 bound on the log evidence L(q) given in Eq. (5). 1163
1164

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
16 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

!
1165 Where,  T 1205
1206
2 −1
bm;N = bo + 0:5 ∑ E ym ðt Þ −bm
′ Vm bm ðB:21Þ
0 1 t =2
T T
T
B ∑ F ðt ÞP ðt−1ÞF ðt Þ′ ∑ F ðt Þxt−1 uðt Þ C
−1 B t =2 t =2 C 1207
1208
∑m =B
B T
C
C ðB:7Þ 1
@ T
T
2 A cN = co + ðB:22Þ
∑ xt−1 uðt ÞF ðt Þ′ ∑ uðt Þ 2
t =2 t =2
! 1209
1210
1167
1166 0 1 1 2 am;N
T dNi = do + bm ðiÞ + Vm ði; iÞ ðB:23Þ
B ∑ F ðt ÞEðsm ðt Þxðt−1ÞÞ C 2 bm;N
B t =2 C
θm = Σm B
B
C
C ðB:8Þ
@ T A 1211
1212
∑ Ps ðt; t−1ÞF ðt Þ′ cN
t=2
Aα ði; iÞ = ðB:24Þ
dNi
1169
1168
T + 2M 1213
1214
am;N = ao + ðB:9Þ The posteriors for bm, λm and α are estimated for each m = 1, 2,..., M. 1215
2
In this work, we set the hyperparameters ao, bo, co and do to 10 − 3. 1216
1171
1170 !
T  The VB-E and VB-M steps are repeated until convergence. 1217
2 −1
bm;N = bo + 0:5 ∑ E sm ðt Þ −θm′ Σm θm ðB:10Þ
t =2
Appendix C 1218
1173
1172 1219
1174 The posterior for hyper parameters α is given by Initialization 1220

2M + 1
qðαjY Þ = ∏ GaðcN ; dNi Þ ðB:11Þ The above iterative procedure (E and M step) needs to be 1221
i=1 initialized. In this work, we estimate the initial values of latent signals 1222
1176
1175 ŝðt Þ by estimating them at each node using the Weiner deconvolu- 1223
1177 Where, tion method of (Glover, 1999) wherein the canonical HRF is used for 1224
this deconvolution step. We then estimate the initial values of A, C, d 1225
1
cN = co + ðB:12Þ and Q by solving Eq. (1) by least squares assuming that the ŝðt Þ0 s are 1226
2 true values. Similarly the parameters B and R are estimated from 1227
1179
1178 ! Eq. (3) by least squares approach. The EM algorithm is then run using 1228
1 2 am;N these initial values till the required convergence is obtained.
dNi = do + θ ðiÞ + Σm ði; iÞ ðB:13Þ 1229
2 m bm;N
1181
1180 Appendix D. Supplementary data 1230
1182 The posteriors for θm, βm and α are estimated for each m = 1, 2, …,
1183 M from which the posteriors for A; C1 ; …; CJ ; D&Q are computed. Supplementary data to this article can be found online at 1231
1184 Similarly, the posterior distribution for the model parameters in doi:10.1016/j.neuroimage.2010.09.052. 1232
1185 the output equation is computed. Since R is assumed to be diagonal,
1186 the observation equation in m-th node is given by References 1233

ym ðt Þ = bm Φxm ðt Þ + em ðt Þ; em ðt Þ e N ð0; λm Þ ðB:14Þ Abler, B., Roebroeck, A., Goebel, R., Hose, A., Schonfeldt-Lecuona, C., Hole, G., Walter, H., 1234
2006. Investigating directed influences between activated brain areas in a motor- 1235
1188
1187 response task using fMRI. Magn. Reson. Imaging 24, 181–185. 1236
1189 Where, λm = Rðm;m1 1237
Þ. Again assuming Gaussian-Gamma conjugate
Bishop, C., 2006. Pattern Recognition and Machine Learning. Springer.
Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 1994. Time Series Analysis Forecasting and 1238
1190 priors for bm and λm
Control. Pearson Education. 1239
 Bressler, S.L., Menon, V., 2010. Large-scale brain networks in cognition: emerging 1240
−1
pðbm ; λm jαÞ e N 0; ðλm Aα Þ Gaðao ; bo Þ ðB:15Þ methods and principles. Trends Cogn. Sci. 1241 Q7
Bressler, S.L., Seth, A.K., 2010. Wiener–Granger causality: a well established 1242
1192
1191 methodology. Neuroimage. 1243 Q8
1193 Where, α = [α1, α2, …., αP] are the hyper priors on each element of Cassidy, M.J., Penny, W.D., 2002. Bayesian nonstationary autoregressive models for 1244
biomedical signal analysis. IEEE Trans. Biomed. Eng. 49, 1142–1152. 1245
1194 bm and Aα = diag(α). Daunizeau, J., Friston, K.J., Kiebel, S.J., 2009. Variational Bayesian identification and 1246
1195 Let the prior on α be prediction of stochastic nonlinear dynamic causal models. Physica D 238, 1247
2089–2118. 1248
P Deshpande, G., Hu, X., Stilla, R., Sathian, K., 2008. Effective connectivity during haptic 1249
pðαÞ = ∏ Gaðao ; bo Þ ðB:16Þ perception: a study using Granger causality analysis of functional magnetic 1250
i=1 resonance imaging data. Neuroimage 40, 1807–1814. 1251
1197
1196 Deshpande, G., Sathian, K., Hu, X., 2009. Effect of hemodynamic variability on Granger 1252
causality analysis of fMRI. Neuroimage. 1253 Q9
1198 By applying Eq. (9), the joint posterior for bm and λm is given by Friston, K., 2009a. Causal modelling and brain connectivity in functional magnetic 1254
  resonance imaging. PLoS Biol. 7, e33. 1255
−1
qðbm ; λm jY Þ = N bm ; λm Vm Ga am;N ; bmN ðB:17Þ Friston, K., 2009b. Dynamic causal modeling and Granger causality comments on: The 1256
identification of interacting networks in the brain using fMRI: model selection, 1257
1200
1199 causality and deconvolution. Neuroimage. 1258 Q10
T Friston, K.J., 2009c. Modalities, modes, and models in functional neuroimaging. Science 1259
−1 1260
Vm = Φ ∑ Pm ðt ÞΦ′ + Aα ðB:18Þ 326, 399–403.
t =1 Friston, K.J., Harrison, L., Penny, W., 2003. Dynamic causal modelling. Neuroimage 19, 1261
1273–1302. 1262
1202
1201 Fuster, J.M., 2006. The cognit: a network model of cortical representation. Int. J. 1263
T
bm = Vm Φ ∑ ym ðt Þxt ðmÞ
T
ðB:19Þ Psychophysiol. 60, 125–132. 1264
t=1 Ge, T., Kendrick, K.M., Feng, J., 2009. A novel extended Granger Causal Model approach 1265
demonstrates brain hemispheric differences during face recognition learning. PLoS 1266
1204
1203 Comput. Biol. 5, e1000570. 1267
T + P−1 1268
am;N = ao + ðB:20Þ Glover, G.H., 1999. Deconvolution of impulse response in event-related BOLD fMRI.
2 Neuroimage 9, 416–429. 1269

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052
S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx 17

1270 Goebel, R., Roebroeck, A., Kim, D.S., Formisano, E., 2003. Investigating directed cortical Rabiner, L.R., 1989. A tutorial on hidden Markov models and selected applications in 1299
1271 interactions in time-resolved fMRI data using vector autoregressive modeling and speech recognition. Proc. IEEE 77, 257–285. 1300
1272 Granger causality mapping. Magn. Reson. Imaging 21, 1251–1261. Rajapakse, J.C., Zhou, J., 2007. Learning effective brain connectivity with dynamic 1301
1273 Guo, S., Wu, J., Ding, M., Feng, J., 2008. Uncovering interactions in the frequency domain. Bayesian networks. Neuroimage 37, 749–760. 1302
1274 PLoS Comput. Biol. 4, e1000087. Ramsey, J.D., Hanson, S.J., Hanson, C., Halchenko, Y.O., Poldrack, R.A., Glymour, C., 2009. 1303
1275 Havlicek, M., Jan, J., Brazdil, M., Calhoun, V.D., 2010. Dynamic Granger causality based Six problems for causal inference from fMRI. Neuroimage 49, 1545–1558. 1304
on Kalman filter for evaluation of functional network connectivity in fMRI data. Roebroeck, A., Formisano, E., Goebel, R., 2005. Mapping directed influence over the 1305
1277 Neuroimage. brain using Granger causality and fMRI. Neuroimage 25, 230–242. 1306
1278 Hemmelmann, D., Ungureanu, M., Hesse, W., Wustenberg, T., Reichenbach, J.R., Witte, Roebroeck, A., Formisano, E., Goebel, R., 2009. The identification of interacting networks 1307
1279 O.W., Witte, H., Leistritz, L., 2009. Modelling and analysis of time-variant directed in the brain using fMRI: model selection, causality and deconvolution. Neuroimage. 1308 Q12
Q111280 interrelations between brain regions based on BOLD-signals. Neuroimage. Sato, J.R., Junior, E.A., Takahashi, D.Y., de Maria Felix, M., Brammer, M.J., Morettin, P.A., 1309
1281 Hesse, W., Moller, E., Arnold, M., Schack, B., 2003. The use of time-variant EEG Granger 2006. A method to produce evolving functional connectivity maps during the 1310
1282 causality for inspecting directed interdependencies of neural assemblies. J. course of an fMRI experiment using wavelet-based time-varying Granger causality. 1311
1283 Neurosci. Methods 124, 27–44. Neuroimage 31, 187–196. 1312
1284 Koller, D., Friedman, N., 2009. Probabilistic Graphical Models Principles and Techniques. Seth, A.K., 2005. Causal connectivity of evolved neural networks during behavior. 1313
1285 The MIT Press. Network 16, 35–54. 1314
1286 Makni, S., Beckmann, C., Smith, S., Woolrich, M., 2008. Bayesian deconvolution of Seth, A.K., 2010. A MATLAB toolbox for Granger causal connectivity analysis. J. Neurosci. 1315
1287 [corrected] fMRI data using bilinear dynamical systems. Neuroimage 42, Methods 186, 262–273. 1316
1288 1381–1396. Smith, J.F., Pillai, A., Chen, K., Horwitz, B., 2009. Identification and validation of effective 1317
1289 Mechelli, A., Price, C.J., Noppeney, U., Friston, K.J., 2003. A dynamic causal modeling study on connectivity networks in functional magnetic resonance imaging using switching 1318
1290 category effects: bottom-up or top-down mediation? J. Cogn. Neurosci. 15, 925–934. linear dynamic systems. Neuroimage. 1319 Q13
1291 Murphy, K.P., 1998. Switching Kalman Filters. Technical report, DEC/Compaq Cam- Sridharan, D., Levitin, D.J., Menon, V., 2008. A critical role for the right fronto-insular 1320
1292 bridge Research Labs. cortex in switching between central-executive and default-mode networks. Proc. 1321
1293 Passingham, R.E., Stephan, K.E., Kotter, R., 2002. The anatomical basis of functional Natl. Acad. Sci. U. S. A. 105, 12569–12574. 1322
1294 localization in the cortex. Nat. Rev. Neurosci. 3, 606–616. Tipping, M., 2001. Sparse Bayesian learning and relevant vector machine. J. Mach. Learn. 1323
1295 Penny, W., Ghahramani, Z., Friston, K., 2005. Bilinear dynamical systems. Philos. Trans. Res. 1, 211–244. 1324
1296 R. Soc. Lond. B Biol. Sci. 360, 983–993. Valdes-Sosa, P.A., Sanchez-Bornot, J.M., Lage-Castellanos, A., Vega-Hernandez, M., 1325
1297 Prichard, D., Theiler, J., 1994. Generating surrogate data for time series with several Bosch-Bayard, J., Melie-Garcia, L., Canales-Rodriguez, E., 2005. Estimating brain 1326
1298 simultaneously measured variables. Phys. Rev. Lett. 73, 951–954. functional connectivity with sparse multivariate autoregression. Philos. Trans. R. 1327
Soc. Lond. B Biol. Sci. 360, 969–981. 1328
1329

Please cite this article as: Ryali, S., et al., Multivariate dynamical systems models for estimating causal interactions in fMRI, NeuroImage
(2010), doi:10.1016/j.neuroimage.2010.09.052

You might also like