The Effect of Machine Learning Regression Algorithms and Sample Size On Individualized Behavioral Prediction With Functional Connectivity Features
The Effect of Machine Learning Regression Algorithms and Sample Size On Individualized Behavioral Prediction With Functional Connectivity Features
NeuroImage
journal homepage: www.elsevier.com/locate/neuroimage
A R T I C L E I N F O A B S T R A C T
Keywords: Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming
Individualized prediction increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially
Machine learning influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on indi-
Regression algorithm vidualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this
Sample size issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS)
Functional magnetic resonance imaging (MRI)
regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net
Resting-state functional connectivity
regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific
behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state
functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-
state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five
sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses
showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-
based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and
feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size.
The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction
using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/
cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide
critical insight into how the selected ML regression algorithm and sample size influence individualized predictions
of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in
relevant investigations.
* Corresponding author. State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, 100875, China.
E-mail address: [email protected] (G. Gong).
https://doi.org/10.1016/j.neuroimage.2018.06.001
Received 28 February 2018; Received in revised form 31 May 2018; Accepted 1 June 2018
Available online 2 June 2018
1053-8119/© 2018 Elsevier Inc. All rights reserved.
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
2005; Friston, 1994). Particularly, significant associations between this rsFC strength (rsFCS) as features, the 6 regression algorithms were
measure and behavior/cognition have been repeatedly observed (Dubois implemented to make specific behavioral predictions. A range of sample
and Adolphs, 2016; Liu et al., 2017; Smith et al., 2015). Importantly, rsFC sizes from 20 to 700 were utilized, and the included subjects for each
has been demonstrated as an effective feature for predicting the char- given sample size were randomly selected from the full HCP sample set.
acteristics of individuals, such as biological age (Dosenbach et al., 2010), Using the HCP test-retest rs-fMRI dataset, we thoroughly assessed the
visual/verbal memory ability (Siegel et al., 2016), attention ability test-retest reproducibility of how the algorithm and sample size influence
(Rosenberg et al., 2015), and intelligence quotient (Finn et al., 2015), in behavioral predictions.
ML regression algorithms.
To date, neuroimaging studies have employed a series of different Materials and methods
regression algorithms. The most frequently used algorithms include or-
dinary least squares (OLS) regression (Rosenberg et al., 2015; Shen et al., Participants
2017), least absolute shrinkage and selection operator (LASSO) regres-
sion (Wager et al., 2013), ridge regression (Siegel et al., 2016), elastic-net The publicly available dataset from the HCP S900 release was used in
regression (Cui et al., 2018), linear support vector regression (LSVR) the present study (Van Essen et al., 2012, 2013). Please refer to the study
(Ullman et al., 2014), and relevance vector regression (RVR) (Gong et al., by Van Essen et al. (2013) for subject inclusion/exclusion criteria. Two
2014). These algorithms differ in how parameters are optimized, such as rs-fMRI sessions were acquired over two days for each subject, denoted
the form of the loss function and regularization techniques (Hastie et al., REST1 and REST2. As in previous studies (Liao et al., 2017; Zalesky et al.,
2001; Sch€ olkopf and Smola, 2002). Unsurprisingly, the prediction per- 2014), fMRI acquisition with left-to-right phase-encoding was used. In
formances of these algorithms also differ in practice depending on how the originally released dataset, REST1 and REST2 data were available for
well the implicit assumption for each algorithm holds true in the data. a total of 873 and 838 subjects, respectively.
This raises an open question about which algorithms are favorable for an Two subjects were excluded because they had a large posterior cranial
rsFC-feature based prediction. To answer this, a systematic comparison fossa arachnoid cyst, and, 3 and 7 subjects were excluded from the REST1
between these algorithms is necessary, which can provide instructive and REST2 sessions, respectively, due to incomplete data acquisition (less
information for the choice of regression algorithms for specific behav- than 1200 time points). In addition, 74 and 51 subjects were excluded
ioral/cognitive prediction. from the REST1 and REST2 session, respectively, due to severe head
Along this line, there have been a few attempts to look at differences motion (displacement > 3 mm, rotation > 3 ). Finally, a total of 794
in prediction performance across ML regression algorithms. Two separate subjects (345 males; 22–35 years; Table 1) for the REST1 session and 778
studies reported that RVR consistently outperforms SVR when predicting subjects (343 males; 22–35 years) for the REST2 session were used in our
individual age (Franke et al., 2010) or clinical scores (Wang et al., 2010) predictive analyses, and their HCP IDs are provided in Supplementary
with brain tissue volume as features. Another study by Chu and col- Tables 2 and 3.
leagues found that ridge regression using functional activation features
performed slightly better than RVR on average for predicting various
Behavioral/cognitive scores
types of feature ratings during virtual reality task (Chu et al., 2011). To
date, however, a comprehensive assessment of performance differences
The HCP dataset includes a battery of behavioral/cognitive tests
between all these frequently used regression algorithms remains scarce,
(Barch et al., 2013). In the present study, the scores of 4 behavior-
particularly a comparison of their ability to predict individual behav-
al/cognitive tests were used as prediction factors for individuals. The
ioral/cognitive abilities using rsFC-related features.
tests included one motor-related test (Grip Strength Dynamometry Test
In addition to the choice of ML regression algorithm, another open
[GSDT]), two language-related tests (Oral Reading Recognition Test
question is about what sample size is large enough for a robust behav-
[ORRT] and Picture Vocabulary Test [PVT]), and one spatial
ioral/cognitive prediction. In the context of MRI-based subject classifi-
orientation-related test (Variable Short Penn Line Orientation Test
cation/discrimination, a few studies found increasing classification
[VSPLOT]). The details of each test were described in Supplementary
accuracies with sample size increasing (Chu et al., 2012; Kl€ oppel et al.,
Table 4. All the tests were applied using the NIH Cognition Battery
2008). A recent thorough review indicated that studies with small sample
toolbox, and the raw scores for each test were further transformed into
sizes tend to report a relatively high discriminative accuracy when
age-adjusted scores with a mean of 100 and a standard deviation (SD) of
classifying brain disease patients from controls, which is likely due to the
15 using the NIH National Norms toolbox. Please see Slotkin et al. (2012)
overfitting issue related to the small sample size (Arbabshirani et al.,
for testing and scoring details. Within the subjects whose imaging data
2017). However, these relevant assessments of sample size effect are
qualified for inclusion, one subject lacked the GSD score, and 5 subjects
confined to classification algorithms, and it remains unexplored whether
lacked the VSPLOT score.
and how prediction accuracies of ML regression algorithms are affected
In the present study, we chose the GSDT score as the main prediction
by sample size.
score, as it provides the highest overall prediction accuracy relative to the
The present study aims to comprehensively compare rsFC feature-
other three behavioral/cognitive scores. The predictions of the other
based prediction among 6 ML regression algorithms (i.e., OLS regres-
three scores were included in the validation results to evaluate the
sion, LASSO regression, ridge regression, elastic-net regression, LSVR,
generalizability of our observed algorithm and sample size effects.
and RVR) and further evaluate the effect of sample size on prediction
accuracies, which should be able to inform future investigations applying
rsFC feature to specific behavioral/cognitive predictions at the individual Table 1
The characteristics of HCP S900 sample subjects in our study.
level. We confined to these six regression algorithms because they are the
most commonly used ones to date in neuroimaging field (See Supple- Characteristic S900
mentary Table 1 for a summary). To evaluate the influence of feature REST1 (N ¼ 794) REST2 (N ¼ 778)
dimension, we also included rsFC strength (rsFCS) as an rsFC-extracted
Age (y, Mean (SD)) 28.79 (3.67) 28.76 (3.69)
lower dimensional feature, which is simply defined as the sum of all Gender (Male, %) 43.45 44.09
linked rsFC values for each brain region and putatively can capture the Race (White, %)a 75.57 74.42
global communication ability of brain regions (Beucke et al., 2013; Ethnicity (Not Hispanic/Latino %)b 90.81 91.00
Buckner et al., 2009; Zuo et al., 2012). Specifically, a resting-state a
Race was coded as whilte, Black or African American, American Indian/
functional MRI (rs-fMRI) dataset from the Human Connectome Project Alaskan Native, Asian/Native Hawaiian/Other Pacific Islander, More than one.
b
(HCP) was used to compare the algorithms. Using whole-brain rsFC and Ethnicity was coded as Hispanic/Latino, Not Hispanic/Latino.
623
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Pp
MRI acquisition and preprocessing share the common goal of finding a function f ðxi Þ ¼ j¼1 βj xi;j þ β0 that
best predicts the actual behavioral score yi, but they differ in how they fit
In the HCP, high-resolution (2-mm isotropic voxels) fMRI images the regression coefficients using the training data.
under resting state were acquired using a customized Siemens Skyra 3-T
scanner with a 32-channel head coil. The functional images were first
OLS regression
preprocessed by the fMRIVolume pipeline, which included gradient
distortion correction, motion correction, echo-planar imaging (EPI)
The OLS regression algorithm fits a linear model by minimizing the
distortion correction, registration to the Montreal Neurological Institute
residual sum of squares between the observed yi in the training dataset
(MNI) space, intensity normalization to a global mean, and masking out
and the values f(xi) predicted by the linear model. The objective function
non-brain voxels. For details on data acquisition and preprocessing, see
takes the form as below:
the study by Glasser and colleagues (Glasser et al., 2013).
X
N
After the above preprocessing, the DPARSFA (part of DPABI) (Yan where yi is the actual value of the behavioral score. The Moore-Penrose
and Zang, 2010; Yan et al., 2016) was applied to remove the linear trend pseudo-inverse approach was used to solve the minimization problem of
and several nuisance signals, including Friston's 24 head motion pa- this objective function, and the singular value decomposition (SVD) was
rameters (Friston et al., 1996), the global signal, and the average white used to find the pseudo-inverse (Casanova et al., 2012; MacAusland,
matter (WM) and cerebrospinal fluid (CSF) signal. Finally, temporal 2014; Mosic and Djordjevic, 2009). If X is full column rank, the β has the
bandpass filtering (0.01–0.1 Hz) was performed voxel-by-voxel. general analytical solution as below:
The human brainnetome atlas (http://atlas.brainnetome.org/) was
1
applied to parcellate the entire gray matter into 246 regions (123 in each b
β ¼ X þ y ¼ ðX T XÞ X T y
hemisphere) consisting of 210 cortical and 36 subcortical regions (Fan
et al., 2016). This atlas is connectivity-based and, therefore, is recom- where X þ ¼ ðX T XÞ1 X T indicates Moore-Penrose pseudo-inverse, and X
mended for regional functional connectivity and brain network analyses is a N *p matrix in which each row is a feature vector of one subject.
(Dresler et al., 2017). For each subject, a regional mean time series was However, OLS regression tends to over-fit when the data is noisy, that
calculated by averaging the time series over all voxels within the region, is, the acquired model performs well when predicting the training sam-
and a total of 246 regional mean time series were therefore yielded. The ples but fails when predicting a new/unseen sample. In contrast, ridge
rsFC between each pair of regions (30,135 pairs in total) was computed regression, LASSO regression, elastic-net regression, LSVR, and RVR
by using the Pearson correlation to yield a whole-brain rsFC feature apply various regularization techniques to maximize the generalizability
vector of 30,135 features for each subject (Fig. 1). For each region, the of predicting unseen samples in noisy data (Smola and Scholkopf, 2004;
rsFCS was calculated, which corresponds to the centrality measure in Tipping, 2001; Zou and Hastie, 2005).
graph theory and is simply defined as the sum of the rsFC values between
that region and all the other regions (245 in total) (Buckner et al., 2009; Ridge regression
Liu et al., 2017). A whole-brain rsFCS feature vector was then extracted
for each subject, which can be taken as a lower dimensional feature of Ridge regression develops a model that minimizes the sum of the
rsFC. The whole-brain rsFC and rsFCS feature vectors were indepen- squared prediction error in the training data and an L2-norm regulari-
dently used in the prediction analysis (Fig. 1). The motivation of zation, i.e., the sum of the squares of regression coefficients (Hoerl and
including both rsFC and rsFCS features in the present study is to see Kennard, 1970). The object function is as below:
whether and how feature dimensionality influences our results.
X
N X
p
2
min ðf ðxi Þ yi Þ2 þ λ βj
ML regression algorithms β
i¼1 j¼1
The present study included six commonly used ML linear regression This technique can shrink the regression coefficients, resulting in
algorithms in neuroimaging field: OLS regression, LASSO regression, better generalizability for predicting unseen samples. In this algorithm, a
ridge regression, elastic-net regression, LSVR, and RVR. We confined to regularization parameter λ is used to control the trade-off between the
linear model algorithms, due to their interpretability and resilience to prediction error of the training data and L2-norm regularization, i.e., a
overfitting in high-dimensional dataset (Kragel et al., 2012). Theoreti- trade-off of penalties between the bias and variance. A large λ corre-
cally, SVR/RVR can be a non-linear regression model by applying a sponds to more penalties on variance, and a small λ corresponds to more
non-linear kernel and mapping the inputs into high-dimensional feature penalties on bias (Zou and Hastie, 2005). Compared with the OLS, ridge
spaces (Tipping, 2001; Vapnik, 2000). Here, we selected a linear kernel regression can better deal with the problem of multicollinearity (Vinod,
for both SVR and RVR, therefore leading to a nature of linear model for 1978) and avoid overfitting through this bias-variance trade-off.
the SVR and RVR in our present study.
These linear regression models can be formulized as follows: LASSO regression
X
p
LASSO regression applies L1-norm regularization to the OLS loss
b ¼
Y b
β j Xj þ b
β0
j¼1 function, aiming to minimize the sum of the absolute value of the
regression coefficients (Tibshirani, 1996). The objective function takes
b ¼ ðb
where Y y 1 ; …; b
y n ÞT , and b
y i ði ¼ 1; …; nÞ is the predictive value of the the form as below:
behavioral score for the ith subject, Xj ¼ ðx1;j ; …; xn;j ÞT and xi;j is the X
N X
p
min ðf ðxi Þ yi Þ2 þ λ βj
value of the jth feature for the ith subject, and b
β j is the regression coef- β
i¼1 j¼1
ficient of the jth feature. Suppose we are given training data ((x1, y1), …,
(xN, yN)), where N is the number of training samples, xi is a high- This L1-norm regularization typically sets most coefficients to zero
dimensional feature vector (xi,1, …, xi,p), p is the number of features, and retains one random feature among the correlated ones (Zou and
and yi is the actual behavioral score. The six linear regression algorithms Hastie, 2005). Thus, LASSO regression results in a very sparse predictive
624
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 1. Schematic overview of the analysis framework. The human brainnetome atlas (http://atlas.brainnetome.org/) was applied to parcellate the entire gray matter
into 246 regions. Using a resting-state fMRI dataset, a 246 246 symmetric rsFC matrix was first obtained, and all lower triangle elements of the matrix (30,315 in
total) were extracted as the whole-brain rsFC feature vector for each subject. For each region, the rsFC strength (rsFCS) was calculated as the sum of the rsFC values of
that region with all other regions. These rsFCS values (246 in total) were then combined as a whole-brain rsFCS feature vector for each subject. Both whole-brain rsFC
and rsFCS features were applied to separately predict individual behavioral/cognitive scores by six regression algorithms.
model, which facilitates optimization of the predictors and reduces the loss function (Zou and Hastie, 2005). The objective function takes the
model complexity. Notably, LASSO can only select a maximum of N-1 form as below:
features in the final model, where N is the sample size (Efron et al., 2004;
X Xp
Ryali et al., 2012). This can be problematic for a regression with few
N 1 2
min ðf ðxi Þ yi Þ2 þ λ αβj þ ð1 αÞβj
samples but large number of features. Likewise, an algorithm parameter λ β
i¼1 j¼1
2
is used to control the trade-off between the prediction error on the
training data and L1-norm regularization, i.e., the trade-off of penalties Therefore, elastic-net regression is essentially a combination of
between the bias and variance. LASSO regression and ridge regression, which allows the number of the
selected features to be larger than the sample size while achieving a
sparse model (Carroll et al., 2009; Zou and Hastie, 2005). Again, a reg-
Elastic-net regression
ularization parameter λ is used to control the trade-off between the
prediction error on the training data and regularization, i.e., the trade-off
Elastic-net regression aims to overcome the limitations of LASSO
of penalties between the bias and variance. In addition, a mixing
method by combining L1-norm and L2-norm regularizations in the OLS
625
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
parameter α is used to control the relative weighting of the L1-norm and Where ∅s ðxÞ ¼ x*xs . Similarly, the regression coefficients of all features
L2-norm contributions. are determined as the weighted sum of the feature vector of all ‘relevance
vector’ samples. Notably, this algorithm has no algorithm-specific
LSVR parameter and, therefore, does not require extra computational re-
sources to estimate the optimal algorithm-specific parameters.
In contrast to the squared loss function in the above methods, LSVR The scikit-learn library (version: 0.16.1) was used to implement OLS
applies a Vapnik's ε-sensitive loss function to fit the linear model (Smola regression, LASSO regression, ridge regression and elastic-net regression
and Scholkopf, 2004; Vapnik, 2000). Specifically, it aims to find a (http://scikit-learn.org/) (Pedregosa et al., 2011), the LIBSVM function
function f(xi) whose predictive value deviates by no more than ε from the in MATLAB was used to implement LSVR (https://www.csie.ntu.edu.tw/
actual yi for all the training data while maximizing the flatness of the ~cjlin/libsvm/) (Chang and Lin, 2011), and the PRoNTo toolbox (http://
function. Flatness maximization is implemented via a L2-norm regulari- www.mlnl.cs.ucl.ac.uk/pronto/) was used to implement RVR (Schrouff
zation by minimizing the squared sum of the regression coefficients. The et al., 2013).
objective function takes the form as below:
Individualized prediction framework
1 X 2 X
p l
min βj þ C ξi þ ξ*i
β 2 j¼1 i¼1 A schematic overview of our prediction framework is shown in Fig. 1
and Supplementary Fig. 1. The 6 regression algorithms were applied
8 separately for the whole-brain rsFC and rsFCS features. To quantify the
< yi f ðxi Þ ε þ ξi
subject to f ðxi Þ yi ε þ ξ*i prediction accuracy, we applied 5-fold cross-validation (5F-CV) in all
: algorithms. For LASSO regression, ridge regression, elastic-net regres-
ξi; ξ*i 0
sion, and LSVR, a nested 5F-CV was applied, with the outer 5F-CV loop
where l is the quantity of ‘support vectors’, which are the samples that estimating the generalizability of the model and the inner 5F-CV loop
deviate by more than ε from the actual yi used to fit the model. A weight determining the optimal parameters (e.g., λ, α, or C) for these algorithms.
(i.e., αs) is generated for each of these ‘support vectors’ using the algo- This nested 5F-CV procedure is elaborated below.
rithm, and the regression coefficients of all features are calculated as the
weighted sum of the feature vector of these samples. Specifically, Outer 5F-CV
! In the outer 5F-CV, all subjects were divided into 5 subsets. Here, we
X
p X
p X
l X
l
sorted the subjects according to their behavioral scores and then assigned
f ðxi Þ ¼ βj xi;j þ β0 ¼ αs xs;j xi;j þ β0 ¼ αs ðxi *xs Þ þ β0 individuals with a rank of (1st, 6th, 11th, …) to the first subset, (2nd, 7th,
j¼1 j¼1 s¼1 s¼1
12th, …) to the second subset, (3rd, 8th, 13th, …) to the third subset,
where xs *xi is called the linear kernel. A parameter C controls the trade- (4th, 9th, 14th, …) to the forth subset, and (5th, 10th, 15th, …) to the
off between the how strongly the samples that deviate by more than ε are fifth subset. This splitting approach prevented random bias between
tolerated and the flatness of the regression line, i.e., the trade-off of subsets, and more importantly avoided the overwhelmingly intensive
penalties between the bias and variance. A large C corresponds to more computation due to the multiple repetitions in a random splitting scheme
penalties on bias, and a small C corresponds to more penalties on (Cui et al., 2018). Of the five subsets, four were combined as the training
variance. set, and the remaining subset was used as the testing set. To avoid fea-
tures in greater numeric ranges dominating those in smaller numeric
Relevance vector regression (RVR) ranges, each feature was linearly scaled to the range of 0–1 across the
training dataset, and the scaling parameters were also applied to scale the
RVR is formulated in a Bayesian framework and has an identical testing dataset (Cui et al., 2018; Erus et al., 2015; Hsu et al., 2003). A
functional form to SVR (Tipping, 2001). The function also takes the form prediction model was constructed using all the training samples and then
as below: used to predict the scores of the testing samples. The Pearson correlation
coefficients and mean absolute error (MAE) between the actual scores
X
l and the predicted scores were computed to quantify the accuracy of the
f ðxi Þ ¼ βs ðxi *xs Þ þ β0 prediction (Cui et al., 2018; Erus et al., 2015; Siegel et al., 2016). The
s¼1
training and testing procedures were repeated 5 times so that each of the
Like LSVR, only some samples (l < N), termed the ‘relevance vector’, 5 subsets was used once as the testing set. To yield the final accuracies,
are used to fit the model in RVR. Here, the predicted target value we averaged the correlations and MAE across the five iterations,
t ¼ fti gN1 : respectively.
626
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
subsets were selected to train the model under a given parameter C for 722 subjects. Each of the subjects can have two predicted scores, one
(LSVR) or λ (LASSO and ridge regression), or a given parameter set (λ, α) from the REST1 data and the other from the REST2 data. To cross-
(elastic-net regression), and the remaining subset was used to test the validate the test-retest predictions between the REST1 and REST2 data-
model. This procedure was repeated 5 times such that each subset was sets, we correlated the two predicted scores across all 722 subjects using
used once as the testing dataset, resulting in 5 inner 5F-CV loops in total. Pearson correlation. In addition, the intra-class correlation coefficient
For each inner 5F-CV loop, one correlation r and one mean absolute error (ICC) of the test-retest predicting scores was calculated for all algorithms
(MAE) were generated for each parameter or parameter set, and a mean (Braun et al., 2012; Shrout and Fleiss, 1979; Zhao et al., 2015).
value across the 5 inner loops was then obtained for the MAE and cor-
relation r, respectively. The sum of the mean correlation r and reciprocal Algorithm similarity in individualized predictions
of the mean MAE was defined as the inner prediction accuracy, and the
parameter or parameter set with the highest inner prediction accuracy To explore the individual prediction similarity among the six algo-
was chosen as the optimal parameter or parameter set (Cui et al., 2018). rithms, we correlated the predictive scores across all individuals between
Of note, the mean correlation r and the reciprocal of the mean MAE each pair of algorithms, resulting in an algorithm-by-algorithm similarity
cannot be summed directly, because the scales of the raw values of these matrix. This 6 * 6 similarity matrix was converted into a distance matrix
two measures are quite different. Therefore, we normalized the mean (i.e., 1-similarity matrix), and a hierarchical clustering method (i.e., the
correlation r and the reciprocal of the mean MAE across all values (i.e., 16 average linkage agglomerative algorithm) was then applied (Legendre
values in LASSO regression, ridge regression, and LSVR, and 176 values and Legendre, 2012; Zhong et al., 2015).
in elastic-net regression) and then summed the resultant normalized
values. Algorithm similarity in the spatial pattern of feature importance
Accordingly, each loop of the outer 5F-CV yielded a specific optimal
parameter or parameter set. The optimal parameter λ and C or parameter In a linear prediction model, the absolute value of the weight/
set (λ, α) was then used to estimate the final predictive model with the regression coefficient represents the importance of corresponding feature
training set of the outer 5F-CV loop. Notably, OLS regression and RVR do in a prediction (Erus et al., 2015; Mourao-Miranda et al., 2005; Siegel
not have algorithm-specific parameters, and therefore, the above inner et al., 2016). For each algorithm, we trained a prediction model using all
5F-CVs were not applied in these two algorithms. the samples (i.e., 794 subjects for REST1 and 778 for REST2). The
resultant absolute weights were then correlated across all features be-
Sample size and sub-sampling tween each pair of algorithms. As above, the average linkage agglom-
erative algorithm was then applied to the distance matrix. It should be
To explore the effect of sample size on prediction accuracies, we noted that some weighted combination of all features is the driving
sampled subsets with different sample sizes from the full cohort. Puta- source for regression predictions and complex interactions exist among
tively, ML prediction performance should be less sensitive to sample size these features. These interactions make it very difficult to mathematically
differences as the sample size increases. To reduce the computational quantify such weighted patterns/combinations of all features. Our
burden, subset sample sizes were therefore chosen in increments of 20 adopted comparative strategy here (i.e., the linear correlation across
from 20 to 300 and then in increments of 40 from 300 to 700. This feature weights vectors between algorithms) is a sub-optimal one by
procedure resulted in 25 sample sizes. For each sample size, we carried simply treating each feature in isolation.
out random sampling 50 times, with each time sampling without
replacement. The mean and SD of the resulting 50 prediction accuracies Computational cost
(i.e., correlation r) were then yielded.
It is important to ascertain an approximate pattern/function how the Given the huge number of features and samples, computational cost is
mean or SD of prediction accuracies is influenced by the sample size, i.e., becoming an increasingly important factor of concern when selecting the
which model/function can fit our observed data well. Here, we applied appropriate ML regression algorithm. To quantify the computational
two candidate model/function forms to fit the mean or SD of the pre- cost, we recorded the running time of the six algorithms by running each
diction accuracies: a linear function, f(x) ¼ a*x þ b, and an exponential algorithm with the above-specified procedure on a single core of the
function, f(x) ¼ a * exp (x/b) þ c. These two candidate models/functions same server. To evaluate the effect of the parameter optimizing proced-
are widely used, and the visual inspection highly suggested an expo- ure, we tested the running time of the algorithms with parameters under
nential model. To evaluate the goodness-of-fit, r2 values were used (a two conditions: 1) when the parameter or parameter set was pre-
value closer to 1 indicates a better fit). determined; and 2) when the optimal parameter or parameter set was
determined using the inner 5F-CV.
Generalizability of the algorithm and sample size effect The Python/Matlab functions/scripts for the six regression algorithms
have been made available online: https://github.com/ZaixuCui/Pattern_
Notably, the rsFC and rsFCS features in our behavioral/cognitive Regression.
predictions were extracted after applying global signal regression (GSR),
which is a controversial processing step in resting-state functional con- Results
nectivity analyses (Fox et al., 2009; Murphy et al., 2009; Murphy and
Fox, 2017). To explore how GSR influenced our main results, we Algorithm effect
computed the rsFC and rsFCS features without applying GSR and reran
the same procedure to predict individual GSDT scores. In Fig. 2 and supplementary Fig. 2, the correlation r and MAE of each
To evaluate whether the patterns of how the ML regression algorithm algorithm were plotted as a function of sample size. Here, the correlation
and sample size influence the GSDT prediction can be generalized to r was taken as the main metric of prediction accuracy. Using the whole-
other behavioral/cognitive predictions, we applied the same procedure brain rsFC features (30,135 in total) of the REST1 dataset, LASSO
to predict ORRT, PVT, and VSPLOT scores using either whole-brain rsFC regression yielded markedly lower GSDT prediction accuracies for sam-
or rsFCS features (with applying GSR). ple sizes greater than 60 (Fig. 2A). The other five regression algorithms
performed similarly, exhibiting very large ranges of overlap in GSDT
Test-retest validation of individualized predictions prediction accuracy regardless of the sample size. In contrast, using the
whole-brain rsFCS features (246 in total), OLS regression performed
In our dataset, both quantified REST1 and REST2 data were available much worse than the other algorithms when the sample size was greater
627
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 2. The mean and SD of GSDT prediction accuracies (correlation r) of the 50 sample subsets. The six regression algorithms are marked with different colors, and 25
sample sizes were selected from 20 to 700. (A) rsFC-based prediction using the REST1 dataset; (B) rsFC-based prediction using the REST2 dataset; (C) rsFCS-based
prediction using the REST1 dataset; (D) rsFCS-based prediction using the REST2 dataset. See Supplementary Fig. 2 for the mean absolute error (MAE) changes.
than approximately 160 (Fig. 2C). Among the other five algorithms, relationship between the prediction accuracy and sample size was
LASSO regression and LSVR corresponded to slightly lower prediction captured well. As for the OLS regression, there is a replicable dip of ac-
accuracies than ridge regression, RVR, and elastic-net regression as the curacy for the rsFCS-based but not rsFC-based predictions, with the peak
sample size increased. It should be noted that the whole-brain rsFC fea- around 300 sample size, which likely represents an overfitting of the
tures yielded relatively higher prediction accuracies overall than the rsFCS-based OLS regression to noise data.
whole-brain rsFCS features. The mean prediction accuracy (mean r of the
50 sampling subsets) reached a maximum greater than 0.5 for the rsFC
feature but a maximum lower than 0.3 for the rsFCS feature. Generalizability of the algorithm and sample size effects
As shown, these observed differences in patterns among the six al-
gorithms and between feature types were highly repeatable for the GSDT As illustrated in Supplementary Fig. 4, the GSDT prediction results
predictions using the REST2 dataset (Fig. 2B and D, Supplementary remain quite similar when we applied random divisions for the five folds
Fig. 2B and 2D). In addition, to evaluate the reproducibility of these re- during the cross-validation. In addition, the individual GSDT scores were
sults in independent dataset, we re-ran the analyses above using the rs- re-predicted using rsFC or rsFCS as the feature derived without applying
fMRI data of HCP new subjects, which were included in the latest HCP the GSR preprocessing step. The resultant patterns of the algorithm and
S1200 dataset but not in the S900 dataset (236 individuals in total, see sample size effects were well preserved (Fig. 4 and Supplementary
more details in Supplementary Tables 5-7). As shown in the Supple- Fig. 5). This indicated a limited effect of the GSR step on our results,
mentary Fig. 3, the resultant patterns across the same range of sample supporting the generalization of our observed algorithm and sample size
size are highly compatible with the main results of HCP S900: the LASSO effects.
performed worse than other algorithms for the rsFC feature, and the OLS In addition to GSDT prediction, ORRT, PVT, and VSPLOT scores were
performed worse than the other algorithms for the rsFCS feature. also predicted using either the whole-brain rsFC or rsFCS features
(REST1). As illustrated in Fig. 5 and Supplementary Fig. 6, for these three
Sample size effect behavioral/cognitive scores, the pattern of performance difference
among the algorithms was quite similar to the GSDT prediction: LASSO
As clearly shown (Fig. 2 and Supplementary Fig. 3), the mean GSDT regression corresponded to significantly lower accuracies for the rsFC
prediction accuracies of the 50 sampling subsets increased with the feature, and OLS regression preformed significantly worse than the other
sample size, regardless of the algorithm. Conversely, the SD decreased as algorithms for the rsFCS features. Similarly, the mean and SD prediction
the sample size increased. Except for OLS regression, the exponential accuracies of the 50 sampling subsets increased and decreased, respec-
form (i.e., f(x) ¼ a * exp (x/b) þ c) better fitted the mean and SD of tively, as the sample size increased, regardless of the algorithm. These
prediction accuracies using either the REST1 or REST2 dataset, regard- results indicated that the above observed pattern of how the algorithm
less of the algorithm and feature type (Fig. 3). In particular, the expo- and sample size influence predictions was not confined to the GSDT
nential model explained more than 87% of the variance of the mean and prediction but can be generalized to predictions of other behavioral/
SD (r2 > 0.87), indicating excellent fitting performance and that the cognitive scores.
628
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 3. Model fitting for the mean and SD of GSDT prediction accuracies (correlation r) of the 50 sample subsets. (A) Scatter plots for the REST1 dataset; (B) Scatter
plots for the REST2 dataset. Two candidate functions were fitted: a linear function, f(x) ¼ a*x þ b, and an exponential function, f(x) ¼ a * exp (x/b) þ c. The r2 was
calculated as the goodness-of-fit. Except for OLS regression using the rsFCS feature, the exponential function fitted the mean and SD significantly better.
Test-retest validation of individualized prediction robustness of the individual prediction. Given the better prediction per-
formance of the rsFC feature, the correlation of test-retest predicted
To cross-validate individual predictions between the REST1 and scores based on the rsFC feature is expected to be higher than that based
REST2 datasets, 722 subjects with available quantified REST1 and REST2 on the rsFCS features.
data were applied. Fig. 6A and B shows scatter plots between the actual
GTSD scores and the GTSD scores predicted by each algorithm. Regard-
Algorithm similarity in individualized predictions
less of the algorithm, the whole-brain rsFC feature-based prediction
outperformed the rsFCS feature-based prediction using either the REST1
The hierarchical clustering of the six algorithms in terms of individual
or REST2 dataset. Furthermore, the predicted GTSD scores using the
prediction similarity yielded the same clusters for both the REST1 and
REST1 dataset were significantly correlated with the predicted scores
REST2 datasets (Fig. 6D and E). Specifically, for the rsFC-based predic-
using the REST2 dataset (Fig. 6C). The ICC values for the test-retest
tion, LSVR and RVR together formed one cluster (i.e., showing high
predicting scores were 0.65, 0.66, 0.59, 0.67, 0.67, and 0.67 for the
similarity among within-cluster algorithms but relatively low similarity
rsFC-based OLS regression, ridge regression, LASSO regression, Elastic-
among algorithms outside their cluster) and elastic-net regression, ridge
net regression, LSVR, and RVR, respectively. For the rsFCS-based pre-
regression, and OLS regression formed another cluster. In contrast,
diction, the ICC values were 0.27, 0.38, 0.32, 0.36, 0.40, and 0.35,
LASSO regression showed a very low degree of similarity with all other
respectively. These results together suggested a decent test-retest
algorithms, which is compatible with its significantly lower prediction
629
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 4. The accuracies of GSDT prediction using the FC features that were derived without applying GSR during the processing procedure. (A) rsFC-based prediction
using the REST1 dataset; (B) rsFC-based prediction using the REST2 dataset; (C) rsFCS-based prediction using the REST1 dataset; (D) rsFCS-based prediction using the
REST2 dataset. See Supplementary Fig. 5 for the mean absolute error (MAE) changes.
accuracies relative to the other algorithms (Fig. 2A). The rsFCS feature- among themselves (mostly r > 0.75). Furthermore, the top 10 rsFCS
specific algorithm prediction similarity exhibited a slightly different features (regions) with the highest absolute weights were similar among
pattern (Fig. 6F and G): while LSVR and RVR also belonged to one cluster, LASSO regression, ridge regression, elastic-net regression, and LSVR.
LASSO regression was clustered with elastic-net and ridge regression. In These regions with the greatest contributions primarily involved the
particular, OLS regression was isolated from all the other algorithms, inferior frontal gyrus, the supplementary motor area, the insula, the
which corresponded to its relatively lower rsFCS-based prediction per- inferior temporal gyrus, and the superior parietal gyrus.
formance (Fig. 2C). Again, the observed results were highly consistent for both the REST1
and REST2 datasets.
Algorithm similarity in the spatial pattern of feature importance
Computational cost
The 6 * 6 algorithm similarity matrices of the whole-brain rsFC and
rsFCS features are illustrated in Fig. 7 and Fig. 8, respectively. For the The running times of the six algorithms are listed in Table 2. As ex-
rsFC-based GSDT prediction, the spatial patterns of the importance of the pected, the whole-brain rsFCS feature-based predictions were faster than
rsFC features exhibited relatively high between-algorithm correlations the rsFC feature-based predictions, given the markedly fewer rsFCS fea-
(r > 0.95) among OLS regression, ridge regression, elastic-net regression, tures. It was clearly observed that optimizing the parameter or parameter
and LSVR across the similarity matrix. While RVR exhibited smaller set with inner 5F-CVs dramatically increased the running time for ridge
correlations with these four algorithms relative to their within-algorithm regression, elastic-net regression, LASSO regression, and LSVR. Overall,
correlations, the absolute correlation value was still relatively high OLS regression and RVR had the shortest running times, as they did not
(r > 0.85). In contrast, LASSO regression had a relatively low correlation include the parameter optimization step. Notably, some software-related
(r ~ ¼ 0.20) with the other 5 algorithms, which is in line with the bias might be introduced into our results, given that the six algorithms
significantly lower prediction accuracies of LASSO regression relative to were implemented in different software packages.
the prediction accuracies of the other algorithms. The spatial distribution
of the top 100 rsFC features with the highest absolute weights are illus- Discussion
trated in Fig. 7. As expected, OLS regression, ridge regression, elastic-net
regression, and LSVR showed similar patterns. Specifically, these rsFC Using the large HCP dataset, the present study compared rsFC/rsFCS-
features with the greatest contributions were largely connected with the based predictions among 6 commonly used ML regression algorithms and
superior parietal area, primary motor area, middle/inferior temporal evaluated the effects of sample size on prediction performance. The re-
gyrus, superior/middle/inferior frontal gyrus, middle occipital gyrus, sults showed that ridge regression, elastic-net regression, LSVR, and RVR
basal ganglia, and thalamus. performed quite similarly for both rsFC- and rsFCS-based predictions.
Regarding the similarities of the rsFCS feature importance pattern However, LASSO regression performed remarkably worse than the other
among algorithms (Fig. 8), the correlations of OLS regression and RVR algorithms based on rsFC features, while OLS regression performed
with the other algorithms were relatively low (mostly r < 0.70) across the markedly worse than the other algorithms based on rsFCS features.
similarity matrix. In contrast, LASSO regression, ridge regression, elastic- Particularly, the prediction performance of all algorithms became
net regression, and LSVR corresponded to relatively higher correlations increasingly stable and better on average as the sample size increased.
630
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 5. The mean and SD of prediction accuracies of the 50 sample subsets for other 3 behavioral scores: ORRT, PVT, and VSPLOT. (A) rsFC-based ORRT prediction;
(B) rsFCS-based ORRT prediction; (C) rsFC-based PVT prediction; (D) rsFCS-based PVT prediction; (E) rsFC-based VSPLOT prediction; (F) rsFCS-based VSPLOT
prediction. See Supplementary Fig. 6 for the mean absolute error (MAE) changes.
The specific prediction patterns of each observed algorithm and sample associated matrices (Casanova et al., 2012). Often, ill-conditioning is
size effects were replicated using re-test fMRI data and different imaging associated with sensitivity of the prediction to data noise, which can be
preprocessing methods (i.e., without applying the GSR). The predictions effectively overcome by using regularization. Therefore the impact of
of different behavioral/cognitive scores were also replicated, which regularization on prediction performance of a regression algorithm will
strongly supports the robustness/generalizability of these observed ef- depend on the degree of ill-conditioning and noise level of a given
fects. These findings provide important reference information that can be problem. OLS has been reported previously to produce comparable per-
used in choosing an appropriate ML regression algorithm or sample size formance to other regularized algorithms in the context of some classi-
in individualized behavioral/cognitive prediction studies. fication problems (Casanova et al., 2012; Raizada et al., 2010). In our
work, we observed that rsFC-based prediction using OLS regression
Regression algorithm differences performed comparably with the other algorithms, but the rsFCS-based
prediction using OLS regression performed worse than the other algo-
First, the relative performances of rsFC- and rsFCS-based predictions rithms, possibly due to its sensitivity to data noise. In addition, the
using OLS regression significantly differed. Notably, OLS regression does observed dip of OLS accuracy around 300 samples, which is specific to
not apply any regularization techniques within the algorithm. In contrast, the rsFCS-based prediction, may also attribute to its sensitivity to data
ridge regression and LSVR both apply L2-norm regularization, LASSO noise. OLS regression therefore should be applied with extreme caution
regression includes L1-norm regularization, elastic-net regression in- in the case of high-dimensional feature space and small sample size,
cludes both L1-norm and L2-norm regularization, and RVR applies reg- because of its unstable performance.
ularization through a Gaussian prior. The lack of regularization in OLS In addition to OLS regression, the prediction performance of LASSO
regression model may account for its unstable prediction performance, regression also exhibited a dependence on the feature type: it performed
relative to the other algorithms. Notably, the need of regularization in markedly worse than the other algorithms when the rsFC features were
high-dimensional linear regression problems may depend on the applied but similarly with others when the rsFCS features were applied.
631
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 6. The test-retest cross-validation and algorithm-to-algorithm similarity of individualized GSDT predictions. (A) Scatter plots between the actual and predicted
scores using the REST1 dataset; (B) Scatter plots between the actual and predicted scores using the REST2 dataset; (C) Scatter plots of predicted scores between the
REST1 and REST2 datasets; (D) The 6 * 6 matrix representing the algorithm-to-algorithm similarity of individual rsFC-based predicted GSDT scores using the REST1
dataset. Hierarchical clustering dendrogram of algorithms are illustrated on the right. (E) The algorithm-to-algorithm similarity matrix of individual rsFC-based
predicted GSDT scores using the REST2 dataset. (F) The algorithm-to-algorithm similarity matrix of individual rsFCS-based predicted GSDT scores using the
REST1 dataset. (G) The algorithm-to-algorithm similarity matrix of individual rsFCS-based predicted GSDT scores using the REST2 dataset.
Due to the nature of L1-norm regularization, LASSO generally selects should be noted that a larger number of features do not necessarily lead
only one random feature from among the correlated features and ach- to a poor performance of LASSO regression, relative to other algorithms.
ieves a final sparse model, which is easy to optimize and provides better For example, the rsFC-based LASSO regression without applying GSR
generalization. In practice, LASSO can only select a maximum of N-1 exhibited similar prediction accuracies with other algorithms (Fig. 4),
features in the final model, where N is the sample size (Efron et al., 2004; and performed a bit better than the rsFC-based LASSO after applying
Ryali et al., 2012). In the present study, the number of whole-brain rsFC GSR. This may attribute to a more degree of between-feature correlation
features (i.e., 30,135) was much larger than the entire sample size for the rsFC without applying GSR, which favored a less chance of dis-
(<800). For a sampling subset of 20 subjects, LASSO can select a carding useful features in LASSO's sparse model and ultimately led to a
maximum of 20 features from among the 30,135 features. Therefore, better performance. Future work is desired to address this GSR-relevant
many useful rsFC features could be discarded, which might contribute to issue.
the underperformance of the LASSO algorithm in rsFC-based prediction. To overcome the limitation of LASSO regression, a reduction in
In contrast, there existed only 246 whole-brain rsFCS features in total, feature dimensionality (e.g., principal component analysis) can be
and LASSO regression corresponded to comparable prediction perfor- applied before using the LASSO algorithm (Wager et al., 2013).
mance with the other algorithms for the rsFCS features. However, it Elastic-net regression, which includes both L1-norm and L2-norm
632
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 7. Algorithm similarity in the spatial pattern of rsFC feature importance. The 6 * 6 matrix representing algorithm-to-algorithm similarity of the spatial pattern of
rsFC feature importance in GSDT predictions. The hierarchical clustering dendrograms of the algorithms are illustrated on the right. The top 100 rsFC features
(connections) with the highest absolute weights are displayed below. BrainNet Viewer was used for 3D surface visualization (www.nitrc.org/projects/bnv) (Xia
et al., 2013).
penalizations for regularization, can also be used and can be considered Regression algorithm similarity
as a combination of LASSO and ridge regression (Cui et al., 2018; Zou and
Hastie, 2005). Elastic-net regression can yield a sparse model and In terms of individual prediction, the scores predicted by RVR and
simultaneously permits the number of selected features to be larger than LSVR were quite correlated, regardless of feature type (rsFC or rsFCS).
the sample size. This can be expected because these two algorithms are conceptually
633
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Fig. 8. Algorithm similarity in the spatial pattern of rsFCS feature importance. The 6 * 6 matrix representing the algorithm-to-algorithm similarity of the spatial
pattern of rsFCS feature importance in GSDT prediction. The hierarchical clustering dendrograms of the algorithms are illustrated on the right. The top 10 rsFCS
features (regions) with the highest absolute weights are displayed below.
similar, and RVR was introduced by Tipping (2001) as a Bayesian RVR greatly outperformed LSVR in our comparisons of computational
alternative to LSVR. RVR has a few advantages over LSVR: RVR has far cost among algorithms (Table 2). Given the similar prediction perfor-
fewer ‘relevance vectors’ compared to the number of ‘support vectors’ in mance but difference in computational cost, RVR is recommended over
LSVR, which can reduce the model complexity and computational cost. In LSVR for behavioral/cognitive prediction investigations.
addition, RVR has no within-algorithm parameters, therefore requiring On the other hand, OLS regression, ridge regression, LASSO regres-
no extra computation for parameter optimization. Consistent with this, sion, and elastic-net regression are conceptually the same types of
634
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Table 2 and larger sample sizes can increase classification accuracy (Chu et al.,
The running time of the six algorithms. 2012; Kl€ oppel et al., 2008). Moreover, neuroimaging-based model clas-
OLS LASSO Ridge Elastic- LSVR RVR sification was reported to be unstable for small training sample sizes
net (Nieuwenhuis et al., 2012). Complementary to these ML classification
Whole-brain rsFC feature-based prediction studies, our present study provides strong evidence of the sample size
Pre-determined 11s 5s 8s 7s 252s 60s effect on the prediction performance of ML regression. Our main findings
parameters (λ/C ¼ 1, are highly consistent with those of ML classification: average prediction
α ¼ 0.5) accuracies were improved with increasing sample sizes, regardless of the
Parameters determined 2049s 124s 40106s 11595s
with inner 5F-CVs
algorithm. Additionally, the variance of the prediction accuracies across
the 50 sampling subsets markedly decreased for larger sample sizes,
Whole-brain rsFCS feature-based prediction
consistent with the instable prediction performance for small sample
Pre-determined 1s 1s 1s 1s 1s 12s sizes. It should be noted that the average accuracies of rsFCS feature for
parameters (λ/C ¼ 1, predicting the ORRT, PVT and VSPLOT scores did not appear to increase
α ¼ 0.5)
Parameters determined 17s 5s 196s 1179s
significantly with increasing sample size, while their variances decreased
with inner 5F-CVs like other predictions. This observation may reflect an inherently limited
power of rsFCS to predict these behavioral scores, accounting for poor
Notably, some software-related bias might be introduced into our results, given
prediction accuracies regardless of sample size. An improved prediction
the six algorithms were implemented in different software.
performance with a larger sample size therefore should not be taken for
granted, and the inherent power of the features for predicting specific
algorithms. Specifically, OLS regression fits parameters by minimizing
behavior plays an important role in this matter.
the squared loss function. LASSO regression and ridge regression are
Importantly, our present study further revealed that the ML regres-
based on OLS but further apply L1-norm and L2-norm regularization,
sion prediction accuracy and its stability (standard deviation) were both
respectively, on the squared loss function. Elastic-net regression applies
exponentially related to the sample size. That is, greater improvements in
both L1-norm and L2-norm regularization. The common loss function
prediction accuracy and its stability are achieved when the sample size is
might contribute to the correlated individual predicted scores among
increased from an initially small sample size, whereas smaller improve-
these four algorithms.
ments are observed when the sample size is increased from an initially
Regarding the spatial pattern of feature importance, the similarities
large sample size. According to Fig. 3, the average prediction accuracy
among the six algorithms were on average higher for rsFC-based pre-
and its stability appear to plateau at sample sizes of 200–300, regardless
dictions than for rsFCS-based predictions, which is possibly related to the
of the algorithm. Therefore, a minimum sample size of 200 is recom-
overall higher prediction accuracy of rsFC-based predictions. Particu-
mended for ML regression prediction studies of behavior/cognition.
larly, LASSO regression has the poorest performance among rsFC-based
Although achieving this sample size may be challenging for some re-
predictions, and its correlation of the spatial pattern of feature impor-
searchers, the increasing prevalence of multi-site data sharing and big-
tance with the other algorithms is quite low. Consistently, OLS regression
project data release among the neuroimaging community, e.g., the HCP
had the poorest performance in rsFCS-based prediction and corresponded
(http://www.humanconnectome.org/) and the UK BioBank (https://
to very low correlation of the spatial pattern of feature importance with
www.ukbiobank.ac.uk/), can be utilized to address these sample size
the other algorithms. It should be noted that the algorithm similarities in
issues.
terms of individual predictions and the spatial pattern of feature
importance are two relatively independent measures. A high similarity of
Limitations
individual predictions between two algorithms does not necessarily
indicate a similar spatial pattern of feature importance, as an algorithm
A few issues related to the current study should be noted. First, the
can achieve similar predictions with different combinations of features.
prediction accuracy across all regression algorithms showed relatively
It is worth noting that the currently observed discriminative regions
low r values, even with a large sample size (rsFC feature, mean value:
from both rsFC- and rsFCS-based GSTD prediction have been largely
0.1–0.5; rsFCS feature, mean value: 0.1–0.3), while most of them are
found hand-motor related (the main functional domain of GSDT) in
significantly higher than by chance (Supplementary Fig. 7). Technically,
previous studies. For example, the supplementary motor area and insula
specific feature pre-processing or data-driven based feature selection
were found activated in a hand grip task (Ward et al., 2008), and the
procedure might be useful to improve prediction accuracy, with the
degree of activation of the primary motor area and insula was correlated
actual degree of improvement depending on specific regression algo-
with the performance of hand dynamometer (Loubinoux et al., 2007). In
rithms. This potential interaction effect between these procedures and
hand movement tasks, the primary motor cortex and superior parietal
regression algorithms will confound the comparison of prediction per-
area were involved (Tombari et al., 2004). Finally, the basal ganglia,
formance among regression algorithms. Therefore, the present study did
thalamus, and frontal cortex were important for the function of precision
not include any of these procedures benefiting higher prediction accu-
grip force control (Prodoehl et al., 2009; Spraker et al., 2007; Vaillan-
racy, partly accounting for the relatively low r values. In addition to this
court et al., 2007), which is highly related to the GSDT.
technical consideration, the biologically limited power of rsFC/rsFCS
features for predicting cognitive/behavioral scores might serve as
Sample size effect another inherent source. Putatively, there exist very complex neural
mechanisms underlying any human cognition/behavior, which involve a
A large sample size is always intuitively desired for ML classification variety of brain features. The rsFC/rsFCS represent only a very limited
or regression studies, and a larger sample size theoretically can minimize aspect of human brain features; it is therefore unsurprising that rsFC/
the empirical risk provided that the data is sampled from the same dis- rsFCS features alone could at the most capture only a small portion of the
tribution (Vapnik, 2000). A small sample size may not fully represent the cognitive/behavioral variance, ultimately leading to a low r value.
entire spectrum of a population, therefore limiting the generalizability of Combining rsFC/rsFCS with other brain features may effectively improve
the predicted results to other independent sample sets. In addition, many the prediction accuracy (i.e., a higher r value), which warrants further
machine learning methods suffer from model overfitting if the number of investigation in the future.
data samples is limited, further causing failure of generalization to in- Next, the predictive features of the present study were confined to
dependent datasets (Pereira et al., 2009). The effect of sample size on rsFC and rsFCS, with the rsFCS being included as an example lower
neuroimaging-based ML classification performance has been explored, dimensional feature from rsFC. Compared with the rsFC, the rsFCS
635
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
feature overall performed remarkably worse in predictions, suggesting a Essen, D.C., Consortium, W.U.-M.H., 2013. Function in the human connectome: task-
fMRI and individual differences in behavior. Neuroimage 80, 169–189.
significantly negative impact of this rsFC-to-rsFCS feature extraction/
Beucke, J.C., Sepulcre, J., Talukdar, T., Linnman, C., Zschenderlein, K., Endrass, T.,
dimensionality reduction procedure. Caution therefore should be exer- Kaufmann, C., Kathmann, N., 2013. Abnormally high degree connectivity of the
cised, no matter the strategy of feature extraction/dimensionality orbitofrontal cortex in obsessive-compulsive disorder. JAMA Psychiatry 70, 619–629.
reduction is pre-defined (e.g., rsFCS and local clustering coefficient) Biswal, B., Yetkin, F.Z., Haughton, V.M., Hyde, J.S., 1995. Functional connectivity in the
motor cortex of resting human brain using echo-planar MRI. Magn. Reson. Med. 34,
(Wee et al., 2012) or data-driven (e.g., singular value decomposition) 537–541.
(Zhan et al., 2015). In addition, other commonly used imaging features Braun, U., Plichta, M.M., Esslinger, C., Sauer, C., Haddad, L., Grimm, O., Mier, D.,
from structural MRI or diffusion MRI were not included in the present Mohnke, S., Heinz, A., Erk, S., Walter, H., Seiferth, N., Kirsch, P., Meyer-
Lindenberg, A., 2012. Test-retest reliability of resting-state connectivity network
study. It remains unclear whether our currently observed algorithm and characteristics using fMRI and graph theoretical measures. Neuroimage 59,
sample size effects can be generalized to individual behavioral/cognitive 1404–1412.
predictions using non-rsFC/rsFCS neuroimaging features, which need to Buckner, R.L., Sepulcre, J., Talukdar, T., Krienen, F.M., Liu, H., Hedden, T., Andrews-
Hanna, J.R., Sperling, R.A., Johnson, K.A., 2009. Cortical hubs revealed by intrinsic
be exclusively explored in the future. functional connectivity: mapping, assessment of stability, and relation to Alzheimer's
Finally, while the present study validated relevant findings across disease. J. Neurosci. 29, 1860–1873.
multiple aspects, all analyses were based on the HCP dataset. It is Carroll, M.K., Cecchi, G.A., Rish, I., Garg, R., Rao, A.R., 2009. Prediction and
interpretation of distributed neural activity with sparse models. Neuroimage 44,
important to further validate our results with independent non-HCP 112–122.
datasets in the future to determine whether the fMRI protocol had an Casanova, R., Hsu, F.C., Espeland, M.A., Alzheimer's Disease Neuroimaging, I., 2012.
impact on our findings. Moreover, the present study compared only six Classification of structural MRI images in Alzheimer's disease from the perspective of
ill-posed problems. PLoS One 7, e44877.
machine learning regression algorithms that have been commonly used
Chang, C.C., Lin, C.J., 2011. LIBSVM: a library for support vector machines. Acm Trans.
to date in neuroimaging filed. Obviously, there are a number of widely Intelligent Syst. Technol. 2, 27.
used regression algorithms out there, e.g., partial least squares (Yoo et al., Yan, C.G., Zang, Y.F., 2010. DPARSF: a MATLAB toolbox for “pipeline” data analysis of
2018), random forest (Kesler et al., 2017), Bayesian additive regression resting-state fMRI. Front. Syst. Neurosci. 4, 13.
Chu, C., Hsu, A.-L., Chou, K.-H., Bandettini, P., Lin, C., 2012. Does feature selection
trees (BART), and ensemble learning. Future investigations are desired to improve classification accuracy? Impact of sample size and feature selection on
evaluate those algorithms as well. classification using anatomical magnetic resonance images. Neuroimage 60, 59–70.
Chu, C., Ni, Y., Tan, G., Saunders, C.J., Ashburner, J., 2011. Kernel regression for fMRI
pattern prediction. Neuroimage 56, 662–673.
Conclusions Cui, Z., Su, M., Li, L., Shu, H., Gong, G., 2018. Individualized prediction of reading
comprehension ability using gray matter volume. Cereb. Cortex 28, 1656–1672.
The present study revealed non-trivial effects of the chosen ML Dosenbach, N.U.F., Nardos, B., Cohen, A.L., Fair, D.A., Power, J.D., Church, J.A.,
Nelson, S.M., Wig, G.S., Vogel, A.C., Lessov-Schlaggar, C.N., 2010. Prediction of
regression algorithm and sample size on behavioral/cognitive individual individual brain maturity using fMRI. Science 329, 1358–1361.
predictions using resting-state functional connectivity features. Among Dresler, M., Shirer, W.R., Konrad, B.N., Muller, N.C., Wagner, I.C., Fernandez, G.,
the commonly used algorithms evaluated here, applying the OLS Czisch, M., Greicius, M.D., 2017. Mnemonic training reshapes brain networks to
support superior memory. Neuron 93, 1227–1235 e1226.
regression or LASSO regression should be cautious. Compared with the Dubois, J., Adolphs, R., 2016. Building a science of individual differences from fMRI.
other algorithms, OLS regression is the fastest algorithm and performs Trends Cognit. Sci. 20, 425–443.
similarly when using rsFC feature, but underperforms when using rsFCS Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., 2004. Least angle regression. Ann. Stat.
32, 407–499.
feature. In contrast, the LASSO performs comparable with other algo- Erus, G., Battapady, H., Satterthwaite, T.D., Hakonarson, H., Gur, R.E., Davatzikos, C.,
rithms for rsFCS feature but underperforms for rsFC feature. Regardless Gur, R.C., 2015. Imaging patterns of brain development and their relationship to
of the algorithm, the prediction accuracy and its stability exponentially cognition. Cerebr. Cortex 25, 1676–1684.
Fan, L., Li, H., Zhuo, J., Zhang, Y., Wang, J., Chen, L., Yang, Z., Chu, C., Xie, S., Laird, A.R.,
increased with the sample size. These findings are of significant value for
Fox, P.T., Eickhoff, S.B., Yu, C., Jiang, T., 2016. The human brainnetome atlas: a new
understanding how the selected ML regression algorithm and sample size brain atlas based on connectional architecture. Cerebr. Cortex 26, 3508–3526.
influence individualized behavior/cognition predictions and for Finn, E.S., Shen, X., Scheinost, D., Rosenberg, M.D., Huang, J., Chun, M.M.,
Papademetris, X., Constable, R.T., 2015. Functional connectome fingerprinting:
choosing the most appropriate ML regression algorithm or sample size in
identifying individuals using patterns of brain connectivity. Nat. Neurosci. 18,
relevant investigations. 1664–1671.
Fox, M.D., Snyder, A.Z., Vincent, J.L., Corbetta, M., Van Essen, D.C., Raichle, M.E., 2005.
The human brain is intrinsically organized into dynamic, anticorrelated functional
Acknowledgements
networks. Proc. Natl. Acad. Sci. U. S. A. 102, 9673–9678.
Fox, M.D., Zhang, D., Snyder, A.Z., Raichle, M.E., 2009. The global signal and observed
This work was supported by the National Science Foundation of anticorrelated resting state brain networks. J. Neurophysiol. 101, 3270–3283.
China (81671772, 91732101), the 863 program (2015AA020912), the Franke, K., Ziegler, G., Kl€oppel, S., Gaser, C., 2010. Estimating the age of healthy subjects
from T1-weighted MRI scans using kernel methods: exploring the influence of various
Fundamental Research Funds for the Central Universities. Data were parameters. Neuroimage 50, 883–892.
provided by the Human Connectome Project, WU-Minn Consortium Fransson, P., 2005. Spontaneous low-frequency BOLD signal fluctuations: an fMRI
(Principal Investigators: David Van Essen and Kamil Ugurbil; investigation of the resting-state default mode of brain function hypothesis. Hum.
Brain Mapp. 26, 15–29.
1U54MH091657) funded by the 16 NIH Institutes and Centers that Friston, K.J., 1994. Functional and effective connectivity in neuroimaging: a synthesis.
support the NIH Blueprint for Neuroscience Research; and by the Hum. Brain Mapp. 2, 56–78.
McDonnell Center for Systems Neuroscience at Washington University. Friston, K.J., Williams, S., Howard, R., Frackowiak, R.S., Turner, R., 1996. Movement-
related effects in fMRI time-series. Magn. Reson. Med. 35, 346–355.
The authors thank Diego G. Davila for English editing. The authors have Gabrieli, J.D.E., Ghosh, S.S., Whitfield-Gabrieli, S., 2015. Prediction as a humanitarian
no conflicts of interest to declare. and pragmatic contribution from human cognitive neuroscience. Neuron 85, 11–26.
Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B., Andersson, J.L.,
Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., Van Essen, D.C., Jenkinson, M., W. U-
Appendix A. Supplementary data Minn HCP Consortium, 2013. The minimal preprocessing pipelines for the Human
Connectome Project. Neuroimage 80, 105–124.
Supplementary data related to this article can be found at https://doi. Gong, Q.Y., Li, L.J., Du, M.Y., Pettersson-Yeo, W., Crossley, N., Yang, X., Li, J.,
Huang, X.Q., Mechelli, A., 2014. Quantitative prediction of individual
org/10.1016/j.neuroimage.2018.06.001. psychopathology in trauma survivors using resting-state fMRI.
Neuropsychopharmacology 39, 681–687.
References Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning.
Springer series in statistics, New York.
Haynes, J.D., 2015. A primer on pattern-based approaches to fMRI: principles, pitfalls,
Arbabshirani, M.R., Plis, S., Sui, J., Calhoun, V.D., 2017. Single subject prediction of brain
and perspectives. Neuron 87, 257–270.
disorders in neuroimaging: promises and pitfalls. Neuroimage 145, 137–165.
Hoerl, A.E., Kennard, R.W., 1970. Ridge regression: biased estimation for nonorthogonal
Barch, D.M., Burgess, G.C., Harms, M.P., Petersen, S.E., Schlaggar, B.L., Corbetta, M.,
problems. Technometrics 12, 55–67.
Glasser, M.F., Curtiss, S., Dixit, S., Feldt, C., Nolan, D., Bryant, E., Hartley, T.,
Footer, O., Bjork, J.M., Poldrack, R., Smith, S., Johansen-Berg, H., Snyder, A.Z., Van
636
Z. Cui, G. Gong NeuroImage 178 (2018) 622–637
Hsu, C.-W., Chang, C.-C., Lin, C.-J., 2003. A Practical Guide to Support Vector Slotkin, J., Nowinski, C., Hays, R., Beaumont, J., Griffith, J., Magasi, S., Salsman, J.,
Classification. Gershon, R., 2012. NIH Toolbox Scoring and Interpretation Guide. National Institutes
Kesler, S.R., Rao, A., Blayney, D.W., Oakley-Girvan, I.A., Karuturi, M., Palesh, O., 2017. of Health, Washington (DC), pp. 6–7.
Predicting long-term cognitive outcome following breast cancer with pre-treatment Smith, S.M., Nichols, T.E., Vidaurre, D., Winkler, A.M., Behrens, T.E., Glasser, M.F.,
resting state fMRI and random forest machine learning. Front. Hum. Neurosci. 11, Ugurbil, K., Barch, D.M., Van Essen, D.C., Miller, K.L., 2015. A positive-negative
555. mode of population covariation links brain connectivity, demographics and behavior.
Kl€
oppel, S., Stonnington, C.M., Chu, C., Draganski, B., Scahill, R.I., Rohrer, J.D., Fox, N.C., Nat. Neurosci. 18, 1565–1567.
Ashburner, J., Frackowiak, R.S., 2008. A plea for confidence intervals and Smola, A.J., Scholkopf, B., 2004. A tutorial on support vector regression. Stat. Comput.
consideration of generalizability in diagnostic studies. Brain 132, e102. 14, 199–222.
Kragel, P.A., Carter, R.M.K., Huettel, S.A., 2012. What makes a pattern? Matching Spraker, M.B., Yu, H., Corcos, D.M., Vaillancourt, D.E., 2007. Role of individual basal
decoding methods to data in multivariate pattern analysis. Front. Neurosci. 6. ganglia nuclei in force amplitude generation. J. Neurophysiol. 98, 821–834.
Legendre, P., Legendre, L.F., 2012. Numerical Ecology. Elsevier. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B
Liao, X., Cao, M., Xia, M., He, Y., 2017. Individual differences and time-varying features 267–288.
of modular brain architecture. Neuroimage 152, 94–107. Tipping, M.E., 2001. Sparse Bayesian learning and the relevance vector machine. J. Mach.
Liu, J., Xia, M., Dai, Z., Wang, X., Liao, X., Bi, Y., He, Y., 2017. Intrinsic brain hub Learn. Res. 1, 211–244.
connectivity underlies individual differences in spatial working memory. Cerebr. Tombari, D., Loubinoux, I., Pariente, J., Gerdelat, A., Albucher, J.F., Tardy, J., Cassol, E.,
Cortex 27, 5496–5508. Chollet, F., 2004. A longitudinal fMRI study: in recovering and then in clinically
Loubinoux, I., Dechaumont-Palacin, S., Castel-Lacanal, E., De Boissezon, X., Marque, P., stable sub-cortical stroke patients. Neuroimage 23, 827–839.
Pariente, J., Albucher, J.F., Berry, I., Chollet, F., 2007. Prognostic value of FMRI in Ullman, H., Almeida, R., Klingberg, T., 2014. Structural maturation and brain activity
recovery of hand function in subcortical stroke patients. Cerebr. Cortex 17, predict future working memory capacity during childhood development. J. Neurosci.
2980–2987. 34, 1592–1598.
MacAusland, R., 2014. The Moore-Penrose Inverse and Least Squares. Math 420: Vaillancourt, D.E., Yu, H., Mayka, M.A., Corcos, D.M., 2007. Role of the basal ganglia and
Advanced Topics in Linear Algebra. frontal cortex in selecting and producing internally guided force pulses. Neuroimage
Mosic, D., Djordjevic, D.S., 2009. Moore-Penrose-invertible normal and Hermitian 36, 793–803.
elements in rings. Linear Algebra Appl. 431, 732–745. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.,
Mourao-Miranda, J., Bokde, A.L., Born, C., Hampel, H., Stetter, M., 2005. Classifying Consortium, W.U.-M.H., 2013. The WU-Minn human connectome project: an
brain states and determining the discriminating activation patterns: support Vector overview. Neuroimage 80, 62–79.
Machine on functional MRI data. Neuroimage 28, 980–995. Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.E., Bucholz, R.,
Murphy, K., Birn, R.M., Handwerker, D.A., Jones, T.B., Bandettini, P.A., 2009. The impact Chang, A., Chen, L., Corbetta, M., Curtiss, S.W., Della Penna, S., Feinberg, D.,
of global signal regression on resting state correlations: are anti-correlated networks Glasser, M.F., Harel, N., Heath, A.C., Larson-Prior, L., Marcus, D., Michalareas, G.,
introduced? Neuroimage 44, 893–905. Moeller, S., Oostenveld, R., Petersen, S.E., Prior, F., Schlaggar, B.L., Smith, S.M.,
Murphy, K., Fox, M.D., 2017. Towards a consensus regarding global signal regression for Snyder, A.Z., Xu, J., Yacoub, E., Consortium, W.U.-M.H., 2012. The Human
resting state functional connectivity MRI. Neuroimage 154, 169–173. Connectome Project: a data acquisition perspective. Neuroimage 62, 2222–2231.
Nieuwenhuis, M., van Haren, N.E., Hulshoff Pol, H.E., Cahn, W., Kahn, R.S., Vapnik, V., 2000. The Nature of Statistical Learning Theory. Springer-Verlag, New York.
Schnack, H.G., 2012. Classification of schizophrenia patients and healthy controls Vinod, H.D., 1978. A survey of ridge regression and related techniques for improvements
from structural MRI scans in two large independent samples. Neuroimage 61, over ordinary least squares. Rev. Econ. Stat. 121–131.
606–612. Wager, T.D., Atlas, L.Y., Lindquist, M.A., Roy, M., Woo, C.W., Kross, E., 2013. An fMRI-
Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V., 2006. Beyond mind-reading: multi- based neurologic signature of physical pain. N. Engl. J. Med. 368, 1388–1397.
voxel pattern analysis of fMRI data. Trends Cognit. Sci. 10, 424–430. Wang, Y., Fan, Y., Bhatt, P., Davatzikos, C., 2010. High-dimensional pattern regression
Orrù, G., Pettersson-Yeo, W., Marquand, A.F., Sartori, G., Mechelli, A., 2012. Using using machine learning: from medical images to continuous clinical variables.
support vector machine to identify imaging biomarkers of neurological and Neuroimage 50, 1519–1535.
psychiatric disease: a critical review. Neurosci. Biobehav. Rev. 36, 1140–1152. Ward, N.S., Swayne, O.B., Newton, J.M., 2008. Age-dependent changes in the neural
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., correlates of force modulation: an fMRI study. Neurobiol. Aging 29, 1434–1446.
Prettenhofer, P., Weiss, R., Dubourg, V., 2011. Scikit-learn: machine learning in Wee, C.Y., Yap, P.T., Zhang, D., Denny, K., Browndyke, J.N., Potter, G.G., Welsh-
Python. J. Mach. Learn. Res. 12, 2825–2830. Bohmer, K.A., Wang, L., Shen, D., 2012. Identification of MCI individuals using
Pereira, F., Mitchell, T., Botvinick, M., 2009. Machine learning classifiers and fMRI: a structural and functional connectivity networks. NeuroImage 59, 2045–2056.
tutorial overview. Neuroimage 45, S199–S209. Xia, M., Wang, J., He, Y., 2013. BrainNet Viewer: a network visualization tool for human
Prodoehl, J., Corcos, D.M., Vaillancourt, D.E., 2009. Basal ganglia mechanisms underlying brain connectomics. PLoS One 8, e68910.
precision grip force control. Neurosci. Biobehav. Rev. 33, 900–908. Yan, C.G., Wang, X.D., Zuo, X.N., Zang, Y.F., 2016. DPABI: data processing & analysis for
Raizada, R.D., Tsao, F.M., Liu, H.M., Holloway, I.D., Ansari, D., Kuhl, P.K., 2010. Linking (Resting-State) brain imaging. Neuroinformatics 14, 339–351.
brain-wide multivoxel activation patterns to behaviour: examples from language and Yoo, K., Rosenberg, M.D., Hsu, W.T., Zhang, S., Li, C.R., Scheinost, D., Constable, R.T.,
math. Neuroimage 51, 462–471. Chun, M.M., 2018. Connectome-based predictive modeling of attention: comparing
Rosenberg, M.D., Finn, E.S., Scheinost, D., Papademetris, X., Shen, X., Constable, R.T., different functional connectivity features and prediction methods across datasets.
Chun, M.M., 2015. A neuromarker of sustained attention from whole-brain functional Neuroimage 167, 11–22.
connectivity. Nat. Neurosci. 19, 165–171. Zalesky, A., Fornito, A., Cocchi, L., Gollo, L.L., Breakspear, M., 2014. Time-resolved
Ryali, S., Chen, T.W., Supekar, K., Menon, V., 2012. Estimation of functional connectivity resting-state brain networks. Proc. Natl. Acad. Sci. U. S. A. 111, 10341–10346.
in fMRI data using stability selection-based sparse partial correlation with elastic net Zhan, L., Liu, Y., Wang, Y., Zhou, J., Jahanshad, N., Ye, J., Thompson, P.M., Alzheimer's
penalty. Neuroimage 59, 3852–3861. Disease Neuroimaging, I., 2015. Boosting brain connectome classification accuracy in
Sch€olkopf, B., Smola, A.J., 2002. Learning with Kernels: Support Vector Machines, Alzheimer's disease using higher-order singular value decomposition. Front.
Regularization, Optimization, and beyond. MIT Press, Cambridge, Mass. Neurosci. 9, 257.
Schrouff, J., Rosa, M.J., Rondina, J.M., Marquand, A.F., Chu, C., Ashburner, J., Zhao, T., Duan, F., Liao, X., Dai, Z., Cao, M., He, Y., Shu, N., 2015. Test-retest reliability of
Phillips, C., Richiardi, J., Mourao-Miranda, J., 2013. PRoNTo: pattern recognition for white matter structural brain networks: a multiband diffusion MRI study. Front. Hum.
neuroimaging toolbox. Neuroinformatics 11, 319–337. Neurosci. 9, 59.
Shen, X., Finn, E.S., Scheinost, D., Rosenberg, M.D., Chun, M.M., Papademetris, X., Zhong, S., He, Y., Gong, G., 2015. Convergence and divergence across construction
Constable, R.T., 2017. Using connectome-based predictive modeling to predict methods for human brain white matter networks: an assessment based on individual
individual behavior from brain connectivity. Nat. Protoc. 12, 506–518. differences. Hum. Brain Mapp. 36, 1995–2013.
Shrout, P.E., Fleiss, J.L., 1979. Intraclass correlations: uses in assessing rater reliability. Zou, H., Hastie, T., 2005. Regularization and variable selection via the elastic net. J. R.
Psychol. Bull. 86, 420–428. Stat. Soc. Series B Stat. Methodol. 67, 301–320.
Siegel, J.S., Ramsey, L.E., Snyder, A.Z., Metcalf, N.V., Chacko, R.V., Weinberger, K., Zuo, X.N., Ehmke, R., Mennes, M., Imperati, D., Castellanos, F.X., Sporns, O.,
Baldassarre, A., Hacker, C.D., Shulman, G.L., Corbetta, M., 2016. Disruptions of Milham, M.P., 2012. Network centrality in the human functional connectome.
network connectivity predict impairment in multiple behavioral domains after Cerebr. Cortex 22, 1862–1875.
stroke. Proc. Natl. Acad. Sci. U. S. A. 113, E4367–E4376.
637