A Stepwise Approach For High-Dimensional Gaussian Graphical Models
A Stepwise Approach For High-Dimensional Gaussian Graphical Models
Francisco Nogales
Universidad Carlos III de Madrid, España
Marcelo Ruiz
Universidad Nacional de Rı́o Cuarto, Argentina
Ruben Zamar
University of British Columbia, Canada
January 12, 2020
Abstract
We present a stepwise approach to estimate high dimensional Gaussian graphical models.
We exploit the relation between the partial correlation coefficients and the distribution of
the prediction errors, and parametrize the model in terms of the Pearson correlation coeffi-
cients between the prediction errors of the nodes’ best linear predictors. We propose a novel
stepwise algorithm for detecting pairs of conditionally dependent variables. We compare the
proposed algorithm with existing methods including graphical lasso (Glasso), constrained
`1 -minimization (CLIME) and equivalent partial correlation (EPC), via simulation studies
and real life applications. In our simulation study we consider several model settings and
report the results using different performance measures that look at desirable features of the
recovered graph.
Keywords: Covariance Selection, Gaussian Graphical Model, Forward and Backward Selection,
Partial Correlation Coefficient.
∗
The authors thanks the generous support of NSERC, Canada, the Institute of Financial Big Data, University
Carlos III of Madrid and the CSIC, Spain.
1
1 Introduction
High-dimensional Gaussian graphical models (GGM) are widely used in practice to represent the
linear dependency between variables. The underlying idea in GGM is to measure linear dependen-
cies by estimating partial correlations to infer whether there is an association between a given pair
of variables, conditionally on the remaining ones. Moreover, there is a close relation between the
nonzero partial correlation coefficients and the nonzero entries in the inverse of the covariance ma-
trix. Covariance selection procedures take advantage of this fact to estimate the GGM conditional
dependence structure given a sample (Dempster, 1972; Lauritzen, 1996; Edwards, 2000).
When the dimension p is larger than the number n of observations, the sample covariance
matrix S is not invertible and the maximum likehood estimate (MLE) of Σ does not exist. When
p/n ≤ 1, but close to 1, S is invertible but ill-conditioned, increasing the estimation error (Ledoit
and Wolf, 2004). To deal with this problem, several covariance selection procedures have been
proposed based on the assumption that the inverse of the covariance matrix, Ω, called precision
matrix, is sparse.
a forward-backward algorithm, which we call StepGraph. Our procedure takes advantage of the
relation between the partial correlation and the Pearson correlation coefficient of the residuals.
Existing methods to estimate the GGM can be classified in three classes: nodewise regres-
sion methods, maximum likelihood methods and limited order partial correlations methods. The
nodewise regression method was proposed by Meinshausen and Bühlmann (2006). This method
estimates a lasso regression for each node in the graph. See for example Peng et al. (2009), Yuan
(2010), Liu and Wang (2012), Zhou et al. (2011) and Ren et al. (2015). Penalized likelihood meth-
2
ods include Yuan and Lin (2007), Banerjee et al. (2008), Friedman et al. (2008), Johnson et al.
(2011) and Ravikumar et al. (2011) among others. Cai et al. (2011) propose an estimator called
CLIME that estimates precision matrices by solving the dual of an `1 penalized maximum like-
lihood problem. Limited order partial correlation procedures use lower order partial correlations
to test for conditional independence relations. See Spirtes et al. (2000), Kalisch and Bühlmann
(2007), Rütimann et al. (2009), Liang et al. (2015) and Huang et al. (2016).
The rest of the article is organized as follows. Section 2 introduces the stepwise approach
along with some notation. Section 3 gives simulations results and a real data example. Section 4
presents some concluding remarks. Appendix A reports detailed description of the crossvalidation
procedure used to determine the required parameters in our StepGraph algorithm and Appendix
In this section we review some definitions and technical concepts needed later on. Let G = (V, E)
be a graph where V 6= ∅ is the set of nodes or vertices and E ⊆ V × V = V 2 is the set of edges.
For simplicity we assume that V = {1, . . . , p}. The graph G is undirected, that is, (i, j) ∈ E if
and only if (j, i) ∈ E. Two nodes i and j are called connected, adjacent or neighbors if (i, j) ∈ E.
A graphical model (GM) is a graph such that V indexes a set of variables {X1 , . . . , Xp } and E
is defined by:
(i, j) ∈
/ E if and only if Xi ⊥⊥ Xj | XV \{i,j}. (1)
3
Here ⊥⊥ denotes conditional independence.
Notice that Ai gives the nodes directly connected with i and therefore a GM can be effectively
We further assume that (X1 , . . . , Xp )> ∼ N(0, Σ), where Σ = (σij )i,j=1...,p is a positive-definite
covariance matrix. In this case the graph is called a Gaussian graphical model (GGM). The matrix
There exists an extensive literature on GM and GGM. For a detailed treatment of the theory
see for instance Lauritzen (1996), Edwards (2000), and Bühlmann and Van De Geer (2011).
In a GGM the set of edges E represents the conditional dependence structure of the vector
from classical multivariate analysis. For an exhaustive treatment of these results see, for instance,
4
Note that X has multivariate normal distribution with mean 0 and covariance matrix
Σ11 Σ12
(3)
Σ21 Σ22
such that Σ11 has dimension 2 × 2, Σ12 has dimension 2 × (p − 2) and so on. The matrix in (3) is
a partition of a permutation of the original covariance matrix Σ, and will be also denoted by Σ,
Moreover, we set
−1
Σ11 Σ12 Ω11 Ω12
Ω=
=
.
Σ21 Σ22 Ω21 Ω22
Then, by (B.2) of Lauritzen (1996), the blocks Ωi,j can be written explicitly in terms of Σi,j and
Σ−1
i,j . In particular
−1
Ω11 = Σ11 − Σ12 Σ−1
22 Σ21 where
ωii ωil
Ω11 =
ωli ωll
is the submatrix of Ω (with rows i and l and columns i and l). Hence,
= Ω−1
11
1 ωll −ωil
=
ωii ωll − ωil ωli
−ωli ωii
ωil
corr Xi , Xl |XV \{i,l} = − √ . (5)
ωii ωll
5
This gives the standard parametrization of E in terms of the support of the precision matrix
We now introduce another parametrization of E, which we need to define and implement our
b 1 = X1 − β > X2
ε = X1 − X
and let εi and εl denote the entries of ε (i.e. ε> = (εi , εl )). The regression error ε is independent
of X
b 1 and has normal distribution with mean 0 and covariance matrix Ψ11 with elements denoted
by
ψii ψil
Ψ11 =
.
(7)
ψli ψll
A straightforward calculation shows that
Ψ11 = cov (X1 ) + cov X1 − 2cov X1 , X1
b b
Therefore, by this equality, (4) and (5), the partial correlation coefficient and the conditional
ψil
ρil·V \{i,l} = corr Xi , Xl |XV \{i,l} = √ .
ψii ψll
6
Summarizing, the problem of determining the conditional dependence structure in a GGM (rep-
resented by E) is equivalent to finding the pairs of nodes of V that belong to the set
which is equal to the support of the precision matrix, supp (Ω), defined by (6).
Remark 1 As noticed above, under normality, partial and conditional correlation are the same.
Remark 2 Let βi,l be the regression coefficient of Xl in the regression of Xi versus XV \{i} and,
similarly let βl,i be the regression coefficient of Xi in the regression of Xl versus XV \{i} . Then it
p
follows that ρil·V \{i,l} = sign (βl,i ) βl,i βi,l . This allows for another popular parametrization for E.
Moreover, let i be the error term in the regression of the ith variable on the remaining ones. Then
by Lemma 1 in Peng et al. (2009) we have that cov(i , l ) = ωil /ωii ωll and var(i ) = 1/ωii .
Conditionally on its neighbors, Xi is independent of all the other variables. Therefore, given a
Xi and Xl can be obtained by the following procedure: (i) regress Xi on XAi and compute the
regression residual εi ; (ii) regress Xl on XAl and compute the regression residual εl ; (iii) calculate
This reasoning motivates the StepGraph algorithm. At each step k of StepGraph, we have
working assumption, that the empirical partial correlation coefficient ρbjl.Abk is close to zero. If
j
7
the maximum absolute partial correlation computed this way is large, then we conclude that the
working system of neighborhoods needs to be updated. We then add the most likely new edge,
the one with the largest partial correlation. This constitutes the forward step. In the backward
step, if the minimum absolute partial correlation coefficient between presently connected nodes, j
Input: the (centered) data {x1 , ..., xn } , and the forward and backward thresholds αf and αb .
Iteration Step. Given Abk1 , Abk2 , ..., Abkp we compute Abk+1 k+1 bk+1 as follows.
1 , A2 , ..., Ap
(a) Regress the j th variable on the variables with subscript in the set Abkj and compute the
regression residuals ekj = ek1j , ek2j , ..., eknj .
(b) Regress the lth variables on the variables with subscript in the set Abkl and compute the
(c) Obtain the partial correlation fjlk by calculating the Pearson correlation between ekj and
ekl .
If
max fjlk = fjk0 l0 ≥ αf
l∈
/Abk ,j∈V
j
8
If
max fjlk = fjk0 l0 < αf , stop.
(a) Regress the j th variables on the variables with subscript in the set Abk+1
j \ {l} and compute
the regression residuals rkj = r1jk , r k , ..., r k
2j nj .
(b) Regress the lth variable on the variables with subscript in the set Abk+1
l \ {j} and compute
(c) Compute the partial correlation bkjl by calculating the Pearson correlation between rkj
and rkl .
If
k k
min bjl = bj0 l0 ≤ αb
bk ,j∈V
l∈Aj
Output
for i = 1, ..., p, where ei is the vector of the prediction errors in the regression of the ith
9
2.4 Thresholds selection by cross-validation
vations. We randomly partition the dataset {xi }1≤i≤n into K disjoint subsets of approximately
K
X
th (t)
equal sizes, the t subset being of size nt ≥ 2 and nt = n. For every t, let {xi }1≤i≤nt be the
t=1
(t)
tth validation subset, and its complement {e
xi }1≤i≤n−nt , the tth training subset. For every t and
(t) (t)
for every pair (αf , αb ) of threshold parameters let Ab1 , . . . , Abp be the estimated neighborhoods
given by StepGraph using the tth training subset. For every j = 1, . . . , p let βbAb(t) be the estimated
j
(t)
coefficient of the regression of the variable Xj on the neighborhood Abj .
(t)
Consider now the tth validation subset. So, for every j, using βb (t) , we obtain the vector of
Aj
K pj
1 XX
(t) b (t)
2
CV (αf , αb ) =
Xj − Xj (αf , αb )
n t=1 j=1
where k·k the L2-norm or euclidean distance in Rp . Hence the K–fold cross–validation forward–
backward thresholds α
bf , α
bb is
(b
αf , α
bb ) =: argmin CV (αf , αb )
(αf ,αb )∈H
where H is a grid of ordered pairs (αf , αb ) in [0, 1] × [0, 1] over which we perform the search. For
10
2.5 Example
To illustrate the algorithm we consider the GGM with 16 edges given in the first panel of Figure
1. We draw n = 1000 independent observations from this model (see the next section for details).
The values for the threshold parameters αf = 0.17 and αb = 0.09 are determined by 5-fold cross-
validation. The figure also displays the selected pairs of edges at each step in a sequence of
successive updates of Abkj , for k = 1, 4, 9, 12 and the final step k = 16, showing that the estimated
●
7 ●
6
●
5 ●
7 ●
6
●
5 ●
7 ●
6
●
5
●
8 ●
4 ●
8 ●
4 ●
8 ●
4
●9 ●3 ●9 ●3 ●
9 ●
3
●
11 ● 1 ●
11 ● 1 ●
11 ● 1
●12 ●
20 ●12 ●
20 ●12 ●
20
●
13 ●
19 ●
13 ●
19 ●
13 ●
19
●
14 ●
18 ●
14 ●
18 ●
14 ●
18
●
15
●
16 ●
17 ●
15
●
16 ●
17 ●
15
●
16 ●
17
●
7 ●
6
●
5 ●
7 ●
6
●
5 ●
7 ●
6
●
5
●
8 ●
4 ●
8 ●
4 ●
8 ●
4
●9 ●3 ●9 ●3 ●
9 ●
3
●
11 ● 1 ●
11 ● 1 ●
11 ● 1
●12 ●
20 ●12 ●
20 ●12 ●
20
●
13 ●
19 ●
13 ●
19 ●
13 ●
19
●
14 ●
18 ●
14 ●
18 ●
14 ●
18
●
15
●
16 ●
17 ●
15
●
16 ●
17 ●
15
●
16 ●
17
k=9 k = 12 k = 16
11
3 Numerical results and real data example
In this section we report some results from this study and a numerical experiment using real data.
Simulated Models
We consider three dimension values p = 50, 100, 150 and three different models for Ω:
Model 1. Autoregressive model of orden 1, denoted AR(1). In this case Σij = 0.4|i−j| for
i, j = 1, . . . p.
Model 2. Nearest neighbors model of order 2, denoted NN(2). For each node we randomly
select two neighbors and choose a pair of symmetric entries of Ω using the NeighborOmega
Model 3. Block diagonal matrix model with q blocks of size p/q, denoted BG. For p =
50, 100 and 150, we use q = 10, 20 and 30 blocks, respectively. Each block, of size p/q = 5,
For each p and each model we generate R = 50 random samples of size n = 100. These graph
models are widely used in the genetic literature to model gene expression data. See for example
Lee and Liu (2015) and Lee and Ghi (2006). Figure 2 displays graphs from Models 1-3 with
p = 100 nodes.
12
● ● ● ● ●
● ●
● ● ●
● ● ● ●
●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ●
●
● ● ●
●
●
● ● ●
● ● ● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●
●
● ●
● ● ● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ●
● ●
● ● ●
●
● ● ●
● ● ● ● ●
● ●
● ●
● ● ●
● ● ●
●
● ● ●
●
●
● ●
● ● ●
●
● ● ●
● ●
●
● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ●
● ● ●
● ●
● ● ● ● ●
●
● ●
●
● ● ● ● ● ●
● ●
● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
AR(1) NN(2) BG
Figure 2: Graphs of AR(1), NN(2) and BG graphical models for p = 100 nodes.
Methods
minimization for inverse matrix estimation (CLIME) and equivalent partial correlation (EPC)
proposed by Friedman et al. (2008), Cai et al. (2011) and Liang et al. (2015) respectively. More
In our simulations and examples we use the R-package CVglasso with the tuning parameter
where S is the sample covariance, I is the identity matrix, |·|∞ is the elementwise l∞ norm,
and λ is a tuning parameter. For computations, we use the R-package clime with the tuning
13
3. The EPC method, which performs multiple hypothesis tests based on an equivalent measure
to the partial correlation coefficient. This method starts with a screening step to determine
4. The proposed method StepGraph with the forward and backward thresholds, αf > αb ,
for an optional screening step, as in EPC, and the resulting method is then denoted by
StepGraph2 .
To evaluate the graph recovery we compute the Matthews correlation coefficient (Matthews, 1975)
TP × TN − FP × FN
MCC = p , (11)
(TP + FP)(TP + FN)(TN + FP)(TN + FN)
the Specificity = TN/(TN + FP) and the Sensitivity = TP/(TP + FN). Here TP, TN, FP and FN
are the number of true positives, true negatives, false positives and false negatives, respectively.
Larger values of MCC, Sensitivity and Specificity indicate better performances (Fan et al., 2009;
1 n b −1 o n h io
b −1 − p
DKL = tr ΩΩ − log det ΩΩ
2
Results
14
Table 1 shows the MCC performance for the three methods under Models 1-3. For models 1
and 2, StepGraph and EPC clearly outperforms the other two methods, with CLIME being only
slightly better than Glasso. EPC is slightly better than StepGraph and worse than StepGraph2 .
Moreover, the equSA package often crashes in the case of model 3 (NA values reported in the
table). Cai et al. (2011) pointed out that a procedure yielding a more sparse Ω
b is preferable
because this facilitates interpretation of the data. The sensitivity and specificity results, reported
in Table 4 in Appendix B, show that in general StepGraph, StepGraph2 and EPC are more
sparse than the CLIME and Glasso, yielding fewer false positives (more specificity) but a few
more false negatives (less sensitivity). Table 2 shows that all the methods are roughly comparable
under AR(1) and show equally poor performances under NN(2). StepGraph and StepGraph2
The axes in the panels in Figure 3 display the graph p-nodes in a given order. Each cell displays
a gray level proportional to the frequency with which the corresponding pair of nodes appear in
the estimated graph from the R = 50 simulation runs. Hence a white color in a given cell (i, j)
means that nodes i and j are never adjacent in the graph. On the other hand, a pair of nodes
that are always adjacent in the graph are given a black color. Notice that the sparsity patterns
estimated by StepGraph and StepGraph2 best match those of the true models. As noticed before,
EPC results are missing for the case of BG. Figures 1 -3 in Appendix B display similar heatmaps
15
True StepGraph StepGraph 2 Glasso CLIME EPC
AR(1)
NN(2)
BG
Figure 3: Models heatmaps for the frequency of adjancency for each pair of nodes, for models AR(1), NN(2) and BG, with p = 50
16
Table 1: Comparison of means and standard deviations (in brackets) of MCC over R = 50 replicates.
50 0.741 (0.009) 0.863 (0.005) 0.419 (0.016) 0.492 (0.006) 0.831 (0.005)
AR(1) 100 0.751 (0.004) 0.847 (0.005) 0.433 (0.020) 0.464 (0.004) 0.803 (0.005)
150 0.730 (0.004) 0.837 (0.004) 0.474 (0.017) 0.499 (0.003) 0.778 (0.004)
50 0.751 (0.004) 0.857 (0.006) 0.404 (0.014) 0.401 (0.007) 0.870 (0.004)
NN(2) 100 0.802 (0.005) 0.875 (0.005) 0.382 (0.006) 0.407 (0.005) 0.862 (0.000)
150 0.695 (0.007) 0.799 (0.004) 0.337 (0.008) 0.425 (0.003) 0.762 (0.004)
Table 2: Comparison of means and standard deviations (in brackets) of mF and mN KL over R = 50 replicates.
Model p mN KL mF mN KL mF mN KL mF mN KL mF mN KL mF
50 0.70 3.82 0.66 3.59 0.64 3.90 0.63 3.91 0.67 3.75
(0.00) (0.00) (0.00) (0.03) (0.00) (0.02) (0.00) (0.01) (0.00) (0.03)
AR(1) 100 0.83 5.73 0.81 5.24 0.80 5.72 0.79 5.75 0.82 5.56
(0.00) (0.00) (0.00) (0.03) (0.00) (0.02) (0.00) (0.01) (0.00) (0.03)
150 0.89 7.16 0.87 6.53 0.86 7.21 0.86 7.25 0.88 7.03
(0.00) (0.00) (0.00) (0.03) (0.02) (0.02) (0.01) (0.01) (0.00) (0.02)
50 0.99 6.98 0.99 6.88 0.99 6.65 0.99 6.64 1.00 6.39
(0.00) (0.00) (0.00) (0.01) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00)
NN(2) 100 1.00 10.11 1.00 10.09 1.00 9.64 1.00 9.60 1.00 9.30
(0.00) (0.00) (0.00) (0.01) (0.00) (0.01) (0.00) (0.01) (0.00) (0.00)
150 1.00 12.37 1.00 12.34 1.00 11.90 1.00 11.79 1.00 11.51
(0.00) (0.00) (0.00) (0.01) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00)
17
3.2 Analysis of Breast Cancer Data
In preoperative chemoterapy, the complete eradication of all invasive cancer cells is referred to as
pathological complete response, abbreviated as pCR. It is known in medicine that pCR is associ-
ated with the long-term cancer-free survival of a patient. Gene expression profiling (GEP) – the
measurement of the activity (expression level) of genes in a patient – could in principle be a useful
Using normalized gene expression data of patients in stages I-III of breast cancer, Hess et al.
(2006) aim to identify patients that may achieve pCR under sequential anthracycline paclitaxel
preoperative chemotherapy. When a patient does not achieve pCR state, he is classified in the
group of residual disease (RD), indicating that cancer still remains. Their data consist of 22283
gene expression levels for 133 patients, with 34 pCR and 99 RD. Following Fan et al. (2009) and
Cai et al. (2011) we randomly split the data into a training set and a testing set. The testing set
is formed by randomly selecting 5 pCR patients and 16 RD patients (roughly 1/6 of the subjects)
and the remaining patients form the training set. From the training set, a two sample t-test
is performed to select the 50 most significant genes. The data is then standardized using the
We apply a linear discriminant analysis (LDA) to predict whether a patient may achieve
pathological complete response (pCR), based on the estimated inverse covariance matrix of the
gene expression levels. We label with r = 1 the pCR group and r = 2 the RD group and assume
that data are normally distributed, with common covariance matrix Σ and different means µr .
18
as follows
b µr − 1 µ> Ωµ
δr (x) = x> Ωb b r + logb
πr for i = 1, . . . , n, (12)
2 r
where π
br is the proportion of group r subjects in the training set. The classification rule is
For every method we use 5-fold cross validation on the training data to select the tuning constants.
Table 3 displays the means and standard errors (in brackets) of Sensitivity, Specificity, MCC
performance of StepGraph and CLIME are similar. However notice that StepGraph is preferable
because the recovered graph is much more sparse. On the other hand, the performances of Glasso
and EPC are similarly poor. The results of StepGraph2 are similar to those of StepGraph and
therefore omited.
Table 3: Comparison of means and standard deviations (in brackets) of Sensitivity, Specificity, MCC and Number of selected edges
4 Concluding remarks
This paper introduces a stepwise procedure, called StepGraph, to perform covariance selection
19
the Gaussian graphical model based on Pearson correlations between the best-linear-predictors
prediction errors. The algorithm begins with a family of empty neighborhoods and using basic
steps, forward and backward, adds or delete edges until appropriate thresholds are reached. These
StepGraph is compared with Glasso, CLIME and EPC under different Gaussian graphical
models (AR(1), NN(2) and BG) and using different performance measures regarding network
recovery and sparse estimation of the precision matrix Ω. StepGraph is shown to have good support
recovery performance and to produce more sparse models than Glasso and CLIME (i.e. StepGraph
processing correlation screening step) compare well with standard procedures including Glasso,
CLIME and EPC. Particularly good simulation results are obtained under block models where
We apply StepGraph for the analysis of breast cancer data and show that our method is a
References
Baldi, P., S. Brunak, Y. Chauvin, C. Andersen, and H. Nielsen (2000). Assessing the accuracy of
Banerjee, O., L. El Ghaoui, and A. d’Aspremont (2008). Model selection through sparse maximum
20
likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning
Research 9, 485–516.
Bühlmann, P. and S. Van De Geer (2011). Statistics for high-dimensional data: methods, theory
Cai, T., W. Liu, and X. Luo (2011). A constrained `1 minimization approach to sparse precision
matrix estimation. Journal of the American Statistical Association 106 (494), 594–607.
Statistics.
Edwards, D. (2000). Introduction to Graphical Modelling. Springer Science & Business Media.
Fan, J., Y. Feng, and Y. Wu (2009). Network exploration via the adaptive lasso and scad penalties.
Friedman, J., T. Hastie, and R. Tibshirani (2008). Sparse inverse covariance estimation with the
21
Huang, S., J. Jin, and Z. Yao (2016). Partial correlation screening for estimating large precision
Johnson, C. C., A. Jalali, and P. Ravikumar (2011). High-dimensional sparse inverse covariance
Kalisch, M. and P. Bühlmann (2007). Estimating high-dimensional directed acyclic graphs with
Lawrance, A. J. (1976). On conditional and partial correlation. The American Statistician 30 (3),
146–149.
Lee, H. and J. Ghi (2006). Gradient directed regularization for sparse gaussian concentration
Lee, W. and Y. Liu (2015). Joint estimation of multiple precision matrices with common structures.
Liang, F., Q. Song, and P. Qiu (2015). An equivalent measure of partial correlation coefficients
for high-dimensional gaussian graphical models. Journal of the American Statistical Associa-
Liu, H. and L. Wang (2012). Tiger: A tuning-insensitive approach for optimally estimating
22
Matthews, B. (1975). Comparison of the predicted and observed secondary structure of t4 phage
Meinshausen, N. and P. Bühlmann (2006). High-dimensional graphs and variable selection with
Peng, J., P. Wang, N. Zhou, and J. Zhu (2009). Partial correlation estimation by joint sparse
regression models. Journal of the American Statistical Association 104 (486), 735–746.
Statistics 5, 935–980.
Ren, Z., T. Sun, C.-H. Zhang, H. H. Zhou, et al. (2015). Asymptotic normality and optimalities
in estimation of large gaussian graphical models. The Annals of Statistics 43 (3), 991–1026.
Rütimann, P., P. Bühlmann, et al. (2009). High dimensional sparse covariance estimation via
Spirtes, P., C. N. Glymour, and R. Scheines (2000). Causation, Prediction, and Search. MIT
press.
Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming.
Yuan, M. and Y. Lin (2007). Model selection and estimation in the gaussian graphical model.
23
Zhou, S., P. Rütimann, M. Xu, and P. Bühlmann (2011). High-dimensional covariance estimation
based on gaussian graphical models. The Journal of Machine Learning Research 12, 2975–3026.
24
A Selection of the thresholds parameters by cross-validation
In this section we describe the selection of the forward and backward thresholds for StepGraph.
observations. For each j = 1, . . . , p, let Xj = (x1j , . . . , xnj )> denote the jth–column of the matrix
X.
We randomly partition the dataset {xi }1≤i≤n into K disjoint subsets of approximately equal
XK
th (t)
size, the t subset being of size nt ≥ 2 and nt = n. For every t, let {xi }1≤i≤nt be the tth
t=1
(t)
validation subset, and its complement {e
xi }1≤i≤n−nt , the tth training subset.
(t) (t)
For every t = 1, . . . , K and threshold parameters (αf , αb ) ∈ [0, 1] × [0, 1] let Ab1 , . . . , Abp be
(t)
the estimated neighborhoods given by StepGraph using the tth training subset {e
xi }1≤i≤n−nt with
e(t)
x i = (e
(t) (t) (t)
eip ), 1 ≤ i ≤ n − nt . Consider for every node j the estimated neighborhood Abj =
xi1 , . . . , x
(t) (t)
{l1 , . . . , lq } and let βbAb(t) be the estimated coefficient of the regression of X en−nt j )>
x1j , . . . , x
e j = (e
j
(t) (t)
where XAb(t) is the matrix with rows (xil1 , . . . , xilq ), 1 ≤ i ≤ nt represented in (15) (in blue colour).
j
(t)
If the neighborhood Aj = ∅ we define
25
We define the K–fold cross–validation function as
K p
1 XX
(t) b (t)
2
CV (αf , αb ) =
Xj − Xj (αf , αb )
n t=1 j=1
where k·k the L2-norm or euclidean distance in Rp . Hence the K–fold cross–validation forward–
backward thresholds α
bf , α
bb is
(b
αf , α
bb ) =: argmin CV (αf , αb ) (14)
(αf ,αb )∈H
where H is a grid of ordered pairs (αf , αb ) in [0, 1] × [0, 1] over which we perform the search.
tth training subset
(t) (t) (t)
··· x ··· x ··· x ···
e1j e1l1 e1lq
.. .. .. .. .. .. ..
. . . . . . .
(t) (t) (t)
··· x
en−nt j ··· x
en−nt l1 ··· x
en−nt lq ···
(15)
tth validation subset
(t) (t) (t)
··· x1j ··· x1l1 ··· x1lq ···
.. .. .. .. .. .. ..
. . . . . . .
(t) (t) (t)
··· xnt j ··· xnt l1 ··· xnt lq ···
Remark 3 Matrix (15) represents, for every node j the comparison between estimated and pre-
(t) (t)
dicted values for cross-validation. βbAb(t) is computed using the observations X
e j = (e en−nt j )>
x1j , . . . , x
j
(t) (t)
and the matrix X
e b(t) with rows (e
A
eilq ), i = 1, . . . , n − nt in the tth training subset (red
xil1 , . . . , x
j
blue color).
26
B Additional simulation results
In this section we give aditional simulation results. Table 4 reports additional Specificity and
Sensitivity results from our simulation study. Figures 3 - 6 display the heatmaps for the three
27
Table 4: Comparison of means and standard deviations (in brackets) of Specificity (TN%), Sensitivity (TP%) and MCC over R = 50 replicates.
Model p TP% TN% MCC TP% TN% MCC TP% TN% MCC TP% TN% MCC TP% TN% MCC
50 0.756 0.988 0.741 0.812 0.997 0.863 0.994 0.823 0.419 0.988 0.891 0.492 0.750 0.998 0.831
(0.015) (0.002) (0.009) (0.011) (0.000) (0.005) (0.002) (0.012) (0.016) (0.002) (0.003) (0.006) (0.011) (0.000) (0.005)
AR(1) 100 0.632 0.999 0.751 0.771 0.999 0.847 0.989 0.897 0.433 0.983 0.934 0.464 0.689 0.999 0.803
(0.007) (0.000) (0.004) (0.008) (0.000) (0.005) (0.002) (0.009) (0.020) (0.002) (0.001) (0.004) (0.009) (0.000) (0.005)
150 0.607 0.999 0.730 0.749 0.999 0.837 0.981 0.943 0.474 0.972 0.964 0.499 0.636 1.000 0.778
(0.006) (0.000) (0.004) (0.007) (0.000) (0.004) (0.002) (0.007) (0.017) (0.002) (0.001) (0.003) (0.007) (0.000) (0.004)
50 0.632 0.999 0.751 0.787 0.999 0.857 0.971 0.864 0.404 0.984 0.875 0.401 0.798 0.999 0.870
(0.007) (0.000) (0.004) (0.012) (0.000) (0.006) (0.004) (0.010) (0.014) (0.003) (0.004) (0.007) (0.008) (0.000) (0.004)
NN(2) 100 0.730 0.999 0.802 0.831 0.999 0.875 0.987 0.924 0.382 0.985 0.937 0.407 0.791 0.999 0.862
(0.008) (0.000) (0.005) (0.007) (0.000) (0.005) (0.002) (0.004) (0.006) (0.002) (0.001) (0.005) (0.007) (0.000) (0.000)
150 0.555 0.999 0.695 0.693 0.999 0.799 0.952 0.936 0.337 0.934 0.965 0.425 0.621 1.000 0.762
(0.017) (0.000) (0.007) (0.006) (0.000) (0.004) (0.004) (0.002) (0.008) (0.003) (0.001) (0.003) (0.007) (0.000) (0.004)
50 0.994 0.981 0.898 0.904 0.983 0.832 0.867 0.697 0.356 0.962 0.807 0.482 NA NA NA
28
(0.002) (0.001) (0.005) (0.039) (0.001) (0.028) (0.032) (0.021) (0.009) (0.004) (0.005) (0.005) NA NA NA
BG 100 0.949 0.989 0.857 0.949 0.989 0.857 0.569 0.908 0.348 0.818 0.920 0.462 NA NA NA
(0.007) (0.000) (0.005) (0.007) (0.000) (0.005) (0.039) (0.011) (0.004) (0.005) (0.005) (0.002) NA NA NA
150 0.782 0.994 0.780 0.782 0.994 0.780 0.426 0.952 0.314 0.626 0.959 0.408 NA NA NA
(0.021) (0.000) (0.008) (0.021) (0.000) (0.008) (0.035) (0.006) (0.003) (0.006) (0.001) (0.003) NA NA NA
True StepGraph StepGraph 2 Glasso CLIME EPC
p = 50
p = 100
p = 150
Figure 4: Model AR(1). Heatmaps for the frequency of adjancency for each pair of nodes. The axes display the graph p-nodes in a given order.
True StepGraph StepGraph 2 Glasso CLIME EPC
p = 50
p = 100
p = 150
Figure 5: Model NN(2). Heatmaps for the frequency of adjancency for each pair of nodes. The axes display the graph p-nodes in a given order.
True StepGraph StepGraph 2 Glasso CLIME
p = 50
p = 100
p = 150
Figure 6: Model BG. Heatmaps for the frequency of adjancency for each pair of nodes. The axes display the graph p-nodes in a given order.