Exploratory Graph Analysis
Exploratory Graph Analysis
Author manuscript
Psychol Methods. Author manuscript; available in PMC 2021 June 01.
Author Manuscript
Abstract
Exploratory graph analysis (EGA) is a new technique that was recently proposed within the
Author Manuscript
tutorial on how to apply and interpret EGA, using scores from a well-known psychological
instrument: the Marlowe-Crowne Social Desirability Scale.
Keywords
exploratory graph analysis; number of factors; dimensionality; exploratory factor analysis; parallel
analysis
Correspondence concerning this article should be addressed to Hudson Golino, 485 McCormick Road, Gilmer Hall, Room 102,
Charlottesville, VA 22903. [email protected].
Golino et al. Page 2
Investigating the number of latent factors or dimensions that underlie multivariate data is an
Author Manuscript
Since the 1960’s, several techniques were developed to estimate the number of underlying
dimensions in psychological data, such as parallel analysis (PA; Horn, 1965), the K1 rule
(Kaiser, 1960), and the scree test (Cattell, 1966). Simulation studies, however, have
consistently shown that each technique has its own limitations (e.g. see Garrido, Abad, &
Author Manuscript
Ponsoda, 2013; Lubbe, 2019), indicating a need for new dimensionality assessment methods
that can provide more accurate estimates. Furthermore, the factor-analytic techniques also
present challenges beyond the estimation of the number of dimensions such as the rotation
of the loadings matrix and the subjective interpretation of the factor loadings (Sass &
Schmitt, 2010).
Recently, Golino and Epskamp (2017) proposed an alternative approach, Exploratory Graph
Analysis (EGA), to identify the dimensions of psychological constructs from the network
psychometrics perspective. Network psychometrics is a recent addition to the field of
quantitative psychology, which applies the network modeling framework to study
psychological constructs (Epskamp, Rhemtulla, & Borsboom, 2017). The network
psychometric perspective is provided by the Gaussian graphical model (GGM: Lauritzen,
Author Manuscript
1996), which estimates the joint distribution of random variables (i.e., nodes in the network)
by modeling the inverse of the variance-covariance matrix (Epskamp et al., 2017). Nodes
(e.g., test items) are connected by edges or links, which indicate the strength of the
association between the variables (Epskamp & Fried, 2018). Edges are typically partial
correlation coefficients (Epskamp & Fried, 2018). Absent edges represent zero partial
correlations (conditionally independent variables) while non-absent edges represent the
remaining association between two variables after controlling for all other variables
(Epskamp & Fried, 2018; Epskamp et al., 2017). Importantly, absent edges in the model will
only correspond to conditional independence if the data is multivariate normal. EGA
combines the GGM model with a clustering algorithm for weighted networks (walktrap;
Pons & Latapy, 2006) to assess the dimensionality of the items in psychological constructs.
Preliminary investigations of EGA via simulation studies have shown that it’s a promising
alternative technique to assess the dimensionality of constructs (Golino & Epskamp, 2017).
Author Manuscript
Despite the promising initial evidence, the original EGA technique (Golino & Epskamp,
2017) is not expected to work well with unidimensional structures, because of limitations
related to the walktrap algorithm (Pons & Latapy, 2006). Specifically, the modularity
measure (used to quantify the quality of dimensions in the algorithm) penalizes network
structures that have only one dimension (Newman, 2004). As a consequence, the original
EGA algorithm would almost always identify more than one factor, even if the data is
generated from a unidimensional structure. To overcome this limitation, the current paper
Author Manuscript
will present a new EGA algorithm that leverages the walktrap’s tendency to find multiple
clusters in weighted networks. This new EGA algorithm is expected to work well in both
unidimensional and multidimensional structures (i.e., when the underlying dimensionality is
comprised of one or more factors). An in-depth analysis, however, is necessary to check the
suitability of this new EGA algorithm to estimate the number of simulated factors across
different conditions and compared to traditional factor-analytic techniques.
Present Research
The aims of the current paper is threefold. First, it aims to systematically investigate, via a
Monte-Carlo simulation study, the performance of the new EGA algorithm in recovering the
number of simulated factors under different conditions. Previous studies have shown that the
interfactor correlations, number of items per factor, and sample size each have an impact on
Author Manuscript
the original EGA’s performance (Golino & Epskamp, 2017), but little is known about the
impact of factor loadings in the accuracy of EGA. It is well established in the literature that
factor loadings are one of the most important elements that affect the accuracy of traditional
dimensionality assessment methods (Garrido et al., 2013). Skewness has also not been
considered in previous simulations involving EGA, which has only used unskewed
dichotomous data (Golino & Epskamp, 2017). To better resemble practical settings in
psychological data, we examined continuous (i.e., multivariate normal) and dichotomous
data with skew.
Second, this study also investigates an alternative network estimation method for EGA, the
Triangulated Maximally Filtered Graph approach (TMFG; Massara, Di Matteo, & Aste,
2016), hereafter named EGAtmfg. By replacing the GGM model with the TMFG algorithm,
Author Manuscript
the EGAtmfg method can potentially overcome some of the limitations of the former
method. One of the advantages of the TMFG is that it is not restricted to multivariate normal
distributions and partial correlation measures (i.e., any association measure can be used),
and it can potentially make stable comparisons across sample sizes (Christensen, Kenett,
Aste, Silvia, & Kwapil, 2018). We investigated the performance of the EGAtmfg method in
this study, and compared it to the new EGA algorithm, which uses the GGM model. We
discuss the performance of both approaches and suggest practical recommendations for
them. Also, while preliminary studies have compared traditional factor analytic methods
with EGA (Golino & Epskamp, 2017; Golino & Demetriou, 2017), there is a need to
compare the performance of EGA with different types of parallel analysis as well as
techniques based on the scree test (Cattell, 1966), which are among the most widely known
methods historically applied in psychology.
Author Manuscript
Lastly, this article provides a tutorial on how to implement the EGA techniques using R.
With this tutorial, researchers from different fields interested in estimating the
dimensionality of their tests, questionnaires, and other types of instruments can readily apply
EGA. EGA may be especially relevant for those working on the area of aging research, that
needs to use dimensionality assessment/reduction techniques to investigate the structure of
multiple scales, questionnaires and tests.1
The tutorial uses data from the Virginia Cognitive Aging Project (VCAP; Salthouse, 2018)
Author Manuscript
and verifies the dimensionality of the Social Desirability Scale (SDS; Crowne & Marlowe,
1960). A key part of our tutorial will showcase the new EGA algorithm by demonstrating
how it can be used to first estimate dimensionality and then verify the unidimensionality of
the dimensions in the SDS.
detail in Appendix A.
K = Σ−1 (1)
Each element kij can be standardized to yield the partial correlation between two variables yi
and yj, given all other variables in y, y−c(i,j) (Epskamp, Waldorp, Mõttus, & Borsboom,
Author Manuscript
2018):
kij
Cor Y i, Y j y−(i, j) = − . (2)
kii kjj
Epskamp et al. (2018) points out that modeling K in a way that every nonzero element is
treated as a freely estimated parameter generates a sparse model for Σ. The sparse model of
the variance-covariance matrix is the GGM (Epskamp et al., 2018). The level of sparsity of
the GGM can be set using different methods. The most common approach in network
psychometrics is to apply a variant of the least absolute shrinkage and selection operator
(LASSO; Tibshirani, 1996) termed graphical LASSO (GLASSO; Friedman, Hastie, &
Tibshirani, 2008). The GLASSO is a regularization technique that is very fast to estimate
Author Manuscript
both the model structure and the parameters of a sparse GGM (Epskamp et al., 2018). It has
a tuning parameter (γ), that can be chosen in a way to minimize the extended Bayesian
information criterion (EBIC; Chen & Chen, 2008), which is used to estimate optimal model
fit and has been shown to accurately retrieve the true network structure in simulation studies
(Epskamp & Fried, 2018; Foygel & Drton, 2010).
1The current paper is part of an international effort to develop new techniques, methods and metrics for healthy aging launched in
2017 by the World Health Organization (International Consortium on Metrics and Evidence for Healthy Ageing).
Now, we’ll connect the GGM with factor models, and show how network psychometrics can
Author Manuscript
y = Λη + ε, (3)
Σ = ΛΨΛ⊤ + Θ, (4)
Golino and Epskamp (2017) showed a decomposition (using the Woodbury matrix identity;
Woodbury, 1950) leads to two important properties connecting GGM and factor model to
orthogonal factors, with the resulting GGM being composed of unconnected clusters, while
for oblique factors, the resulting GGM is composed of weighted clusters that are connected
for each factor. These two characteristics can be explained as follows. Let the inverse of the
variance-covariance matrix be the precision matrix K, as shown in equation (1), therefore
(following Woodbury, 1950):
−1 −1 ⊤ −1
K = ΛΨΛ⊤ + Θ = Θ−1 − Θ−1Λ Ψ−1 + Λ⊤Θ−1Λ Λ Θ . (5)
Author Manuscript
If X = (Ψ−1+ Λ⊤Θ−1Λ), and knowing that Λ⊤Θ−1Λ is diagonal, then K is a block matrix in
which every block is the inner product of factor loadings and residual variances, with
diagonal blocks scaled by diagonal elements of X and off-diagonal blocks scaled by the off-
diagonal elements of X. As Golino and Epskamp (2017) argue, constraining the diagonal
values of X to one will not lead to information loss. Furthermore, the absolute off-diagonal
elements of X will be smaller than one. Considering the formation of X, its off-diagonal
values will equal zero if the latent factors are orthogonal (Golino & Epskamp, 2017).
In sum, network modeling and factor modeling are closely connected (Epskamp et al.,
2018), and the use of network psychometrics for dimensionality assessment is a direct
consequence of the two properties pointed to earlier. If the resulting GGM of orthogonal
factors is a network with unconnected clusters (often referred to as communities) and the
Author Manuscript
resulting GGM of oblique factors is a set of connected weighted clusters for each factor,
then a community detection algorithm for weighted networks (which detects these clusters)
can be applied to transform a network psychometric model into a dimensionality assessment
technique.
Golino and Epskamp (2017) proposed the use of the Walktrap algorithm (Pons & Latapy,
2006) to detect the number of dimensions (i.e., communities) in a network. The algorithm
uses “random walks” or a stochastic number of steps from one node, across an edge, to
another. The number of steps the random walks take can be adjusted but for current
estimation purposes, EGA always applies the default number of four. The choice of using
four steps comes from previous simulation studies that have shown that the Walktrap
algorithm outperforms other community detection algorithms for weighted networks using
four steps (Gates, Henry, Steinley, & Fair, 2016; Yang, Algesheimer, & Tessone, 2016).
is not expected to work well with unidimensional structures. An overview of the Walktrap
algorithm and why the modularity index penalizes unidimensional structures can be found in
Appendix A. A new EGA algorithm that takes advantage of this characteristic and that could
potentially be used in both unidimensional and multidimensional structures will be presented
in a later section.
EGA Performance
Golino and Epskamp (2017) studied the accuracy in estimating the number of dimensions of
EGA along with six traditional techniques: very simple structure (VSS; Revelle & Rocklin,
1979), minimum average partial (MAP; Velicer, 1976), Bayesian information criterion
(BIC), EBIC, K1, and PA with generalized weighted least squares extraction and random
data generation from a multivariate normal distribution. The authors simulated 32,000 data
Author Manuscript
sets to fit known factor structures, systematically manipulating four variables: number of
factors (2 and 4), number of items (5 and 10), sample size (100, 500, 1000 and 5000), and
correlation between factors (0, .20, .50 and .70). The results of Golino and Epskamp (2017)
showed that the accuracies of the different techniques, in ascending order, were: 39% for
VSS, 50% for MAP, 81% for K1, 81% for BIC, 82% for EBIC, 89% for PA, and 93% for
EGA. EGA was especially superior to the traditional techniques in the cases of larger
structures (4 factors) and very high factor correlations (.70), achieving an accuracy of 71%
which was much higher than the next best method (PA = 40%). Golino and Epskamp (2017)
ascertained that EGA was the most robust method because its accuracy was less affected by
the manipulated variables than those of the other methods.
The higher accuracy of EGA, when compared to traditional factor analytic methods, might
be explained by the network psychometrics approach focus on the unique variance between
Author Manuscript
pairs of variables rather than the variance shared across all variables. When a dataset is
simulated following a traditional factor model, the dimensionality structure becomes clearer
when a network of regularized partial correlations is estimated. Figure 1 shows two
simulated five-factor model (population correlations). One with loadings of .70, inter-factor
correlations of .70, and eight items per factor, and the other with loadings of .70, orthogonal
factors and eight items per factor. In this figure, the population correlation matrix is plotted
In this layout, nodes with stronger edges (e.g. high correlations) are placed closer than nodes
with weak edges (e.g. low correlations). The two-dimensional layout helps to visually
inspect groupings of variables, since variables with higher correlations are plotted together.
The colors of the nodes represent the factors. On the left side of the figure, the population
correlation matrix is shown; on the right side the estimated EGA structure is shown. The
high correlation structure is shown in the top of the figure, and the orthogonal structure in
the bottom. Estimating a network using regularized partial correlations results in a clearer
structure with five groups of variables for the high correlation structure. Also, the strength of
the regularized partial correlations is stronger within clusters than between clusters for the
high correlation structure (top), making the true simulated five-factor structure easier to
depict, even if the true correlation between factors is high.
Author Manuscript
The algorithm starts by simulating an unidimensional structure with four variables and
loadings of .70. Then, it binds the simulated data with the empirical (user-provided) data.
The next step is the estimation of the GGM (if the network model is set to be a GGM). The
Author Manuscript
correlation matrix is computed using the cor_auto function of the qgraph package (Epskamp,
Cramer, Waldorp, Schmittmann, & Borsboom, 2012). The EBICglasso function (from
qgraph) is then used to estimate the GGM. The EBICglasso function will search for the
optimal level of sparsity (using λ parameter in the glasso algorithm) in a network by
choosing a value of λ that minimizes the extended Bayesian information criteria (EBIC;
Chen & Chen, 2008). Following Foygel and Drton (2010), 100 values of λ are chosen.
These values are logarithmically evenly spaced between λMax (the smallest value which will
result in a completely empty network—that is, no edges between the nodes) and λMax/100.
The ratio of the lowest λ value compared to λMax is set to 0.1. A hyperparameter (γ;
gamma) of EBICglasso controls the severity of the model selection. EBIC is computed for
values of gamma larger than zero. However, when gamma is zero, BIC is computed instead
(for more details, see Chen & Chen, 2008).
Author Manuscript
In the implementation of the EGA algorithm, the gamma hyperparameter of the EBICglasso
function is set to 0.5. If the resulting network has a node with the strength of zero (i.e.,
disconnected from the rest of the network), then gamma is set to 0.25. The process repeats
until all nodes are connected in the resulting network or if the gamma parameter is zero. In
this last case, the EBIC is equal to the regular BIC.
In the next step, the walktrap algorithm is used. If the number of estimated clusters in the
Author Manuscript
network is equal to or lower than 2, then the empirical data is unidimensional. This is one of
the most important parts of the new EGA algorithm. Since the walktrap algorithm will
penalize networks with only one cluster, by adding a simulated dataset with a known
unidimensional structure, the walktrap algorithm will estimate at least two clusters: one
comprised by the simulated data, and the other by the empirical or user-provided dataset. In
this case, the estimated number of factors/clusters in the empirical data is one, since the
other cluster is composed by the simulated data. If the number of clusters is greater than
two, then the new EGA algorithm will re-estimate the network, and apply the walktrap
algorithm as described above. The final clustering solution is defined by all clusters with at
least two variables (or nodes/items). The resulting network plot will show the estimated
network and the nodes are colored by cluster/factor. If one variable (or node) is estimated as
belonging to a single cluster, this variable won’t be colored in the plot. This strategy helps
Author Manuscript
the user identify if there are any variables that do not pertain to any cluster in the network.
Another difference from the original EGA method is related to the gamma parameter of the
EBICglasso function. Originally, Golino and Epskamp (2017) used the default of 0.5. This
modification, together with the removal of clusters with single nodes, makes the result of
EGA more likely to be stable, in the sense that it will generate less extreme results with the
number of clusters approaching the number of variables.
cliques (i.e., sets of connected nodes; a triangle and tetrahedron, respectively). The TMFG
method constructs a network using zero-order correlations and the resulting network can be
associated with the inverse covariance matrix (yielding a GGM; Barfuss, Massara, Di
Matteo, & Aste, 2016). Notably, the TMFG can use any association measure and thus does
not assume the data is multivariate normal.
Construction begins by forming a tetrahedron (Figure 3) of the four nodes that have the
highest sum of correlations that are greater than the average correlation in the correlation
matrix, which is defined as:
∑i ∑j cij
c= , (6)
n
Author Manuscript
where cij is the correlation between node i and node j, c is the average correlation of the
correlation matrix (6), and wi is the sum of the correlations greater than the average
correlation for node i (7).
Next, the algorithm iteratively identifies the node that maximizes its sum of correlations to a
Author Manuscript
connected set of three nodes (triangles) already included in the network and then adds that
node to the network. In equation (8), this is mathematically defined as the maximum gain of
the score function (S; e.g., sum of correlations) for each node (v) with each node in a set of
triangles (t1, t2, t3) in the network (Figure 4):
The process is completed once every node is connected in the network. In this process, the
network automatically generates what’s called a planar network. A planar network is a
network that could be drawn on a sphere with no edges crossing (Figure 3; often, however,
the networks are depicted with edges crossing; Tumminello, Aste, Di Matteo, & Mantegna,
Author Manuscript
2005).
An intriguing property of planar networks is that they form a “nested hierarchy” within the
overall network (Song, Di Matteo, & Aste, 2011). This simply means that sub-networks are
nested within larger sub-networks of the overall network. The constituent elements of these
sub-networks are 3-node cliques (i.e., triangles), which form an emergent hierarchy in the
overall network (Song, Di Matteo, & Aste, 2012). Research that compared a novel
algorithm, which exploited this hierarchical structure, to several traditional methods of
hierarchical clustering (e.g., complete linkage and k-mediods) found that the novel algorithm
outperformed the traditional methods, retrieving more information with fewer clusters (Song
et al., 2012). Similar to EGA, EGAtmfg first constructs the network (using the TMFG
method) and the walktrap algorithm is applied.
Author Manuscript
Parallel analysis was originally proposed by Horn (1965) as a modification of the K1 rule
(Kaiser, 1960) that took into account the sampling variability of the latent roots. The
rationale behind this method is that the true dimensions should have sample eigenvalues that
are larger than those obtained from random variables that are uncorrelated at the population
level. Parallel analysis has been one of the most studied and accurate dimensionality
assessment methods for continuous and categorical variables to date (Crawford et al., 2010;
Author Manuscript
Garrido et al., 2013; Garrido, Abad, & Ponsoda, 2016; Ruscio & Roche, 2012; Timmerman
& Lorenzo-Seva, 2011).
Although Horn (1965) based PA on the eigenvalues obtained from the full correlation matrix
using principal component analysis (PApca), Humphreys and Ilgen (1969) suggested that a
more precise estimate of the number of common factors could be obtained by computing the
eigenvalues from a reduced correlation matrix with estimates of communalities in its
diagonal using principal axis factoring (PApaf). As a communality estimate, they chose the
squared multiple correlations between each variable and all the others. Even though these
two variants of PA have not been compared frequently, Crawford et al. (2010) found that for
continuous variables their overall accuracies were similar for structures of one, two, and four
factors (60% for PApca and 65% for PApaf), with neither method being superior to the other
across all the studied conditions. With categorical variables (two to five response options),
Author Manuscript
however, Timmerman and Lorenzo-Seva (2011) found that PApca clearly outperformed
PApaf for structures of one and three major factors (overall accuracies of 95% for PApca and
70% for PApaf).
methods. Similarly, Ruscio and Roche (2012) showed that the OC (74%), PA (76%), and the
Akaike Information Criterion (73%) had comparable accuracies that were notably higher
than other methods including the BIC (60%), MAP (60%), the chi-square test of model fit
(59%), the AF (46%), and K1 (9%).
Method
Design
In order to evaluate the performance of the different dimensionality methods, six relevant
variables were systematically manipulated using Monte Carlo methods: the number of
factors, factor loadings, variables per factor, factor correlations, number of response options,
and sample size. For each of these, their levels were chosen to represent conditions that are
Author Manuscript
encountered in empirical research and that could produce differential levels of accuracy for
the dimensionality procedures.
Purc-Stephenson, 2009). Additionally, these levels are in line with typical simulation studies
Author Manuscript
in the area of dimensionality (e.g., Auerswald & Moshagen, 2019; Garrido et al., 2016).
Factor loadings: factor loadings were simulated with the levels of .40, .55, .70, and .85.
According to Comrey and Lee (2016), loadings of .40, .55, and .70 can be considered as
poor, good, and excellent, respectively, thus representing a wide range of factor saturations.
In addition, loadings of .85 were also simulated, which although not frequently encountered
in psychological data, allow for the evaluation of the dimensionality methods under ideal
conditions.
Variables per factor: the factors generated were composed of 3, 4, 8, and 12 indicators with
salient loadings. Three items are the minimum required for factor identification (Anderson,
1958), 4 items per factor represents a slightly overidentified model, while factors composed
of 8 and 12 items may be considered as moderately strong and highly overidentified,
Author Manuscript
respectively (Velicer, 1976; Widaman, 1993). It should be noted that the condition of 12
variables per factor was simulated for unidimensional structures only.
Factor correlations: factor correlations were simulated with the levels of .00, .30, .50,
and .70. This includes the orthogonal condition (.00), as well as medium (.30) and large
(.50) correlation levels, according to Cohen (1988). Further, although factor correlations
of .70 are very large, in some areas within psychology (e.g., intelligence), researchers
sometimes have to distinguish between constructs that are this highly correlated (e.g., Kane,
Hambrick, & Conway, 2005).
Number of response options: normal continuous and dichotomous types of data were
generated. The level of association between the continuous variables was measured using
Pearson’s correlations, while tetrachoric correlations were used for the dichotomous
Author Manuscript
variables.
Sample size: datasets with 500, 1,000, and 5,000 observations were simulated. Sample sizes
of 500 and 1,000 can be considered as medium and large, respectively (Li, 2016), while a
sample of 5,000 observations allows for the evaluation of the dimensionality methods in
conditions that can approximate their population performance. Further, these sample sizes
were selected by taking into account that tetrachoric correlations require large sample sizes
to achieve acceptable sampling errors, especially when the item difficulties vary
substantially (such as when the data are skewed; Timmerman & Lorenzo-Seva, 2011).
In order to generate more realistic factor structures, several steps were undertaken. First, the
factor loading for each item was drawn randomly from a uniform distribution with values
Author Manuscript
ranging from ±.10 of the specified level manipulated (e.g., for the level of .40 the loadings
were drawn from the range of .30 to .50). Second, as it is common in practice to find
complex structures in which items present non-zero loadings on multiple factors, we
generated cross-loadings consistent to those commonly found in real data. The cross-
loadings were generated following the procedure described in (Meade, 2008) and (Garcia-
Garzon, Abad, & Garrido, 2019a): cross-loadings were randomly drawn from a normal
distribution, N(0, .05), for all the items. Third, the magnitude of skewness for each item was
randomly drawn with equal probability from a range of −2 to 2 in increments of . 50,
As the simulation design of the current study is not completely crossed (e.g., there are no
factor correlations for unidimensional structures), it can be broken down into two parts: (a)
the unidimensional conditions with a 4 × 4 × 2 × 3 (factor loadings × variables per factor ×
number of response options × sample size) design, for a total of 96 condition combinations;
and (b) the multidimensional conditions with a 4 × 3 × 4 × 2 × 3 (factor loadings × variables
per factor × factor correlations × number of response options × sample size) design, for a
total of 288 condition combinations. For each of these 384 conditions combinations, 500
replicates were simulated.
Data Generation
Author Manuscript
For each simulated condition, 500 sample data matrices were generated according to the
common factor model. A detailed description of the data simulation approach can be found
on Appendix C. The resulting continuous variables were also dichotomized by applying a set
of thresholds according to specific levels of skewness (Garrido et al., 2013). For each sample
data matrix generated, the convergence of EGA with GLASSO estimation was verified (see
the convergence rate on Appendix D). If the analysis did not generate a numeric estimation
(i.e. number of factors), the sample data matrix was discarded and a new one was generated,
until we obtained 500 sample data matrices per condition.
Data analysis
We used R (R Core Team, 2017) for all our analyses. The AF and OC techniques were
computed using the nFactors package (Raiche, 2010), while PA with resampling was applied
Author Manuscript
using the fa.parallel function contained in the psych package (Revelle, 2018). Both versions
of EGA were applied using the EGAnet package (Golino & Christensen, 2019). The figures
were generated using the ggplot2 (Wickham, 2016) and ggpubr package (Kassambara,
2017).2
2The paper was written following a reproducible approach, integrating text and code into two sets of files. The first set has all the code
used in the simulation. The second set contains an R Markdown file integrating the manuscript text and code used for the statistical
and graphical analysis presented in the results’ section. The papaja package (Aust & Barth, 2018) was used to easily create a
document following the APA guidelines. Two other methods that are available in R and that may be used by applied researchers are
Velicer’s MAP (Velicer, 1976) and the very simple structure (VSS; Revelle & Rocklin, 1979), with both being implemented in the
psych package (Revelle, 2018). Since Golino and Epskamp (2017) already compared EGA with VSS and MAP, the current paper
won’t present and discuss these two methods. However, readers interested in comparing EGA and EGAtmfg with MAP and VSS can
find a summary of the results in Appendix E.
The third criteria (MAE) is similar to MBE, but uses the absolute value of the difference
Author Manuscript
Finally, analyses of variance (ANOVA) were conducted to investigate how the factor levels
and their combinations impacted the accuracy of the dimensionality methods. The PC and
MAE were set (separately) as the dependent variables and the manipulated variables
constituted the independent factors. The partial eta squared (η2) measure of effect size was
used to assess the magnitude of the main effects and interactions, per technique. According
Author Manuscript
to Cohen (1988), η2 values of 0.01, 0.06, and 0.14 can be considered as small, medium, and
large effect sizes, respectively. It is important to note that all the codes used in the current
study is available at an Open Science Framework repository, for reproducibility purposes:
https://osf.io/e9f2c/?view_only=3732b311ef304b1793ee92613dcb0fe7.
Results
Overall Performance
The overall performance of the dimensionality methods, as well as their performance across
the levels of the independent variables, is presented in Table 1. According to the accuracy of
the methods shown in the table, the methods can be classified into three groups: low (below
70%; AF and OC), moderate (70% and 80%; EGAtmfg and K1), and high accuracy (> 80%;
Author Manuscript
PApaf, PApca and EGA). In terms of the PC criterion, the methods from best to worst were:
EGA (M = 87.91%, SD = 32.60%), PApca (M = 83.01%, SD = 37.55%), PApaf (M =
81.88%, SD = 38.52%), K1 (M = 79.46%, SD = 40.40%), EGAtmfg (M = 74.61%, SD =
43.52%), OC (M = 66.36%, SD = 47.25%) and AF (M = 54.59%, SD = 49.79%).
In terms of the MBE, EGA method showed the least overall bias, with a very small tendency
to overfactor (0.02), followed by EGAtmfg (MBE = −0.12), PApaf (−0.25) and PApca
(−0.29), which had a moderate tendency to underfactor. The rest of the methods had
considerable larger MBEs, with OC (−0.61) and AF (−0.97) underfactoring, and K1 (0.33)
overfactoring. Regarding the MAE, the two best methods were EGA (0.27) and PApca
(0.30), followed by PApaf (0.32) and EGAtmfg (0.32). The remaining methods, K1 (0.46),
OC (0.71) and AF (0.97), produced MAEs that were markedly worse.
Author Manuscript
Unidimensional Structures
Figure 5 shows the accuracy of the methods per sample size, factor loadings and number of
variables for continuous (Figure 5A) and dichotomous (Figure 5B) data. In each plot, a
dashed gray line represents an accuracy of 90%. Inspecting Figure 5 reveals several notable
trends. First, while most methods presented an accuracy higher than 90% in the continuous
data condition (Figure 5A), EGAtmfg fails considerably when the number of variables per
factor is 12 (M = 26.20%). Second, K1 presents a low accuracy for sample size of 500,
loadings of .40 and 12 variables per factor (M = 11.75%). Third, PApaf performs poorly
Author Manuscript
when the factor loadings is .40 and the number of items is 3 or 4 (M = 0.35%), improving
significantly for 3 or 4 variables per factor and loadings of .55 (M = 57.52%)
In the dichotomous data condition, the scenario is a slightly more nuanced for the percentage
of correct dimensionality estimates. AF and PApca are the two most accurate methods
(99.78% and 99.27%, respectively), followed by OC (M = 94.57%) and EGA (M = 92.54%).
The accuracy of K1 and OC decreases with an increase in the number of variables, for factor
loadings of .40 and .55 and sample sizes of 500 and 1000. EGAtmfg once again presents a
very low accuracy when the number of variables is 12 (M = 11.38%), although presenting a
high accuracy for 3, 4 or 8 items (M = 97.29%). It is also notable that PApaf presents a much
lower percentage of correct estimates for loadings of .40 (M = 40.87%) and .55 (M =
40.87%), especially when compared with EGA (MLOAD=0.40 = 91.22%, MLOAD=0.55 =
95.87%).
Author Manuscript
Figure 6 shows the absolute bias (MAE) for continuous (Figure 6A) and dichotomous data
(Figure 6B). In the continuous data condition, PApca, OC and AF presented a MAE of zero,
while EGA had a MAE 0.04, K1 0.05, K1 had 0.05, PApaf 0.20, and EGAtmfg 0.24.
Except for loadings of .40 and .55, EGAtmfg presented higher bias for conditions with 12
items, in general (MAE = 0.26). PApaf had higher MAE for loadings of .40 and three or four
variables per factor (MAE = 1.00), and for loadings of .55 and 3 variables per factor (MAE
= 0.71). Also, EGA, K1 and EGAtmfg presented an increased bias in the conditions with
factor loadings of .40, 12 variables per factor and sample size of 500.
Bias increased in the dichotomous data conditions (Figure 6B). The order of MAE (from
worst to best), however, remained the same: EGAtmfg (MAE = 0.24), PApaf (MAE = 0.20),
Author Manuscript
K1 (MAE = 0.05 and EGA (MAE = 0.04). OC (MAE = 0), AF (MAE = 0) and PApca (MAE
= 0) presented the lower bias.
Table 2 shows the effect sizes per condition simulated. K1 and PApaf were the methods that
presented the highest effect sizes, in general. Both methods are very affected, in terms of
accuracy and bias, by the variability in the number of variables, factor loadings and the
interaction between factor loadings and number of variables. EGAtmfg is also very affected
by the number of variables per factor, both in terms of accuracy and bias.
Multidimensional structures
Figure 7 shows the accuracy of the methods per sample size, factor loadings, interfactor
correlation and number of variables for continuous (Figure 7A) and dichotomous data
Author Manuscript
(Figure 7B), for the five most accurate techniques (PApaf, EGA, EGAtmfg, K1 and PApca).
In each plot, a dashed gray line represents an accuracy of 90%. For the continuous data
condition, the order of the methods in terms of percentage of correct dimensionality
estimates is: PApaf (M = 88.18%), EGA (M = 87.20%), K1 (M = 83.29%), PApca (M =
81.02%) and EGAtmfg (M = 76.33%).
The first notable trend in Figure 7 is the very high accuracy (above 90%) in the continuous
data condition (Figure 7A) for loadings from .55 to .85 and interfactor correlation from zero
to .50 for most methods, with the following exceptions. For loadings of .55, orthogonal
Author Manuscript
factors and three variables per factor, the accuracy of PApaf is lower than 75%. The accuracy
of K1 is also below 75% in conditions with eight items and samples of 500, as well as PApca
in conditions with 3 or 4 items, samples of 500 and interfactor correlation of .50. EGAtmfg
presents a PC lower than 75% irrespective of sample size when the interfactor correlation
is .50 and three variables per factor.
It is important to note that the accuracy of K1 goes down with the increase in the number of
variables per factor, in conditions with loadings of .40, sample sizes of 500 or 1000. The
accuracy of EGA is almost always lower than 75% with loadings of .40 and sample size of
500. It is also notable that PApaf have very low PCs in conditions with loadings of .40 and 3
or 4 variables per factor.
In the conditions where the interfactor correlation is .70, factor loading is .40, and number of
Author Manuscript
variables per factor is eight, PApaf presented a mean percentage of correct estimates of
92.13% and 99.87% for sample size of 1000 and 5000, while EGA presented an accuracy of
66.07% for sample size of 1000 and 98.06% for sample size of 5000. In the same conditions,
EGAtmfg presented an accuracy of 69.67% and 92.73% for sample sizes of 1000 and 5000,
while PApca presented an accuracy of 48.60% and 100%, and K1 7.73% and 95.33%
respectively for samples of 1000 and 5000.
In conditions with interfactor correlation of .70 and factor loadings of .55, PApca and K1
only presented percentage of correct dimensionality estimates above 90% with eight
variables per factor and sample size of 1000 and 5000. EGA and EGAtmfg presented an
accuracy higher than 90% irrespective of sample size with eight variables per factor, for a
loading of .55 and interfactor correlation of .70. EGA (86.07%) and PApaf (99.59%), on the
Author Manuscript
other side, presented high PCs for loadings varying from .55 to .85 and sample sizes of 1000
and 5000, irrespective of the number of variables per factor when the interfactor correlation
is .70.
The accuracy for EGA and PApaf for factor loadings of .70, across all conditions, is 98.83%
and 99.99%, respectively. For factor loadings of .85 is 100% for both EGA and PApaf. At
the same time, EGAtmfg presented an accuracy of 82.12% for loadings of .70 and 85.54%
for loadings of .85, while K1 presented an accuracy of 91.27% and 92.01%, and PApca of
84.99% and 87.78% for loadings of .70 and .85, respectively.
In the dichotomous data condition, the scenario is, again, more nuanced in terms of accuracy
than in the continuous data condition (Figure 7B). EGA is the most accurate method (M =
81.47%), followed by PApaf (M = 78.74%), PApca (M = 70.23%), EGAtmfg (M = 69.38%)
Author Manuscript
and K1 (M = 65.78%).
Figure 7B reveals two general tendencies. One is the increase of PC with the increase of
number of variables per factor, sample size and factor loadings. The second one is the
decrease in accuracy as the interfactor correlation increases from zero to .70. With loadings
of .40, most techniques present accuracies lower than 90%, except in the following
conditions. For a sample size of 1000, eight items per factor and orthogonal factors, EGA,
PApca and EGAtmfg presented an accuracy greater than 90%. For a sample size of 5000 and
orthogonal factors, EGA and PApca achieved an accuracy higher than 90% irrespective of
Author Manuscript
the number of variables per factor, while PApaf increased the accuracy with the increase in
the number of variables and K1 decreased the accuracy with the number of items going from
3 to 8. With an interfactor correlation of .30, PApca achieved an accuracy higher than 90%
with eight items and a sample of 1000, and with a sample size of 5000, the accuracy was
above 90% irrespective of the number of variables, while EGA achieved the same level of
accuracy only with four or eight variables per factor. With an interfactor correlation of .50,
EGA, EGAtmfg, PApaf and PApca presented accuracies above 90% with eight items and
sample size of 5000. When the correlation was .70, only EGA presented an accuracy higher
than 90%, with a sample size of 500 and eight variables per factor.
As the factor loadings increase, the accuracy of the methods also increase, even if the
interfactor correlation is .70. EGA presented an accuracy of 23.92% for loadings of .40,
54.21% for loadings of .55, 89.32% for loadings of .70 and 99.22% for loadings of .85.
Author Manuscript
PApaf presented a similar pattern, with PC of 44.64% for loadings of .40, 80.46% for
loadings of .55, 94.59% for loadings of .70 and 98.99% for loadings of .85.
Figure 9 shows the bias (MAE) for continuous (Figure 9A) and dichotomous data (Figure
9B). In the continuous data condition, PApaf presented the lowest bias (MAE = 0.28),
Author Manuscript
followed by EGAtmfg (MAE = 0.29), K1 (MAE = 0.32), PApca (MAE = 0.33) and EGA
(MAE = 0.45). The bias of the techniques increases with the increase of interfactor
correlation, but decreases with higher sample sizes and higher factor loadings. Interestingly,
while EGA presented a mean absolute error of 1.62 for loadings of .40, it shrank to 0.15 for
loadings of .55 and to 0.01 for loadings of .70 or .85. PApaf had a similar pattern, presenting
a mean absolute error of 1.01 for loadings of .40, 0.11 for loadings of .55 and 0 for loadings
of .70 or .85. In contrast, PApca presented a mean absolute error of 0.50, 0.33 and 0.24 for
loadings of .40, .55 and to .70 or .85, respectively.
Finally, in the dichotomous data condition, EGA presented the lowest bias (MAE = 0.27),
followed by EGAtmfg (MAE = 0.38), PApaf (MAE = 0.44), PApca (MAE = 0.52) and K1
(MAE = 0.89). Similarly to the continuous variables, the bias of the techniques increases
with the increase of interfactor correlation, but decreases with higher sample sizes and
Author Manuscript
Table 3 shows the effect size for the five most accurate methods (a heatmap version of Table
3 is available in Appendix F). It is interesting to note that EGA presents a high effect size for
factor loading, both in terms of accuracy and bias. EGAtmfg presents a high effect size for
the number of variables and interfactor correlation, while PApaf is more affected by factor
loadings. PApca presents a high effect size for interfactor correlation and factor loadings. As
with the unidimensional structures, K1 presented a higher number of moderate and high
Author Manuscript
effect sizes.
In sum, the results revealed that AF and OC presented high accuracy only in the
unidimensional conditions, K1 and EGAtmfg presented a moderately good accuracy in both
unidimensional and multidimensional structures, and EGA, PApaf, PApca presented higher
accuracies in general. The most accurate technique was EGA, with a mean accuracy of 88%
accross conditions, followed by PApca (83%) and PApaf (82%).
Social Desirability Scale (SDS; Crowne & Marlowe, 1960) during the first measurement
occasion (between 2001 and 2017). The participants’ (64.8% women) age ranged from 18 to
97 years old (M = 50.72, SD = 18.73) and had an average of 15.65 years of education.
To start, the EGAnet package can be downloaded and installed from CRAN:
The EGAnet package was developed as a simple and easy way to implement the exploratory
graph analysis technique. The package has several functions but we will focus on the new
EGA algorithm in this tutorial. This function simultaneously integrates the algorithm to
Author Manuscript
# EGA arguments
EGA(data, model = c(“glasso”, “TMFG”), plot.EGA = TRUE,
n, steps = 4, nvar = 4, nfact = 1, load = .70, ...)
The next three arguments: nvar, nfact, and load are parameters used to simulate data for
Author Manuscript
detecting unidimensionality. nvar sets the number of variables (defaults to 4), nfact sets the
number of factors (defaults to 1), and load sets the item loadings on each factor (defaults
to .70). We recommend using the default values when estimating multidimensional
structures but adjusting the nvar value for unidimensional structures. Our tutorial will
provide recommendations for how to do so. Finally, the ... argument is used to pass
additional network estimation arguments into glasso or TMFG functions. Links to these
functions are provided in the EGA function’s documentation.
Tutorial
The first step is to load the EGAnet package. Then, the dataset should be imported into R. In
this case, the SDS dataset composed of dichotomous (TRUE/FALSE) variables is saved as
a .csv file in the local directory, so the function to import the dataset into R is the read.csv
Author Manuscript
function. An object named sds can be created to store the data and, as a last step, the EGA
function is used. It is important to note that before importing the dataset the reversed items
had been recoded so that all the items have the same direction.
The results in Figure 10 show five dimensions for the SDS, which can be interpreted as
Author Manuscript
follows. The first dimension (red nodes) reflects behaviors and attitudes that are egoist,
insouciant, a little bit manipulative and resentful, with items such as item 19: I sometimes
try to get even rather than forgive and forget. The second reflects behaviors and attitudes of a
cautious and well-mannered people, with items similar to item 27: I never make a long trip
without checking the safety of my car. The third factor, in turn, indicates a trait of integrity
and credibility, with items such as: I would never think of letting someone else be punished
for my wrongdoings (item 24). The fourth factor indicates a trait of sympathy, generally
exhibited by people that are easy to get along with, with items as item 4: I have never
intensely disliked anyone (item 4). Finally, the fifth factor reflects a low self-esteem trait
with items such as item 5: On occasion I have had doubts about my ability to succeed in life
(item 5).
Author Manuscript
The results above differs from the most common dimensionality structure of the SDS scale,
proposed by Millham (1974), that suggested two constructs of social desirability: one
involving self-denial of undesirable characteristics (denial) and another involving a tendency
to attribute socially desirable characteristics (attribution; Ventimiglia & MacDonald, 2012).
To check which structure presents a better fit to the data, the CFA function from the EGAnet
package can be used. This function takes the object generated by the EGA function, and fits
the corresponding confirmatory factor model using lavaan (Rosseel, 2012). The CFA
Author Manuscript
The fit of the CFA model can be inspected using cfa.ega.sds$fit.measures, and a plot can be
called using the plot(cfa.ega.sds). The five-factor structure estimated using EGA presented
Author Manuscript
the highest CFI (0.97) and the lowest RMSEA (0.03) compared to the theoretical two-factor
(attribution-denial) model: CFI = 0.95, RMSEA = 0.03).
To determine whether the SDS dimensions described above are unidimensional, we can
apply EGA and adjust the nvar argument for data generation. The default value of 4 was used
in the simulation to keep the argument consistent across the conditions. We recommend,
however, to adjust this value when testing whether data is unidimensional. We recommend
setting nvar to the number of variables that are in the dimension being tested. Factor one, for
example, had 14 items (Figure 5), so nvar should be set to 14. Factor two had 6 items, factor
three and four had 5 items and factor five had 3 items, so nvar should be set to 6, 5, 5, and 3,
respectively. We also computed parallel analysis with PAF and PCA using tetrachoric
correlations and data generation via resampling from the psych package (Revelle, 2018). To
Author Manuscript
demonstrate how to implement this procedure, the following code can be applied:
# PApaf
papaf.res <- vector(“numeric”, length = max(ega.sds$wc))
# PApca
papca.res <- vector(“numeric”, length = max(ega.sds$wc))
# Run ‘for’ loop to determine dimensions
for(i in 1:max(ega.sds$wc))
{
# Identify target items
target <- which(ega.sds$wc == i)
# Estimate dimensions
# EGA
ega.res[i] <- max(EGA(sds[,target], model = “glasso”, plot.EGA = FALSE, nvar
Author Manuscript
= length(target))$wc)
cap <- capture.output(pa <- fa.parallel(sds[,target], sim = FALSE, cor =
“poly”, plot = FALSE))
# PApaf
papaf.res[i] <- pa$nfact
# PApca
papca.res[i] <- pa$ncomp
}
# Combine and name results
res <- rbind(ega.res, papaf.res, papca.res)
row.names(res) <- c(“EGA”, “PApaf”, “PApca”)
colnames(res) <- paste(“Factor”,1:5)
Author Manuscript
# Return results
res
As the results show in Table 4, EGA and PApca estimated unidimensional structures for all 5
factors, while PApaf only estimated one factor as unidimensional.
These results are consistent with our simulation findings, suggesting that EGA and PApca
are effective, while PApaf is inaccurate at estimating unidimensionality in dichotomous data.
This tutorial demonstrates how EGA can first be used to detect the number of dimensions in
a multidimensional construct. Then, it shows how EGA can be applied to the dimensions
identified in a construct to verify that each dimension is indeed unidimensional. For applied
researchers, the steps demonstrated in this tutorial are particularly useful for applying EGA
to their own dimensional assessments. This has particular implications for scale
Author Manuscript
development and psychometric assessment practices. EGA appears to be robust for both
multidimensional and unidimensional assessments, whereas traditional methods such as
PApaf and PApca would be necessary to estimate multidimensional and unidimensional
structures, respectively. Thus, applied researchers can use EGA as a single, all-around
dimension identification approach.
Discussion
Author Manuscript
The present study examined the dimensionality identification accuracy of two new
exploratory graph analysis methods (one that can deal with both unidimensional and
multidimensional structures, and the other that implements a new network estimation), as
well as several traditional factor-analytic techniques, using an extensive Monte Carlo
simulation. Aside from manipulating salient variables across ranges of plausible values that
may be found in applied settings, all the structures that were generated had varying main
factor loadings, cross-loadings, and skewness across items in order to enhance the ecological
validity of the simulation. Additionally, previous studies comparing EGA with traditional
factor-analytic methods only included dichotomous variables in the simulation design. The
current paper also included continuous data, expanding our knowledge about the suitability
of EGA as a dimensionality assessment technique compared to traditional methods.
Author Manuscript
Method Performance
The results from the simulation study revealed that the methods could be classified into three
groups: those with high accuracy only in the unidimensional conditions (AF and OC), those
with a moderately good accuracy in both unidimensional and multidimensional structures
Author Manuscript
(K1, EGAtmfg) and those with higher accuracies in general (EGA, PApaf, PApca). Of the
high performing methods, none was the best across every condition and criteria, and all
showed strengths and weaknesses.
Overall, the new EGA algorithm presented the highest accuracy to correctly estimate the
number of simulated factors, and the lowest mean bias error. It is important to note that the
new EGA algorithm can adequately deal with unidimensional structures, a condition that the
original EGA method proposed by Golino and Epskamp (2017) could not handle. At the
same time, the new EGA algorithm was implemented in a way that doesn’t change the
original EGA method if the data presents more than two factors. Both EGA and EGAtmfg
performed similarly to the most accurate traditional technique, parallel analysis, in a number
of conditions.
Author Manuscript
The new EGA algorithm (using the GGM model) was the most accurate method with
medium (.55), and the second best with high (.70) and very high (.85) factor loadings,
followed closely by PApaf. Also, of the five best methods, EGA and PApaf were the two
most robust to the factor correlations, sustaining the smallest decreases in accuracy with
higher factor correlations. The excellent performance of EGA in these conditions is in line
with previous research (Golino & Epskamp, 2017). With low loadings (.40) combined with
smaller samples (500), however, the performance of EGA was lower, but still presented rates
of correct estimates that were in line with those of the other well performing methods.
Author Manuscript
Recent developments in the area of network psychometrics seems to improve the estimation
of the GGM model to deal with low sample sizes and large number of variables (Williams,
2018; Williams & Rast, 2019). Future studies should investigate how these new GGM
estimation procedures can improve the accuracy of EGA, especially in conditions with low
sample size, low factor loadings and moderate or high interfactor correlation.
EGA with TMFG provided correct dimensionality estimates just below that of the other high
performing methods, but its most notable characteristic was that its estimates, along those of
the new EGA and PApaf, were the closest to the population values. In comparison to the
other good performing methods, EGAtmfg was at its best in the unidimensional structures
for fewer variables per factor, and in the multidimensional conditions it was best for
structures weaker factor correlations (≤ .50), and eight variables per factor. In contrast, the
biggest limitations of EGAtmfg came from structures that were composed of many variables
Author Manuscript
per factor, and with highly correlated factors. It is likely that these conditions create
problems for EGAtmfg due to the way it constructs the network, through the formation of
tetrahedrons (groups of four nodes), which severely limits (or enforces) cross-dimension
connections. Future simulations should examine a new method that constructs the network in
a similar way as the TMFG but eliminates its artificial structural constraint (i.e., 3- and 4-
node cliques; Massara & Aste, 2019).
In terms of the two PA methods, they generally performed well, thus extending the vast
literature supporting the accuracy of this procedure (e.g., Garrido et al., 2013, 2016;
Timmerman & Lorenzo-Seva, 2011). Comparing both parallel analysis methods, it’s
interesting to point that while PApca was more accurate in the unidimensional conditions,
PApaf was more robust in the multidimensional conditions, especially with higher interfactor
Author Manuscript
correlations. These two methods complemented each other, with one being stronger where
the other was weaker, and vice versa (e.g., for factor loadings, variables per factor, and factor
correlations). In the case of PApca, the method showed a clear bias in the condition of
multiple factors, few variables per factor (3 or 4) combined with moderate (.50) or very high
factor correlations (.70). In these cases the method will generally produce a one-factor
estimate regardless of the actual dimensionality of the data. The reason for this is simple: the
population eigenvalues after that corresponding to the first factor will be lower than one, and
thus, asymptotically PApca is not able to retain them. In terms of PApaf, it produced
comparatively poorest performance with low factor loadings (.40).
model used to compute the reference eigenvalues only constitutes a strictly adequate
reference for the first observed eigenvalue (Braeken & Van Assen, 2017). The values of
subsequent eigenvalues for the data under consideration are conditional upon the structure in
the data captured by previous eigenvalues. Particularly, when factors are highly correlated
and the number of variables is small, the first eigenvalue will be very large, whereas
succeeding eigenvalues will be necessarily notably smaller (as the sum of the eigenvalues is
always constrained to be equal to the total variance). This situation will give rise to scenarios
where the eigenvalues from major factors after the first will be lower than the reference
eigenvalues at the population level, thus limiting the accuracy of the method for these
Author Manuscript
It is also interesting to note that the automated scree methods presented a very high accuracy
in the unidimensional conditions, but moderately low accuracies in the multidimensional
conditions. Their percentage of correct estimates was between 20% and 30% below to that
of the EGA and PA methods. The AF method was one of the most accurate methods for
orthogonal structures and for single factors (unidimensional structures), but its accuracy
shrinks as the interfactor correlation increases. In the case of K1, the method tended to
overestimate the population dimensionality by very large amounts, as has been widely
documented in the literature (Costello & Osborne, 2005). Surprisingly, the accuracy of K1 in
the current simulation was not bad. This can be explained by the use of three and four
variables per factor in the simulation design, a condition in which K1 presents higher
accuracies. However, the results of the present study show very clear that the K1 technique
Author Manuscript
should be avoided in situations where the number of variables per factor is relatively high,
and the factor loadings are small or moderate. A similar pattern was identified for MAP and
VSS (see Appendix D). MAP presented a moderately low accuracy for 2 (52.5%), 3 (47.4%)
and 4 (44.4%) factors, while VSS presented very low accuracies (14.7%, 7.3% and 5.9%,
respectively). However, MAP presented a very high accuracy for unidimensional structures
(99.7%), and VSS followed in the same direction (91%).
The current paper presents limitations that should be addressed in future studies. A question
that remains open regards the accuracy of the EGA techniques compared to PApaf and
PApca when the simulated data has a complex structure where items have large loadings on
more than one factor. Also, little is known about the accuracy of EGA in the presence of
population error. Lim and Jahng (2019), for example, investigated several variants of parallel
Author Manuscript
analysis, and discovered that the majority of the PA methods presented much lower
accuracies in the presence of population error. Both the issue of complex factor structures
and population error should be addressed in future studies comparing EGA and PA
techniques.
EGA in Practice—Which EGA method should be used with empirical data? In this section
we will provide some practical recommendations to guide researchers in the implementation
of EGA and EGAtmfg. On one hand, it is useful to always compute both EGA and
EGAtmfg and see if their estimates agree. In our simulation, 58.0% of the cases where EGA
erred it did so by overfactoring, while in 85.6% of the cases that EGAtmfg erred it was due
to underfactoring. Thus, when the methods agree it is likely because they have found the
optimal solution. For example, in this study EGA and EGAtmfg provided the same estimate
Author Manuscript
for 78% of the datasets, and for these, their accuracy was nearly perfect (PC = 91.85%,
MAE = 0.10). Therefore, if both EGA and EGAtmfg produce the same dimensionality
estimate researchers can have increased confidence that the solution suggested is optimal, or
if not, very close to it. On the other hand, when the two methods disagreed in the present
study the accuracy of EGA (PC = 73.73%, MAE = .82) decreases and EGAtmfg (PC =
12.94%, MAE = 1.07) significantly decreases. In these instances when EGA and EGAtmfg
provide different estimates in practice, researchers can look at the line plots presented in
Figures 5 and 7 to see the method that is likely to perform better in the conditions that they
think most apply to their data. Additionally, in these cases where EGA and EGAtmfg
Author Manuscript
disagree, it is important to more strongly consider potential alternative solutions (with less or
more dimensions, respectively) to those suggested by the methods. In particular, to help the
researchers decide which dimensionality estimate is better, a fit index was recently
developed specifically for EGA (Golino et al., 2019) and could be used to check which
dimensionality structure (i.e., estimated using EGA or EGAtmfg) fits the the data better.
Lastly, researchers could also use PApaf to check if the number of factors matches the
number of factors estimated using the EGA techniques (Garcia-Garzon et al., 2019b).
Conclusion
This paper describes the EGA method and shows, through an extension simulation, that it
performs as well as the best factor-analytic techniques. On top of excellent performance,
EGA possess several advantages over traditional methods. First, with EGA, researchers do
Author Manuscript
not need to decipher a factor loading matrix but instead can immediately interpret which
items belong to which factor with the color-coded network plot. Second, EGA does not
require the researcher to make any decisions about the type of rotation to use for the factor
structure. There are an enormous number of factor rotations for researchers to chose from,
which can make it difficult for researchers to know whether they are using the appropriate
rotation method. Third, EGA is a single step approach and does not require additional steps
to verify factors, while with traditional methods, the number of dimensions are estimated
first and then are followed by exploratory factor analysis with the specified number of
dimensions. These last two advantages ultimately reduce the number of researcher degrees
of freedom and eliminate most of the potential for bias and errors. In sum, we show that
EGA is a promising method for accurate dimensionality estimation.
Author Manuscript
Acknowledgement
J. Amuthavalli Thiyagarajan and R. Sadana are staff members of the World Health Organization. All listed authors
alone are responsible for the views expressed in this publication and they do not necessarily represent the decisions,
policy, or views of the World Health Organization. Research reported in this publication was supported by the
National Institute on Aging of the National Institutes of Health under award number R01AG024270.
Appendix A
Aij
transition probability, Pij = NS(i)
, which forms the transition matrix, P.
To determine the communities that the nodes belong to, the transition matrix is used to
compute a distance metric, r, which measures the structural similarity between nodes (1).
This structural similarity is defined as (Pons & Latapy, 2006):
n 2
Pik − Pjk
Author Manuscript
rij = ∑ NS(k)
(A1)
k=1
This distance can be generalized to the distance between nodes and communities by
beginning the random walk at a random node in a community, C. This can be defined as:
1
PCj =
C ∑ Pij . (A2)
i∈C
Finally, this can be further generalized to the distance between two communities:
n 2
PC1k − PC2k
rC1C2 = ∑ ,
Author Manuscript
(A3)
k=1
NS(k)
where this definition is consistent with the distance between nodes in the network (Eq. A1).
Algorithm
The algorithm begins by having each node as a cluster (i.e., n clusters). The distances, r, are
computed between all adjacent nodes, and the algorithm then begins to iteratively choose
two clusters. These two clusters chosen are then merged into a new cluster, updating the
distances between the node(s) and cluster(s) with each merge (in each k = n − 1 steps).
Clusters are only merged if they are adjacent to one another (i.e., an edge between them).
The merging method is based on Ward’s agglomerative clustering approach (Ward, 1963)
Author Manuscript
that depends on the estimation of the squared distances between each node and its
community (σk), for each k steps of the algorithm. Since computing σk is computationally
expensive, Pons and Latapy (2006) adopted an efficient approximation that only depends on
the nodes and the communities rather than the k steps. The approximation seeks to minimize
the variation of σ that would be induced if two clusters (C1 and C2) are merged into a new
cluster (C3):
1
Δσ C1, C2 =
n ∑ 2 −
riC 3 ∑ 2 −
riC 1 ∑ 2
riC 2
. (A4)
i ∈ C3 i ∈ C1 i ∈ C2
Since Ward’s approximation adopted by Pons and Latapy (2006) only merges adjacent
clusters, the total number of times Δσ is updated is not very large, and the resulting values
Author Manuscript
of the edges linking j and k is ejk = 12 p, so that the total fraction of edges between the two
Author Manuscript
clusters is ejk + ejk (Newman, 2004). On the other hand, ejj represents the fraction of edges
that fall within cluster j, whose sum equals one: ∑j ejj = 1. Newman (2004) points out that a
division of networks into clusters is meaningful if the value of the sums of ejj and eii is
maximized. However, in cases where only one cluster is presented, the maximal value will
be one, which is also the value of ∑j ejj. Therefore, for networks composed by only one
cluster this index is not informative. A solution Newman (2004) proposed was to calculate
an index that takes ∑j ejj and subtract from it the value that it would take if edges were
placed at random. For a given cluster j, the modularity is calculated as:
where aj is given by ∑j ejk. Therefore, the modularity index penalizes network structures
with only one cluster, since in this condition the value of Q would be zero (Newman, 2004).
Appendix B
For p number of variables, the OC procedure aims to identify the actual factors by
computing p–2 two-point regression models, and verifying if the eigenvalue in question is
greater than the one estimated by these models. The last positive verification, starting from
the second eigenvalue, and continuing without interruption, is used to determine the number
of factors to retain. The predicted eigenvalue λ i, known as the optimal coordinate, is
estimated through the linear regression model using only the last eigenvalue and the (i + 1)tℎ
eigenvalue so that
Author Manuscript
with
and
On the other hand, the AF method searches for the point in the eigenvalue plot where the
slope of the curve changes abruptly. In order to achieve this, the AF evaluates an
Author Manuscript
Additionally, Raiche, Walls, Magis, Riopel, and Blais (2013) complement the OC and AF
Author Manuscript
methods with the K1 rule or PApca, such that no eigenvalues are retained that are below one
(K1) or below the eigenvalue obtained from independent variates (PApca).
Appendix C
RR = ΛΦΛ′, (C1)
where RR is the reproduced population correlation matrix, lambda (Λ) is the measurement
model (i.e. a k × r factor loading matrix for k variables and r factors) and phi (Φ) is the
Author Manuscript
structure matrix of the latent variables (i.e. a r × r matrix of correlations among factors). The
population correlation matrix RP was then obtained by inserting unities in the diagonal of
RR, thereby raising the matrix to full rank. The next step was performing a Cholesky
decomposition of RP, such that:
RP = U′U . (C2)
If either RP was not positive definite (i.e., at least one eigenvalue was ≤ 0) or an item’s
communality was greater than 0.90, the Λ matrix was replaced and a new RP matrix was
computed following the same procedure. Subsequently, the sample data matrix of
continuous variables was computed as:
X = ZU,
Author Manuscript
(C3)
where Z is a matrix of random standard normal deviates with rows equal to the sample size
and columns equal to the number of variables.
Appendix D
Overall, the convergence rates (CRs) of the EGA analysis are high across most conditions.
Those with lower CRs are small factor loading conditions (i.e., loadings = 0.4) associated
with small to medium sample size (i.e., N=500 or 1000). This is expected as the results are
consistent with the performance of EGA, where EGA works best with medium to high factor
loadings or small loadings with large sample size. We think the reason for the
nonconvergence could be related to the GLASSO regularization procedure. This pattern is
Author Manuscript
Among the small loading and small sample conditions, in multidimensional conditions, the
number of factors affects the CRs. The more the factors, the lower the CRs tend to be.
Furthermore, consistent with the performance of EGA, CRs for medium to high factor
loading conditions (i.e., loadings = 0.55, 0.7 or 0.85) are very high, with occasionally a few
non-converged conditions when loadings = 0.5 and sample size is small. All unidimensional
cases with medium to high loadings have 100% CRs. In sum, the CR was 97% for the
Author Manuscript
Appendix E
Table E1.
Table E2.
Mean Bias Error (MBE) for EGA, EGAtmfg, VSS and MAP
Table E3.
Mean Absolute Error (MAE) for EGA, EGAtmfg, VSS and MAP
Appendix
Author Manuscript
Figure F1.
Author Manuscript
References
Anderson H, T.W. & Rubin. (1958). Statistical inference in factor analysis. In Proceedings of the 3rd
berkeley symposium on mathematics, statistics, and probability (Vol. 5, pp. 111–150).
Auerswald M, & Moshagen M (2019). How to determine the number of factors to retain in exploratory
factor analysis: A comparison of extraction methods under realistic conditions. Psychological
Methods, 24, 468–491. 10.1037/met0000200 [PubMed: 30667242]
Aust F, & Barth M (2018). papaja: Create APA manuscripts with R Markdown. Retrieved from https://
github.com/crsh/papaja
Author Manuscript
Barfuss W, Massara GP, Di Matteo T, & Aste T (2016). Parsimonious modeling with information
filtering networks. Physical Review E, 94(6), 062306. [PubMed: 28085404]
Beierl B, E. T. (2018). Is that measure really one-dimensional? Nuisance parameters can mask severe
model misspecification when assessing factorial validity. Methodology, 14(4), 188–196.
Braeken J, & Van Assen MA (2017). An empirical kaiser criterion. Psychological Methods, 22(3), 450.
[PubMed: 27031883]
Cattell RB (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2),
245–276. [PubMed: 26828106]
Chen J, & Chen Z (2008). Extended bayesian information criteria for model selection with large model
spaces. Biometrika, 95(3), 759–771.
Christensen AP, Kenett YN, Aste T, Silvia PJ, & Kwapil TR (2018). Network structure of the
wisconsin schizotypy scales–Short forms: Examining psychometric network filtering approaches.
Behavior Research Methods, 50(6), 2531–2550. https://doi.org/doi:10.3758/s13428-018-1032-9
[PubMed: 29520631]
Author Manuscript
Cohen J (1988). Statistical power analysis for the behavioral sciences. 2nd Hillsdale, NJ: Erlbaum.
Comrey AL, & Lee HB (2016). A first course in factor analysis. New York: Routledge.
Costello AB, & Osborne JW (2005). Best practices in exploratory factor analysis: Four
recommendations for getting the most from your analysis. Practical Assessment, Research &
Evaluation, 10(7), 1–9.
Crawford AV, Green SB, Levy R, Lo W-J, Scott L, Svetina D, & Thompson MS (2010). Evaluation of
parallel analysis methods for determining the number of factors. Educational and Psychological
Measurement, 70(6), 885–901.
Crowne D, & Marlowe D (1960). A new scale of social desirability independent of psychopathology.
Journal of Consulting Psychology, 24(4), 349. [PubMed: 13813058]
Epskamp S, & Fried E (2018). A tutorial on regularized partial correlation networks. Psychological
Methods, 23(4), 617–634. 10.1037/met0000167 [PubMed: 29595293]
Epskamp S, Cramer AOJ, Waldorp LJ, Schmittmann VD, & Borsboom D (2012). qgraph: Network
visualizations of relationships in psychometric data. Journal of Statistical Software, 48(4), 1–18.
Author Manuscript
Garrido LE, Abad FJ, & Ponsoda V (2016). Are fit indices really fit to estimate the number of factors
with categorical variables? Some cautionary findings via monte carlo simulation. Psychological
Author Manuscript
strongly related constructs: Comment on ackerman, beier, and boyle (2005). Psychological
Bulletin, 131, 66–77. [PubMed: 15631552]
Kassambara A (2017). Ggpubr: ‘Ggplot2’ based publication ready plots. Retrieved from https://
CRAN.R-project.org/package=ggpubr
Lauritzen SL (1996). Graphical models (Vol. 17). Oxford: Clarendon Press.
Li C-H (2016). Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood
and diagonally weighted least squares. Behavior Research Methods, 48(3), 936–949. [PubMed:
26174714]
Lim S, & Jahng S (2019). Determining the number of factors using parallel analysis and its recent
variants. Psychological Methods, 24(4), 452–467. [PubMed: 31180694]
Lubbe D (2019). Parallel analysis with categorical variables: Impact of category probability
proportions on dimensionality assessment accuracy. Psychological Methods, 24(3), 339–351.
[PubMed: 29745684]
Massara GP, & Aste T (2019). Learning clique forests. arXiv. Retrieved from https://arxiv.org/abs/
1905.02266
Author Manuscript
Massara GP, Di Matteo T, & Aste T (2016). Network filtering for big data: Triangulated maximally
filtered graph. Journal of Complex Networks, 5(2), 161–178.
Meade AW (2008). Power of afi’s to detect cfa model misfit. In Paper presented at the 23th annual
conference of the society for industrial and organizational psychology San Francisco, CA
Retrieved from pdfs.semanticscholar.org/a23c/45ca18db70125a9a0ad983926513d40fa32b.pdf
Meyers LS, Gamst G, & Guarino AJ (2016). Applied multivariate research: Design and interpretation.
Thousand Oaks: SAGE Publications.
Millham J (1974). Two components of need for approval score and their relationship to cheating
following success and failure. Journal of Research in Personality, 8(4), 378–392.
Author Manuscript
Muthén B, & Kaplan D (1992). A comparison of some methodologies for the factor analysis of non-
normal likert variables: A note on the size of the model. British Journal of Mathematical and
Statistical Psychology, 45(1), 19–30.
Newman M (2004). Fast algorithm for detecting community structure in networks. Physical Review E,
69 10.1103/PhysRevE.69.066133
Pons P, & Latapy M (2006). Computing communities in large networks using random walks. J. Graph
Algorithms Appl, 10(2), 191–218.
R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria: R
Foundation for Statistical Computing Retrieved from https://www.R-project.org/
Raiche G (2010). An r package for parallel analysis and non graphical solutions to the cattell scree test.
Retrieved from http://CRAN.R-project.org/package=nFactors
Raiche G, Walls TA, Magis D, Riopel M, & Blais J-G (2013). Non-graphical solutions for cattell’s
scree test. Methodology: European Journal of Research Methods for the Behavioral and Social
Sciences, 9(1), 23–29. 10.1027/1614-2241/a000051
Author Manuscript
Revelle W (2018). Psych: Procedures for psychological, psychometric, and personality research.
Evanston, Illinois: Northwestern University Retrieved from https://CRAN.R-project.org/
package=psych
Revelle W, & Rocklin T (1979). Very simple structure: An alternative procedure for estimating the
optimal number of interpretable factors. Multivariate Behavioral Research, 14(4), 403–414.
[PubMed: 26804437]
Rosseel Y (2012). lavaan: An R package for structural equation modeling. Journal of Statistical
Software, 48(2), 1–36. Retrieved from http://www.jstatsoft.org/v48/i02/
Ruscio J, & Roche B (2012). Determining the number of factors to retain in an exploratory factor
analysis using comparison data of known factorial structure. Psychological Assessment, 24(2),
282–292. 10.1037/a0025697 [PubMed: 21966933]
Salthouse T (2018). The virginia cognitive aging project. Retrieved from http://www.mentalaging.com
Sass DA, & Schmitt TA (2010). A comparative investigation of rotation criteria within exploratory
factor analysis. Multivariate Behavioral Research, 45(1), 73–103. [PubMed: 26789085]
Author Manuscript
Song W-M, Di Matteo T, & Aste T (2011). Nested hierarchies in planar graphs. Discrete Applied
Mathematics, 159(17), 2135–2146.
Song W-M, Di Matteo T, & Aste T (2012). Hierarchical information clustering by means of
topologically embedded graphs. PLoS One, 7(3), e31929. [PubMed: 22427814]
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical
Society. Series B (Methodological), 58(1), 267–288.
Timmerman ME, & Lorenzo-Seva U (2011). Dimensionality assessment of ordered polytomous items
with parallel analysis. Psychological Methods, 16(2), 209–220. [PubMed: 21500916]
Tumminello M, Aste T, Di Matteo T, & Mantegna RN (2005). A tool for filtering information in
complex systems. Proceedings of the National Academy of Sciences of the United States of
America, 102(30), 10421–10426. 10.1073/pnas.0500298102 [PubMed: 16027373]
Velicer WF (1976). Determining the number of components from the matrix of partial correlations.
Psychometrika, 41(3), 321–327.
Ventimiglia M, & MacDonald DA (2012). An examination of the factorial dimensionality of the
Author Manuscript
marlowe crowne social desirability scale. Personality and Individual Differences, 52(4), 487–491.
Wickham H (2016). Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York Retrieved
from http://ggplot2.org
Widaman KF (1993). Common factor analysis versus principal component analysis: Differential bias
in representing model parameters? Multivariate Behavioral Research, 28(3), 263–311. [PubMed:
26776890]
Williams DR (2018). Bayesian inference for gaussian graphical models: Structure learning,
explanation, and prediction. PsyArXiv. 10.31234/osf.io/x8dpr
Williams DR, & Rast P (2019). Back to the basics: Rethinking partial correlation network
methodology. British Journal of Mathematical and Statistical Psychology [Epub Ahead of Print].
Author Manuscript
10.1111/bmsp.12173
Woodbury M (1950). Inverting modified matrices (Vol. 42, pp. 99–117). Statistical Research Group,
Memo. Rep. no. 42, Princeton University, Princeton, N. J.
Ward JH (1963). Hierarchical grouping to optimize an objective function. Journal of the American
Statistical Association, 58, 236–244. 10.2307/2282967
Yang Z, Algesheimer R, & Tessone CJ (2016). A comparative analysis of community detection
algorithms on artificial networks. Scientific Reports, 6, 30750. [PubMed: 27476470]
Author Manuscript
Author Manuscript
Author Manuscript
Figure 1.
Author Manuscript
Simulated five factor model with loadings of .70 and 5,000 observations with interfactor
correlation of .70 (top) and zero (bottom). The left side shows the population correlation
matrix plotted as a network of zero-order correlations, while the left side shows the EGA
estimation of the population correlation matrix. Nodes represent variables, edges represent
correlations, and the node colors indicates the simulated factors.
Figure 2.
New EGA algorithm for unidimensional and multidimensional structures
Author Manuscript
Figure 3.
A depiction of a network tetrahedron (left) and a tetrahedron drawn so that no edges are
crossing (right)
Author Manuscript
Author Manuscript
Figure 4.
A depiction of how TMFG constructs a network. Starting with the tetrahedron, the node with
the largest sum to three other nodes in the network is added (top left). This process continues
until all nodes are included in the network.
Author Manuscript
Figure 5.
Author Manuscript
Accuracy per sample size, factor loadings and number of variables (NVAR) for
unidimensional factors with continuous (A) and dichotomous (B) data.
Figure 6.
Author Manuscript
Mean Absolute Error (MAE) per sample size, factor loadings and number of variables
(NVAR) for unidimensional factors with continuous (A) and dichotomous (B) data.
Figure 7.
Author Manuscript
Accuracy per sample size, factor loadings and number of variables (NVAR) for
multidimensional factors with continuous (A) and dichotomous (B) data.
Figure 8.
Boxplot comparing the percentage of correct estimates between EGA, PApaf and PApca in
multidimensional structures with dichotomous data by interfactor correlation and
factorloadings.
Author Manuscript
Figure 9.
Author Manuscript
Mean Absolute Error (MAE) per sample size, factor loadings and interfactor correlation for
unidimensional factors with continuous (A) and dichotomous (B) data.
Figure 10.
EGA dimesional structure of the Social Desirability Scale.
Author Manuscript
Table 1
Performance of the dimensionality methods across the levels of the independent variables and in total
Items per factor Sample Size Number of factors Factor Loadings Factor correlation Data
Golino et al.
Methods 3 4 8 12 500 1000 5000 1 2 3 4 0.4 0.55 0.7 0.85 0 0.3 0.5 0.7 Cont Dic. Total
Percentage Correct (PC)
EGA 0.82 0.89 0.93 0.87 0.81 0.89 0.93 0.96 0.84 0.86 0.82 0.68 0.88 0.97 0.98 0.95 0.93 0.87 0.76 0.91 0.85 0.88
EGAtmfg 0.58 0.85 0.95 0.19 0.71 0.74 0.79 0.79 0.72 0.77 0.69 0.64 0.75 0.78 0.81 0.91 0.82 0.68 0.57 0.78 0.71 0.73
OC 0.50 0.63 0.80 0.94 0.65 0.66 0.68 0.97 0.73 0.49 0.37 0.62 0.68 0.69 0.67 0.57 0.78 0.76 0.56 0.69 0.64 0.67
AF 0.51 0.51 0.51 1.00 0.54 0.55 0.56 1.00 0.53 0.28 0.24 0.52 0.55 0.56 0.56 0.97 0.55 0.36 0.31 0.56 0.54 0.56
K1 0.82 0.86 0.71 0.78 0.70 0.78 0.91 0.91 0.82 0.74 0.68 0.55 0.79 0.90 0.94 0.84 0.84 0.84 0.66 0.87 0.72 0.79
PApaf 0.70 0.79 0.94 0.94 0.79 0.82 0.85 0.78 0.82 0.84 0.84 0.45 0.85 0.98 0.99 0.80 0.83 0.84 0.79 0.86 0.78 0.82
PApca 0.71 0.80 0.94 1.00 0.77 0.82 0.89 1.00 0.82 0.75 0.70 0.72 0.83 0.87 0.90 0.98 0.96 0.85 0.53 0.87 0.79 0.83
Mean bias error (MBE)
EGA −0.23 −0.07 0.30 0.23 0.25 −0.10 −0.10 0.07 −0.04 0.00 0.02 0.18 −0.12 −0.01 0.02 0.11 0.10 0.02 −0.17 0.10 −0.06 0.02
EGAtmfg −0.53 −0.16 0.04 1.05 −0.12 −0.12 −0.12 0.27 −0.24 −0.27 −0.38 −0.28 −0.14 −0.05 −0.01 0.06 −0.04 −0.19 −0.32 −0.12 −0.12 −0.09
OC −1.07 −0.75 −0.15 0.07 −0.50 −0.58 −0.72 0.03 −0.32 −0.89 −1.44 −0.40 −0.54 −0.69 −0.77 −0.99 −0.34 −0.37 −0.70 −0.69 −0.51 −0.59
AF −1.04 −1.04 −1.04 0.00 −0.96 −0.96 −0.96 0.00 −0.46 −1.43 −2.27 −0.98 −0.96 −0.95 −0.95 −0.03 −1.09 −1.33 −1.38 −0.95 −0.97 −0.94
K1 −0.14 0.14 0.99 0.40 0.63 0.37 0.00 0.15 0.19 0.40 0.66 1.12 0.31 −0.01 −0.08 0.41 0.41 0.39 0.13 0.09 0.58 0.34
PApaf −0.57 −0.28 0.02 0.07 −0.28 −0.25 −0.22 −0.11 −0.29 −0.31 −0.34 −0.86 −0.14 0.00 0.00 −0.37 −0.24 −0.16 −0.23 −0.23 −0.27 −0.24
PApca −0.52 −0.34 −0.09 0.00 −0.39 −0.30 −0.18 0.00 −0.18 −0.41 −0.68 −0.46 −0.30 −0.23 −0.18 −0.01 −0.05 −0.22 −0.88 −0.23 −0.36 −0.29
Mean Absolute Error (MAE)
EGA 0.29 0.18 0.35 0.23 0.54 0.17 0.10 0.07 0.22 0.35 0.50 0.86 0.16 0.04 0.02 0.16 0.19 0.27 0.47 0.32 0.22 0.27
Note. AF = scree test acceleration factor; OC = scree test optimal coordinate; K1 = eigenvalues-greater-than-one rule; PApca = parallel analysis with principal component analysis; PApaf = parallel analysis
with principal axis factoring; EGA = exploratory graph analysis with the graphical LASSO; EGAtmfg = exploratory graph analysis with the triangulated maximally filtered graph approach. The best column
values are bolded and underlined (highest PC), highlighted in grey (MBE equal to or greater than the average) and highlighted and bolded (MAE one standard deviation below average).
Page 45
Golino et al. Page 46
Table 2
ANOVA partial eta squared (ηp2) effect sizes for the percentage correct (PC) and mean absolute error (MAE)
Author Manuscript
N:NVAR 0.00 0.00 0.04 0.01 0.02 0.01 0.07 0.11 0.01 0.01 0.00 0.00 0.00 0.00
N:LOAD 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.13 0.03 0.03 0.00 0.00 0.00 0.00
NVAR:LOAD 0.00 0.00 0.00 0.00 0.06 0.04 0.19 0.28 0.04 0.04 0.20 0.16 0.00 0.00
Author Manuscript
N:Data 0.00 0.00 0.02 0.00 0.00 0.00 0.03 0.05 0.02 0.02 0.00 0.00 0.00 0.00
NVAR:Data 0.00 0.00 0.05 0.01 0.01 0.01 0.05 0.11 0.03 0.03 0.00 0.00 0.00 0.00
LOAD:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.13 0.06 0.06 0.00 0.00 0.01 0.01
N:NVAR:LOAD 0.00 0.00 0.00 0.00 0.01 0.01 0.06 0.13 0.02 0.02 0.00 0.00 0.00 0.00
N:NVAR:Data 0.00 0.00 0.03 0.00 0.00 0.00 0.01 0.04 0.01 0.01 0.00 0.00 0.00 0.00
N:LOAD:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.05 0.03 0.03 0.00 0.00 0.00 0.00
NVAR:LOAD:Data 0.00 0.00 0.01 0.01 0.01 0.01 0.03 0.12 0.04 0.04 0.00 0.00 0.00 0.00
N:NVAR:LOAD:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.02 0.02 0.00 0.00 0.00 0.00
Note. AF = scree test acceleration factor; EGA = exploratory graph analysis with the graphical LASSO; EGAtmfg = EGA with the triangulated
maximally filtered graph approach; K1 = Kaiser-Guttman eigenvalue rule; OC = scree test optimal coordinate; PApca = parallel analysis with
principal component analysis; PApaf = parallel analysis with principal axis factoring. N = sample size; LOAD = factor loading; NVAR= variables
per factor; CORF= factor correlation; Data = Continuous/dichotomous. Large effect sizes (ηp2 ≥ 0.14) are bolded and highlighted in dark grey;
Author Manuscript
moderate effect sizes (ηp2 between 0.6 and 0.13) are highlighted in light grey.
Author Manuscript
Table 3
ANOVA partial eta squared (ηp2) effect sizes for the percentage correct (PC) and mean absolute error (MAE)
Author Manuscript
NFAC:N 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.01
Author Manuscript
NFAC:NVAR 0.00 0.00 0.00 0.01 0.00 0.09 0.00 0.00 0.00 0.04
N:NVAR 0.00 0.01 0.00 0.00 0.03 0.20 0.00 0.00 0.00 0.00
NFAC:LOAD 0.00 0.01 0.00 0.00 0.01 0.10 0.01 0.00 0.01 0.02
N:LOAD 0.03 0.02 0.00 0.00 0.05 0.17 0.01 0.00 0.01 0.01
NVAR:LOAD 0.02 0.01 0.01 0.02 0.10 0.45 0.11 0.13 0.00 0.00
NFAC:CORF 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.01 0.13
N:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.02
NVAR:CORF 0.04 0.00 0.10 0.11 0.05 0.02 0.01 0.02 0.13 0.15
LOAD:CORF 0.09 0.01 0.01 0.03 0.01 0.01 0.00 0.03 0.02 0.02
NFAC:Data 0.00 0.00 0.00 0.00 0.00 0.03 0.01 0.01 0.00 0.01
N:Data 0.00 0.00 0.00 0.00 0.01 0.05 0.01 0.00 0.01 0.01
NVAR:Data 0.00 0.01 0.00 0.00 0.02 0.18 0.00 0.00 0.00 0.00
Author Manuscript
LOAD:Data 0.00 0.01 0.00 0.00 0.03 0.14 0.02 0.01 0.01 0.01
CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.01
NFAC:N:NVAR 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00
NFAC:N:LOAD 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00
NFAC:NVAR:LOAD 0.00 0.00 0.00 0.00 0.00 0.10 0.01 0.00 0.00 0.00
N:NVAR:LOAD 0.01 0.01 0.00 0.00 0.00 0.15 0.00 0.00 0.00 0.00
NFAC:N:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01
NFAC:NVAR:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03
N:NVAR:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:LOAD:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
N:LOAD:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00
NVAR:LOAD:CORF 0.02 0.00 0.00 0.00 0.01 0.00 0.01 0.03 0.01 0.01
Author Manuscript
NFAC:N:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:NVAR:Data 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00
N:NVAR:Data 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00
NFAC:LOAD:Data 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00
N:LOAD:Data 0.00 0.01 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00
NVAR:LOAD:Data 0.00 0.01 0.00 0.00 0.00 0.11 0.00 0.00 0.00 0.00
NFAC:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
N:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NVAR:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:N:NVAR:LOAD 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00
NFAC:N:NVAR:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:N:LOAD:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:NVAR:LOAD:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
N:NVAR:LOAD:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01
Author Manuscript
NFAC:N:NVAR:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:N:LOAD:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:NVAR:LOAD:Data 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00
N:NVAR:LOAD:Data 0.00 0.01 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00
NFAC:N:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:NVAR:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
N:NVAR:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
N:LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NVAR:LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:N:NVAR:LOAD:CORF 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:N:NVAR:LOAD:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Author Manuscript
NFAC:N:NVAR:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:N:LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:NVAR:LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
N:NVAR:LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NFAC:N:NVAR:LOAD:CORF:Data 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Note. EGA = exploratory graph analysis with the graphical LASSO; EGAtmfg = EGA with the triangulated maximally filtered graph approach; K1
= Kaiser-Guttman eigenvalue rule; PApca = parallel analysis with principal component analysis; PApaf = parallel analysis with principal axis
factoring. N = sample size; LOAD = factor loading; NVAR= variables per factor; CORF= factor correlation; Data = Continuous/dichotomous.
Large effect sizes (ηp2 ≥ 0.14) are bolded and highlighted in dark grey; moderate effect sizes (ηp2 between 0.6 and 0.13) are highlighted in light
grey.
Author Manuscript
Table 4