Sample Size Guideline For Exploratory Factor Analysis When Using Small Sample
Sample Size Guideline For Exploratory Factor Analysis When Using Small Sample
Mohamad Adam Bujang, Puzziawati Ab Ghani Mohamad Adam Bujang, Shahrul Aiman Soelar,
Faculty of Computer & Mathematical Sciences Nor Aizura Zulkifli
Universiti Teknologi MARA Biostatistics Unit,
Selangor, Malaysia Clinical Research Centre
Kuala Lumpur, Malaysia
Abstract—Sample size guideline for exploratory factor analysis measurement scales. Therefore, the aims of this article are
(EFA) was long established however none investigate the effect firstly, to determine the guideline of sufficient sample size for
from the difference of measurement scales. The authors are EFA that produced correct and stable factor solution from
concern if researchers prefer to use the minimum number of various measurement scales and secondly, to propose the
sample size from the guideline in conducting EFA especially in relevant statistics to be achieved such as internal consistency,
the clinical setting since it is difficult to get enough sample size. corrected item to total correlation (CITC), communalities, total
Here, the authors present a guideline of sample size requirement variance explained and factor loadings as indication of reliable
according to various types of measurement scales and also and valid factor solution from EFA when using minimum
suggest guideline if any researcher planned to apply the rule of
sample size guideline from numerical variables and various
thumb that proposes the smallest number of sample size for EFA.
types of measurement scales.
Keywords-component; Exploratory factor analysis, II. METHODOLOGY
measurement scales and sample size
We used data from diabetes mellitus patient type 2 from
I. INTRODUCTION registry database of An Audit of Diabetes Control and
Exploratory Factor Analysis (EFA) is widely used in a Management (ADCM) 2009. The methodology with regards to
clinical study to measure a specific latent variable. the collection of data was described by Mastura et al. [10]. We
Questionnaires such as Quality of life (SF36) [1], Children extracted the data from the first 160 patients without missing
Depression Index (CDI) [2], Adolescent Coping Scale (ACS) values from variables weight, waist circumference (WC), Body
[3], Depression Anxiety and Stress Scales (DASS) [4] and The Mass Index (BMI), random blood glucose (RBG), fasting
Summary of Diabetes Self-care Activities (SDSCA) [5] are all blood glucose (FBG), glycated hemoglobin (HbA1C), total
examples of clinical psychometrics properties designed and cholesterol (TC), and low density lipoprotein (LDL-C).
developed through EFA. The issue that most investigators are Clinically the eight variables can be categorized into three
facing is to determine the sufficient number of sample size to major categories such as Factor 1: body structure (weight, WC
run the EFA since it is not easy to get sample in a clinical and BMI), Factor 2: glucose measurement (FBG and HbA1C)
setting. There were rules of thumbs regarding the sample size and Factor 3: lipid profile (TC and LDL-C). By using EFA
for factor analysis. An extensive range of recommendations from the original values for the eight variables, factor solution
concerning sample size in factor analysis has been proposed. for the eight variables also turn into three categories and
distribution for the eight variables are the same compare to
Based on one of the earliest rule of thumb, Cattell [6] clinically categorization.
suggested minimum sample of 1:3 per item and Gorsuch [7]
proposed minimum of 1:5 per item. According to Barrett et al. All the eight numerical variables were divided
[8], they found that "for 16 variables from Sixteen Personality proportionately into two until ten categories, then we ran the
Factor Questionnaire" good recovery was obtained from a analysis for EFA from two until ten categories using samples
subsample of n = 48 meaning three times from the number of size from 1:2 per variable until 1:20 per variable. For mixture
variables. Then, MacCallum et al. [9] conducted a Monte Carlo measurement scales, we used two categories for Factor 1, three
study on sample size effects. They found an excellent recovery categories for Factor 2 and four categories for Factor 3. The
of population factor structure with a sample size of 60 with 20 authors chose only one combination since as secondary
variables. However, the result was obtained when the level of objective, this paper would like to proof EFA can works with
communality (over 0.7 in average) and over determination (3 mix combination of measurement scales.
loaded factors) were high. EFA was analyzed by using Principle Component
There a lot of reported guidelines of sample size for Analysis extraction method with varimax rotation. Number of
EFA, however, none of it touch on the effect from various factor solution produced by analysis was based on eigen values
more than one. Prior to that, the internal consistencies with we used measurement scales of two and there in our
respective corrected items to total correlation (CITC) for each simulation. Sample size ratio of 1:4 also produced correct
domain and correlation among the items in a particular domain factor solution for all measurement scales except for in
were also reported for every solution and became as a dichotomous variable. For sample size ratio of 1:5, we found
guideline in conducting EFA using minimum sample size. All that all factor solutions were correct for all types of
analyses were carried out using PASW version 18.0 (SPSS Inc, measurement scales. Researchers have to justify the items are
Chicago IL). good enough to represent their respective domains to apply
minimum number of sample size for respective measurement
III. RESULT scale. If we have more items and the tendency to have more
Table 1 showed the result from the simulation of the EFA unrelated items, thus the cross loadings were high. These have
for extension 19 subsequent of samples ties 11 scales to be solved first to get better factor solution. In our analysis,
measurement and these results were summarized in Table 2 and we only use few variables, and there are clinically inter-related.
Table 3. For minimum sample size ratio of 1:2, the factor Besides that, this study also proposes other parameter to
solution is valid only if variables have to be in numerical and be referred to justify reliable and valid factor solution. We
the statistics for this solution are as follow (Cronbach’s alpha is found that, for all the correct factor solution, all the Cronbach’s
0.651, minimum CITC is 0.296, minimum communalities is alpha were more than 0.519, CITC was more than 0.123,
0.60, total variance explained is 86.0% and minimum factor communalities were more than 0.26, total variance explain
loading is 0.68). For sample size ratio of 1:3, the factor solution were more than 68.8% and factor loading were more than 0.45.
has the credibility if scales measurement has to be at least in Although from our findings, we found the minimum
four Likert scale and the statistics for this solution are communalities was 0.051 (see Table 2 from solution sample
Cronbach’s alpha is 0.622, minimum CITC is 0.239, minimum size ratio of 1:2), but we believed this happened due to chance
communalities is more 0.46, total variance explained is more and will not put this as a basis of minimum communalities in
than 80.6%, and factor loading is more than 0.49. For sample getting correct factor solution. These values were consistent
size ratio of 1:4, the minimum scales measurement is in three with literatures such as Cronbach’s alpha more than 0.5 was
Likert scale and criteria for this solution are Cronbach’s alpha acceptable while 0.7 or more was considered good [12] while
is 0.525, minimum CITC is 0.166, minimum communalities is corrected item to total correlation more than 0.2 was acceptable
more than 0.53, total variance explained is 76.4%, and [13] and factor loadings of 0.4 or more were considered good
minimum factor loading is more than 0.61. For sample size [14]. It is difficult to get high communalities and factor
ratio of 1:5 until 1:20, all types of measurement can be applied loading, however, the values that we proposed here are used as
including dichotomous variable. Regardless of sample size a general guideline to achieve strong, stable, reliable and valid
from sample size ratio of 1:5 until 1:20), the minimum statistics factor solution. From our findings, we allow the CITC to be
for this solution were Cronbach’s alpha is more than 0.519, low until 0.123.
minimum CITC is 0.123, minimum communalities is 0.26 and
minimum factor loading is 0.48. Bear in mind that high factor loading, communalities
and total variance explained do not promise valid factor
IV. DISCUSSION solution. From our result, most of the results from sample for
With regards to minimum sample size for EFA, our 1:2 per variable also showed good communalities, factor
findings actually were consistent with the other literatures. The loading and total variance explained, however, the factor
previous studies suggested that the minimum requirement of solution were not correct. Therefore, we can conclude that, all
sample size ratio for EFA is 1:3 per item was suggested by the communalities, factor loadings and total variance explained
Cattell [6]. There were few mathematicians and statistician are important if we can confirm we have enough samples when
suggested sample size ratio of 1:5 to 1:10 per item such as from running the EFA and prior to that, all items have to be
Everitt [11], Nunally [12] and Gorsuch [7]. The higher sample scientifically reliable and valid first. In addition, this study had
is the better in terms of representativeness. However, contributed to another important finding where we applied mix
concerning developing a questionnaire in a clinical setting, combination of measurement scales to run the EFA. We found
subjects are difficult to be recruited perhaps because of the that combination of scale measurement can be used for EFA.
disease is rare and inconvenient to response to questionnaire Therefore, develop a questionnaire using a combination of
due to sickness. Ideal and small sample size are preferable but measurement scale is feasible although it is rare or may be
not until it will violate the assumptions of factor solution. none of the instrument that measure any latent variable had
Based on our results, the EFA can produce correct factor applied mixtures of measurement scales.
solution although from sample size ratio of 1:2 per item but this This paper proposes general guideline to determine
only applicable for numerical variables. Although from the whether the factor solution has enough evidence to get a valid
results (see Table 1), measurement scale of three, five and nine solution if researchers plan to apply minimum sample size for
categories also yield the correct solution for dichotomous EFA from various measurement scales as in Table 3. Although
variable, but the results were not consistent throughout other this paper could not provide the minimum statistics to be
measurement scales. achieved for each factor solution, but the factor solution that
We proposed the minimum sample size ratio for EFA is had exceeded the minimum statistics as we found (see Table 3)
1:3 per variable but only for measurement scale of four and will has higher credibility to be valid. For researcher that
above. However, the factor solution from a mixture of planned to use the minimum sample size ratio of 1:2, we
combination scales was incorrect, and this may be due to that recommended that, the EFA to be used only for numerical
TABLE I. THE RESULTS OF EFA FROM VARIOUS SAMPLE SIZE AND TYPES OF MEASUREMENT SCALES
Measurement Scales
2 3 4 5 6 7 8 9 10 Mix Original
1:2 items
Reliability 0.651 0.549 0.645 0.704 0.674 0.706 0.665 0.695 0.708 0.645 0.651
Corrected Item-Total Correlation 0.400 0.051 0.172 0.267 0.248 0.285 0.195 0.270 0.271 0.225 0.296
Min. Correlation 0.296 0.253 0.212 0.310 0.268 0.305 0.221 0.302 0.286 0.258 0.400
Communalities 0.66-0.94 0.40-0.95 0.51-0.98 0.55-0.95 0.52-0.96 0.59-0.97 0.48-0.97 0.53-0.96 0.53-0.97 0.57-0.88 0.60-0.95
Total Variance Explained 77.594 83.177 85.557 82.985 85.673 87.284 85.597 85.276 86.104 79.818 86.018
Range of Factor Loadings 0.65-0.91 0.45-0.95 0.66-0.95 0.48-0.96 0.57-0.97 0.65-0.96 0.52-0.97 0.49-0.98 0.50-0.97 0.60-0.92 0.68-0.97
a
Solution × √ × √ × × × √ × × √
1:3 items
Reliability 0.667 0.626 0.622 0.649 0.681 0.689 0.670 0.680 0.694 0.627 0.667
Corrected Item-Total Correlation 0.432 0.131 0.239 0.262 0.314 0.351 0.275 0.322 0.303 0.251 0.432
Min. Correlation 0.478 0.324 0.308 0.383 0.389 0.408 0.350 0.403 0.394 0.329 0.478
Communalities 0.66-0.94 0.46-0.92 0.46-0.93 0.60-0.95 0.54-0.94 0.56-0.95 0.55-0.95 0.58-0.94 0.58-0.95 0.47-0.90 0.66-0.94
Total Variance Explained 77.594 79.309 80.612 79.750 82.458 83.828 82.701 82.247 83.053 75.397 84.433
Factor loading 0.65-0.91 0.48-0.95 0.49-0.95 0.56-0.96 0.50-0.97 0.55-0.97 0.52-0.97 0.58-0.97 0.56-0.96 0.50-0.94 0.75-0.97
a
Solution × × √ √ √ √ √ √ √ × √
.
.
.
.
.
1:20 items
Reliability 0.678 0.591 0.690 0.696 0.715 0.697 0.701 0.710 0.713 0.702 0.678
Corrected Item-Total Correlation 0.380 0.203 0.287 0.279 0.318 0.294 0.286 0.309 0.308 0.327 0.380
Min. Correlation 0.458 0.247 0.384 0.404 0.426 0.412 0.404 0.426 0.426 0.437 0.458
Communalities 0.31-0.80 0.53-0.84 0.47-0.89 0.48-0.88 0.50-0.89 0.49-0.90 0.49-0.89 0.51-0.89 0.51-0.89 0.53-0.88 0.56-0.91
Total Variance Explained 68.994 75.972 77.458 77.209 78.789 78.641 78.863 78.822 79.007 74.106 81.164
Factor loading 0.56-0.89 0.73-0.92 0.68-0.94 0.70-0.94 0.70-0.94 0.69-0.94 0.70-0.94 0.71-0.94 0.71-0.94 0.73-0.94 0.74-0.96
a
Solution √ √ √ √ √ √ √ √ √ √ √
a. √ for correct factor solution and x is not.
TABLE II. RESULTS FROM FACTOR SOLUTION, REPORTING ONLY EFA THAT PRODUCED CORRECT FACTOR SOLUTION (CFS) IN THE SMALLEST AND ANY
MEASUREMENT SCALES (MS) FOR EVERY PER SAMPLE SIZE RATIO
1:04 3 – 10, mix and original 0.525/0.525 0.166/0.166 0.53/0.49 76.4/72.9 0.61/0.52
1:05 2 – 10, mix and original 0.663/0.543 0.444/0.186 0.37/0.37 77.6/71.8 0.48/0.48
1:06 2 – 10, mix and original 0.609/0.519 0.369/0.146 0.37/0.37 77.6/70.8 0.48/0.48
1:07 2 – 10, mix and original 0.645/0.563 0.383/0.239 0.37/0.37 77.6/72.0 0.48/0.48
1:08 2 – 10, mix and original 0.624/0.540 0.366/0.206 0.37/0.37 69.5/69.5 0.54/0.54
1:09 2 – 10, mix and original 0.634/0.564 0.368/0.199 0.37/0.37 69.5/69.5 0.54/0.54
1:10 2 – 10, mix and original 0.612/0.528 0.310/0.123 0.37/0.37 69.5/69.5 0.54/0.54
1:11 2 – 10, mix and original 0.612/0.572 0.308/0.159 0.37/0.37 68.8/68.8 0.55/0.55
1:12 2 – 10, mix and original 0.653/0.569 0.359/0.190 0.37/0.37 68.8/68.8 0.55/0.55
1:13 2 – 10, mix and original 0.679/0.544 0.402/0.142 0.37/0.37 68.8/68.8 0.55/0.55
1:14 2 – 10, mix and original 0.687/0.563 0.402/0.151 0.35/0.35 69.0/69.0 0.56/0.56
1:15 2 – 10, mix and original 0.669/0.567 0.378/0.144 0.35/0.35 69.0/69.0 0.56/0.56
1:16 2 – 10, mix and original 0.672/0.571 0.381/0.152 0.35/0.35 69.0/69.0 0.56/0.56
1:17 2 – 10, mix and original 0.678/0.582 0.386/0.155 0.26/0.26 69.0/69.0 0.51/0.51
1:18 2 – 10, mix and original 0.677/0.571 0.386/0.173 0.26/0.26 69.0/69.0 0.51/0.51
1:19 2 – 10, mix and original 0.686/0.569 0.397/0.172 0.26/0.26 69.0/69.0 0.51/0.51
1:20 2 – 10, mix and original 0.678/0.591 0.380/0.203 0.31/0.31 69.0/69.0 0.56/0.56
For reporting smallest measurement scales, sample size ratio of 1:2, the smallest MS is 3, for sample size ratio of 1:3 is MS of 4, for sample size ratio of 1:4 is MS of 3 and for sample size ratio of 1:5 and onwards,
the MS is 2. Statistics with italic were derived from results of factor solution from any measurement scales. CS for correct solution, MS for measurement scales and CITC for corrected item to total correlation