0% found this document useful (0 votes)
244 views5 pages

Sample Size Guideline For Exploratory Factor Analysis When Using Small Sample

This document discusses guidelines for determining sufficient sample sizes for exploratory factor analysis (EFA) based on different measurement scales. It presents results from simulations of EFA using various sample sizes and scale types. The key findings are: 1) For sample sizes of at least 5 times the number of variables, EFA can produce correct factor solutions for all measurement scale types, including dichotomous variables. 2) For smaller sample sizes, the measurement scales need to have more categories/response options to obtain valid solutions. Numerical scales require fewer observations than Likert or dichotomous scales. 3) Additional validity statistics like Cronbach's alpha, item correlations, communalities, variance explained and factor loadings

Uploaded by

sdbitbihac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views5 pages

Sample Size Guideline For Exploratory Factor Analysis When Using Small Sample

This document discusses guidelines for determining sufficient sample sizes for exploratory factor analysis (EFA) based on different measurement scales. It presents results from simulations of EFA using various sample sizes and scale types. The key findings are: 1) For sample sizes of at least 5 times the number of variables, EFA can produce correct factor solutions for all measurement scale types, including dichotomous variables. 2) For smaller sample sizes, the measurement scales need to have more categories/response options to obtain valid solutions. Numerical scales require fewer observations than Likert or dichotomous scales. 3) Additional validity statistics like Cronbach's alpha, item correlations, communalities, variance explained and factor loadings

Uploaded by

sdbitbihac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Sample Size Guideline for Exploratory Factor

Analysis When Using Small Sample:


Taking into Considerations of Different Measurement Scales

Mohamad Adam Bujang, Puzziawati Ab Ghani Mohamad Adam Bujang, Shahrul Aiman Soelar,
Faculty of Computer & Mathematical Sciences Nor Aizura Zulkifli
Universiti Teknologi MARA Biostatistics Unit,
Selangor, Malaysia Clinical Research Centre
Kuala Lumpur, Malaysia

Abstract—Sample size guideline for exploratory factor analysis measurement scales. Therefore, the aims of this article are
(EFA) was long established however none investigate the effect firstly, to determine the guideline of sufficient sample size for
from the difference of measurement scales. The authors are EFA that produced correct and stable factor solution from
concern if researchers prefer to use the minimum number of various measurement scales and secondly, to propose the
sample size from the guideline in conducting EFA especially in relevant statistics to be achieved such as internal consistency,
the clinical setting since it is difficult to get enough sample size. corrected item to total correlation (CITC), communalities, total
Here, the authors present a guideline of sample size requirement variance explained and factor loadings as indication of reliable
according to various types of measurement scales and also and valid factor solution from EFA when using minimum
suggest guideline if any researcher planned to apply the rule of
sample size guideline from numerical variables and various
thumb that proposes the smallest number of sample size for EFA.
types of measurement scales.
Keywords-component; Exploratory factor analysis, II. METHODOLOGY
measurement scales and sample size
We used data from diabetes mellitus patient type 2 from
I. INTRODUCTION registry database of An Audit of Diabetes Control and
Exploratory Factor Analysis (EFA) is widely used in a Management (ADCM) 2009. The methodology with regards to
clinical study to measure a specific latent variable. the collection of data was described by Mastura et al. [10]. We
Questionnaires such as Quality of life (SF36) [1], Children extracted the data from the first 160 patients without missing
Depression Index (CDI) [2], Adolescent Coping Scale (ACS) values from variables weight, waist circumference (WC), Body
[3], Depression Anxiety and Stress Scales (DASS) [4] and The Mass Index (BMI), random blood glucose (RBG), fasting
Summary of Diabetes Self-care Activities (SDSCA) [5] are all blood glucose (FBG), glycated hemoglobin (HbA1C), total
examples of clinical psychometrics properties designed and cholesterol (TC), and low density lipoprotein (LDL-C).
developed through EFA. The issue that most investigators are Clinically the eight variables can be categorized into three
facing is to determine the sufficient number of sample size to major categories such as Factor 1: body structure (weight, WC
run the EFA since it is not easy to get sample in a clinical and BMI), Factor 2: glucose measurement (FBG and HbA1C)
setting. There were rules of thumbs regarding the sample size and Factor 3: lipid profile (TC and LDL-C). By using EFA
for factor analysis. An extensive range of recommendations from the original values for the eight variables, factor solution
concerning sample size in factor analysis has been proposed. for the eight variables also turn into three categories and
distribution for the eight variables are the same compare to
Based on one of the earliest rule of thumb, Cattell [6] clinically categorization.
suggested minimum sample of 1:3 per item and Gorsuch [7]
proposed minimum of 1:5 per item. According to Barrett et al. All the eight numerical variables were divided
[8], they found that "for 16 variables from Sixteen Personality proportionately into two until ten categories, then we ran the
Factor Questionnaire" good recovery was obtained from a analysis for EFA from two until ten categories using samples
subsample of n = 48 meaning three times from the number of size from 1:2 per variable until 1:20 per variable. For mixture
variables. Then, MacCallum et al. [9] conducted a Monte Carlo measurement scales, we used two categories for Factor 1, three
study on sample size effects. They found an excellent recovery categories for Factor 2 and four categories for Factor 3. The
of population factor structure with a sample size of 60 with 20 authors chose only one combination since as secondary
variables. However, the result was obtained when the level of objective, this paper would like to proof EFA can works with
communality (over 0.7 in average) and over determination (3 mix combination of measurement scales.
loaded factors) were high. EFA was analyzed by using Principle Component
There a lot of reported guidelines of sample size for Analysis extraction method with varimax rotation. Number of
EFA, however, none of it touch on the effect from various factor solution produced by analysis was based on eigen values
more than one. Prior to that, the internal consistencies with we used measurement scales of two and there in our
respective corrected items to total correlation (CITC) for each simulation. Sample size ratio of 1:4 also produced correct
domain and correlation among the items in a particular domain factor solution for all measurement scales except for in
were also reported for every solution and became as a dichotomous variable. For sample size ratio of 1:5, we found
guideline in conducting EFA using minimum sample size. All that all factor solutions were correct for all types of
analyses were carried out using PASW version 18.0 (SPSS Inc, measurement scales. Researchers have to justify the items are
Chicago IL). good enough to represent their respective domains to apply
minimum number of sample size for respective measurement
III. RESULT scale. If we have more items and the tendency to have more
Table 1 showed the result from the simulation of the EFA unrelated items, thus the cross loadings were high. These have
for extension 19 subsequent of samples ties 11 scales to be solved first to get better factor solution. In our analysis,
measurement and these results were summarized in Table 2 and we only use few variables, and there are clinically inter-related.
Table 3. For minimum sample size ratio of 1:2, the factor Besides that, this study also proposes other parameter to
solution is valid only if variables have to be in numerical and be referred to justify reliable and valid factor solution. We
the statistics for this solution are as follow (Cronbach’s alpha is found that, for all the correct factor solution, all the Cronbach’s
0.651, minimum CITC is 0.296, minimum communalities is alpha were more than 0.519, CITC was more than 0.123,
0.60, total variance explained is 86.0% and minimum factor communalities were more than 0.26, total variance explain
loading is 0.68). For sample size ratio of 1:3, the factor solution were more than 68.8% and factor loading were more than 0.45.
has the credibility if scales measurement has to be at least in Although from our findings, we found the minimum
four Likert scale and the statistics for this solution are communalities was 0.051 (see Table 2 from solution sample
Cronbach’s alpha is 0.622, minimum CITC is 0.239, minimum size ratio of 1:2), but we believed this happened due to chance
communalities is more 0.46, total variance explained is more and will not put this as a basis of minimum communalities in
than 80.6%, and factor loading is more than 0.49. For sample getting correct factor solution. These values were consistent
size ratio of 1:4, the minimum scales measurement is in three with literatures such as Cronbach’s alpha more than 0.5 was
Likert scale and criteria for this solution are Cronbach’s alpha acceptable while 0.7 or more was considered good [12] while
is 0.525, minimum CITC is 0.166, minimum communalities is corrected item to total correlation more than 0.2 was acceptable
more than 0.53, total variance explained is 76.4%, and [13] and factor loadings of 0.4 or more were considered good
minimum factor loading is more than 0.61. For sample size [14]. It is difficult to get high communalities and factor
ratio of 1:5 until 1:20, all types of measurement can be applied loading, however, the values that we proposed here are used as
including dichotomous variable. Regardless of sample size a general guideline to achieve strong, stable, reliable and valid
from sample size ratio of 1:5 until 1:20), the minimum statistics factor solution. From our findings, we allow the CITC to be
for this solution were Cronbach’s alpha is more than 0.519, low until 0.123.
minimum CITC is 0.123, minimum communalities is 0.26 and
minimum factor loading is 0.48. Bear in mind that high factor loading, communalities
and total variance explained do not promise valid factor
IV. DISCUSSION solution. From our result, most of the results from sample for
With regards to minimum sample size for EFA, our 1:2 per variable also showed good communalities, factor
findings actually were consistent with the other literatures. The loading and total variance explained, however, the factor
previous studies suggested that the minimum requirement of solution were not correct. Therefore, we can conclude that, all
sample size ratio for EFA is 1:3 per item was suggested by the communalities, factor loadings and total variance explained
Cattell [6]. There were few mathematicians and statistician are important if we can confirm we have enough samples when
suggested sample size ratio of 1:5 to 1:10 per item such as from running the EFA and prior to that, all items have to be
Everitt [11], Nunally [12] and Gorsuch [7]. The higher sample scientifically reliable and valid first. In addition, this study had
is the better in terms of representativeness. However, contributed to another important finding where we applied mix
concerning developing a questionnaire in a clinical setting, combination of measurement scales to run the EFA. We found
subjects are difficult to be recruited perhaps because of the that combination of scale measurement can be used for EFA.
disease is rare and inconvenient to response to questionnaire Therefore, develop a questionnaire using a combination of
due to sickness. Ideal and small sample size are preferable but measurement scale is feasible although it is rare or may be
not until it will violate the assumptions of factor solution. none of the instrument that measure any latent variable had
Based on our results, the EFA can produce correct factor applied mixtures of measurement scales.
solution although from sample size ratio of 1:2 per item but this This paper proposes general guideline to determine
only applicable for numerical variables. Although from the whether the factor solution has enough evidence to get a valid
results (see Table 1), measurement scale of three, five and nine solution if researchers plan to apply minimum sample size for
categories also yield the correct solution for dichotomous EFA from various measurement scales as in Table 3. Although
variable, but the results were not consistent throughout other this paper could not provide the minimum statistics to be
measurement scales. achieved for each factor solution, but the factor solution that
We proposed the minimum sample size ratio for EFA is had exceeded the minimum statistics as we found (see Table 3)
1:3 per variable but only for measurement scale of four and will has higher credibility to be valid. For researcher that
above. However, the factor solution from a mixture of planned to use the minimum sample size ratio of 1:2, we
combination scales was incorrect, and this may be due to that recommended that, the EFA to be used only for numerical
TABLE I. THE RESULTS OF EFA FROM VARIOUS SAMPLE SIZE AND TYPES OF MEASUREMENT SCALES

Measurement Scales
2 3 4 5 6 7 8 9 10 Mix Original
1:2 items
Reliability 0.651 0.549 0.645 0.704 0.674 0.706 0.665 0.695 0.708 0.645 0.651
Corrected Item-Total Correlation 0.400 0.051 0.172 0.267 0.248 0.285 0.195 0.270 0.271 0.225 0.296
Min. Correlation 0.296 0.253 0.212 0.310 0.268 0.305 0.221 0.302 0.286 0.258 0.400
Communalities 0.66-0.94 0.40-0.95 0.51-0.98 0.55-0.95 0.52-0.96 0.59-0.97 0.48-0.97 0.53-0.96 0.53-0.97 0.57-0.88 0.60-0.95
Total Variance Explained 77.594 83.177 85.557 82.985 85.673 87.284 85.597 85.276 86.104 79.818 86.018
Range of Factor Loadings 0.65-0.91 0.45-0.95 0.66-0.95 0.48-0.96 0.57-0.97 0.65-0.96 0.52-0.97 0.49-0.98 0.50-0.97 0.60-0.92 0.68-0.97
a
Solution × √ × √ × × × √ × × √

1:3 items
Reliability 0.667 0.626 0.622 0.649 0.681 0.689 0.670 0.680 0.694 0.627 0.667
Corrected Item-Total Correlation 0.432 0.131 0.239 0.262 0.314 0.351 0.275 0.322 0.303 0.251 0.432
Min. Correlation 0.478 0.324 0.308 0.383 0.389 0.408 0.350 0.403 0.394 0.329 0.478
Communalities 0.66-0.94 0.46-0.92 0.46-0.93 0.60-0.95 0.54-0.94 0.56-0.95 0.55-0.95 0.58-0.94 0.58-0.95 0.47-0.90 0.66-0.94
Total Variance Explained 77.594 79.309 80.612 79.750 82.458 83.828 82.701 82.247 83.053 75.397 84.433
Factor loading 0.65-0.91 0.48-0.95 0.49-0.95 0.56-0.96 0.50-0.97 0.55-0.97 0.52-0.97 0.58-0.97 0.56-0.96 0.50-0.94 0.75-0.97
a
Solution × × √ √ √ √ √ √ √ × √
.
.
.
.
.

1:20 items
Reliability 0.678 0.591 0.690 0.696 0.715 0.697 0.701 0.710 0.713 0.702 0.678
Corrected Item-Total Correlation 0.380 0.203 0.287 0.279 0.318 0.294 0.286 0.309 0.308 0.327 0.380
Min. Correlation 0.458 0.247 0.384 0.404 0.426 0.412 0.404 0.426 0.426 0.437 0.458
Communalities 0.31-0.80 0.53-0.84 0.47-0.89 0.48-0.88 0.50-0.89 0.49-0.90 0.49-0.89 0.51-0.89 0.51-0.89 0.53-0.88 0.56-0.91
Total Variance Explained 68.994 75.972 77.458 77.209 78.789 78.641 78.863 78.822 79.007 74.106 81.164
Factor loading 0.56-0.89 0.73-0.92 0.68-0.94 0.70-0.94 0.70-0.94 0.69-0.94 0.70-0.94 0.71-0.94 0.71-0.94 0.73-0.94 0.74-0.96
a
Solution √ √ √ √ √ √ √ √ √ √ √
a. √ for correct factor solution and x is not.
TABLE II. RESULTS FROM FACTOR SOLUTION, REPORTING ONLY EFA THAT PRODUCED CORRECT FACTOR SOLUTION (CFS) IN THE SMALLEST AND ANY
MEASUREMENT SCALES (MS) FOR EVERY PER SAMPLE SIZE RATIO

Minimum Minimum Total variance Minimum factor


Ratio MS that produces CS Minimum CITC
Cronbach's alpha Communalities explained (%) loading
1:02 3, 5, 9 and original 0.549/0.549 0.051/0.051 0.40/0.40 83.2/83.0 0.45/0.45

1:03 4 – 10 and original 0.622/0.622 0.239/0.239 0.46/0.46 80.6/79.8 0.49/0.49

1:04 3 – 10, mix and original 0.525/0.525 0.166/0.166 0.53/0.49 76.4/72.9 0.61/0.52

1:05 2 – 10, mix and original 0.663/0.543 0.444/0.186 0.37/0.37 77.6/71.8 0.48/0.48

1:06 2 – 10, mix and original 0.609/0.519 0.369/0.146 0.37/0.37 77.6/70.8 0.48/0.48

1:07 2 – 10, mix and original 0.645/0.563 0.383/0.239 0.37/0.37 77.6/72.0 0.48/0.48

1:08 2 – 10, mix and original 0.624/0.540 0.366/0.206 0.37/0.37 69.5/69.5 0.54/0.54

1:09 2 – 10, mix and original 0.634/0.564 0.368/0.199 0.37/0.37 69.5/69.5 0.54/0.54

1:10 2 – 10, mix and original 0.612/0.528 0.310/0.123 0.37/0.37 69.5/69.5 0.54/0.54

1:11 2 – 10, mix and original 0.612/0.572 0.308/0.159 0.37/0.37 68.8/68.8 0.55/0.55

1:12 2 – 10, mix and original 0.653/0.569 0.359/0.190 0.37/0.37 68.8/68.8 0.55/0.55

1:13 2 – 10, mix and original 0.679/0.544 0.402/0.142 0.37/0.37 68.8/68.8 0.55/0.55

1:14 2 – 10, mix and original 0.687/0.563 0.402/0.151 0.35/0.35 69.0/69.0 0.56/0.56

1:15 2 – 10, mix and original 0.669/0.567 0.378/0.144 0.35/0.35 69.0/69.0 0.56/0.56

1:16 2 – 10, mix and original 0.672/0.571 0.381/0.152 0.35/0.35 69.0/69.0 0.56/0.56

1:17 2 – 10, mix and original 0.678/0.582 0.386/0.155 0.26/0.26 69.0/69.0 0.51/0.51

1:18 2 – 10, mix and original 0.677/0.571 0.386/0.173 0.26/0.26 69.0/69.0 0.51/0.51

1:19 2 – 10, mix and original 0.686/0.569 0.397/0.172 0.26/0.26 69.0/69.0 0.51/0.51

1:20 2 – 10, mix and original 0.678/0.591 0.380/0.203 0.31/0.31 69.0/69.0 0.56/0.56
For reporting smallest measurement scales, sample size ratio of 1:2, the smallest MS is 3, for sample size ratio of 1:3 is MS of 4, for sample size ratio of 1:4 is MS of 3 and for sample size ratio of 1:5 and onwards,
the MS is 2. Statistics with italic were derived from results of factor solution from any measurement scales. CS for correct solution, MS for measurement scales and CITC for corrected item to total correlation

TABLE III. GUIDELINE TO USE MINIMUM SAMPLE SIZE IN EFA

Minimum Minimum Minimum Total variance Minimum factor


Ratio Minimum CITC
measurement scales Cronbach's alpha Communalities explained (%) loading
1:2 Numerical 0.651 0.296 0.60 86.0 0.68

1:3 Likert scale of 4 0.622 0.239 0.46 80.6 0.49

1:4 Likert scale of 3 0.525 0.166 0.53 76.4 0.61

1:5 – 1:20 Likert scale of 2 0.519 0.186 0.26 68.7 0.48


CS for correct solution, MS for measurement scales and CITC for corrected item to total correlation.
variable. If researcher planned to use sample size ratio of 1:3, ACKNOWLEDGMENT
the factor solution has the credibility if scales measurement has The authors wish to thank the Director General of Health
to be at least in four Likert scale. For sample size ratio of 1:4, Malaysia for the permission to publish the paper, investigators
the scales measurement have to be at least in three Likert scale of An Audit of Diabetes Control and Management (ADCM)
and for sample size ratio of 1:5 until 1:20, all types of 2009 and centers that participated in the data collection.
measurement can be applied. Although we set a high standard
for all the statistics in order to justify the valid solution, but it REFERENCES
does not mean that, EFA that produce lower statistics compare [1] McHorney CA., Ware JE Jr., Lu JF., et al., “The MOS 36-item Short-
to what had been proposed in this paper (as in Table 3) is not a Form Health Survey (SF-36): III. Tests of data quality, scaling
valid factor solution since we do not have the evidence to assumptions, and reliability across diverse patient groups,” Med Care,
justify that but at least this paper can provide a general vol. 32, pp. 40–66, 1994.
guideline that will help the researcher to develop strong and [2] Kovacs, M., Children's depression inventory, North Tonawanda, N.Y.:
stable factor solution especially in a clinical setting where it is Multi-Health System, 1992.
always difficult to get big sample size. Generally we suggested [3] Frydenberg, E. and Lewis, R., Adolescent coping scale, Administrator’s
minimum Cronbach’s alpha of 0.5, CITC of 0.15, Manual, The Australian Council for Educational Research Ltd.
Melbourne, Australia, 1995.
communalities of 0.2, total variance explained of 65.0% and
[4] Lovibond, P. F. and Lovibond, S. H., “The structure of negative
factor loading of 0.4 for strong factor solution. emotional states: Comparison of the Depression Anxiety Stress Scales
Most validated instruments use standard measurement (DASS) with the Beck Depression and Anxiety Inventories,” Behaviour
Research and Therapy, vol. 33, pp. 335-343, 1995.
scales such as three to five Likert-scale measurements.
[5] Toobert, D. J. and Glasgow, R. E., “Assessing diabetes self-
Therefore, it will be difficult to expand or lower down to more management: the summary of diabetes self care activities questionnaire,”
categories. However, in this study we used numerical data as In Handbook of Psychology and Diabetes, Bradley, C. Ed. Chur,
original values and from there we proportionately divided the Switzerland, Harwood Academic, pp. 351-375, 1994.
variables into few categories. Besides that, these eight variables [6] Cattell, R. B., The scientific use of factor analysis in behavioral and life
are valid clinical variables and there were scientifically inter- sciences, New York: Plenum, 1978.
related. This can justify that the three factor solution from the [7] Gorsuch, R. L., Factor analysis, 2nd ed., Hillsdale, NJ: Erlbaum, 1983.
eight variables is a reliable and valid factor solution. However, [8] Barrett, P. T. and Kline, P., “The observation to variable ratio in factor
the limitation from this study is that we could not evaluate the analysis. Personality Study in Group Behavior,” vol. 1, pp. 23-33, 1981.
impact from more variables since we used only eight variables [9] MacCallum, R. C., Widaman, K. F., Zhang, S., and Hong, S., “Sample
for simulation. size in factor analysis,” Psychological Methods, vol. 4, pp. 84-99, 1999.
[10] Mastura Ismail, Chew Boon How, Lee Ping Yein, et al., “Control and
V. CONCLUSION Treatment Profiles of 70889 Adult Type 2 Diabetes Mellitus Patients in
Malaysia - A Cross Sectional Survey in 2009,” International Journal of
For conclusion, different measurement scales have the Collaborative Research on Internal Medicine & Public Health, vol. 3, pp.
impact towards minimum sample size for exploratory factor 98-113, 2011.
analysis. The summary for the guideline is described in Table [11] Everitt, B. S., “Multivariate analysis: The need for data and other
3. In addition, our finding showed that EFA can be done based problem,” British Journal of Psychiatry, vol. 126, pp. 237-240, 1975.
on a mixture of Likert scale. [12] Nunally, J.C., Psychometric Theory, 2nd ed., New York: McGraw-Hill,
1978.
[13] Streiner, D. L. and Norman, G. R., Health measurement scales: A
practical guide to their development and use, 2nd ed., Oxford University
Press: Oxford, 1995.
[14] Raubenheimer, J. E., “An item selection procedure to maximize scale
reliability and validity,” South African Journal of Industrial Psychology,
vol. 30 (4), pp. 59-64, 2004.

You might also like