0 ratings0% found this document useful (0 votes) 20 views19 pagesModule V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
MULTIPLE REGRESSION AND PREDICTION
so far, in this chapter we have only discussed regression prediction
based on uvo variables—dependent and independent. In tues, here we
have made use of linear correlation for deriving two regression equa-
tions, one for predicting ¥ from X scores, and the other for predicting
YX from ¥ scores. However, in practical situations in the studies made in
cation and psychology, quite often we find that the dependent
able is jointly influenced by more than uvo variables, e.g. academic
performance is jointly influenced by variables like intelligence, hours
Hevoted per week for studi ity of teachers, and. facilities
qu
available in the school, parental education and socio-economic status.
In such a situation, we have to compute a multiple correlation
coefficient (R) rather than a mere linear correlation coefficient (r).
Accordingly, the line of regression is also to be set up in accordance
with the concept of multiple R. Here, the resulting regression equation
is called the multiple regression equation.
Let us now discuss the setting up of multiple regression equation
and its use in predicting the values of the dependent variable on the
basis of the values of two or more independent variables.
Setting up of a Multiple Regression Equation
Suppose there is a dependent variable X; (say academic achievement)
which is controlled by or dependent upon two variables designated as
NX» and Xy (say, intelligence and number of hours studied per week).
The multiple regression equation helps us predict the values of X) (i.e.,
it gives X, as the predicted value) by knowing the values of Xy and Xs.
The equation used for this is as follows:
X, = by3Xq + b13.2X3 (in deviation form)
where
X, = (X-M), X2= Q-4
Hence the equation becomes
= (Xs - Ms)
(X, — My) = bigs Xe — Mz) + bis2(Xs ~ Ms)
or
i
X, = dog Xo + bisgXs + M, — Bia.3Me -
= byo3Xq + biy9Xy + K (in the score form)
when K is a constant and is equal to My ~ bigs M2 ~ b13.2 My. In this
equation,
Scanned with CamScannerfas = Predicted value of dependent variable
123 = Multiplying constant or weight for the xX,
_ ates 2
bi32 = Multiplying constant or weight for the Xs val
3 Value,
Both bi23 and byyo are generally named as partial regress
icients. The partial regression coefficient: BPSSION cocsy
digg tells us how many units X, i
3 s any units X, incr evel i
in Xq while Xy is held constant; and is bow a nee
in Xz 3 is hel | constant; and bj3. tells us how many , .
increases for every unit increase in X while Xy is held conn 7
j : if stant
Computation of 6 coefficients or partial regression coeffic;
icients
(biz3 and 613.2)
o,
big = (2)
bis. = (2):
Here, 0}, 02 and 03 are the standard deviations for the distributio
related to variable X,, Xz and Xs and j23 and fis» are called
B coefficients (beta coefficients). These f coefficients are also called
standard partial regression coefficients and are computed by using the
following formulae:
ig — 73 23
Bis = E
1-75
_ fis 723
Biss =~ 2
1-15
Steps to Formulate a Regression Equation
le regression equation for predicting
with the help of the given values of
as follows:
The steps for framing a multipl
the dependent variable value X1
independent variables Xp and Xy can be summarized
Step1. Write the multiple regression equation
Digg Xo + bigaXs + K
as
where
K=M = bis Mo - bis2 Me - bse Ms “on
Step 2. Write the formula for the calculation of partial regres?
coefficients
Scanned with CamScannertus = (22)
bisa = (2 }one
step 3. Compute the values of standard partial regression coefficients
Bios and Ais:
Bry = Ae
By =
Step 4. Put the values of 8,93 and j39 in the formulae for computing
the values of bi2.3 and bj3», along with the values of 0, 02 and 03.
Compute the value of K by putting the values of M;, Ma, Ms
Step 5.
X» and X3) and the computed values of bj2.3
(means of distribution X,,
and b)39 in the equation
K = M, ~ bi2.3Mo - bi32Ms3
Step 6. Put the values of bj»3, Xo (the given value of independent
variable X9), 13.2, Xs (the given value of independent variable X3) and
the value of constant K in multiple regression equation as given in
step1.
Now, the task of formulating regression equations can be illust-
rated with the help of a few examples.
Example 14.4: Given the following data for a group of students:
Scores on achievement test
57S
X_ = Scores on intelligence test
Xs = Scores calculated showing study hours per week
My, = 101.71, My = 10.06, Mg = 3.35
o; = 13.65, Oy = 3.06, 3 = 2.02
ny = 0.41, ry = 0.50, 3 = 0.16
(a) Make out a multiple regression equation involving the
dependent variable X1, and independent variables X2 and X3.
(b) Ifa student scores 12 in the intelligence test Xz, and 4 in Xs
(study hours per week), what will be his estimated score in X,
(achievement test)?
Scanned with CamScannerSolution.
Step 1. Write the multiple regression equation
x
K = M, - big3My - bis2My
= dingXy + dysy Xa + K
where
Step 2. Obtain the partial regression coefficients
o
Step 3. Find the values of the standard partial regression coefficients
0.41-0.50x0.16 _ 0.41- 0.08
— (0.16)" = 0.0256
= 2:3300 _ 9 49g
0.9744 °
Brae = ExT fintis _ 0.50% 0.41% 0.16 _ 0.50 - 0.0656
cl 1- (0.16 1 0.0256
= 4844 _ 9 aug
0.9744
Step 4. Substituting the values of fi2,3 and fiz in the relations in step
2, we obtain
13.65 13.65
biog = 13:85 = 338 = 1.507
123 = 3G Airs 3.06 * 0.338 507
13.65
b, = x 0.446 = 3.
13.2 2.02 O14
Step 5. Compute the values of constant K:
K= M ~ big3Mp ~ by39Msz
= 101.71 - 1.507(10.06) — 3.014(3.35)
= 101.710 - 15.160 - 10.097
= 101.710 - 25.257 = 76.45
Step 6. The multiple regression equation, as laid down in step 1. is
X, = big Xp + diggXyt+ K
' . . vet
Patting the values of biv3, biyy and K in the above equation, we 8°
Scanned with CamScannerX, = 1.507(X2) + 3.014(Xy) + K
= 1.507(Xy) + 3.014(X5) + 76.453
‘The required multiple regression equation is
X, = 1.507Xy + 3.014Xy + 76.453
Here,
X2=12, X,=4
Hence the predicted value of Xj variable is
X, = 1.507 (12) + 3.014 (4) + 76.453
18.084 + 12.056 + 76.453
106.593 = 107 (nearest whole number)
Example 14.5: Given the following data for a group of students:
X, = Scores on an intelligence test
Scores on a memory sub-test
X2
X3 = Scores on a reasoning sub-test
M, = 78.00, Mz = 87.20, Mz = 32.80
o, = 10.21, 0, = 6.02, o3 = 10.35
rip = 0.67, ns = 0.75, 195 = 0.63
(a) Establish’ a multiple regression equation involving the depe-
ndent variable X, and two independent variables Xz and X35.
on memory sub-test and a
(b) If a student obtains a score of 80
what can be his expected
score of 40 on reasoning sub-test,
score in total intelligence test?
Solution.
Step 1. Write the multiple regression equation
X, = bigXo + bis2X3 + K
where
K = My - big3M2 - bis9Ms
Step 2. Compute partial regression coefficients:
10.21
=a Pies
i = 2 Bs =
(i) bes = ie Bis = G09
7 _o _ 1021 gf
(i) bisg = os Ais2 = 10.35 Bis2
Scanned with CamScannerStep 3.
Calculate the stindard partial regression coefficients.
Step 4. Put the values of Aiys and fjsy in the relations given jn
Step 2 and obtain
10.21
bys =
x 0.327
6.02
1.7 x 0.327 = 0.5559 = 0.556 approximately
O21, 0.543
10.35
= 0.986 x 0.543 = 0.5353 = 0.535 approximately
Step 5. Compute the values of constant K
K = M, ~ biz3Mp — bi32M5
= 78.00 — (0.556 x 87.2) - (0.535 x 32.8)
= 78.00 - 48.483 - 17.548
8.00 - 66.031 = 11.969 = 12 approximately
Step 6. The multiple regression equation, as laid down in Step 1, is
X, = biy3Xo + biyeXy + K
= .556X, + .535Xy + 12
Step 7. ‘The predicted value of X, variable is
Xx, = 5
56 x 80 + 535 x 40 + 12
= 44.480 + 21.40 + 12
= 77.88 = 78 (approximate)
Standard Error of Estimate
With the help of multiple regression equation, we
y to predict or on
; values of the
mate the value of X, (the dependent variable), when the values ©
Scanned with CamScannerindependent variables Xp and Xs ... are giver
actual value of X; and the predicted or estimated value X, is known
The difference between
the
the Standard error (SE) of the estimate
ye ndard er estimate and can be computed by the
g (estimated X;) or G13 = oy JI-Reoy
23
Here, 01 is the standard deviation of X;, which is the dependent
ble, and Rj23 is multiple correlation coefficient (correlation
varial
This can be computed by using formula
between X, and Xy + Xs).
Ne +3 — 2% Ns
=
As discussed in Chapter 13, it can also be computed with the help
*s (Beta coefficients) which are computed during the course of
establishment of multiple regression equation.
‘The formula for computing multiple regression coefficient w.th
the help of beta is
Rigs = VArstia + Aiso%s
or
Ri, = Basti + Asots
‘The SE of the estimate can be computed by using th
formula:
or Gia3 = Hl - Re
o (estimated X1)
help of examples.
of
e following
The above formula can be illustrated with the
represents the scores on
‘and Xg indicates scores
ariables). The other
the Example 14.4, %1
achievement test (dependent variable) and X2
on intelligence and study hours (independent v:
related values needed are:
(i) 0, SD of the scor
ryg = 0.50
Bisy = 0446
Example 14.6: In
of - Ue
(i) m2 = 0.41,
(ii) Bizs = 0-338,
Let us now compute the value of Rios:
Rios = As 72 + As:
= 938 xa 446 x .50
36158
n13
= 13858 + .22300 =
Scanned with CamScanner‘Then,
@ (estimated Xj), or
orn = oy l= Ries
= 13.65 /1—.36158
= 13,65 /.63842
= 13.65 x .799 = 10.9
‘The relaed given and computed data in Example
Example 14.
14.5 are as follows:
(i) 9 = 10.21
(i) ry = 0.67, ry = 0.75
(iii) Bigy = 0.327, Biy2 = 0.543
Let us first compute the value of Rigs:
Roy = Bost + Bis2%13 = (0.327 x 0.67) + (0.543 x 0.75)
= 0.219 + 0.407 = 0.626
Then, o (estimated X)), or
195 = Fy - Rios
10.21 1 - 0.632
"
10.21 /0.368 = 10.21 x .606
"
6.187
SUMMARY
1, The coefficient of correlation helps us in finding the degree
and direction of association between two variables. Howevet,
the concepts of regression lines and regression equations help
us predict the value of one variable when the values ofa
correlated variable or variables are known to us.
2. In simple regression based on r, there are two regression
equations:
¢,
(i) Y- M, = r>(X - M,)
o,
value of Y variable
(This equation helps us predict the scor
corresponding to any value of the X variable.)
Scanned with CamScanner(ii) X — My = ae (v= M,)
In the above equations, My and M, represent the mei
meal istibutionlandloMandlG¥areliieetandaeIflsrinioe
of these distributions , ae ens
Multiple regression is based on multiple corre
used to p edict the values of the dependent variable when the
values of two or more independent variables (being associated
with dependent variable) are known to us. A general mult ste
regression equation has the following formula: m
ion R. Tt is
= bigs Xo + bisyXy + K
where
K= M, ~ byy3 Mz - bx Ms
Here, 6)23 and 3. are partial regression coefficients. These
values are computed as
bes = 2 Bes
bi32 =
where
By3 =
Bisa =
and My, My, Ms, 01, 0% ¢3 are the means and standard
deviation of the distribution Xj, Xz and Xs. Now, from this
equation, we can predict the value of X), the dependent
variable, when we are given the values of Xy and Xy (the
independent variables).
There remain gaps and differences between the predicted
values or scores and the observed values or scores. This
deviation of X, (the predicted score) from X, (the actual
score) is called error in prediction This error can be computed
in the form of SE of the estimate by using the following
formulae:
(i) The SE of the estimate for pre
dicting Y from X is
Scanned with CamScanner(ii) The SE of the estimate for predicting X and Y is
On = oft,
mate for predicting dependent variahte
rhe SE of th P
from the given values of independent v
Xq is
@ (estimated X})
or
where gj is the SD of the
correlation coeffi
EXERCISES
What are the regression lines in a scatter diagram? How would
you use them for the prediction of variables? Explain with the
help of an example.
Given the following data for two tests:
History (X) Civics (Y)
Mean = 25 Mean = 30
SD =1.7 SD = 1.6
Coefficient of correlation Try = 0.95
(a) Determine both the regression equations.
(b) Predict the probable score in Civics of a student whose
score in History is 40.
(c) Predict the probable score in History of a student whose
score in Civics is 50.
From the Scatter diagram in Figure 7.8,
(a) Calculate both the regression equations.
(b) Predict the probable score on X when Y = 100.
A group of five students obtained the following scores on two
achievement tests X and ¥:
Students A B iG D E
Scores in 10 11 12 9 8
X test
Scores in 12 18 20 10 (10
Y test
(a) Determine both the regression equations.
Scanned with CamScannercm
(b) Ifa student scores 15 in test X, predict his probable score
in test Y.
(c) Ifa student scores 5 in test Y, predict his probable score
in test X.
What is a multiple regression equation? How is it used for
predicting the value of a dependent variable? Illustrate with
the help of an example.
A researcher collected the following data during the course of
his study.
Dependent Independent Independent
variable X, variable Xo variable Xx
M, = 78 Mz = 55
a = 16 o;,= 10
nN = .70 73 = .80 To3 = .50
(a) Set up the multiple regression equation for predicting
the value of dependent variable for the given values of
both the independent variables.
(b) If X» = 60 and Xz = 40, predict the value of X,
A researcher on psychology wanted to study the relationship
of physical efficiency and hours per week devoted to practice
with the performance in athletics. He obtained the following
results during the course of his study:
Performance Physical Hours practised
in athletics (X) efficiency test (Xo) _ per week (Xs)
M, = 73.8 My = 19.7 Mz = 49.5
go = 9.1 03 = 17.0
Tg = 465 N13 To, = .562
(a) Set up the multiple regression equation for predicting
performance in athletics on the basis of scores on physical
efficiency test and hours per week practice.
(b) If X; = 20 and Xy = 42, predict the value of X}.
Scanned with CamScannerMultivariate ANOVA (MANOVA)
Multivariate ANOVA (MANOVA) extends the capabilities of analysis of variance (ANOVA)
by assessing multiple dependent variables simultaneously. ANOVA statistically tests the
differences between three or more group means. For example, if we have three different
teaching methods and we want to evaluate the average scores for these groups, we can use
ANOVA. However, ANOVA does have a drawback, It can assess only one dependent variable
at a time. This limitation can be an enormous problem in certain circumstances because it can
prevent we from detecting the effects that actually exist.
MANOVA provides a solution for some studies. This statistical procedure tests multiple
dependent variables at the same time. By doing so, MANOVA can offer several advantages
over ANOVA.
ANOVA limitations
Regular ANOVA tests can assess only one dependent variable at a time in our model. Even
when we fit a general linear model with multiple independent variables, the model only
considers one dependent variable. The problem is that these models can’t identify patterns in
multiple dependent variables.
This restriction can be very problematic in certain cases where a typical ANOVA won't be able
to produce statistically significant results.
Comparison of MANOVA to ANOVA Using an Example
MANOVA can detect patterns between multiple dependent variables. It sounds complex, but
graphs make it easy to understand. An example that compares ANOVA to MANOVA:
Suppose we are studying three different teaching methods for a course. This variable is our
independent variable. We also have student satisfaction scores and test scores. These variables
are our dependent variables. We want to determine whether the mean scores for satis
and tests differ between the three teaching methods.
The graphs below display the scores by teaching method. One chart shows the test scores and
the other shows the satisfaction scores. These plots represent how one-way ANOVA tests the
data—one dependent variable at a time.
Scanned with CamScannerIndividual Vaive Plot of Test
a '
po 2
na t
i
= Ce
Individual Value Plot of Satstaction:
a : 3
|- : : i
m 63 : ?
"UY + ;
Both of these graphs appear to show that there is no association between teaching method and
either test scores or satisfaction scores. The groups seem to be approximately equal.
Consequently, it’s no surprise that the one-way ANOVA P-values for both test and satisfaction
scores are insignificant (0.923 and 0.254).
‘The teaching method isn’t related to either satisfaction or test scores,
How MANOVA Assesses the Data
The patterns we can find between the dependent variables and how they are related to the
teaching method. The graph between the test and satisfaction scores on the scatterplot and use
the teaching method as the grouping variable. This multivariate approach represents how
MANOVA tests the data. These are the same data, but sometimes how we look at them makes
all the difference.
Scanned with CamScannerScatterplot of Test vs Satisfaction
307 ‘Method
eo et
3.06 a2
3
3.05 .
3.084 i
% 303
y
3.024
301
.
3.00
299 .
28 299° «300 «30130230338 3.05 306
The graph displays a positive correlation between Test scores and Satisfaction. As student
satisfaction increases, test scores tend to increase as well. Moreover, for any given satisfaction
score, teaching method 3 tends to have higher test scores than methods 1 and 2. In other words,
students who are equally satisfied with the course tend to have higher scores with method 3.
MANOVA can test this pattern statistically to help ensure that it’s not present by chance.
In our preferred statistical software, fit the MANOVA model so that Method is the independent
variable and Satisfaction and Test are the dependent variables.
The MANOVA results are below.
General Linear Model: Test, Satisfaction versus Method
MANOVA for Method
ge2 me-05 n= 21.0
Test OF
Criterion Statistic F Mun Denom
Wilks’ 0.51094 8.778 «4 eB
Lawley-Horelling 0.95877 10.275 486
Pillai's 0.40977 7.297 4 = 90
Roy's o.9ses1
Even though the one-way ANOVA results and graphs seem to indicate that there is nothing of
interest, MANOVA produces statistically significant results—as signified by the minuscule P-
values, We can conclude that there is an association between the teaching method and the
relationship between the dependent variables.
Scanned with CamScannerBenefits of MANOVA:
Use multivariate ANOVA when our dependent variables are correlated. The correlation
structure between the dependent variables provides additional information to the model which
gives MANOVA the following enhanced capabilities:
© Greater statistical power: When the dependent variables are correlated,
MANOVA can identify effects that are smaller than those that regular ANOVA,
can find,
© Assess patterns between multiple dependent variables: The factors in the
model can affect the relationship between dependent variables instead of
influencing a single dependent variable. As the example in this post shows,
ANOVA tests with a single dependent variable can fail completely to detect these
patterns
Limits the joint error rate: When we perform a series of ANOVA tests because
wwe have multiple dependent variables, the joint probability of rejecting a true null
hypothesis increases with each additional test. Instead, if we perform one
MANOVA test, the error rate equals the significance level.
> Discriminant Analysis (Wilk’s Lambda)
Wilks’ lambda (A) is a test statistic that’s reported in results from MANOVA, discriminant
analysis, and other multivariate procedures. Other similar test statistics include Pillai’s trace
criterion and Roy’s Ger criterion,
+ InMANOVA, A tests, if there are differences between group means for a particular
combination of dependent variables. It is similar to the F-test statistic in ANOVA.
Lambda is a measure of the percent variance in dependent variables not explained by
differences in levels of the independent variable. A value of zero means that there isn’t
any variance not explained by the independent variable (which is ideal). In other words,
the closer to zero the statistic is, the more the variable in question contributes to the
model. We would reject the null hypothesis when Wilk’s lambda is close to zero,
although this should be done in combination with a small p-value.
‘© Indiscriminant analysis, Wilk’s lambda tests how well each level of independent
variable contributes to the model. The scale ranges from 0 to 1, where 0 means total
discrimination, and | means no discrimination. Each independent variable is tested by
putting it into the model and then taking it out — generating a A statistic
‘The significance of the change in A is measured with an F-test; if the F-value is greater
than the critical value, the variable is kept in the model. This stepwise procedure is
usually performed using software like Minitab, R, or SPSS. The following SPSS output
shows which variables (from a list of a dozen or more) were kept in using this
procedure.
Scanned with CamScannerAnalysis 1
Stepwise Statistics
Variables Entered Removed? ©
Tike Lamba
Bade
step_| enerea | staiste | on oo oa [Saisie [at ma aa
1 —[REIGHT ro 7 T]_ sr | 36817 T]_ 57000] 000
2 [canopy 220 2 2| sro | 21510 4| 112000] 000
3__|coneow. 265, 2 21 srooo | _ 17265 51 110000 |__000
"ALeach sep, the vaiabe hat minimis te overall WIke Lambda is ened
»
4
Maximum numberof steps i 10.
Minimum patialF to enters 3.68
Masrnur partial Ft remove Is 2.74
F lve tolerance, or VIN insufficient for futher computation.
SPSS Wilk’s Lambda output,
Formula
[El 7!
|H+E| Winx
1 —2.in the denominator is the proportion of variance in dependent variables explained by the
model's effect.
Caution should be used in interpreting results as this statistic tends to be biased,
especially for small samples.
Output Components
Wilks’ lambda output has several components, including:
“Sig” or significance (p-value). If this is small, (i.e. under .05) reject the null
hypothesis.
“Value” column in the output: the value of Wilk’s Lambda,
“Statistic” is the F-statistic associated with the listed degrees of freedom. It
would be reported in APA format as F (df, df2) = value. For example, if we
had an f-value of 36.612 with | and 2 degrees of freedom we would report that
as F (1,2) = 36.612,
Scanned with CamScanner> Factor analysis
Factor analysis is a way to take a mass of data and shrinking it to a smaller data set that is more
manageable and more understandable. It’s a way to find hidden patterns, show how those
patterns overlap and show what characteristics are seen in multiple patterns. It is also used to
create set of variables for similar items in the set (these sets of variables are called
dimensions). It can be a very useful tool for complex sets of data involving psychological
studies, socioeconomic status and other involved concepts.
A “factor” is a set of observed variables that have similar response patterns; They are
associated with a hidden variable (called a confounding variable) that isn’t directly measured
Factors are listed according to factor loadings, or how much variation in the data they can
explain.
‘The two types: exploratory and confirmatory.
‘© Exploratory factor analysis is if we don’t have any idea about what structure our data
is or how many dimensions are in a set of variables.
Confirmatory Factor Analysis is used for verification as long as we have a specific
idea about what structure our data is or how many dimensions are in a set of variables.
Factor Loadings
Pett
Eee
coer
Scanned with CamScannerNot alll factors are created equal; some factors have more weight than others. In a simple
example, imagine our bank conducts a phone survey for customer satisfaction and the results
show the following factor loadings
Variable Factor 1 Factor 2 Factor 3
Question 1 0.885 0.121 -0.033
Question 2 0.829 0.078 0.157
Question 3 0.777 0.190 0.540
‘The factors that affect the question the most (and therefore have the highest factor loadings)
are bolded. Factor loadings are similar to correlation coefficients in that they can vary from -1
to 1. The closer factors are to -1 or 1, the more they affect the variable. A factor loading of zero
would indicate no effect.
Multiple Factor Analys
This subset of Factor Analysis is used when our variables are structured in variable groups. For
example, we might have a student health questionnaire with several items like sleep pattern
addictions, psychological health, or leaming disabilities
The two steps performed in Multiple Factor Analysis are:
1. Principal Component Analysis is performed on each set of data. This gives
an eigenvalue, which is used to normalize the data sets.
2. The new data sets are merged into a unique matrix and a second, global PCA is
performed.
Performing Factor Analysis
Factor Analysis is an extremely complex mathematical procedure and is performed with
software.
‘+ Instructions for Stata,
= Minitab.
+ SPSS.
Scanned with CamScannerConfirmatory Factor Analysis allows we to figure out if a relationship between a set
of observed variables (also known as manifest variables) and their underlying constructs exists.
It is similar to Exploratory Factor Analysis, The main difference between the two is:
‘+ If we want to explore patterns, use EFA.
‘* If we want to perform hypothesis testing, use CFA.
EFA provides information about the optimal number of factors required to represent the data
set. With Confirmatory Factor Analysis we can specify the number of factors required. For
example, CFA can answer questions like “Does my ten-question survey accurately measure
one specific factor?”. Although it is technically applicable to any discipline, it is typically used
in the social sciences.
Exploratory Factor Analysis (EFA) is used to find the underlying structure of a large set of
variables. “It reduces data to a much smaller set of summary variables.
EFA is almost identical to Confirmatory Factor Analysis(CFA). Both techniques can (perhaps
surprisingly) be used to confirm or explore. Similarities are:
+ Assess the internal reliability of a measure.
+ Examine factors or theoretical constructs represented by item sets. They assume
the factors aren’t correlated.
«Investigate quality for individual items.
There are, however, some differences, mostly concerning how factors are treated/used. EFA is
basically a data-driven approach, allowing all items to load on all factors, while with CFA we
must specify which factors to load. EFA is a good choice if we don’t have any idea about what
common factors might exists. EFA can generate a large number of possible models for our
data, something that may not be possible if a researcher has to specify factors. If we do have
an idea about what the models look like, and we want to test our hypotheses about the data
structure, CFA is a better approach.
Scanned with CamScanner