0% found this document useful (0 votes)
200 views7 pages

Chapter 19, Factor Analysis

This document provides an overview of factor analysis, including: 1) Factor analysis is a technique used to reduce a large set of correlated variables into a smaller set of underlying factors. It examines relationships among interrelated variables to represent them in terms of a few underlying factors. 2) Factor analysis is exploratory and used to identify underlying dimensions that explain correlations among variables. It identifies new, smaller sets of variables to replace original correlated variables. 3) Key aspects of factor analysis include determining common factors, unique factors, factor loadings, eigenvalues, factor scores, and interpreting outputs like scree plots and factor matrices.

Uploaded by

KANIKA GORAYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
200 views7 pages

Chapter 19, Factor Analysis

This document provides an overview of factor analysis, including: 1) Factor analysis is a technique used to reduce a large set of correlated variables into a smaller set of underlying factors. It examines relationships among interrelated variables to represent them in terms of a few underlying factors. 2) Factor analysis is exploratory and used to identify underlying dimensions that explain correlations among variables. It identifies new, smaller sets of variables to replace original correlated variables. 3) Key aspects of factor analysis include determining common factors, unique factors, factor loadings, eigenvalues, factor scores, and interpreting outputs like scree plots and factor matrices.

Uploaded by

KANIKA GORAYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 19, Factor Analysis

Factor analysis
Factor analysis is a general name denoting a class of procedures primarily used for data reduction
and summarization. In marketing research, there may be a large number of variables, most of which
are correlated and which must be reduced to a manageable level. Relationships among sets of many
interrelated variables are examined and represented in terms of a few underlying factors.

In analysis of variance, multiple regression, and discriminant analysis, one variable is considered as
the dependent or criterion variable, and the others as independent or predictor variables. However,
no such distinction is made in factor analysis. Rather, factor analysis is an interdependence
technique in that an entire set of interdependent relationships is examined.

Factor analysis is used in the following circumstances:

• To identify underlying dimensions, or factors, that explain the correlations among a set of
variables.
• To identify a new, smaller set of uncorrelated variables to replace the original set of
correlated variables in subsequent multivariate analysis (regression or discriminant analysis).
• To identify a smaller set of salient variables from a larger set for use in subsequent
multivariate analysis.

All these uses are exploratory in nature and, therefore, factor analysis is also called exploratory
factor analysis (EFA). The technique has numerous applications in marketing research.

• It can be used in market segmentation for identifying the underlying variables on which to
group the customers.
• In product research, factor analysis can be employed to determine the brand attributes that
influence consumer choice.
• In advertising studies, factor analysis can be used to understand the media consumption
habits of the target market.
• In pricing studies, it can be used to identify the characteristics of price-sensitive consumers.

Factor Analysis Model


Mathematically, factor analysis is somewhat similar to multiple regression analysis, in that each
variable is expressed as a linear combination of underlying factors. The amount of variance a
variable shares with all other variables included in the analysis is referred to as communality. The
covariation among the variables is described in terms of a small number of common factors plus a
unique factor for each variable. These factors are not overtly observed. If the variables are
standardized, the factor model may be represented as:

where
Xi = ith standardized variable
Aij = standardized multiple regression coefficient of variable i on common factor j
F = common factor
Vi = standardized regression coefficient of variable i on unique factor i
Ui = the unique factor for variable i
m = number of common factors
The unique factors are uncorrelated with each other and with the common factors. The common
factors themselves can be expressed as linear combinations of the observed variables.

where
Fi = estimate of ith factor
Wi = weight or factor score coefficient
K = number of variables
It is possible to select weights or factor score coefficients so that the first factor explains the largest
portion of the total variance. Then a second set of weights can be selected, so that the second factor
accounts for most of the residual variance, subject to being uncorrelated with the first factor. This
same principle could be applied to selecting additional weights for the additional factors. Thus, the
factors can be estimated so that their factor scores, unlike the values of the original variables, are
not correlated. Furthermore, the first factor accounts for the highest variance in the data, the
second factor the second highest, and so on. A simplified graphical illustration of factor analysis in
the case of two variables is presented in Figure 19.2.

FIGURE 19.2 X2 Graphical Illustration of Factor Analysis

Statistics Associated with Factor Analysis


• Bartlett’s test of sphericity. Bartlett’s test of sphericity is a test statistic used to examine the
hypothesis that the variables are uncorrelated in the population. In other words, the
population correlation matrix is an identity matrix; each variable correlates perfectly with
itself (r = 1) but has no correlation with the other variables (r = 0).
• Correlation matrix. A correlation matrix is a lower triangle matrix showing the simple
correlations, r, between all possible pairs of variables included in the analysis. The diagonal
elements, which are all 1, are usually omitted.
• Communality. Communality is the amount of variance a variable shares with all the other
variables being considered. This is also the proportion of variance explained by the common
factors.
• Eigenvalue. The eigenvalue represents the total variance explained by each factor.
• Factor loadings. Factor loadings are simple correlations between the variables and the
factors.
• Factor loading plot. A factor loading plot is a plot of the original variables using the factor
loadings as coordinates.
• Factor matrix. A factor matrix contains the factor loadings of all the variables on all the
factors extracted.
• Factor scores. Factor scores are composite scores estimated for each respondent on the
derived factors.
• Factor scores coefficient matrix. This matrix contains the weights, or factor score
coefficients, used to combine the standardized variables to obtain factor scores.
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The Kaiser-Meyer-Olkin (KMO)
measure of sampling adequacy is an index used to examine the appropriateness of factor
analysis. High values (between 0.5 and 1.0) indicate factor analysis is appropriate. Values
below 0.5 imply that factor analysis may not be appropriate.
• Percentage of variance. This is the percentage of the total variance attributed to each
factor.
• Residuals. Residuals are the differences between the observed correlations, as given in the
input correlation matrix, and the reproduced correlations, as estimated from the factor
matrix.
• Scree plot. A scree plot is a plot of the eigenvalues against the number of factors in order of
extraction.

Conducting Factor Analysis

Formulate the Problem


Problem formulation includes several tasks. First, the objectives of factor analysis should be
identified. The variables to be included in the factor analysis should be specified based on past
research, theory, and judgment of the researcher. It is important that the variables be appropriately
measured on an interval or ratio scale. An appropriate sample size should be used. As a rough
guideline, there should be at least four or five times as many observations (sample size) as there are
variables. In many marketing research situations, the sample size is small and this ratio is
considerably lower. In these cases, the results should be interpreted cautiously.

Construct the Correlation Matrix


The analytical process is based on a matrix of correlations between the variables. Valuable insights
can be gained from an examination of this matrix. For the factor analysis to be appropriate, the
variables must be correlated. In practice, this is usually the case. If the correlations between all the
variables are small, factor analysis may not be appropriate. We would also expect that variables that
are highly correlated with each other would also highly correlate with the same factor or factors.

Formal statistics are available for testing the appropriateness of the factor model. Bartlett’s test of
sphericity can be used to test the null hypothesis that the variables are uncorrelated in the
population; in other words, the population correlation matrix is an identity matrix. In an identity
matrix, all the diagonal terms are 1, and all off-diagonal terms are 0. The test statistic for sphericity is
based on a chi-square transformation of the determinant of the correlation matrix. A large value of
the test statistic will favor the rejection of the null hypothesis. If this hypothesis cannot be rejected,
then the appropriateness of factor analysis should be questioned. Another useful statistic is the
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. This index compares the magnitudes of
the observed correlation coefficients to the magnitudes of the partial correlation coefficients. Small
values of the KMO statistic indicate that the correlations between pairs of variables cannot be
explained by other variables and that factor analysis may not be appropriate. Generally, a value
greater than 0.5 is desirable.

Determine the Method of Factor Analysis


The approach used to derive the weights or factor score coefficients differentiates the various
methods of factor analysis. The two basic approaches are principal components analysis and
common factor analysis. In principal components analysis, the total variance in the data is
considered. The diagonal of the correlation matrix consists of unities, and full variance is brought
into the factor matrix. Principal components analysis is recommended when the primary concern is
to determine the minimum number of factors that will account for maximum variance in the data for
use in subsequent multivariate analysis. The factors are called principal components.

In common factor analysis, the factors are estimated based only on the common variance.
Communalities are inserted in the diagonal of the correlation matrix. This method is appropriate
when the primary concern is to identify the underlying dimensions and the common variance is of
interest. This method is also known as principal axis factoring.

Other approaches for estimating the common factors are also available. These include the methods
of unweighted least squares, generalized least squares, maximum likelihood, alpha method, and
image factoring. These methods are complex and are not recommended for inexperienced users.

Determine the Number of Factors


It is possible to compute as many principal components as there are variables, but in doing so, no
parsimony is gained. In order to summarize the information contained in the original variables, a
smaller number of factors should be extracted. Several procedures have been suggested for
determining the number of factors.

• A PRIORI DETERMINATION Sometimes, because of prior knowledge, the researcher knows


how many factors to expect and thus can specify the number of factors to be extracted
beforehand. The extraction of factors ceases when the desired number of factors have been
extracted. Most computer programs allow the user to specify the number of factors,
allowing for an easy implementation of this approach.
• DETERMINATION BASED ON EIGENVALUES In this approach, only factors with eigenvalues
greater than 1.0 are retained; the other factors are not included in the model. An eigenvalue
represents the amount of variance associated with the factor. Hence, only factors with a
variance greater than 1.0 are included. Factors with variance less than 1.0 are no better than
a single variable, because, due to standardization, each individual variable has a variance of
1.0. If the number of variables is less than 20, this approach will result in a conservative
number of factors.
• DETERMINATION BASED ON SCREE PLOT A scree plot is a plot of the eigenvalues against the
number of factors in order of extraction. The shape of the plot is used to determine the
number of factors. Typically, the plot has a distinct break between the steep slope of factors,
with large eigenvalues and a gradual trailing off associated with the rest of the factors. This
gradual trailing off is referred to as the scree. Experimental evidence indicates that the point
at which the scree begins denotes the true number of factors. Generally, the number of
factors determined by a scree plot will be one or a few more than that determined by the
eigenvalue criterion.
• DETERMINATION BASED ON PERCENTAGE OF VARIANCE In this approach, the number of
factors extracted is determined so that the cumulative percentage of variance extracted by
the factors reaches a satisfactory level. What level of variance is satisfactory depends upon
the problem. However, it is recommended that the factors extracted should account for at
least 60 percent of the variance.
• DETERMINATION BASED ON SPLIT-HALF RELIABILITY The sample is split in half and factor
analysis is performed on each half. Only factors with high correspondence of factor loadings
across the two subsamples are retained.
• DETERMINATION BASED ON SIGNIFICANCE TESTS It is possible to determine the statistical
significance of the separate eigenvalues and retain only those factors that are statistically
significant. A drawback is that with large samples (size greater than 200), many factors are
likely to be statistically significant, although from a practical viewpoint many of these
account for only a small proportion of the total variance.

Rotate Factors
An important output from factor analysis is the factor matrix, also called the factor pattern matrix.
The factor matrix contains the coefficients used to express the standardized variables in terms of the
factors. These coefficients, the factor loadings, represent the correlations between the factors and
the variables. A coefficient with a large absolute value indicates that the factor and the variable are
closely related. The coefficients of the factor matrix can be used to interpret the factors.

Although the initial or unrotated factor matrix indicates the relationship between the factors and
individual variables, it seldom results in factors that can be interpreted, because the factors are
correlated with many variables

In rotating the factors, we would like each factor to have nonzero, or significant, loadings or
coefficients for only some of the variables. Likewise, we would like each variable to have nonzero or
significant loadings with only a few factors, if possible with only one. If several factors have high
loadings with the same variable, it is difficult to interpret them. Rotation does not affect the
communalities and the percentage of total variance explained. However, the percentage of variance
accounted for by each factor does change. This is seen in Table 19.3 by comparing “Extraction Sums
of Squared Loadings” with “Rotation Sums of Squared Loadings.” The variance explained by the
individual factors is redistributed by rotation. Hence, different methods of rotation may result in the
identification of different factors.

The rotation is called orthogonal rotation if the axes are maintained at right angles. The most
commonly used method for rotation is the varimax procedure. This is an orthogonal method of
rotation that minimizes the number of variables with high loadings on a factor, thereby enhancing
the interpretability of the factors. Orthogonal rotation results in factors that are uncorrelated. The
rotation is called oblique rotation when the axes are not maintained at right angles, and the factors
are correlated. Sometimes, allowing for correlations among factors can simplify the factor pattern
matrix. Oblique rotation should be used when factors in the population are likely to be strongly
correlated. Rotation achieves simplicity and enhances interpretability.

Interpret Factors
Interpretation is facilitated by identifying the variables that have large loadings on the same factor.
That factor can then be interpreted in terms of the variables that load high on it. Another useful aid
in interpretation is to plot the variables using the factor loadings as coordinates. Variables at the end
of an axis are those that have high loadings on only that factor, and hence describe the factor.
Variables near the origin have small loadings on both the factors. Variables that are not near any of
the axes are related to both the factors. If a factor cannot be clearly defined in terms of the original
variables, it should be labeled as an undefined or a general factor.

Note that a negative coefficient for a negative variable leads to a positive interpretation.

Calculate Factor Scores


Following interpretation, factor scores can be calculated, if necessary. Factor analysis has its own
stand-alone value. However, if the goal of factor analysis is to reduce the original set of variables to a
smaller set of composite variables (factors) for use in subsequent multivariate analysis, it is useful to
compute factor scores for each respondent. A factor is simply a linear combination of the original
variables. The factor scores for the ith factor may be estimated as follows:

The weights, or factor score coefficients, used to combine the standardized variables are obtained
from the factor score coefficient matrix. Most computer programs allow you to request factor
scores. Only in the case of principal components analysis is it possible to compute exact factor
scores. Moreover, in principal component analysis, these scores are uncorrelated. In common factor
analysis, estimates of these scores are obtained, and there is no guarantee that the factors will be
uncorrelated with each other. The factor scores can be used instead of the original variables in
subsequent multivariate analysis. The standardized variable values would be multiplied by the
corresponding factor score coefficients to obtain the factor scores.

Select Surrogate Variables


Sometimes, instead of computing factor scores, the researcher wishes to select surrogate variables.
Selection of substitute or surrogate variables involves singling out some of the original variables for
use in subsequent analysis. This allows the researcher to conduct subsequent analysis and interpret
the results in terms of original variables rather than factor scores. By examining the factor matrix,
one could select for each factor the variable with the highest loading on that factor. That variable
could then be used as a surrogate variable for the associated factor. This process works well if one
factor loading for a variable is clearly higher than all other factor loadings. However, the choice is not
as easy if two or more variables have similarly high loadings. In such a case, the choice between
these variables should be based on theoretical and measurement considerations. For example,
theory may suggest that a variable with a slightly lower loading is more important than one with a
slightly higher loading. Likewise, if a variable has a slightly lower loading but has been measured
more precisely, it should be selected as the surrogate variable.

Determine the Model Fit


The final step in factor analysis involves the determination of model fit. A basic assumption
underlying factor analysis is that the observed correlation between variables can be attributed to
common factors. Hence, the correlations between the variables can be deduced or reproduced from
the estimated correlations between the variables and the factors. The differences between the
observed correlations (as given in the input correlation matrix) and the reproduced correlations (as
estimated from the factor matrix) can be examined to determine model fit. These differences are
called residuals. If there are many large residuals (residuals are larger than 0.05), the factor model
does not provide a good fit to the data and the model should be reconsidered.

You might also like