Chapter 19, Factor Analysis
Chapter 19, Factor Analysis
Factor analysis
Factor analysis is a general name denoting a class of procedures primarily used for data reduction
and summarization. In marketing research, there may be a large number of variables, most of which
are correlated and which must be reduced to a manageable level. Relationships among sets of many
interrelated variables are examined and represented in terms of a few underlying factors.
In analysis of variance, multiple regression, and discriminant analysis, one variable is considered as
the dependent or criterion variable, and the others as independent or predictor variables. However,
no such distinction is made in factor analysis. Rather, factor analysis is an interdependence
technique in that an entire set of interdependent relationships is examined.
• To identify underlying dimensions, or factors, that explain the correlations among a set of
variables.
• To identify a new, smaller set of uncorrelated variables to replace the original set of
correlated variables in subsequent multivariate analysis (regression or discriminant analysis).
• To identify a smaller set of salient variables from a larger set for use in subsequent
multivariate analysis.
All these uses are exploratory in nature and, therefore, factor analysis is also called exploratory
factor analysis (EFA). The technique has numerous applications in marketing research.
• It can be used in market segmentation for identifying the underlying variables on which to
group the customers.
• In product research, factor analysis can be employed to determine the brand attributes that
influence consumer choice.
• In advertising studies, factor analysis can be used to understand the media consumption
habits of the target market.
• In pricing studies, it can be used to identify the characteristics of price-sensitive consumers.
where
Xi = ith standardized variable
Aij = standardized multiple regression coefficient of variable i on common factor j
F = common factor
Vi = standardized regression coefficient of variable i on unique factor i
Ui = the unique factor for variable i
m = number of common factors
The unique factors are uncorrelated with each other and with the common factors. The common
factors themselves can be expressed as linear combinations of the observed variables.
where
Fi = estimate of ith factor
Wi = weight or factor score coefficient
K = number of variables
It is possible to select weights or factor score coefficients so that the first factor explains the largest
portion of the total variance. Then a second set of weights can be selected, so that the second factor
accounts for most of the residual variance, subject to being uncorrelated with the first factor. This
same principle could be applied to selecting additional weights for the additional factors. Thus, the
factors can be estimated so that their factor scores, unlike the values of the original variables, are
not correlated. Furthermore, the first factor accounts for the highest variance in the data, the
second factor the second highest, and so on. A simplified graphical illustration of factor analysis in
the case of two variables is presented in Figure 19.2.
Formal statistics are available for testing the appropriateness of the factor model. Bartlett’s test of
sphericity can be used to test the null hypothesis that the variables are uncorrelated in the
population; in other words, the population correlation matrix is an identity matrix. In an identity
matrix, all the diagonal terms are 1, and all off-diagonal terms are 0. The test statistic for sphericity is
based on a chi-square transformation of the determinant of the correlation matrix. A large value of
the test statistic will favor the rejection of the null hypothesis. If this hypothesis cannot be rejected,
then the appropriateness of factor analysis should be questioned. Another useful statistic is the
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. This index compares the magnitudes of
the observed correlation coefficients to the magnitudes of the partial correlation coefficients. Small
values of the KMO statistic indicate that the correlations between pairs of variables cannot be
explained by other variables and that factor analysis may not be appropriate. Generally, a value
greater than 0.5 is desirable.
In common factor analysis, the factors are estimated based only on the common variance.
Communalities are inserted in the diagonal of the correlation matrix. This method is appropriate
when the primary concern is to identify the underlying dimensions and the common variance is of
interest. This method is also known as principal axis factoring.
Other approaches for estimating the common factors are also available. These include the methods
of unweighted least squares, generalized least squares, maximum likelihood, alpha method, and
image factoring. These methods are complex and are not recommended for inexperienced users.
Rotate Factors
An important output from factor analysis is the factor matrix, also called the factor pattern matrix.
The factor matrix contains the coefficients used to express the standardized variables in terms of the
factors. These coefficients, the factor loadings, represent the correlations between the factors and
the variables. A coefficient with a large absolute value indicates that the factor and the variable are
closely related. The coefficients of the factor matrix can be used to interpret the factors.
Although the initial or unrotated factor matrix indicates the relationship between the factors and
individual variables, it seldom results in factors that can be interpreted, because the factors are
correlated with many variables
In rotating the factors, we would like each factor to have nonzero, or significant, loadings or
coefficients for only some of the variables. Likewise, we would like each variable to have nonzero or
significant loadings with only a few factors, if possible with only one. If several factors have high
loadings with the same variable, it is difficult to interpret them. Rotation does not affect the
communalities and the percentage of total variance explained. However, the percentage of variance
accounted for by each factor does change. This is seen in Table 19.3 by comparing “Extraction Sums
of Squared Loadings” with “Rotation Sums of Squared Loadings.” The variance explained by the
individual factors is redistributed by rotation. Hence, different methods of rotation may result in the
identification of different factors.
The rotation is called orthogonal rotation if the axes are maintained at right angles. The most
commonly used method for rotation is the varimax procedure. This is an orthogonal method of
rotation that minimizes the number of variables with high loadings on a factor, thereby enhancing
the interpretability of the factors. Orthogonal rotation results in factors that are uncorrelated. The
rotation is called oblique rotation when the axes are not maintained at right angles, and the factors
are correlated. Sometimes, allowing for correlations among factors can simplify the factor pattern
matrix. Oblique rotation should be used when factors in the population are likely to be strongly
correlated. Rotation achieves simplicity and enhances interpretability.
Interpret Factors
Interpretation is facilitated by identifying the variables that have large loadings on the same factor.
That factor can then be interpreted in terms of the variables that load high on it. Another useful aid
in interpretation is to plot the variables using the factor loadings as coordinates. Variables at the end
of an axis are those that have high loadings on only that factor, and hence describe the factor.
Variables near the origin have small loadings on both the factors. Variables that are not near any of
the axes are related to both the factors. If a factor cannot be clearly defined in terms of the original
variables, it should be labeled as an undefined or a general factor.
Note that a negative coefficient for a negative variable leads to a positive interpretation.
The weights, or factor score coefficients, used to combine the standardized variables are obtained
from the factor score coefficient matrix. Most computer programs allow you to request factor
scores. Only in the case of principal components analysis is it possible to compute exact factor
scores. Moreover, in principal component analysis, these scores are uncorrelated. In common factor
analysis, estimates of these scores are obtained, and there is no guarantee that the factors will be
uncorrelated with each other. The factor scores can be used instead of the original variables in
subsequent multivariate analysis. The standardized variable values would be multiplied by the
corresponding factor score coefficients to obtain the factor scores.