Factor Analysis
Factor Analysis
Factor analysis is used to uncover the latent structure of a set of variables. It reduces attribute
space from a large no. of variables to a smaller no. of factors and as such is a non dependent
procedure.
1. To reduce a large no. of variable to a smaller no. of factors for modeling purposes, where
the large number of variables precludes modeling all the measures, individually. As such factor
analysis is integrated in structural equation modeling, helping create the latent variables
modeled by SEM (structure equation model).
2. To select a subset of variables from a large set based on which original variable have the
highest correlations with the principal component factors.
4) Local (i.e., conditional independence): Given factor, observed variables are independent of one
another, cov( Xj, Xk l F) = 0
Factor analysis assumes certain conditions that must be met for the technique to be appropriate:
Interval or Ratio Scale: The variables involved should be continuous (interval or ratio
scale). This is because factor analysis aims to identify correlations between variables,
which work best with continuous data.
Linearity: Factor analysis assumes that there is a linear relationship between the
variables. This means that the variables should correlate in a straight-line fashion.
Correlation: Factor analysis is most useful when variables are correlated with each
other. If the variables are uncorrelated, there is little or no common variance to be
explained by factors.
Sample Size: A larger sample size is preferred to ensure the stability and reliability of the
factor analysis results. In general, a minimum sample size of 100-200 is recommended.
The first task in factor analysis is to examine the relationships between the observed variables
through a correlation matrix. A correlation matrix displays the correlations between each pair
of variables. Factor analysis assumes that the underlying structure of the data is revealed through
these correlations.
The correlation matrix provides insight into how much of the variance in one variable can be
explained by the other variables. If the variables show high intercorrelations, it indicates that
they may share a common underlying factor. The correlation matrix is the starting point for
identifying the number of latent factors that can explain the correlations.
Step 3: Factor Extraction
Factor extraction is the process of identifying the underlying factors (latent variables) that
explain the relationships between the observed variables. The goal is to reduce the
dimensionality of the data by extracting a few factors that can summarize the data.
There are various methods for factor extraction, with the most common being:
1. Principal Component Analysis (PCA): PCA is a method that creates new variables
(principal components) by linearly combining the original variables. These components
are ordered by the amount of variance they explain in the data. The first principal
component accounts for the largest variance, and so on. PCA is often used in exploratory
factor analysis to determine how many factors should be retained.
2. Principal Axis Factoring (PAF): This method focuses on identifying the shared
variance among the observed variables. It aims to extract factors that explain the common
variance (the variance shared by the variables), as opposed to the total variance (which
includes both shared and unique variance).
3. Maximum Likelihood Estimation (MLE): MLE is another factor extraction method
that assumes the data follows a multivariate normal distribution. It estimates factors by
maximizing the likelihood that the observed data comes from a specific factor model.
Once the factors are extracted, the next step is rotation. Rotation helps to make the factors more
interpretable and meaningful. Without rotation, the extracted factors may be hard to interpret
because they could be a mixture of different dimensions.
Rotation makes the factor loadings (the correlations between variables and factors) easier to
interpret. High factor loadings indicate a strong association between the variable and the factor,
while low loadings indicate a weak association.
Step 5: Interpretation of Factors
After rotation, the factors are examined to understand what they represent. Each factor is
interpreted based on the variables that have high loadings on it. For example:
A factor that loads highly on variables like "Product Quality," "Customer Service," and
"Reliability" might be interpreted as a "Quality Factor".
A factor that loads highly on "Price Satisfaction," "Discounts," and "Value for Money"
might represent a "Pricing Factor".
The goal is to label and describe the latent factors in a way that is meaningful and useful for
further analysis or decision-making.
Once the factors are identified and interpreted, factor scores are calculated for each observation
(or case) in the dataset. Factor scores represent an individual's position on each factor and can be
treated as new variables for further analysis. They provide a summary of the individual's
responses or characteristics as defined by the extracted factors.
Factor scores can be used in subsequent analyses (e.g., regression, clustering, or classification)
instead of the original observed variables, allowing for a reduced and simplified representation of
the data.