7 Stages of Factor Analysis PDF - Compressed
7 Stages of Factor Analysis PDF - Compressed
“Exploratory Factor Analysis (EFA) studies how different variables are related and finds hidden patterns
(called factors) that group them. It simplifies many variables into a few meaningful themes without dividing
them into causes and effects, since all variables are looked at together.”
Example:
In a smartphone preference study, we analyze variables like screen size, battery life, camera quality, and
price. When we run an Exploratory Factor Analysis (EFA) on this data, we might identify two underlying
factors: Functionality (battery life, camera quality, screen size) and Affordability (price). This shows
patterns—some customers value features (functionality), while others focus on cost (affordability). These
hidden factors help us understand what drives customer choices.
2. Q-Factor Analysis
Focuses on relationships among respondents.
Uses a correlation matrix of people.
Groups people with similar response patterns. Example: A personality survey may group people as
“introverts,” “extroverts,” etc., based on similar answers.
Q-Factor is less used due to complexity. Usually, cluster analysis is preferred to group people.Before
starting, the researcher must decide: Are you analysing variables (R-Factor) or respondents (Q-Factor)?
Most studies use R-factor analysis.
1. Data Summarization
Data summarization is the process of simplifying complex data to identify patterns and group
similar data together. This helps in better understanding and describing the data. Variables can be examined
at two levels:
Detailed level – where each variable is considered individually.
General level – where similar variables are grouped and interpreted together.
Example:
We asked customers to rate smartphones based on:
Screen size
Battery life
Camera quality
Price
After applying Exploratory Factor Analysis (EFA), we discovered two underlying patterns:
Functionality — includes battery life, camera quality, and screen size
Affordability — includes price
This is data summarization because we are grouping 4 separate features into 2 main ideas (Functionality
and Affordability). This helps us understand the structure of customer preferences without removing any
data yet.
Factor analysis is an interdependence technique—all variables are treated equally, unlike dependence
techniques like regression or MANOVA which predict outcomes. The goal is to create a small number of
factors that still represent the original variables.
Analogy: Each variable can be seen as a result of all the factors OR each factor can be a summary of all
variables. Factor analysis focuses on finding structure, not prediction.
2. Data Reduction
EFA helps reduce the number of variables while retaining their original meaning and usefulness.
This can be done in two main ways:
Selecting Key (Representative) Variables
Creating New Variables (Composite Factors)
These new variables combine related variables into one. This results in a parsimonious set—a simpler group
that still explains the data well. Theory and data both support using fewer variables if they are grouped
meaningfully.
Factor analysis supports:
Understanding how variables are grouped
Deciding whether to combine variables or pick representative ones
Example:
Now, instead of using all 4 original ratings for analysis, we create:
A Functionality score for each customer (based on their ratings of screen size, battery, and camera)
An Affordability score (based on their price rating)
This is data reduction because we’ve replaced the original variables with 2 factor scores, making our data
simpler but still meaningful for future analysis (like clustering or regression).
Sample Size
The success of EFA depends heavily on having a large enough sample. You must have more observations
(people or cases) than variables (questions). While the absolute minimum is 50 responses, ideally you
should aim for:
- At least 5 to 10 responses per variable.
- For example, if you have 10 variables, you should collect data from at least 50–100 people.
- The more data you have, the more stable and reliable your factor analysis will be.
1. Common Variance
Common variance refers to the proportion of a variable’s variance that it shares with other variables
in the dataset. For example, in a psychological survey measuring anxiety, stress, and nervousness, these
three items likely have high common variance because they reflect a shared underlying condition. This
shared variance is what factor analysis tries to uncover through latent factors.
2. Unique Variance
Unique variance includes both specific and error variance. It captures aspects of the variable that
are not explained by the common factors.
Specific Variance
Specific variance represents characteristics of a variable that are not shared with others in the
analysis. Taking the same survey example, suppose a question about social anxiety is only weakly
related to other variables but still relevant—its variance is mostly specific.
Error Variance
Error variance is the randomness or noise introduced during data collection, like respondent
misunderstanding or imprecise measurement tools. For example, if some participants
misunderstood a survey question, their answers would introduce error variance, not reflecting the
true trait being measured.
1. A Priori Criterion
One basic approach is the a priori criterion, where the researcher decides in advance how many
factors to retain. This might be based on theoretical expectations, past studies, or the practical need for a
fixed number of categories.
4. Scree Test
The scree test offers a visual method. You graph the eigenvalues in descending order and look for
an “elbow”—the point where the curve begins to flatten. The idea is to retain all the factors before the
elbow, as these capture the most meaningful variance. For example, if you're analyzing student performance
across multiple subjects and the scree plot levels off after three factors, that suggests three underlying
abilities or themes.
5. Parallel
Analysis
Perhaps the most advanced method is parallel analysis. This method involves generating random datasets
with the same structure and then comparing the eigenvalues from your real data to those from the random
ones. You retain only the factors whose eigenvalues are larger than the averages from these random datasets.
For example, if your third factor has an eigenvalue of 1.2, but in the simulated data it averages at 1.4, you
discard it—because it’s not strong enough to stand out from random noise.
2. Factor Rotation
Rotation rearranges the factor structure to make it easier to understand. It doesn't change the data
but the way it present. Suppose:
Factor 1 → “Customer Service”
Factor 2 → “Product Quality”
Without rotation, a variable like “Return Policy” might load on both. After rotation: “Return Policy” loads
strongly on Factor 1 (Customer Service) only.
Factor Rotation
1. Process
The reference axes of the factors are turned about the origin until some other position has been
reached. Loadings of each variable remain fixed relative to other loadings.
2. Impact
The goal of rotating the factor matrix is to make things clearer:
It helps to separate the factors more cleanly.
You get a simple and meaningful pattern where each variable is strongly linked to one factor.
Example: Without rotation, “Fast service” might relate a little to Factor 1 and a little to Factor 2
After rotation, it might clearly relate only to Customer Service (Factor 1), making it easier to interpret.
3. Alternative Methods
Orthogonal Rotation (e.g., VARIMAX)
Keeps the factors independent (not related to each other).
Most commonly used.
Simple structure.
Example: “Clean store” loads only on Factor 1 (Customer Experience), and doesn’t affect
Factor 2 at all.
Oblique Rotation
Allows correlation between factors (real-life factors are often related).
More flexible but more complex.
Example: “Product Quality” might relate to both Value for Money and Product Features —
oblique rotation allows you to show this relationship.
2. Factor Structure
o A smaller loading is needed given either a larger sample size or a larger number of
variables.
o A larger loading is needed given a factor solution with a larger number of factors.
3. Statistical Significance
Checking sample size using the following table.
1. Simple Structure
The ideal result of a factor matrix is a simple structure, which means:
Each variable loads strongly on only one factor (not multiple).
Each factor only has strong loadings for a few variables (not all).
This makes it easier to name the factors and understand the results.
2. Five step process
Examine the factor matrix of loadings.
Identify the significant loading(s) for each variable.
Assess the communalities of the variables.
Respecify the factor model if needed.
Label the factors.
1.5 to 2.0 Potential cross- One factor is a bit stronger → maybe keep it if the interpretation
loading makes sense
More Safe The stronger loading is clearly dominant → you can ignore the
than 2.0 weaker one
Let’s Take an example
Each variable represents a customer opinion, and we see how strongly it connects to different store-related
factors.
Variable What it means
Var 1 – Friendly staff Customers rating staff behavior
Var 2 – Return policy Opinions on store return process
Var 3 – Checkout speed How fast the billing process feels
Var 4 – Product availability Whether items are always in stock
Option Meaning
Ignore the variable Let it stay if it doesn’t hurt much
Delete the variable Remove it from your analysis
Try a different rotation Use another rotation method like Promax or Varimax
Use a different number of factors Extract more or fewer factors
Conceptual definition is the foundation for creating a summated scale. It clearly defines the concept
being measured in theoretical or practical terms relevant to the research context—such as satisfaction,
value, or image. This definition guides the selection of variables included in the scale. Content validity
(also called face validity) evaluates how well the selected variables align with the conceptual definition. It
is typically assessed through expert judgment, pretests, or reviews to ensure the items reflect both
theoretical and practical aspects of the concept—not just statistical criteria.
Dimensionality refers to the requirement that items in a summated scale must be unidimensional,
meaning they all represent a single concept and are strongly related. This ensures that the scale measures
one underlying factor. Factor analysis (exploratory or confirmatory) is used to assess dimensionality by
checking if all items load highly on a single factor. If multiple dimensions are involved, each should form
a separate factor and be measured by its own set of items.
Reliability measures the consistency of a variable across multiple observations. It ensures that results
are stable over time and that scale items are consistently measuring the same construct. There are two main
types:
Test-retest reliability: Checks stability over time by comparing responses from the same
individual at two points.
Internal consistency: Assesses how well items in a summated scale correlate with each other,
indicating they measure the same concept.
Key diagnostic tools for internal consistency include:
Item-to-total correlation (should exceed 0.50)
Inter-item correlation (should exceed 0.30)
Cronbach’s alpha (should be ≥ 0.70, or ≥ 0.60 for exploratory research)
Other advanced measures include composite reliability and average variance extracted from
confirmatory factor analysis. Reliability must be evaluated before testing the validity of any summated
scale.
Validity is the extent to which a summated scale accurately represents the intended concept. After
confirming the scale's conceptual definition, uni-dimensionality, and reliability, validity must be assessed.
Besides content (face) validity, three major forms of empirical validity are:
Convergent validity: Measures the correlation between the scale and other measures of the
same concept; high correlation confirms accuracy.
Discriminant validity: Assesses whether the scale is distinct from other, conceptually similar
measures; low correlation indicates good distinction.
Nomological validity: Evaluates whether the scale behaves as expected within a theoretical
model by predicting relationships supported by prior research.
Various methods like MTMM matrices and structural equation modeling can be used to test these forms
of validity.
Calculating a summated scale involves either summing or averaging the items that load highly from factor
analysis. The most common method is averaging the items, which simplifies calculations for further
analysis.
When items within a factor have both positive and negative loadings, reverse scoring is necessary to ensure
all loadings are positive. Negative-loading items are reversed by subtracting their values from a maximum
score (e.g., 10). This process ensures that the scale correctly distinguishes between higher and lower scores,
preventing cancellation of differences between variables with opposing loadings.