0% found this document useful (0 votes)
12 views21 pages

7 Stages of Factor Analysis PDF - Compressed

Exploratory Factor Analysis (EFA) is a statistical technique used to identify hidden patterns among variables by grouping them into factors without assuming cause-and-effect relationships. It can be applied in two main ways: R-Factor Analysis, which focuses on the relationships among variables, and Q-Factor Analysis, which examines relationships among respondents. EFA helps in data summarization and reduction, allowing researchers to simplify complex datasets while retaining meaningful information for further analysis.

Uploaded by

fatima.puit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

7 Stages of Factor Analysis PDF - Compressed

Exploratory Factor Analysis (EFA) is a statistical technique used to identify hidden patterns among variables by grouping them into factors without assuming cause-and-effect relationships. It can be applied in two main ways: R-Factor Analysis, which focuses on the relationships among variables, and Q-Factor Analysis, which examines relationships among respondents. EFA helps in data summarization and reduction, allowing researchers to simplify complex datasets while retaining meaningful information for further analysis.

Uploaded by

fatima.puit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

What is EFA?

“Exploratory Factor Analysis (EFA) studies how different variables are related and finds hidden patterns
(called factors) that group them. It simplifies many variables into a few meaningful themes without dividing
them into causes and effects, since all variables are looked at together.”
Example:
In a smartphone preference study, we analyze variables like screen size, battery life, camera quality, and
price. When we run an Exploratory Factor Analysis (EFA) on this data, we might identify two underlying
factors: Functionality (battery life, camera quality, screen size) and Affordability (price). This shows
patterns—some customers value features (functionality), while others focus on cost (affordability). These
hidden factors help us understand what drives customer choices.

Exploratory Factor Analysis Decision Process

Stage 1: Objectives of Factor Analysis


Before starting factor analysis, we first need to clearly understand the research problem—what exactly we
are trying to study or find out. It focuses on:
 Is the analysis exploratory or confirmatory?
 If it is confirmatory then are we going to use Structure Equation modeling and if it is Exploratory
then find what is unit of analysis then ultimately which analysis we are going to use. 1- R factor
Analysis or 2- Q factor/ Cluster Analysis.
 Selecting Objective: Do we need to use Data Summarization or Data Reduction Strategy?
This is the starting point for any statistical method. Factor analysis is a method used to simplify complex
data. Imagine you have a lot of variables (questions, traits, or measurements), and you want to reduce them
into a smaller number of groups (called factors) without losing important information. The goal is to find
hidden patterns or common themes in the data that explain the relationships among the original variables.
Main goals of factor Analysis include:
 Specifying the Unit of Analysis (Choosing what you are analysing)
 Achieving Data Summarization Versus Data Reduction
 Variable Selection (Selecting the right variables to include)
 Using Factor Analysis with other Multivariate Techniques

Specifying the Unit of Analysis


Factor analysis can find patterns among variables or among respondents (people). It works by looking at
how things are related. There are 2 ways to use the factor analysis:

1. R-Factor Analysis (Most Common)


 Understands structure among variables.
 Uses a correlation matrix of variables.
 Finds hidden (latent) dimensions that explain relationships among variables. Example: A survey
with 10 questions on customer satisfaction may reveal 3 key areas: "service quality", "product
satisfaction", and "price fairness".

2. Q-Factor Analysis
 Focuses on relationships among respondents.
 Uses a correlation matrix of people.
 Groups people with similar response patterns. Example: A personality survey may group people as
“introverts,” “extroverts,” etc., based on similar answers.
Q-Factor is less used due to complexity. Usually, cluster analysis is preferred to group people.Before
starting, the researcher must decide: Are you analysing variables (R-Factor) or respondents (Q-Factor)?
Most studies use R-factor analysis.

Achieving Data Summarization Versus Data Reduction


Factor analysis gives two important results:

1. Data Summarization
Data summarization is the process of simplifying complex data to identify patterns and group
similar data together. This helps in better understanding and describing the data. Variables can be examined
at two levels:
 Detailed level – where each variable is considered individually.
 General level – where similar variables are grouped and interpreted together.
Example:
We asked customers to rate smartphones based on:
 Screen size
 Battery life
 Camera quality
 Price
After applying Exploratory Factor Analysis (EFA), we discovered two underlying patterns:
 Functionality — includes battery life, camera quality, and screen size
 Affordability — includes price
This is data summarization because we are grouping 4 separate features into 2 main ideas (Functionality
and Affordability). This helps us understand the structure of customer preferences without removing any
data yet.
Factor analysis is an interdependence technique—all variables are treated equally, unlike dependence
techniques like regression or MANOVA which predict outcomes. The goal is to create a small number of
factors that still represent the original variables.
Analogy: Each variable can be seen as a result of all the factors OR each factor can be a summary of all
variables. Factor analysis focuses on finding structure, not prediction.

2. Data Reduction
EFA helps reduce the number of variables while retaining their original meaning and usefulness.
This can be done in two main ways:
 Selecting Key (Representative) Variables
 Creating New Variables (Composite Factors)
These new variables combine related variables into one. This results in a parsimonious set—a simpler group
that still explains the data well. Theory and data both support using fewer variables if they are grouped
meaningfully.
Factor analysis supports:
 Understanding how variables are grouped
 Deciding whether to combine variables or pick representative ones
Example:
Now, instead of using all 4 original ratings for analysis, we create:
 A Functionality score for each customer (based on their ratings of screen size, battery, and camera)
 An Affordability score (based on their price rating)
This is data reduction because we’ve replaced the original variables with 2 factor scores, making our data
simpler but still meaningful for future analysis (like clustering or regression).

Variable Selection (Choosing the Right Variables for Factor Analysis)


Whether you're summarizing or reducing data, choosing the right variables is key. Always think about the
meaning behind each variable:
 What does this variable represent?
 Is it relevant to the themes I want to explore?
Example: If you want to find out what shapes a store’s image but don’t ask about store staff, the factor
“store personnel” won’t appear.
Garbage In, Garbage Out Factor analysis will always give some output. But if input variables are random
or unrelated, the results may not make sense. This is called "garbage in, garbage out."
Why Good Variable Selection Matters:
 The quality of the results depends on the variables selected. Good input gives meaningful results.
 Even for data reduction, factor analysis works best with conceptually strong variables.
Using Factor Analysis with Other Multivariate Techniques
Factor analysis helps uncover hidden patterns in data by identifying how variables are grouped based on
their correlations. This makes it a valuable first step before applying other multivariate techniques like
regression, MANOVA, or discriminant analysis. It assists in selecting the most meaningful variables by
showing which ones are related and form common factors. This avoids redundancy in models where similar
variables may not all be included. Factor analysis also reduces data by creating new variables (e.g., factor
scores or representative variables), making complex datasets easier to handle. These new variables retain
the original meaning but improve efficiency, clarity, and accuracy in further analysis.
Stage 2: Designing an Exploratory Factor Analysis (EFA)
Exploratory Factor Analysis (EFA) is a technique used to uncover hidden structures or patterns among a set
of observed variables (questions or items). In Stage 2, we design how the EFA will be conducted, focusing
on selecting the right variables, ensuring a good sample size, and analyzing the relationships either among
variables or respondents.

Variable Selection and Measurement Issues


Factor analysis is most commonly done using metric variables. These are numerical values that have
meaningful quantitative relationships—such as age, income, or ratings on a scale (e.g., 1 to 5).
Dummy variables (like Yes = 1, No = 0) can be used in small amounts, but EFA works best when most
variables are metric. If your study aims to uncover the underlying structure of a set of factors, try to include
at least five variables (questions) for each expected factor.

Sample Size
The success of EFA depends heavily on having a large enough sample. You must have more observations
(people or cases) than variables (questions). While the absolute minimum is 50 responses, ideally you
should aim for:
- At least 5 to 10 responses per variable.
- For example, if you have 10 variables, you should collect data from at least 50–100 people.
- The more data you have, the more stable and reliable your factor analysis will be.

Correlation Among Variables or Respondents


This stage involves analyzing whether the data is suitable for factor analysis, based on the relationships
either among the variables (R analysis) or among the respondents (Q analysis).

1. R Analysis (Correlation Matrix)


This is the most common form of analysis. A correlation matrix is created to see how closely each
variable is related to the others. This helps identify which variables group together to form factors.
Use R analysis when your goal is to group related variables and uncover patterns in questions or
measurements.

2. Q Analysis (Similarity/Distance Matrix)


Q analysis focuses on the similarities between respondents, not the variables. It uses a similarity or
distance matrix to identify how closely different individuals respond. This is useful when you want to
segment respondents into groups with similar answer patterns.
Use Q analysis when you're more interested in identifying respondent types or profiles rather than
relationships among variables.
Stage 3: Assumptions in Exploratory Factor Analysis
Exploratory Factor Analysis (EFA) is used to uncover the underlying structure of a relatively large set of
variables. However, its effective application relies on several key assumptions and diagnostics to ensure
that the data are suitable for factor extraction.

Bartlett’s Test of Sphericity


This test assesses whether the correlation matrix significantly differs from an identity matrix (where all
correlations are zero).
 Assumption: The variables should be sufficiently correlated.
 Criteria: A significant p-value (p < .05) suggests that factor analysis is appropriate.
 Note: Larger sample sizes improve the test’s sensitivity.

Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy


The KMO statistic evaluates the proportion of variance among variables that might be common variance
(i.e., potentially explained by underlying factors).
Assumption: Sampling adequacy is required to proceed with EFA.
Acceptable Values:
 ≥ 0.80 – Meritorious
 ≥ 0.70 – Middling
 ≥ 0.60 – Mediocre
 ≥ 0.50 – Miserable
 < 0.50 – Unacceptable
Individual Variable KMO Values:
 Variables with KMO < 0.50 should be considered for removal.
 Delete variables one by one, starting with the lowest KMO, until all remaining variables meet
the threshold.

Partial Correlations & Anti-Image Matrix


 Assumption: Variables should share common variance, not partial correlations.
 High partial correlations (> 0.7) indicate that variables may not belong to the same underlying
factor, suggesting EFA is inappropriate.
 Exception: If two variables load highly on the same factor, their high partial correlation can be
tolerated.
 The anti-image matrix is commonly used in SPSS/SAS to observe the negative partial
correlations.

Correlation Matrix Inspection


 Assumption: Variables must be interrelated but not too highly.
 Guidelines:
o At least some correlations should exceed 0.30.
o If most values are low or uniform, the data may not support factor extraction.
Multicollinearity
 Assumption: Moderate multicollinearity is desirable—it indicates shared variance suitable for
uncovering latent factors.
 Caution: Excessive multicollinearity (very high correlations) may distort factor loadings.

Normality, Linearity, and Homoscedasticity


 Normality: While not a strict requirement, it is beneficial if statistical tests (e.g., significance of
loadings) are applied.
 Linearity: The relationship between variables should be linear, as correlation-based methods like
EFA rely on this property.
 Homoscedasticity: Equal variance across variables helps maintain the validity of correlations.
Stage 4: Deriving Factors and Assessing Overall Fit
Selecting the Factor Extraction Method
In factor analysis, selecting the correct factor extraction method is foundational to the integrity of your
results. This choice essentially sets the stage for how you interpret the relationships among the variables.
Factor extraction methods are statistical procedures used to identify latent factors—underlying variables
that cannot be measured directly but explain observed patterns in the data. Depending on your research
objectives, you may choose to extract factors that account for all variance or only shared variance. The
selection should align with whether your primary goal is to simplify data for predictive modeling or to
explore underlying theoretical constructs.

Partitioning the Variance


At the heart of factor analysis lies the concept of variance—how much a variable deviate from its mean.
The total variance of each observed variable is divided into several components: common variance,
specific variance, and error variance. Each component has a distinct role in explaining the relationship
between variables.

1. Common Variance
Common variance refers to the proportion of a variable’s variance that it shares with other variables
in the dataset. For example, in a psychological survey measuring anxiety, stress, and nervousness, these
three items likely have high common variance because they reflect a shared underlying condition. This
shared variance is what factor analysis tries to uncover through latent factors.

2. Unique Variance
Unique variance includes both specific and error variance. It captures aspects of the variable that
are not explained by the common factors.
 Specific Variance
Specific variance represents characteristics of a variable that are not shared with others in the
analysis. Taking the same survey example, suppose a question about social anxiety is only weakly
related to other variables but still relevant—its variance is mostly specific.
 Error Variance
Error variance is the randomness or noise introduced during data collection, like respondent
misunderstanding or imprecise measurement tools. For example, if some participants
misunderstood a survey question, their answers would introduce error variance, not reflecting the
true trait being measured.

How Extraction Method Affects the Variance Included


The method you choose for extracting factors determines how these types of variances are handled.

1. Principal Component Analysis (PCA)


PCA includes all the variance—common, specific, and error—when generating components. This
means PCA focuses on summarizing the data, not necessarily uncovering underlying constructs. It’s
like compressing a zip file to make it lighter, but you're keeping all the content inside.

2. Common Factor Analysis (CFA)


CFA deliberately ignores specific and error variance, zooming in only on common variance. It’s
better suited for theoretical work where you're trying to model deeper patterns and latent traits that
explain the relationships among observed variables. Think of CFA as a sculptor chiseling away the
noise to reveal the hidden shape within a block of stone.

When to Use PCA vs CFA


The choice between PCA and CFA depends on your research priorities.

1. Use PCA When You Need Data Reduction


PCA is highly effective when the goal is simply to reduce the number of variables while retaining
as much information as possible. For instance, in market research, if you're working with 20 consumer
behavior variables and want to reduce them to 3–5 main dimensions for segmentation, PCA is ideal. It
retains maximum variance regardless of its source.

2. Use CFA When Exploring Underlying Constructs


On the other hand, CFA is the method of choice when you're aiming to understand theoretical
constructs or develop psychological scales. For example, if you’re designing a questionnaire to measure
self-esteem, you would use CFA to ensure that the questions (variables) truly reflect one or more underlying
traits, not just any shared patterns due to coincidence or noise.

Stopping Criteria: Deciding the Number of Factors to Retain


Once factors have been extracted, the next big question is: how many should you keep? Stopping criteria
help researchers decide when to stop extracting more factors, ensuring a balance between model simplicity
and explanatory power.

1. A Priori Criterion
One basic approach is the a priori criterion, where the researcher decides in advance how many
factors to retain. This might be based on theoretical expectations, past studies, or the practical need for a
fixed number of categories.

2. Latent Root Criterion (Kaiser Rule)


A more statistical method is the latent root criterion, often called the Kaiser rule. Here, only
factors with eigenvalues greater than 1 are retained. Eigenvalues represent the amount of variance explained
by each factor, so a factor with an eigenvalue less than 1 explains less variance than an individual variable
would by itself—which defeats the purpose.

3. Percentage of Variance Explained


Another approach involves assessing the percentage of variance explained. Researchers might
decide to retain enough factors to explain at least 60–70% of the total variance. This ensures that the model
provides a meaningful summary of the original data without being too complex.

4. Scree Test
The scree test offers a visual method. You graph the eigenvalues in descending order and look for
an “elbow”—the point where the curve begins to flatten. The idea is to retain all the factors before the
elbow, as these capture the most meaningful variance. For example, if you're analyzing student performance
across multiple subjects and the scree plot levels off after three factors, that suggests three underlying
abilities or themes.
5. Parallel
Analysis
Perhaps the most advanced method is parallel analysis. This method involves generating random datasets
with the same structure and then comparing the eigenvalues from your real data to those from the random
ones. You retain only the factors whose eigenvalues are larger than the averages from these random datasets.
For example, if your third factor has an eigenvalue of 1.2, but in the simulated data it averages at 1.4, you
discard it—because it’s not strong enough to stand out from random noise.

Combining Stopping Criteria: The Practical Approach


In real-world research, it's rarely wise to rely on just one stopping rule. For best results, researchers typically
use a combination of criteria. For instance, you might look at both the eigenvalues and the scree plot, then
run a parallel analysis to confirm your decision. It’s also useful to explore alternative solutions—say, one
with one more factor and one with one less—and compare how well each model fits your data.
Let’s say your data yields five factors with eigenvalues above 1.0, but the scree plot shows an elbow at the
third factor. Meanwhile, the parallel analysis supports retaining only the first three. You’d likely go with
three or four factors, unless theory strongly suggests otherwise.
In studies with diverse subgroups—like a national survey covering different regions—you might even retain
more factors to reflect the underlying heterogeneity.

Alternatives to PCA and CFA


In some cases, traditional PCA or CFA might not be suitable—especially when dealing with non-continuous
data.

1. Optimal Scaling for Categorical Data


For categorical variables, optimal scaling can be used to transform ordinal or nominal data into
interval-level data. This lets you perform factor analysis even on surveys with “yes/no” or ranked responses.
However, these transformations are dataset-specific and may not be reliable across other analyses.

2. Variable Clustering as an Alternative


Another option is variable clustering, particularly using procedures like PROC VARCLUS in
SAS. This approach treats variables more hierarchically, creating clusters that resemble components. It's
especially helpful when dealing with large variable sets where PCA might become unstable or when
identifying representative items within clusters is a priority.
Stage 5: Interpreting the Factors
The Three Processes of Factor Interpretation
1. Estimate the Factor Matrix
 Factor Matrix: A factor matrix is a table that shows how strongly each variable is related to each
factor.
 Calculating how much each variable is linked to each factor. These links are called factor loadings.
A factor loading is a number between -1 and +1:
o A value close to +1 or -1 = strong relationship
o A value close to 0 = weak or no relationship

2. Factor Rotation
Rotation rearranges the factor structure to make it easier to understand. It doesn't change the data
but the way it present. Suppose:
 Factor 1 → “Customer Service”
 Factor 2 → “Product Quality”
Without rotation, a variable like “Return Policy” might load on both. After rotation: “Return Policy” loads
strongly on Factor 1 (Customer Service) only.

3. Factor Interpretation and Respecification


 Evaluate the (rotated) factor loadings for each variable in order to determine that variable’s role
and contribution in determining the factor structure.
 Respecification involves eliminating variables or extract different number of factors.

Factor Rotation
1. Process
The reference axes of the factors are turned about the origin until some other position has been
reached. Loadings of each variable remain fixed relative to other loadings.

2. Impact
The goal of rotating the factor matrix is to make things clearer:
 It helps to separate the factors more cleanly.
 You get a simple and meaningful pattern where each variable is strongly linked to one factor.
Example: Without rotation, “Fast service” might relate a little to Factor 1 and a little to Factor 2
After rotation, it might clearly relate only to Customer Service (Factor 1), making it easier to interpret.

3. Alternative Methods
Orthogonal Rotation (e.g., VARIMAX)
 Keeps the factors independent (not related to each other).
 Most commonly used.
 Simple structure.
 Example: “Clean store” loads only on Factor 1 (Customer Experience), and doesn’t affect
Factor 2 at all.
Oblique Rotation
 Allows correlation between factors (real-life factors are often related).
 More flexible but more complex.
 Example: “Product Quality” might relate to both Value for Money and Product Features —
oblique rotation allows you to show this relationship.

4. Oblique Rotation VS Orthogonal Rotation

Oblique Rotation Orthogonal Rotation


5. Choosing Factor Rotation Methods
Orthogonal rotation methods
 Most widely used rotational methods.
 Preferred method when the research goal is data reduction to either a smaller number of variables
or a set of uncorrelated measures for subsequent use in other multivariate techniques.
 Suppose you are analyzing customer reviews and get:
o Factor 1 = “Customer Service”
o Factor 2 = “Product Price”
In orthogonal rotation, these two factors are assumed to be completely separate — meaning
if someone likes the service, it has no connection to how they feel about price.
Oblique rotation methods
 This method allows the factors to be related to each other.
 Useful when your factors influence one another (which is common in real life).
 Let’s say you are studying student behavior and find:
o Factor 1 = “Study Habits”
o Factor 2 = “Academic Performance”
Here, better study habits likely lead to better academic performance — so the two factors are correlated.
Judging the Significance of Factor Loadings
1. Practical Significance
Loading Value Meaning

Less than ±0.10 Useless / same as zero

±0.30 to ±0.40 Minimum acceptable level

±0.50 or more Strong enough to be useful

±0.70 or more Very strong / clearly defined factor

2. Factor Structure
o A smaller loading is needed given either a larger sample size or a larger number of
variables.
o A larger loading is needed given a factor solution with a larger number of factors.

3. Statistical Significance
Checking sample size using the following table.

Interpreting a Factor Matrix


When you're using EFA not just to reduce variables, but to understand patterns or label the factors,
you need to interpret the matrix.

1. Simple Structure
The ideal result of a factor matrix is a simple structure, which means:
 Each variable loads strongly on only one factor (not multiple).
 Each factor only has strong loadings for a few variables (not all).
This makes it easier to name the factors and understand the results.
2. Five step process
 Examine the factor matrix of loadings.
 Identify the significant loading(s) for each variable.
 Assess the communalities of the variables.
 Respecify the factor model if needed.
 Label the factors.

Examine the factor matrix of loadings


1. Starting Point
o For each variable, look for the highest or most meaningful loading.
o Identify if the variable loads strongly on just one factor or multiple factors.
o Cross-loading happens when a variable has a strong loading on more than one factor.
This creates confusion about which factor the variable really belongs to.
 Let’s say the variable “Product Availability” loads:
 0.60 on Factor 1 (Customer Service)
 0.58 on Factor 2 (Logistics)
 This is a cross-loading — and you may need to drop or reassess that variable.

2. Two Principles to Assist in Identifying Cross-loadings


o Compare Variances, Not Loadings
Instead of comparing loadings (e.g., 0.55 vs 0.45), square them first to get variance:
0.55² = 0.3025
0.45² = 0.2025
Variance tells us how much of a variable is explained by a factor — a more reliable measure.
o Compare Ratios of Variances
Relative magnitude expressed as the ratio of two variances (larger variance / smaller variance)
which “scales” the variance difference to the size of the two variances.

Process For Identifying Cross-Loadings


Identify potential cross-loadings: Check if a variable has two or more significant loadings
(typically ≥ 0.30 or 0.40) on different factors.
Compute the ratio of the squared loadings: Square each of the loadings and then compute the
ratio of the larger loading to the smaller loading.
Designate the pair of loadings as follows based on the ratio:
Ratio What It Means What You Should Do
1.0 to 1.5 Problematic cross- The two loadings are too close → the variable doesn’t clearly
loading belong to one factor → remove or fix it

1.5 to 2.0 Potential cross- One factor is a bit stronger → maybe keep it if the interpretation
loading makes sense

More Safe The stronger loading is clearly dominant → you can ignore the
than 2.0 weaker one
Let’s Take an example

Each variable represents a customer opinion, and we see how strongly it connects to different store-related
factors.
Variable What it means
Var 1 – Friendly staff Customers rating staff behavior
Var 2 – Return policy Opinions on store return process
Var 3 – Checkout speed How fast the billing process feels
Var 4 – Product availability Whether items are always in stock

This table shows how these variables load on three factors:


 Customer Service
 Policy & Management
 Store Operations
The "Ratio" column helps judge whether a variable is clearly connected to one factor or confused
across multiple. The "Classification" tells you what to do with it:
 Ignorable → Safe to keep
 Potential → Use your judgment
 Problematic → Consider removing

Assess the Communalities of the Variables


What is Communality?
How much of a variable is explained by all the factors combined. It is calculated by Adding the
squared loadings of a variable from all the factors it loads on.
Use as an Evaluative Measure
You can use communalities to decide if a variable is useful or not.
 Look at each variable’s communality (the total variance it explains).
 If it explains less than 50% (0.50) of its meaning through the factors, it's probably not a
good fit.

Respecify the Factor Model if Needed


What does “Respecify the Model” mean?
Make changes to your factor model if it isn’t working properly.
When should you consider respecifying?
Three situations:
 A variable has no strong loading (nothing above 0.40 or 0.50).
→ It’s not clearly connected to any factor.
 A variable has a strong loading, but communality is too low (less than 0.50).
→ It’s not well explained by the factors.
 A variable has cross-loading (loads on more than one factor).
→ It creates confusion in interpretation.
What can you do if Respecification is needed?

You have many options:

Option Meaning
Ignore the variable Let it stay if it doesn’t hurt much
Delete the variable Remove it from your analysis
Try a different rotation Use another rotation method like Promax or Varimax
Use a different number of factors Extract more or fewer factors

Label the Factors


Give each factor a meaningful name (label) based on what the grouped variables represent.
Variables with higher loadings are more important when deciding what to name the factor.
Considerations
 Signs (positive or negative):
 A positive sign means the variable is positively related to the factor.
 A negative sign means it’s negatively related.
 Example: “Long wait time” = -0.65 → negatively related to “Customer
Satisfaction”.
 The label is intuitively developed by the researcher based on its appropriateness for
representing the underlying dimensions of a particular factor.
Rules of Thumb – Interpreting the Factors
 An optimal structure exists when all variables have high loadings only on a single factor.
 Variables that cross-load (load highly on two or more factors) are usually deleted unless
theoretically justified or the objective is strictly data reduction.
 Variables should generally have communalities of greater than .50 to be retained in the analysis.
 Respecification of a factor analysis can include options such as:
o Deleting a variable(s),
o Changing rotation methods, and/or
o Increasing or decreasing the number of factors.
Stage 6: Validation of Exploratory Factor Analysis
Validation in Exploratory Factor Analysis (EFA) ensures that the discovered factors are reliable,
meaningful, and not just random artifacts of the data. This process involves three key steps: checking if
results can be replicated, assessing whether the factor structure holds up under different conditions, and
identifying if any unusual data points are distorting the findings.

Use of Replication or a Confirmatory Perspective


The strongest way to validate EFA results is to test whether the same factor structure appears in new data.
This can be done by splitting the original dataset into two parts and running separate EFAs on each half to
see if the factors match. Alternatively, researchers can collect a completely new sample and repeat the
analysis. If the factors remain consistent, they are more likely to represent true underlying patterns rather
than chance groupings. Another approach is Confirmatory Factor Analysis (CFA), which uses statistical
tests to check if a predefined factor model fits new data. While CFA is more rigorous, it requires specialized
software and stronger assumptions, making replication through split samples a practical first step.

Assessing Factor Structure Stability


A stable factor structure means the results don’t change drastically with minor variations in the data.
Stability depends heavily on having a large enough sample size—generally, more cases per variable lead to
more reliable factors. Researchers can test stability by randomly creating smaller subsets of the data and
comparing the factors across them. If the same factors keep appearing, the solution is considered robust.
Simpler models with fewer factors also tend to be more stable, as overly complex structures often break
down when tested in new samples.

Detecting Influential Observations


Sometimes, just a few extreme responses—outliers—can heavily influence the factors. To check for this,
researchers can run the EFA multiple times, removing suspicious cases each time, and observe whether the
factors change significantly. Statistical measures, such as covariance ratios, can also help flag observations
that have an unusually strong impact on the results. If removing an outlier makes the factor structure clearer
or more interpretable, it might be justified to exclude that case. However, outliers should only be dropped
if there’s a valid reason, such as measurement errors, as genuine but unusual responses can sometimes
provide important insights.
STAGE 7: ADDITIONAL USES OF FACTOR ANALYSIS
RESULTS
Factor analysis results can be used beyond interpretation, depending on research goals. If the aim is to
understand variable interrelationships, factor interpretation alone may suffice. For further statistical
analysis, data reduction will be employed. The two options include the following:
 Selecting the variable with the highest factor loading as a surrogate representative for a
particular factor dimension
 Replacing the original set of variables with an entirely new, smaller set of variables created
either from summated scales or fac or scores
These reduced variables can be used in techniques like regression, Multivariate Analysis of Variance
(MANOVA), or cluster analysis.

Selecting Surrogate Variables for Subsequent Analysis


When researchers aim to use factor analysis for selecting variables for further statistical techniques, one
option is to choose a surrogate variable—the variable with the highest loading on each factor. This method
is simple when one variable clearly dominates in loading, but it becomes difficult when multiple variables
have similar high loadings. In such cases, theoretical knowledge or variable reliability can guide the
selection, even if the chosen variable doesn’t have the absolute highest loading.
However, this approach has drawbacks. It may ignore measurement error and oversimplify complex
constructs by reducing them to a single variable, which can lead to misleading results. For example, if
price competitiveness, product quality, and value all load highly on a factor, selecting only one may distort
the interpretation. In such situations, it is often better to use summated scales or factor scores to more
accurately represent the factor.

Creating Summated Scales


A summated scale is a composite measure formed by averaging or summing variables that load highly on
a factor. This method helps replace multiple related variables with a single, simplified measure. It offers
two main benefits:
 Reduces measurement error by combining multiple indicators, lowering the impact of
inaccuracies in individual responses.
 Represents multiple aspects of a concept in a single variable, enhancing clarity and
parsimony in multivariate models by avoiding redundancy.
The construction of summated scales is grounded in theoretical and empirical practices across fields like
psychometrics and marketing. Four key issues in scale construction include:
 Conceptual definition
 Dimensionality
 Reliability
 Validity

Conceptual definition is the foundation for creating a summated scale. It clearly defines the concept
being measured in theoretical or practical terms relevant to the research context—such as satisfaction,
value, or image. This definition guides the selection of variables included in the scale. Content validity
(also called face validity) evaluates how well the selected variables align with the conceptual definition. It
is typically assessed through expert judgment, pretests, or reviews to ensure the items reflect both
theoretical and practical aspects of the concept—not just statistical criteria.
Dimensionality refers to the requirement that items in a summated scale must be unidimensional,
meaning they all represent a single concept and are strongly related. This ensures that the scale measures
one underlying factor. Factor analysis (exploratory or confirmatory) is used to assess dimensionality by
checking if all items load highly on a single factor. If multiple dimensions are involved, each should form
a separate factor and be measured by its own set of items.

Reliability measures the consistency of a variable across multiple observations. It ensures that results
are stable over time and that scale items are consistently measuring the same construct. There are two main
types:
 Test-retest reliability: Checks stability over time by comparing responses from the same
individual at two points.
 Internal consistency: Assesses how well items in a summated scale correlate with each other,
indicating they measure the same concept.
Key diagnostic tools for internal consistency include:
 Item-to-total correlation (should exceed 0.50)
 Inter-item correlation (should exceed 0.30)
 Cronbach’s alpha (should be ≥ 0.70, or ≥ 0.60 for exploratory research)
Other advanced measures include composite reliability and average variance extracted from
confirmatory factor analysis. Reliability must be evaluated before testing the validity of any summated
scale.

Validity is the extent to which a summated scale accurately represents the intended concept. After
confirming the scale's conceptual definition, uni-dimensionality, and reliability, validity must be assessed.
Besides content (face) validity, three major forms of empirical validity are:
 Convergent validity: Measures the correlation between the scale and other measures of the
same concept; high correlation confirms accuracy.
 Discriminant validity: Assesses whether the scale is distinct from other, conceptually similar
measures; low correlation indicates good distinction.
 Nomological validity: Evaluates whether the scale behaves as expected within a theoretical
model by predicting relationships supported by prior research.
Various methods like MTMM matrices and structural equation modeling can be used to test these forms
of validity.
Calculating a summated scale involves either summing or averaging the items that load highly from factor
analysis. The most common method is averaging the items, which simplifies calculations for further
analysis.
When items within a factor have both positive and negative loadings, reverse scoring is necessary to ensure
all loadings are positive. Negative-loading items are reversed by subtracting their values from a maximum
score (e.g., 10). This process ensures that the scale correctly distinguishes between higher and lower scores,
preventing cancellation of differences between variables with opposing loadings.

Computing Factor Scores


Factor scores are composite measures that represent an individual's degree of association with a factor,
based on their scores on variables with high loadings. Unlike a summated scale, which combines selected
variables, factor scores consider all variables' loadings on a factor. Statistical programs can easily compute
these scores for each respondent, but their main limitation is that they are study-specific and cannot be
easily replicated across different studies without substantial computational effort.

Selecting Among the Three Methods


Selecting Among the Three Data Reduction Methods involves choosing based on research goals, data use,
and scale quality. Here are the key decision guidelines:
 Use factor scores if data will only be used within the original sample or if orthogonality
(independence between factors) must be preserved.
 Use summated scales or surrogate variables when generalizability or transferability across
samples is needed.
 Summated scales are the preferred option if they are well-constructed, reliable, and valid.
 If the summated scale is exploratory or lacks testing, and further refinement isn’t possible,
surrogate variables are a better choice.
Summary
Exploratory Factor Analysis (EFA) is a statistical method used to identify underlying patterns or factors
among related variables, simplifying complex datasets into meaningful themes without implying
causation. It supports data summarization and reduction by grouping variables based on correlations,
enhancing clarity for further analysis. Key stages of EFA include defining objectives, selecting suitable
variables, ensuring adequate sample size (typically 5–10 cases per variable), and determining the analysis
type—R-factor (variables) or Q-factor (respondents). Assumption checks, such as Bartlett’s test and KMO
measure, confirm data suitability by assessing correlations, multicollinearity, and sampling adequacy.
During extraction, researchers choose between PCA (for data reduction) and CFA (for identifying latent
constructs), using methods like the Kaiser rule, scree plot, and parallel analysis to determine the number
of factors to retain. Factor interpretation involves estimating loadings, rotating the factor matrix
(orthogonal or oblique), and refining the model based on cross-loadings and communalities, with clear
labels assigned to each factor. Validation ensures reliability through replication across datasets, testing
factor stability, and detecting influential outliers. EFA results are valuable beyond interpretation, aiding
further statistical analyses through data reduction methods such as surrogate variables, summated scales,
or factor scores. Summated scales are typically preferred for their reliability and generalizability, while
factor scores are more suited for within-sample use. The choice among methods depends on research
goals, data characteristics, and the intended application of results.

You might also like