0% found this document useful (0 votes)
5 views

Exploratory Data Analysis v3 Part3

Uploaded by

ahmedpandit48
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Exploratory Data Analysis v3 Part3

Uploaded by

ahmedpandit48
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Exploratory Data

Analysis
Factor Analysis
Part 3
Recall

Identifying the
Right Data

Clean the Data

What is in the
data?
Multivariate Analysis
• used to analyze data that involves multiple variables or observations
simultaneously
Shopping Avg. Purchase Product Online Loyalty
Customer ID Age Income Frequency Amount Categories Shopping Points

Electronics,
1 35 $60,000 Weekly $100 Clothing Yes 1200

Groceries,
2 45 $75,000 Monthly $75 Electronics No 800
Clothing,
3 28 $40,000 Rarely $200 Jewelry Yes 500
Groceries,
4 50 $90,000 Weekly $150 Electronics No 1500
Clothing,
5 22 $25,000 Monthly $50 Shoes Yes 300
Multivariate Analysis
Techniques
Factor Analysis
• used to uncover underlying latent factors that influence observed variables.
• often used in psychology and social sciences to understand the relationships between variables.

Multivariate Analysis of Variance (MANOVA)


• extends ANOVA to multiple dependent variables.
• used to determine whether the means of multiple groups are equal when there are multiple response variables.

Canonical Correlation Analysis (CCA)


• examines the relationships between two sets of variables.
• helps to identify linear combinations of variables in one set that are maximally correlated with linear combinations of variables in another set.

Discriminant Analysis
• Discriminant analysis is used to differentiate between two or more groups based on a set of predictor variables.
• often used in classification problems, such as distinguishing between different species based on multiple characteristics.

Multidimensional Scaling (MDS)


• used to visualize the similarity or dissimilarity between data points in a lower-dimensional space.
• often used in fields like psychology and marketing.
Observed VS

Factor Analysis
Latent
Variables

• a statistical technique used to ID Age Income Education Health Spending Savings

uncover the underlying structure 1 45 50000 12 5 800 1000


or latent factors that influence a 2 30 35000 10 4 600 500
set of observed variables.
3 50 60000 14 5 1000 1200
• These latent factors are not 4 35 42000 12 3 700 400
directly observable but are inferred
5 40 55000 13 4 900 800
from the observed variables.
6 28 32000 9 3 500 300
• Factor analysis is commonly used
7 60 70000 16 5 1200 1500
for data reduction and to simplify
8 48 58000 14 4 1100 1300
complex data by identifying
underlying patterns. 9 55 65000 15 5 1300 1600
10 38 45000 11 3 750 600
Steps for Factor Analysis
• Data Collection
• Collect data on a set of observed variables
• These variables can be related (maybe influenced by a smaller number of unobservable latent factors)
• Factor Extraction
• Use Factor analysis to extract the underlying factors that contribute to the observed data.
• Use methods such as Principal Component Analysis (PCA) and Maximum Likelihood Estimation (MLE).
• Factor Rotation
• After extraction, factors are often rotated to make the results more interpretable.
• Common rotation methods include Varimax and Promax.
• Interpretation:
• Interpret the rotated factor loadings.
• Factor loadings represent the strength and direction of the relationship between observed variables and underlying
factors.
• Factor Scores
• You can calculate factor scores for each individual to understand their position on each factor.
Factor Analysis
Linear Combination
• Linear Combination
• a * X₁ + b * X₂
• X1 & X₂ are the variables
• a & b are weights
• For multiple variables (in Factor Analysis)
• observed variables: X₁, X₂, X₃, …
• underlying latent factors: F₁, F₂, F₃, …..
• error terms: U₁, U₂, U₃, ….
• X₁ = L₁₁ * F₁ + L₁₂ * F₂ + L₁₃ * F₃ + U₁X₂ = L₂₁ * F₁ + L₂₂ * F₂ + L₂₃ * F₃ + U₂
• Factor loadings (L)
• indicate how much each latent factor influences each observed variable.
• High loadings indicate a strong influence, while low loadings indicate a weak influence .
• Error terms (U)
• capture the variance in the observed variables that is not accounted for by the latent factors.
• They represent measurement error and any unique or idiosyncratic variability in the data.
Factor Analysis
Definition
• Set of p observations
• n individuals
• k common factors ()
• k<p
• Factor Loading matrix
• A single observation:

• : ith observation of mth individual,


• : mean of ith observation
• : loading for ith observation of the jth factor
• : value of jth factor of the mth individual
• : (i,m)th unobserved stochastic error term with mean zero and finite variance.
Factor Analysis
Definition
• In matric Notation

• Where:
• Observation Matrix:
• Loading Matrix:
• Factor Matrix:
• Error term matrix:
• Mean Matrix: where (i,m)th element is simply
• Assumptions:
• F and are independent
• E(F) = 0; E: Expectation
• Cov(F) = I
Factor Analysis
Example

• Education Assessment
• Math (X₁), Science (X₂), and English (X₃)
• suspect that these test scores are influenced by two underlying latent
factors: "Academic Ability" (F₁) and "Study Habits" (F₂).
• Factor Loading Matrix

F1 F2
X1 0.9 0.2
X2 0.8 -0.1
X3 0.7 0.6
Variance-Covariance Matrix
• The variance-covariance matrix of the X1 X2 X3
observed variables can be expressed
as a function of the factor loadings
and the unique variances. X1 15.0 3.0 4.0

• Σ = LL' + Ψ
• Where: X2 3.0 9.0 2.0
• Σ is the p x p observed variable
covariance matrix.
• L is the p x m factor loading matrix.
X3 4.0 2.0 8.0
• L' is the transpose of the factor loading
matrix.
• Ψ is a diagonal matrix of unique
variances.

You might also like