RES805-RM-Module 2
RES805-RM-Module 2
SCHOOL OF ENGINEERING
MODULE - 2
Principal Component and Factor Analysis
PRESENTED BY,
ANURAJ N. V
20233MAT0011
1
Overview
• Introduction
• Latent variable
• Assumption of Factor Analysis
• Purpose of Factor Analysis
• Types of Factor Analysis
• Principal Component Analysis (PCA)
2
Introduction
• Factor Analysis is a technique used to reduce a large number of variables
into fewer numbers of factors. It is a way to condense the data in many
variables into just few variables
• Factor Analysis is also called Dimension Reduction
• It is a example of latent variable model
FACTOR
3
Example:
FOOD
SERVICE
QUALITY
4
Latent Variables
• Latent Variables are variables that are not directly observed but are inferred
from other variables.
• Mathematical models that aim to explain observed variables in terms of
latent variables are called latent variable models.
• Example:
5
Assumptions of Factor Analysis
• There are no outliers in the data
• Sample size is supposed to be greater than the factor
• Variables must be interrelated
• Metric variable are expected
• Multivariate normality not required
6
Purpose of Factor Analysis
• Data reduction
• Lament variable discovery
• Simplification of items into subsets of concepts
• Assess dimensionality
7
Types of Factor Analysis
• Exploratory Factor Analysis (EFA): Used to discover underlying structure
• Principal Component Analysis
• Common factor analysis
• Image Factoring
• Maximum likelihood analysis
• Alpha Factoring and Weight Square
• Confirmatory Factor Analysis (CFA): Used to test if the data fit a priori
exaptation for data stature. Uses structural equations modeling
8
Principal Component Analysis
• PCA is a Dimensionality reduction technique or data reduction technique
• PCA is used in exploratory data analysis and for making predictive models
How to reduce the dimensions?
• It is done by projecting each data points onto only the first few principal
components to obtain lower-dimentional data while preserving as much of
data’s variation as possible.
• Principal components are eigenvalues of the data’s covariance matrix
9
Steps for dimensionality reduction using PCA
• Step1: Standardize the data
• Step2: Compute the covariance matrix
• Step 3: Calculate the and Eigenvectors and Eigenvalues
• Step 4: Sort Eigenvalues in descending order and compute the principal
component
• Step 5: Reduce the dimensions of the data
10
1. Data Standardization
• The Data standardization is the process of converting data to a common
format to process and analyze data better.
• Example: Height Weight
(m) (pounds
• Let us consider the 1000 samples of peoples height and weight. )
• Weight feature is dominate over height feature 0.5 2
11
How to Standardize the Data ?
• The Data standardization is calculating a z-score or standard score.
It is given by 𝑍 =
𝑋−𝑋
𝜎
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5
12
How to Standardize the Data ?
• The Data standardization is calculating a z-score or standard score.
It is given by 𝑍 =
𝑋−𝑋
𝜎
0.5
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.5
-1
-1.5
-2
13
2. Covariance Matrix
• Covariance is always measured between two variables or features.
𝑛
∑ ( 𝑋 1 − 𝑋 1 ) ( 𝑋 2 − 𝑋 2)
𝑖=1
𝑐𝑜𝑣 ( 𝑋 1 , 𝑋 2 ) =
𝑛 −1
• It measure only the directional relationship between two variables not the
strength of the relationship between them.
14
Example
• Given 10 samples of two features and as:
Scatter plot
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5
15
Example
• Given 10 samples of two features and as:
∑ ( 𝑋 1 − 𝑋 1)( 𝑋 2 − 𝑋 2 )
0.5 0.7 -1.31 -1.21 1.5851 𝐶𝑜𝑣 ( 𝑋 1 , 𝑋 2 ) = 𝑖 =1
𝑛 −1
16
Example
• For 2-features the covariance matrix is as follows:
17
Example
• Given 10 samples of two features and as:
∑ ( 𝑋 1 − 𝑋 1 )2
0.5 0.7 -1.31 -1.21 1.5851
𝑖 =1
𝐶𝑜𝑣 ( 𝑋 1 , 𝑋 1 )= = 0.616556
𝑛 −1
∑ ( 𝑋 2 − 𝑋 2 )2
𝐶𝑜𝑣 ( 𝑋 2 , 𝑋 2 ) = 𝑖 =1
1.9 2.2 0.09 0.29 0.0261 𝑛− 1
=0. 7 16 5 556
18
3. Eigenvalues and Eigenvectors
• Eigenvalues and Eigenvectors are the linear algebra concepts that are
required in the determination of the principal components from the
covariance matrix.
• Eigenvectors help in finding the new transformation where there is
maximum variance
• Eigenvectors, are those vectors which keep the direction same when
multiplied by a matrix
• Eigenvalues, are the scalers of the respective eigenvectors.
19
Calculate the Eigenvalues
• From the covariance matrix first find eigenvalues:
20
Calculate the Eigenvectors
• Put in the following and solve for and .
21
Obtain Principal Components
• After sorting the eigenvalues in descending order
• Feature vector, has been formulated.
22
Principal Components
• Feature vector,
Variable
% of total
variance
• The first principal component (PC1) is the first column corresponding to the
highest eigenvalues i.e.
23
Transforming and reducing the dimensions of the original data set
• Original data Standard data Transformed Data
24
Reconstructing the original data set
Transformed Data Reconstructed, Reconstructed X
25
26