0% found this document useful (0 votes)
9 views26 pages

RES805-RM-Module 2

This document provides an overview of Principal Component and Factor Analysis, detailing its purpose, assumptions, and types, including Exploratory and Confirmatory Factor Analysis. It explains the process of dimensionality reduction using Principal Component Analysis (PCA), including steps such as data standardization, covariance matrix calculation, and eigenvalue/eigenvector determination. The document also includes examples and mathematical formulations to illustrate the concepts discussed.

Uploaded by

Apurva S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views26 pages

RES805-RM-Module 2

This document provides an overview of Principal Component and Factor Analysis, detailing its purpose, assumptions, and types, including Exploratory and Confirmatory Factor Analysis. It explains the process of dimensionality reduction using Principal Component Analysis (PCA), including steps such as data standardization, covariance matrix calculation, and eigenvalue/eigenvector determination. The document also includes examples and mathematical formulations to illustrate the concepts discussed.

Uploaded by

Apurva S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

PRESIDENCY UNIVERSITY

SCHOOL OF ENGINEERING

MODULE - 2
Principal Component and Factor Analysis
PRESENTED BY,

ANURAJ N. V
20233MAT0011

1
Overview
• Introduction
• Latent variable
• Assumption of Factor Analysis
• Purpose of Factor Analysis
• Types of Factor Analysis
• Principal Component Analysis (PCA)

2
Introduction
• Factor Analysis is a technique used to reduce a large number of variables
into fewer numbers of factors. It is a way to condense the data in many
variables into just few variables
• Factor Analysis is also called Dimension Reduction
• It is a example of latent variable model
FACTOR

Variable 1 Variable 2 Variable 3 Variable 4 Variable 5

3
Example:

FOOD
SERVICE
QUALITY

Waiting Cleanliness Staff Food Food


Taste
time Behavior Freshness temperature
of food

4
Latent Variables
• Latent Variables are variables that are not directly observed but are inferred
from other variables.
• Mathematical models that aim to explain observed variables in terms of
latent variables are called latent variable models.
• Example:

Quality of life Business Happiness


confidence

5
Assumptions of Factor Analysis
• There are no outliers in the data
• Sample size is supposed to be greater than the factor
• Variables must be interrelated
• Metric variable are expected
• Multivariate normality not required

6
Purpose of Factor Analysis
• Data reduction
• Lament variable discovery
• Simplification of items into subsets of concepts
• Assess dimensionality

7
Types of Factor Analysis
• Exploratory Factor Analysis (EFA): Used to discover underlying structure
• Principal Component Analysis
• Common factor analysis
• Image Factoring
• Maximum likelihood analysis
• Alpha Factoring and Weight Square
• Confirmatory Factor Analysis (CFA): Used to test if the data fit a priori
exaptation for data stature. Uses structural equations modeling

8
Principal Component Analysis
• PCA is a Dimensionality reduction technique or data reduction technique
• PCA is used in exploratory data analysis and for making predictive models
How to reduce the dimensions?
• It is done by projecting each data points onto only the first few principal
components to obtain lower-dimentional data while preserving as much of
data’s variation as possible.
• Principal components are eigenvalues of the data’s covariance matrix

9
Steps for dimensionality reduction using PCA
• Step1: Standardize the data
• Step2: Compute the covariance matrix
• Step 3: Calculate the and Eigenvectors and Eigenvalues
• Step 4: Sort Eigenvalues in descending order and compute the principal
component
• Step 5: Reduce the dimensions of the data

10
1. Data Standardization
• The Data standardization is the process of converting data to a common
format to process and analyze data better.
• Example: Height Weight
(m) (pounds
• Let us consider the 1000 samples of peoples height and weight. )
• Weight feature is dominate over height feature 0.5 2

• Weight values are bigger as compared to height values … …


2.5 200
• To prevent this kind of problem transforming the features to comparable
scale using standardization is solution.

11
How to Standardize the Data ?
• The Data standardization is calculating a z-score or standard score.
It is given by 𝑍 =
𝑋−𝑋
𝜎

• Example: Ten samples of two features and ,


Chart Title
3.5

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5

12
How to Standardize the Data ?
• The Data standardization is calculating a z-score or standard score.
It is given by 𝑍 =
𝑋−𝑋
𝜎

• Example: Ten samples of two features and ,


Standard Data
1.5

0.5

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.5

-1

-1.5

-2

13
2. Covariance Matrix
• Covariance is always measured between two variables or features.
𝑛

∑ ( 𝑋 1 − 𝑋 1 ) ( 𝑋 2 − 𝑋 2)
𝑖=1
𝑐𝑜𝑣 ( 𝑋 1 , 𝑋 2 ) =
𝑛 −1

• It measure only the directional relationship between two variables not the
strength of the relationship between them.

14
Example
• Given 10 samples of two features and as:

Scatter plot
3.5

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5

15
Example
• Given 10 samples of two features and as:

2.5 2.4 0.69 0.49 0.3381 𝑛

∑ ( 𝑋 1 − 𝑋 1)( 𝑋 2 − 𝑋 2 )
0.5 0.7 -1.31 -1.21 1.5851 𝐶𝑜𝑣 ( 𝑋 1 , 𝑋 2 ) = 𝑖 =1
𝑛 −1

2.2 2.9 0.39 0.99 0.3861 5.539


𝐶𝑜𝑣 ( 𝑋 1 , 𝑋 2 ) = =0.615444444
10 − 1
1.9 2.2 0.09 0.29 0.0261
𝐶𝑜𝑣 ( 𝑋 1 , 𝑋 2 ) =𝐶𝑜𝑣 ( 𝑋 2 , 𝑋 1) =0.615444
3.1 3 1.29 1.09 1.4061
2.3 2.7 0.49 0.79 0.3871
2 1.6 0.19 -0.31 -0.0589
1 1.1 -0.81 -0.81 0.6561
1.5 1.6 -0.31 -0.31 0.0961
1.1 0.9 -0.71 -1.01 0.7171

16
Example
• For 2-features the covariance matrix is as follows:

• For 3-features the covariance matrix is as follows:

17
Example
• Given 10 samples of two features and as:

2.5 2.4 0.69 0.49 0.3381 𝑛

∑ ( 𝑋 1 − 𝑋 1 )2
0.5 0.7 -1.31 -1.21 1.5851
𝑖 =1
𝐶𝑜𝑣 ( 𝑋 1 , 𝑋 1 )= = 0.616556
𝑛 −1

2.2 2.9 0.39 0.99 0.3861 𝑛

∑ ( 𝑋 2 − 𝑋 2 )2
𝐶𝑜𝑣 ( 𝑋 2 , 𝑋 2 ) = 𝑖 =1
1.9 2.2 0.09 0.29 0.0261 𝑛− 1
=0. 7 16 5 556

3.1 3 1.29 1.09 1.4061


2.3 2.7 0.49 0.79 0.3871 𝐶=
[ 0.616556
0.615444
0.615444
0.71655 6 ]
2 1.6 0.19 -0.31 -0.0589
1 1.1 -0.81 -0.81 0.6561
1.5 1.6 -0.31 -0.31 0.0961
1.1 0.9 -0.71 -1.01 0.7171

18
3. Eigenvalues and Eigenvectors
• Eigenvalues and Eigenvectors are the linear algebra concepts that are
required in the determination of the principal components from the
covariance matrix.
• Eigenvectors help in finding the new transformation where there is
maximum variance

• Eigenvectors, are those vectors which keep the direction same when
multiplied by a matrix
• Eigenvalues, are the scalers of the respective eigenvectors.

19
Calculate the Eigenvalues
• From the covariance matrix first find eigenvalues:

20
Calculate the Eigenvectors
• Put in the following and solve for and .

• So, eigenvectors are:


• Put in the following and solve for and .

• So, eigenvectors are:

21
Obtain Principal Components
• After sorting the eigenvalues in descending order
• Feature vector, has been formulated.

• Eigenvectors corresponding to highest eigenvalues will be the first principal


component in the direction of maximum variance.

22
Principal Components
• Feature vector,

Variable

% of total
variance

• The first principal component (PC1) is the first column corresponding to the
highest eigenvalues i.e.

23
Transforming and reducing the dimensions of the original data set
• Original data Standard data Transformed Data

0.879 0.58 1.021249


-1.67 -1.4 -2.18176
0.497 1.17 1.196353
0.115 0.34 * 0.329514
1.643 1.29 2.060298
0.624 0.93 1.109042
0.242 -0.4 -0.10511
-1.03 -1 -1.40272
-0.39 -0.4 -0.53684
-0.9 -1.2 -1.49003

24
Reconstructing the original data set
Transformed Data Reconstructed, Reconstructed X

1.021249 0.692 0.751


-2.18176 -1.48 -1.6
1.196353 ∗ [ 0.6779 0.7352 ] =¿ 0.811 0.879 ∗ 𝜎 + 𝑋 =¿
0.329514 0.223 0.242
2.060298 1.397 1.514
1.109042 0.752 0.815
-0.10511 -0.07 -0.08
-1.40272 -0.95 -1.03
-0.53684 -0.36 -0.39
-1.49003 -1.01 -1.1

25
26

You might also like