0% found this document useful (0 votes)
4 views20 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique that simplifies datasets while preserving variance. It transforms original data into uncorrelated variables called principal components, which are ranked by their explained variance. Key outputs include explained variance, loadings, and scores, which help in understanding the structure of the data and visualizing patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views20 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique that simplifies datasets while preserving variance. It transforms original data into uncorrelated variables called principal components, which are ranked by their explained variance. Key outputs include explained variance, loadings, and scores, which help in understanding the structure of the data and visualizing patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Principal Component

Analysis
What is PCA?
• Principal Component Analysis (PCA) is a dimensionality reduction
technique used in data analysis and machine learning. Its main goal is
to simplify a dataset while retaining as much information (variance)
as possible.
Why use PCA?
• Datasets can have many variables (features), which can make analysis
hard.
• Some features may be correlated or redundant.
• PCA transforms the original data into a new set of uncorrelated
variables, called principal components, ordered by the amount of
variance they explain.
Steps
•Standardize the data (if so required)
.Compute the covariance matrix
•Calculate the eigenvalues and eigenvectors
Eigenvectors represent the directions (principal components)
Eigenvalues show how much variance each principal component
captures.
•Sort components by variance
Keep the top k components to explain the most variance
.

•Transform the data


Key outputs of PCA
•Explained Variance
•Principal Components (PCs)
•Loadings (component weights)
•Scores (transformed data)
1. Explained Variance /
Eigenvalues
• Each principal component explains a percentage of the total variance
in the data.
• The first PC explains the most variance, the second explains the next
most, and so on.
* Use a scree plot (variance vs. component index) to decide how many
PCs to keep (look for the “elbow”).
2. Principal Components (Axes of
New Space)
• These are new features created by PCA.
• They’re linear combinations of your original features.
• They are interpreted based on the loadings
3. Loadings (Component Matrix)
• Loadings show how much each original variable contributes to a
principal component
• Large (positive or negative) loadings = strong influence
• Small loadings ≈ negligible effect.
4. Scores (Transformed Data Points)
• These are your original observations projected into the new PC space.
Plot them (e.g. PC1 vs PC2) to see patterns:
• Clusters = similar data points.
• Outliers = unusual observations.
• Trends = natural groupings or directions of change.
Summary
PCA Element Meaning Use and Interpretation
Explained Variance How much structure each PC Select number of PCs to retain
captures
Loadings Contribution of each original Understand what PCs represent
variable
Scores Transformed observations Visualize clustering, patterns
Eigen value
In linear algebra, for a square matrix A, an eigenvalue λ and
corresponding eigenvector v satisfy the equation:
Av=λv

. v is a non-zero vector whose direction doesn’t change when


transformed by A.
λ is a scalar that stretches (or shrinks) the vector
Connecting with PCA
• In PCA:
1.You compute the covariance matrix Σ of your dataset (or sometimes
the correlation matrix).
2.Then you solve the eigenvalue equation for Σ:
Σvi​=λi​vi​
λi​: eigenvalues — how much variance is in direction vi
: eigenvectors — the principal components (new axes in PCA) ​
vi​

• V- principal direction, λ- amount


Loadings
• Suppose PC1 is a new axis created by PCA.
• The loading for variable A on PC1 is how much variable A "aligns"
with that axis.
PC1=a1⋅X1+a2⋅X2+………………..+an⋅Xn

Here, a1,a2,…,anare the loadings of variables X1,X2,…,Xn​.

* If you standardize the data, the loadings are the correlations.


Assumptions
• Normality- variables are multivariate normal
• Linear relations between variables- bivariate scatterplots
• Factorability
1. Inter-item correlations- Correlation adequacy- Bartlett’s test of
correlation adequacy (should be significant: p<0.05)
• Ho: Variables are orthogonal (not correlated)
2. Sample adequacy: Kaiser-Meyer-Olkin measure (KMO test) >0.70
(N:k=20:1) ex: 20 items: 400 cases
(EFA is done with 5:1)
KMO values
Bartlett’s test of correlation
adequacy
• Bartlett's (1951) test of sphericity tests whether a matrix (of
correlations) is significantly different from an identity matrix (filled
with 0).
• The test computes the probability that the correlation matrix has
significant correlations among at least some of the variables in a
dataset, a prerequisite for factor analysis to work.
Communalities and Eigen values
Aspect Eigenvalue Communality
Applies to A component (PC1, PC2, etc.) A variable (e.g., Height, Weight)

Tells you How much total variance is How much of a variable’s


captured by a component variance is explained by the
components
Computed from Sum of squared loadings for Sum of squared loadings of that
that component variable across components

Units Total variance units Proportion of variance in each


variable
Eigen values explained
• We can use the eigenvalues to calculate the percentage of variance
accounted for by each of the factors.
• Given that the maximum sum of the eigenvalues will always be equal
to the total number of variables in the analysis, we can calculate the
percentage of variance accounted for by dividing each eigenvalue by
the total number of variables in the analysis.
• % of variance explained = (Eigen value/No. of factors)
How to get simple structure (set up)
• Process by which the solution is made better (smaller residuals)
without changing the mathematical properties

• Orthogonal (method-varimax)- holds factors completely uncorrelated-


PCA
How to tell if my set up achieved
the simple structure
• Variable loading (>0.40 or 0.30)
• Check p-value

You might also like