Feature Engineering

PCA

Uploaded by

Vinod Krishna

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Feature Engineering

PCA

Uploaded by

Vinod Krishna

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

PCA (Principal Component Analysis) is used when the data is

multi-variate and numeric.

PCA is a technique for reducing the dimensionality of

numerical data while retaining as much variability as possible.
It transforms the original variables into a new set of
uncorrelated variables (principal components) that are ordered
by the amount of variance they explain in the data.

It is not typically used with categorical data or ordinal data

directly, as it relies on numerical values to compute variances
and covariances.

Performing PCA (Principal Component Analysis) is as

follows:

 Standardize the data: This step ensures that each feature

contributes equally to the analysis, especially if the
features are on different scales.

 Generate the covariance matrix / correlation matrix for

all the dimensions: After standardizing, compute the
covariance or correlation matrix to understand the
relationships between the variables.

 Perform eigen decomposition: Compute the eigenvalues

and eigenvectors of the covariance or correlation matrix.
 Sort the eigen pairs in descending order of eigenvalues
and select the ones with the largest values: Rank the
eigenvectors by their corresponding eigenvalues and
select the top components that capture the most variance.

Eigenvectors (principal components) indicate directions of

maximum variance, while eigenvalues reflect the amount of
variance along those directions. The reduction in
dimensionality involves losing less significant information,
which is associated with smaller eigenvalues.

A scree plot is a graphical tool used in Principal Component

Analysis (PCA) to help determine the number of principal
components to retain. It visualizes the eigenvalues associated
with each principal component and helps to identify the
"elbow" point, which indicates the optimal number of
components to keep for analysis.

Factor Analysis
Factor analysis is a statistical technique used to identify
underlying relationships between variables by grouping them
into factors. These factors are latent variables that explain the
patterns of correlations observed in the data.

Factor analysis is often employed to simplify complex

datasets, reduce dimensionality, and identify the underlying
structure.

Assumption for factor analysis

 Linearity of Correlations
 Normality
 Absence of Multicollinearity Factor analysis assumes
that the variables are not too highly collinear because
Highly correlated variables can complicate the extraction
of distinct factors
 Homoscedasticity: The variance should be roughly equal
across variables

Singular Value Decomposition (SVD) is a fundamental matrix

factorization technique in linear algebra used for various
applications in data analysis, machine learning, and signal
processing. SVD decomposes a matrix into three other
matrices, providing insights into the structure and properties
of the original matrix.
Applications
 Dimensionality Reduction: In techniques like Principal
Component Analysis (PCA), SVD is used to reduce the
dimensionality of data while preserving as much
variance as possible.
 Data Compression: In image compression (e.g., JPEG),
SVD helps compress data by retaining only the most
significant singular values.
 Noise Reduction: SVD can be used to filter out noise by
reconstructing the matrix with only the largest singular
values.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a

powerful dimensionality reduction technique used for
visualizing high-dimensional data in a lower-dimensional
space, typically 2D or 3D. It is particularly useful for
exploring and understanding complex datasets, revealing
patterns, clusters, and relationships that may not be evident in
higher dimensions.

t-SNE is primarily designed for continuous numerical data.

For categorical or mixed-type data, pre-processing and
encoding are required, which might affect the quality of the
results.

Crowding Problem: In lower dimensions, t-SNE may struggle

with the crowding problem, where the data points become too
crowded and the representation may not capture global
structures well.

Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Pca and t-SNE Dimensionality Reduction
No ratings yet
Pca and t-SNE Dimensionality Reduction
3 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
STAT502
No ratings yet
STAT502
13 pages
Unit-3
No ratings yet
Unit-3
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
No ratings yet
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
12 pages
Summary PCA by Atta Mohammad 26040
No ratings yet
Summary PCA by Atta Mohammad 26040
2 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
lec3
No ratings yet
lec3
60 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Dimensional Reduction in R
No ratings yet
Dimensional Reduction in R
24 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
6 Dimension Reduction Theory
No ratings yet
6 Dimension Reduction Theory
18 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
29 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Unit 3
No ratings yet
Unit 3
31 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
3
No ratings yet
3
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
D3S2 _ Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 _ Unsupervised - Dimensionality Reduction
81 pages
Introduction To Principal Components and Factoranalysis
No ratings yet
Introduction To Principal Components and Factoranalysis
29 pages
AtchleyOct19 PDF
No ratings yet
AtchleyOct19 PDF
29 pages
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
No ratings yet
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
49 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
PCA 1 Geladi Comprehensive Chemometrics 2020
No ratings yet
PCA 1 Geladi Comprehensive Chemometrics 2020
21 pages
Dimensionality reduction
No ratings yet
Dimensionality reduction
7 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Factor Analysis and Principal Components: by A. Subrahmanyam
No ratings yet
Factor Analysis and Principal Components: by A. Subrahmanyam
14 pages
Sparse Principal Component Analysis
No ratings yet
Sparse Principal Component Analysis
23 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Factor Analysis
No ratings yet
Factor Analysis
26 pages
Principal Component Analysis I To
100% (1)
Principal Component Analysis I To
298 pages
Principal component analysis
No ratings yet
Principal component analysis
15 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Module 3
No ratings yet
Module 3
41 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Question 4
No ratings yet
Question 4
3 pages
Assignment and Project Tableau
No ratings yet
Assignment and Project Tableau
1 page
Report Project
No ratings yet
Report Project
4 pages
Boarding Pass
No ratings yet
Boarding Pass
2 pages
Lec 1
No ratings yet
Lec 1
30 pages
Unit 01 - Live Session PPT 2
No ratings yet
Unit 01 - Live Session PPT 2
21 pages
CS822-DataMining-Week3
No ratings yet
CS822-DataMining-Week3
91 pages
5. Dimensionality Reduction
No ratings yet
5. Dimensionality Reduction
47 pages
1152cs191 Data Visualization Unit III
No ratings yet
1152cs191 Data Visualization Unit III
59 pages
Dimension Reduction Algo
No ratings yet
Dimension Reduction Algo
6 pages
2020 @medphyslib Ganesh Naik Biomedical Signal Processing, Advances
No ratings yet
2020 @medphyslib Ganesh Naik Biomedical Signal Processing, Advances
432 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Chap 6 Embedding
No ratings yet
Chap 6 Embedding
44 pages
A Review of Intelligent Airfoil Aerodynamic Optimization Methods Based On Data-Driven Advanced Models (For Aerodynamic Shape Optimization) (2023)
No ratings yet
A Review of Intelligent Airfoil Aerodynamic Optimization Methods Based On Data-Driven Advanced Models (For Aerodynamic Shape Optimization) (2023)
21 pages
ML VN Unit1 1
No ratings yet
ML VN Unit1 1
27 pages
Machine Learning and Big Data Analytics in Power Distribution Systems
No ratings yet
Machine Learning and Big Data Analytics in Power Distribution Systems
54 pages
Brain-Based Computer Interfaces in Virtual Reality
No ratings yet
Brain-Based Computer Interfaces in Virtual Reality
6 pages
The Impact of Digitalization
No ratings yet
The Impact of Digitalization
24 pages
A Survey On Data Mining Techniques For COVID Prediction
100% (2)
A Survey On Data Mining Techniques For COVID Prediction
6 pages
Unit 3 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Machine Learning - WWW - Rgpvnotes.in
29 pages
Trimap: Large-Scale Dimensionality Reduction Using Triplets: Ehsan Amid & Manfred K. Warmuth Google Research, Brain Team
No ratings yet
Trimap: Large-Scale Dimensionality Reduction Using Triplets: Ehsan Amid & Manfred K. Warmuth Google Research, Brain Team
18 pages
Presentation 33360 Content Document 20250319044717PM
No ratings yet
Presentation 33360 Content Document 20250319044717PM
126 pages
Unit 2 - Speech and Video Processing (SVP) - 1
No ratings yet
Unit 2 - Speech and Video Processing (SVP) - 1
23 pages
Local - PCA - It Is Against PCA
No ratings yet
Local - PCA - It Is Against PCA
24 pages
Electronics: Network Slicing For Beyond 5G Systems: An Overview of The Smart Port Use Case
No ratings yet
Electronics: Network Slicing For Beyond 5G Systems: An Overview of The Smart Port Use Case
17 pages
Papers in Quantitative Finance March 2024 1712238549
No ratings yet
Papers in Quantitative Finance March 2024 1712238549
27 pages
FYP Technical Paper Hafiz
No ratings yet
FYP Technical Paper Hafiz
6 pages
CLASS 10 AI Chapter 2
No ratings yet
CLASS 10 AI Chapter 2
18 pages
Monalisa Thesis
No ratings yet
Monalisa Thesis
35 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
1) Introduction To Numpy, Pandas and Matplotlib
No ratings yet
1) Introduction To Numpy, Pandas and Matplotlib
11 pages
ML IMP QUES 1
No ratings yet
ML IMP QUES 1
22 pages
9780521190176
No ratings yet
9780521190176
344 pages
Dip Ii-Unit
No ratings yet
Dip Ii-Unit
7 pages
Kennel Brown Abarbanel PhysRevA.45.3403
No ratings yet
Kennel Brown Abarbanel PhysRevA.45.3403
9 pages
3.4 Lda
No ratings yet
3.4 Lda
12 pages