0% found this document useful (0 votes)
19 views14 pages

Mat 211 - 7

Principal Component Analysis (PCA) was developed by Karl Pearson and Harold Hotelling to identify patterns in high-dimensional data and is widely used in various fields such as psychology and machine learning. PCA reduces dimensionality while retaining important information, enhances data visualization, and improves computational efficiency, but it assumes linear relationships and is sensitive to outliers. Future directions include exploring non-linear techniques and deep learning for feature extraction.

Uploaded by

oobasi721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views14 pages

Mat 211 - 7

Principal Component Analysis (PCA) was developed by Karl Pearson and Harold Hotelling to identify patterns in high-dimensional data and is widely used in various fields such as psychology and machine learning. PCA reduces dimensionality while retaining important information, enhances data visualization, and improves computational efficiency, but it assumes linear relationships and is sensitive to outliers. Future directions include exploring non-linear techniques and deep learning for feature extraction.

Uploaded by

oobasi721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Historical Context of PCA

• PCA was introduced by Karl Pearson in 1901 as a technique for identifying


patterns in high-dimensional data.
• Harold Hotelling further developed PCA in 1933 for multivariate analysis.
• Widely applied in psychology, economics, and machine learning for feature
reduction.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 2 / 15
Real-World Applications of PCA

• Image compression and recognition (e.g., facial recognition).


• Financial data analysis for reducing asset correlations.
• Genomics for identifying key gene expressions.
• Anomaly detection in cybersecurity systems.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 3 / 15
Standardizing Data for PCA

• PCA is sensitive to the scales of the input features.


• Standardization involves transforming data to have mean 0 and variance 1:

x−µ
z=
σ
• Essential step to ensure all features contribute equally.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 4 / 15
Example: Computing Covariance Matrix Manually

• Consider two variables X and Y :

X = [1, 2, 3, 4, 5], Y = [2, 4, 6, 8, 10]

• Compute means:
X̄ = 3, Ȳ = 6
• Covariance:
n
1 X
Cov(X, Y ) = (Xi − X̄)(Yi − Ȳ ) = 5
n−1
i=1

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 5 / 15
Python Code: Visualizing PCA Components

import numpy as np
import matplotlib . pyplot as plt
from sklearn . datasets import load_iris
from sklearn . decomposition import PCA

# Load dataset
i r i s = load_iris ()
X = i r i s . data

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca . fit_transform (X)

# Visualize PCA results


plt . scatter (X_pca[ : , 0] , X_pca[ : , 1] , c=i r i s . target )
plt . xlabel ( ’ First Principal Component’ )
plt . ylabel ( ’Second Principal Component’ )
plt . t i t l e ( ’PCA on I r i s Dataset ’ )
plt . colorbar ()
plt .show()

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 6 / 15
Interpreting Principal Components

• Each principal component is a linear combination of the original features.


• The weight (loading) of each feature indicates its contribution to the
component.
• High loadings suggest a strong influence on that principal component.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 7 / 15
Visualizing Feature Contributions

• Loadings help identify which features contribute most to the variance.


• Visual representation using bar plots for each principal component.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 8 / 15
Application: Image Compression using PCA

• PCA can reduce image size while retaining key features.


• Each image is treated as a high-dimensional matrix.
• PCA reduces the dimensionality, storing only the most significant components.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 9 / 15
Hands-On Activity: PCA on Image Data

• Use PCA to compress and reconstruct an image.


• Compare the original image with the PCA-reduced version.
• Analyze the quality retention with different numbers of components.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 10 / 15
Python Code: PCA Image Compression

from sklearn . decomposition import PCA


from sklearn . datasets import load_digits
import matplotlib . pyplot as plt

# Load digits dataset


digits = load_digits ()
X = digits . data

# Apply PCA
pca = PCA(64)
X_reduced = pca . fit_transform (X)
X_restored = pca . inverse_transform (X_reduced)

# Visualize original and reconstructed images


plt . subplot (1 , 2, 1)
plt .imshow( digits . images[0] , cmap=’gray ’ )
plt . t i t l e ( ’Original Image’ )

plt . subplot (1 , 2, 2)
plt .imshow(X_restored [0]. reshape(8 , 8) , cmap=’gray ’ )
plt . t i t l e ( ’Restored Image’ )
plt .show()
MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 11 / 15
Benefits of PCA

• Reduces dimensionality while retaining the most important information.


• Enhances data visualization.
• Improves computational efficiency.
• Helps in noise reduction and feature extraction.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 12 / 15
Challenges with PCA

• PCA assumes linear relationships between variables.


• Sensitive to outliers and noise in the data.
• May lose interpretability of the transformed features.
• Requires feature scaling before application.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 13 / 15
Future Directions in Dimensionality Reduction

• Non-linear dimensionality reduction techniques:


• t-Distributed Stochastic Neighbor Embedding (t-SNE)
• Uniform Manifold Approximation and Projection (UMAP)
• Deep learning approaches for feature extraction.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 14 / 15
Conclusion

• PCA is a powerful tool for simplifying high-dimensional data.


• Understanding eigenvalues, eigenvectors, and SVD is essential.
• Hands-on implementation enhances learning and practical understanding.

MAT 211: Lecture-07: Principal Component Analysis (PCA) Ashiribo Senapon WUSU (Ph.D.) 15 / 15

You might also like