Principal Component Analysis
Principal Component Analysis
Reduce Dimensionality: When dealing with datasets that have many features, reducing
the number of dimensions simplifies the dataset without losing significant information.
Remove Multicollinearity: PCA can remove multicollinearity (high correlation between
features) by transforming correlated variables into independent components.
Data Visualization: PCA helps visualize high-dimensional data in 2D or 3D space,
making it easier to interpret patterns and relationships.
Speed Up Algorithms: By reducing the number of features, PCA can improve the speed
and performance of machine learning algorithms.
PCA finds new axes (principal components) in the data space such that:
1. The first principal component accounts for the maximum variance in the data.
2. The second principal component is orthogonal (uncorrelated) to the first and accounts for
the maximum remaining variance, and so on.
Steps in PCA:
2. Covariance Matrix Calculation: The covariance matrix shows how the features vary
with respect to each other. This is calculated as:
3. Eigenvectors and Eigenvalues Calculation: The eigenvectors represent the directions
(principal components) in which the data varies the most. The eigenvalues tell you how
much variance is explained by each principal component.
4. Sorting Eigenvectors by Eigenvalues: The eigenvectors are sorted by their
corresponding eigenvalues in descending order. The top eigenvectors form the new axes.
5. Projecting Data onto Principal Components: The original dataset is transformed by
projecting it onto the selected principal components to obtain the reduced-dimensional
dataset.
Mathematically:
Each principal component has an associated explained variance. This represents the amount of
the total dataset's variance that is captured by each component. The explained variance ratio is
used to decide how many components to retain.
Let's implement PCA using Python on a standard dataset, such as the Iris dataset.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
2. Load the Dataset We’ll use the famous Iris dataset, which contains 150 samples with 4
features: sepal length, sepal width, petal length, and petal width.
3. Standardize the Data Since PCA is sensitive to the scales of the features, we standardize
the dataset to have a mean of 0 and a variance of 1.
4. Apply PCA We will apply PCA to reduce the dataset to 2 principal components for
visualization.
# Applying PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
6. Determine the Explained Variance It’s important to check how much variance is
captured by the principal components.
Explained Variance Ratio: This shows how much variance each of the principal
components captures. For example, if the first two components explain 95% of the
variance, they are sufficient to represent the data.
PCA Plot: The scatter plot shows the data points projected onto the first two principal
components. If PCA successfully separates the classes, different class clusters will be
visible in the plot.
Advantages:
Dimensionality Reduction: Reduces the number of features while retaining most of the
information.
Improved Performance: Reduces computation time and improves algorithm
performance.
Reduces Multicollinearity: By creating new uncorrelated principal components, PCA
removes multicollinearity issues.
Limitations:
Revision
PCA is a powerful tool for dimensionality reduction, especially when dealing with high-
dimensional data. By transforming the data into principal components, PCA helps simplify
models, reduce noise, and improve computation times, all while preserving the most important
features of the dataset. However, it is important to carefully consider its limitations, especially
regarding interpretability and the assumption of linearity.
Let's go step by step through a simple numerical example to illustrate the mathematical concepts.
Step-by-Step Example:
PCA works best when the data is standardized, meaning each feature is centered around zero
(i.e., the mean is subtracted from each feature). For each feature, calculate the mean:
To find the principal components, we calculate the eigenvalues and eigenvectors of the
covariance matrix. Eigenvectors give us the directions (principal components), and eigenvalues
give us the magnitude (importance) of these components.
The larger eigenvalue (λ1=2.8) corresponds to the first principal component, which captures the
most variance. The smaller eigenvalue (λ2=0.025) corresponds to the second principal
component.
Next, we compute the eigenvectors. These vectors indicate the direction of the principal
components. Solving for the eigenvectors gives us:
Now, project the original data onto the principal components. To do this, multiply the centered
data matrix Xcentered by the eigenvectors:
We take the dot product of each row of the centered data matrix with the first eigenvector.
Now, let's project the data onto the second principal component:
This will give us the new representation of the data in the principal component space.
Step 6: Final Results (Projected Data)
The final projected data in the space defined by the two principal components is:
We can now reduce the dimensionality by keeping only the principal component(s) with the
largest eigenvalue(s). In this case, we might choose to keep just the first principal component
(corresponding to λ1) to reduce the 2D dataset to 1D while still capturing most of the variance.
In this example, PCA reduced the original 2D dataset to a 1D dataset by projecting the data onto
the first principal component. The key steps involved:
By keeping only the most important principal components, we can reduce the dimensionality of
the dataset while preserving most of its variance.
the projection onto two principal components still results in two dimensions. The idea of
dimensionality reduction with PCA is that you can choose how many dimensions (principal
components) to keep based on the amount of variance each component explains.
In this example, we kept both principal components (PC1 and PC2), which keeps the data in 2D
space. If we want to reduce the dimensionality from 2D to 1D, we can choose to keep only the
first principal component (PC1), which explains the majority of the variance in the data.
These eigenvalues represent the amount of variance explained by each principal component. The
larger the eigenvalue, the more variance that principal component captures.
The first principal component (PC1) explains much more variance (2.8) than the
second one (0.025), meaning it captures most of the important information in the data.
Reducing to 1D:
To reduce the data from 2D to 1D, we only keep PC1 (the first principal component) and ignore
PC2.
This is a 1D representation of the original 2D data, and it preserves most of the variance. By
using only the first principal component, we have successfully reduced the data from 2D to 1D.
Key Takeaway:
In PCA, you reduce the dimensionality by choosing how many principal components to
keep. In this case, keeping just PC1 reduces the 2D data to 1D while still retaining most
of the variance in the dataset.
If you keep both principal components, you stay in 2D. If you discard the second
principal component, you reduce the data to 1D.
This 1D projection captures most of the information, and you have effectively reduced the
dimensionality of the dataset.