0% found this document useful (0 votes)
18 views

Principal Component Analysis

PCA in Machine Larning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Principal Component Analysis

PCA in Machine Larning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a widely used unsupervised technique for


dimensionality reduction, feature extraction, and data visualization. PCA transforms the original
features of a dataset into new, uncorrelated variables called principal components. These
components capture the maximum variance in the data with the fewest dimensions, allowing you
to reduce dimensionality while retaining the essential structure of the data.

1. Why Use PCA?

 Reduce Dimensionality: When dealing with datasets that have many features, reducing
the number of dimensions simplifies the dataset without losing significant information.
 Remove Multicollinearity: PCA can remove multicollinearity (high correlation between
features) by transforming correlated variables into independent components.
 Data Visualization: PCA helps visualize high-dimensional data in 2D or 3D space,
making it easier to interpret patterns and relationships.
 Speed Up Algorithms: By reducing the number of features, PCA can improve the speed
and performance of machine learning algorithms.

2. How PCA Works

PCA finds new axes (principal components) in the data space such that:

1. The first principal component accounts for the maximum variance in the data.
2. The second principal component is orthogonal (uncorrelated) to the first and accounts for
the maximum remaining variance, and so on.

Steps in PCA:

1. Standardization of Data: PCA is sensitive to the scale of the data. Therefore, it is


important to standardize the data such that each feature has zero mean and unit variance.

2. Covariance Matrix Calculation: The covariance matrix shows how the features vary
with respect to each other. This is calculated as:
3. Eigenvectors and Eigenvalues Calculation: The eigenvectors represent the directions
(principal components) in which the data varies the most. The eigenvalues tell you how
much variance is explained by each principal component.
4. Sorting Eigenvectors by Eigenvalues: The eigenvectors are sorted by their
corresponding eigenvalues in descending order. The top eigenvectors form the new axes.
5. Projecting Data onto Principal Components: The original dataset is transformed by
projecting it onto the selected principal components to obtain the reduced-dimensional
dataset.

3. Mathematical Concepts Behind PCA

 Variance: The spread of the data along a particular dimension.


 Covariance: Measures how much two variables change together. If variables are highly
correlated, their covariance is large.
 Eigenvectors and Eigenvalues: Eigenvectors define the directions of the new feature
space (principal components), and eigenvalues represent the magnitude of variance along
these directions.

Mathematically:

 Let X be the standardized data matrix.

 The covariance matrix of X is


 The eigenvectors and their corresponding eigenvalues

are obtained by solving

4. Explained Variance and Choosing the Number of Components

Each principal component has an associated explained variance. This represents the amount of
the total dataset's variance that is captured by each component. The explained variance ratio is
used to decide how many components to retain.

 Explained Variance Ratio: The proportion of variance explained by each component.


 Cumulative Explained Variance: The sum of the explained variances up to the nnn-th
principal component. It helps decide how many components are necessary to retain a
desired amount of variance.

5. Practical Implementation of PCA with Python

Let's implement PCA using Python on a standard dataset, such as the Iris dataset.

Step-by-Step PCA Implementation

1. Import Necessary Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

2. Load the Dataset We’ll use the famous Iris dataset, which contains 150 samples with 4
features: sepal length, sepal width, petal length, and petal width.

# Load the Iris dataset


iris = load_iris()
X = iris.data # Features
y = iris.target # Target (Class labels)

3. Standardize the Data Since PCA is sensitive to the scales of the features, we standardize
the dataset to have a mean of 0 and a variance of 1.

# Standardizing the features


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

4. Apply PCA We will apply PCA to reduce the dataset to 2 principal components for
visualization.

# Applying PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Explained variance ratio (how much variance is explained by each component)


print(pca.explained_variance_ratio_)
5. Visualize the Results We can now plot the data projected onto the first two principal
components to visualize how well PCA separates the classes.

# Plotting the PCA result


plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k',
s=100)
plt.title('PCA of Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar()
plt.show()

6. Determine the Explained Variance It’s important to check how much variance is
captured by the principal components.

# Cumulative explained variance


cumulative_variance = np.cumsum(pca.explained_variance_ratio_)
print("Cumulative explained variance:", cumulative_variance)

# Plot cumulative explained variance


plt.figure(figsize=(6, 4))
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance,
marker='o', linestyle='--')
plt.title('Explained Variance by PCA Components')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance')
plt.grid(True)
plt.show()

6. Interpreting the Results

 Explained Variance Ratio: This shows how much variance each of the principal
components captures. For example, if the first two components explain 95% of the
variance, they are sufficient to represent the data.
 PCA Plot: The scatter plot shows the data points projected onto the first two principal
components. If PCA successfully separates the classes, different class clusters will be
visible in the plot.

7. Real-World Applications of PCA

1. Image Compression: In high-dimensional image datasets, PCA is used to reduce the


number of pixels (features) without significant loss of image quality.
2. Noise Reduction: PCA is used to denoise datasets by removing components with low
variance, which are likely to represent noise rather than meaningful data.
3. Gene Expression Analysis: In bioinformatics, PCA is applied to gene expression data to
reduce thousands of genes into a smaller number of principal components that capture the
most important variations.
4. Finance: PCA is used in finance to reduce the dimensionality of datasets containing
many correlated financial instruments, such as stock prices.

8. Advantages and Limitations of PCA

Advantages:

 Dimensionality Reduction: Reduces the number of features while retaining most of the
information.
 Improved Performance: Reduces computation time and improves algorithm
performance.
 Reduces Multicollinearity: By creating new uncorrelated principal components, PCA
removes multicollinearity issues.

Limitations:

 Loss of Interpretability: Principal components are linear combinations of the original


features, which may not have direct physical or intuitive interpretations.
 Linear Assumption: PCA only captures linear relationships between features. Nonlinear
relationships are not handled by PCA.
 Sensitive to Scaling: PCA is affected by the scale of the features, so standardization is
crucial.

Revision

PCA is a powerful tool for dimensionality reduction, especially when dealing with high-
dimensional data. By transforming the data into principal components, PCA helps simplify
models, reduce noise, and improve computation times, all while preserving the most important
features of the dataset. However, it is important to carefully consider its limitations, especially
regarding interpretability and the assumption of linearity.

Mathematical Workout with example


Principal Component Analysis (PCA) is a popular dimensionality reduction technique used to
simplify datasets by reducing the number of features while preserving as much variability
(information) as possible. Mathematically, PCA transforms the data into a new coordinate
system where the axes (called principal components) correspond to the directions of maximum
variance in the data.
Key Mathematical Concepts Behind PCA:

1. Variance: PCA seeks to maximize the variance in the data.


2. Covariance Matrix: This measures the pairwise variance between features.
3. Eigenvalues and Eigenvectors: These are computed from the covariance matrix and
represent the magnitude and direction of the principal components, respectively.
4. Projection: Data is projected onto the new axes (principal components).

Let's go step by step through a simple numerical example to illustrate the mathematical concepts.

Step-by-Step Example:

Step 1: Data Preparation

Let's consider a simple 2D dataset with two features, :

 Here, each row is a data point, and each column is a feature.

Step 2: Standardize the Data

PCA works best when the data is standardized, meaning each feature is centered around zero
(i.e., the mean is subtracted from each feature). For each feature, calculate the mean:

mean of and mean of

Now, subtract the mean from each feature:

Step 3: Compute the Covariance Matrix


Next, compute the covariance matrix, which captures the relationships between the features.

Calculating this, the covariance matrix is:

Step 4: Compute Eigenvalues and Eigenvectors

To find the principal components, we calculate the eigenvalues and eigenvectors of the
covariance matrix. Eigenvectors give us the directions (principal components), and eigenvalues
give us the magnitude (importance) of these components.

The covariance matrix is:

The characteristic equation simplifies to:

Solving this quadratic equation gives the eigenvalues:

The larger eigenvalue (λ1=2.8) corresponds to the first principal component, which captures the
most variance. The smaller eigenvalue (λ2=0.025) corresponds to the second principal
component.
Next, we compute the eigenvectors. These vectors indicate the direction of the principal
components. Solving for the eigenvectors gives us:

Step 5: Project Data onto Principal Components

Now, project the original data onto the principal components. To do this, multiply the centered
data matrix Xcentered by the eigenvectors:

Projection onto First Principal Component (Eigenvector 1):

We take the dot product of each row of the centered data matrix with the first eigenvector.

Thus, the projections onto the first principal component are:


Projection onto Second Principal Component (Eigenvector 2):

Now, let's project the data onto the second principal component:

Thus, the projections onto the second principal component are:

This will give us the new representation of the data in the principal component space.
Step 6: Final Results (Projected Data)

The final projected data in the space defined by the two principal components is:

Step 6: Reduce Dimensionality

We can now reduce the dimensionality by keeping only the principal component(s) with the
largest eigenvalue(s). In this case, we might choose to keep just the first principal component
(corresponding to λ1) to reduce the 2D dataset to 1D while still capturing most of the variance.

Revisitng the results

In this example, PCA reduced the original 2D dataset to a 1D dataset by projecting the data onto
the first principal component. The key steps involved:

1. Centering the data.


2. Computing the covariance matrix.
3. Finding the eigenvalues and eigenvectors.
4. Projecting the data onto the principal components.

By keeping only the most important principal components, we can reduce the dimensionality of
the dataset while preserving most of its variance.

the projection onto two principal components still results in two dimensions. The idea of
dimensionality reduction with PCA is that you can choose how many dimensions (principal
components) to keep based on the amount of variance each component explains.

In this example, we kept both principal components (PC1 and PC2), which keeps the data in 2D
space. If we want to reduce the dimensionality from 2D to 1D, we can choose to keep only the
first principal component (PC1), which explains the majority of the variance in the data.

Variance Explained by Each Principal Component:

From the previous steps, we calculated the eigenvalues:


 Eigenvalue 1 (λ1) = 2.8
 Eigenvalue 2 (λ2) = 0.025

These eigenvalues represent the amount of variance explained by each principal component. The
larger the eigenvalue, the more variance that principal component captures.

 The first principal component (PC1) explains much more variance (2.8) than the
second one (0.025), meaning it captures most of the important information in the data.

Reducing to 1D:

To reduce the data from 2D to 1D, we only keep PC1 (the first principal component) and ignore
PC2.

The projection onto the first principal component was:

This is a 1D representation of the original 2D data, and it preserves most of the variance. By
using only the first principal component, we have successfully reduced the data from 2D to 1D.

Key Takeaway:

 In PCA, you reduce the dimensionality by choosing how many principal components to
keep. In this case, keeping just PC1 reduces the 2D data to 1D while still retaining most
of the variance in the dataset.
 If you keep both principal components, you stay in 2D. If you discard the second
principal component, you reduce the data to 1D.

This 1D projection captures most of the information, and you have effectively reduced the
dimensionality of the dataset.

You might also like