0% found this document useful (0 votes)
17 views3 pages

03 Principal Components Analysis

Principal Components Analysis (PCA) is a statistical method for reducing dimensionality while preserving variance by transforming original variables into uncorrelated principal components. The process involves standardization, covariance matrix computation, and calculating eigenvalues and eigenvectors to identify significant features. PCA is widely used in applications such as image compression, genomics, finance, and marketing.

Uploaded by

meghanaalluri2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

03 Principal Components Analysis

Principal Components Analysis (PCA) is a statistical method for reducing dimensionality while preserving variance by transforming original variables into uncorrelated principal components. The process involves standardization, covariance matrix computation, and calculating eigenvalues and eigenvectors to identify significant features. PCA is widely used in applications such as image compression, genomics, finance, and marketing.

Uploaded by

meghanaalluri2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) is a statistical technique used for dimensionality


reduction while preserving as much variance as possible. PCA transforms the original
variables into a new set of uncorrelated variables called principal components, ordered
by the amount of original variance they explain.
Objectives of PCA
 Dimensionality Reduction: Reducing the number of variables in a dataset while
retaining the essential information.
 Feature Extraction: Identifying the most significant features that contribute to
the variance in the data.
 Data Visualization: Simplifying the visualization of high-dimensional data by
projecting it onto 2 or 3 principal components.
Mathematical Foundation of PCA
PCA involves the following steps:
 Standardization: Scaling the data so that each variable has a mean of zero and
a standard deviation of one.
 Covariance Matrix Computation: Calculating the covariance matrix to
understand the relationships between the variables.
 Eigenvalue and Eigenvector Computation: Solving the covariance matrix to
find the eigenvalues and eigenvectors.
 Principal Components Calculation: Sorting the eigenvalues and corresponding
eigenvectors. The eigenvectors with the highest eigenvalues are the principal
components.
 Transformation: Projecting the original data onto the new set of principal
components.
Key Concepts:
 Eigenvectors and Eigenvalues: Eigenvectors determine the direction of the
new feature space, while eigenvalues determine their magnitude or importance.
 Variance Explained: The proportion of the dataset's total variance that is
captured by each principal component.
 Scree Plot: A graph of the eigenvalues that helps to determine the number of
principal components to retain.
Steps in PCA
Step 1: Standardize the Data

where X is the original data, μ is the mean, and σ is the standard deviation.
Step 2: Compute the Covariance Matrix

Step 3: Compute Eigenvalues and Eigenvectors

where λ are the eigenvalues and v are the eigenvectors.


Step 4: Sort Eigenvalues and Eigenvectors Sort the eigenvalues in descending order
and sort the corresponding eigenvectors to form the principal components.
Step 5: Project the Data
Z=XW
where Z is the transformed data, X is the original data, and W is the matrix of principal
components.
5. Choosing the Number of Principal Components
 Cumulative Variance: Select the number of principal components that explain a
high percentage (e.g., 95%) of the total variance.
 Scree Plot: Look for the "elbow" point where the eigenvalues start to level off.
6. Applications of PCA
 Image Compression: Reducing the number of pixels needed to represent an
image.
 Genomics: Identifying the most significant genes from high-dimensional genetic
data.
 Finance: Reducing the number of financial indicators while maintaining the
essential information.
 Marketing: Segmenting customers based on purchasing behavior.

You might also like