Principal Components Analysis (PCA) is a statistical method for reducing dimensionality while preserving variance by transforming original variables into uncorrelated principal components. The process involves standardization, covariance matrix computation, and calculating eigenvalues and eigenvectors to identify significant features. PCA is widely used in applications such as image compression, genomics, finance, and marketing.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
17 views3 pages
03 Principal Components Analysis
Principal Components Analysis (PCA) is a statistical method for reducing dimensionality while preserving variance by transforming original variables into uncorrelated principal components. The process involves standardization, covariance matrix computation, and calculating eigenvalues and eigenvectors to identify significant features. PCA is widely used in applications such as image compression, genomics, finance, and marketing.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3
Principal Components Analysis (PCA)
Principal Components Analysis (PCA) is a statistical technique used for dimensionality
reduction while preserving as much variance as possible. PCA transforms the original variables into a new set of uncorrelated variables called principal components, ordered by the amount of original variance they explain. Objectives of PCA Dimensionality Reduction: Reducing the number of variables in a dataset while retaining the essential information. Feature Extraction: Identifying the most significant features that contribute to the variance in the data. Data Visualization: Simplifying the visualization of high-dimensional data by projecting it onto 2 or 3 principal components. Mathematical Foundation of PCA PCA involves the following steps: Standardization: Scaling the data so that each variable has a mean of zero and a standard deviation of one. Covariance Matrix Computation: Calculating the covariance matrix to understand the relationships between the variables. Eigenvalue and Eigenvector Computation: Solving the covariance matrix to find the eigenvalues and eigenvectors. Principal Components Calculation: Sorting the eigenvalues and corresponding eigenvectors. The eigenvectors with the highest eigenvalues are the principal components. Transformation: Projecting the original data onto the new set of principal components. Key Concepts: Eigenvectors and Eigenvalues: Eigenvectors determine the direction of the new feature space, while eigenvalues determine their magnitude or importance. Variance Explained: The proportion of the dataset's total variance that is captured by each principal component. Scree Plot: A graph of the eigenvalues that helps to determine the number of principal components to retain. Steps in PCA Step 1: Standardize the Data
where X is the original data, μ is the mean, and σ is the standard deviation. Step 2: Compute the Covariance Matrix
Step 3: Compute Eigenvalues and Eigenvectors
where λ are the eigenvalues and v are the eigenvectors.
Step 4: Sort Eigenvalues and Eigenvectors Sort the eigenvalues in descending order and sort the corresponding eigenvectors to form the principal components. Step 5: Project the Data Z=XW where Z is the transformed data, X is the original data, and W is the matrix of principal components. 5. Choosing the Number of Principal Components Cumulative Variance: Select the number of principal components that explain a high percentage (e.g., 95%) of the total variance. Scree Plot: Look for the "elbow" point where the eigenvalues start to level off. 6. Applications of PCA Image Compression: Reducing the number of pixels needed to represent an image. Genomics: Identifying the most significant genes from high-dimensional genetic data. Finance: Reducing the number of financial indicators while maintaining the essential information. Marketing: Segmenting customers based on purchasing behavior.