Principal Component Analysis
Principal Component Analysis
Principal Component Analysis (PCA) is a statistical technique used to simplify the complexity in high-
dimensional data while retaining trends and patterns. It does so by transforming the data into a new
coordinate system where the greatest variances by any projection of the data come to lie on the first
coordinates (called principal components), the second greatest variances on the second coordinates,
and so on.
1. Standardize the Data: If variables are measured on different scales, PCA standardizes the data
to have zero mean and unit variance.
Where Z the standardized data matrix and “n” is the number of observations
Eigenvectors and Eigenvalues: Compute the eigenvectors and eigenvalues of the
covariance matrix.
Transform Data: Project the original data onto the new coordinate system
defined by the principal components.
Y = ZV
Code Implementation Of PCA
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Assuming your dataset has a column named 'SOC'
# Standardize the data (excluding the 'SOC' column)
features = final_df_concat
scaler = StandardScaler()
scaled_data = scaler.fit_transform(features)
# Apply PCA
pca = PCA(n_components=3) # Reducing to 3 components for 3D
visualization
principal_components = pca.fit_transform(scaled_data)