0% found this document useful (0 votes)
10 views

Principal Component Analysis For Data Science

Dimensionality reduction techniques like principal component analysis (PCA) can reduce the dimensionality of large datasets by transforming the data into a new coordinate system. The document demonstrates applying PCA to reduce the dimensions of economic trade data from multiple countries over years. PCA identifies the principal components that capture the most variance in the dataset and allows projecting the data onto a new 2D space for easier visualization and analysis.

Uploaded by

shivaybhargava33
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Principal Component Analysis For Data Science

Dimensionality reduction techniques like principal component analysis (PCA) can reduce the dimensionality of large datasets by transforming the data into a new coordinate system. The document demonstrates applying PCA to reduce the dimensions of economic trade data from multiple countries over years. PCA identifies the principal components that capture the most variance in the dataset and allows projecting the data onto a new 2D space for easier visualization and analysis.

Uploaded by

shivaybhargava33
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Dimensionality Reduction

Maximum Data is clustered in one area


Eigen Value: New Dimension Values (Magnitude, Variance)
Eigen vector: Magnitude of information (Slice)
Multiple Slice

PCA - Principle Component Analysis


Dimensionality Reduction (Patient Data)

Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os

Load Data Set


os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')
df = pd.read_csv('trans_us.csv', index_col = 0, thousands = ',')
df

Update Row and Column Headings


df.index.names = ['Country']
df.columns.names = ['Years']
df
Check for Null Values
df.isna().sum()

Create Model
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(df)

Create PCA with two variables


PCA1 = pca.transform(df)
pd.DataFrame(PCA1)

Convert output to Data Frame


PCA2= pd.DataFrame(PCA1)
PCA2.index = df.index
PCA2.columns = ['PC1','PC2']
PCA2.head(50)

Display the variance Percentage


pd.DataFrame(pca.explained_variance_ratio_)

PCA with n -components = None


from sklearn.decomposition import PCA
pca = PCA()
pca.fit(df)

Transform and display data


PCA1 = pca.transform(df)
pd.DataFrame(PCA1)

You might also like