Module 5.2 Principal Component Analysis - V1
Module 5.2 Principal Component Analysis - V1
Data has 4 predictors and number of data instances in 5(small number considered for demonstration
purpose). To represent each data instance 4 dimensional space is required. Is it possible to present
the same data in lesser dimensional space without loss of information?(lesser than 4)
2. Normalize the data by subtracting the mean of each of the data dimensions and dividing by
standard deviation of each dimension (x-xmean )\ xstd dev
Normalized data
3. Find co-variance matrix of normalized data matrix:
4 Get the eigen values and eigen vectors of the co-variance matrix:
5 Determine the number of eigen vectors to be retained based on explained variance and
transform data in terms of retained eigen vectors which results in dimensionality reduction.
If only first two eigen vectors are used, around 80% variance in the data is retained and we get data
dimensionality reduction from 4 to 2(50% lower dimension). In practice, number of dimensions
retained contain 95% variance in data.
To train a machine learning algorithm, the normalized train and test data is transformed with 2
eigen vectors(considered here) as: