0% found this document useful (0 votes)
71 views3 pages

Lab Assignment 7: Nishiv Singh (B20MT029) Google Colab Notebooks Link: Task 1

This document summarizes two lab assignments on principal component analysis (PCA). In Task 1, PCA was applied to reduce the dimensionality of the MNIST handwritten digits dataset from 784 to 149 components while retaining 95% of the information. Task 2 developed a PCA class and applied it to a larger dataset of handwritten digits, visualizing how the reconstructed images sharpened as more components were included, leveling off around 300 components. Residual images and error metrics showed that 300 components captured over 90% of the data.

Uploaded by

Nishiv Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views3 pages

Lab Assignment 7: Nishiv Singh (B20MT029) Google Colab Notebooks Link: Task 1

This document summarizes two lab assignments on principal component analysis (PCA). In Task 1, PCA was applied to reduce the dimensionality of the MNIST handwritten digits dataset from 784 to 149 components while retaining 95% of the information. Task 2 developed a PCA class and applied it to a larger dataset of handwritten digits, visualizing how the reconstructed images sharpened as more components were included, leveling off around 300 components. Residual images and error metrics showed that 300 components captured over 90% of the data.

Uploaded by

Nishiv Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Report

Lab Assignment 7
Nishiv Singh (B20MT029)

Google Colab notebooks link :

Task 1 :
https://colab.research.google.com/drive/1ApyHCr8cURrm1ZaTC1t5rBj
CbRoR8_gV?usp=sharing

Task 2
:https://colab.research.google.com/drive/1Yf2VtE2CWPtW3wN-3jKJtW
5J8LOymDTv?usp=sharing

Task 1 :-
> Downloaded the mnist dataset because not getting the actual data on
internet at that time , so downloaded in csv format and then loaded.
> The data is given in the form of pixels and each row is depicting a single
image , the no. of columns are 784 = 28*28 , so the image can be
visualized by reshaping the image to 28*28.
> Visualized the first image of the data which is digit 7 and also saved it .
> Using sklearn library implemented PCA on the dataset and plotted a
graph of number of components vs Explained variance , and we can clearly
see that around 150 components are giving us 95% of information about
the data.So this implies that if we take around 150 components only in our
dataset the image formed will be highly informative and thus reduced the
dimensionality of the data.
> Also calculated the number of principal components using variance method
and got 149 components which is quite close to the approximated value from the
graph.

Result and Observations:

We can conclude that PCA is a powerful tool for dimensionality reduction at the
same time keeping the data informative. Here the data is not big , but in task 2 I
got the original data which has the dimension of 70000*784 , in these cases the
data is large and working on such type of data is time consuming so applying
PCA here reduces the columns from 784 to 149 which is very helpful for further
analysis.

Task 2:-
> Created a class for PCA implementation which contains fit_transform method
to apply pca , inverse_transform to get back the data in original form and
dimension, and some other functions for getting useful information like feature
matrix etc.

> Loaded the dataset from sklearn library having dimension of 70000*784 ,
contains columns of pixels which contain information about image .

> For further analysis taken the small data of dimension 5000*784.

> Calculated the covariance matrix of the data and eigenvectors and eigenvalues
of the data.

> Printed the covariance matrix and first 5 eigenvalues.

> Applying the PCA on the data from different number of principal components
ranging from 10 to 700 .

> Visualized the first and second image after applying the PCA and the original
ones and can clearly see that the image becomes sharp as we increase the
number of components but around 300 components we can see a image which is
quite understandable , so we can say that around 90-95 % data is in around
300-350 components.

> Visualized the residual images and here we can see that for no. of components
10 we see the residual image matches with original image closely so we can say
that 10 principal components are missing a lot of information and whereas for
300 principal components the residual image is very degraded and at 700 we can
barely understand the residual image.

> Also calculated RMSE of residual images and got around 6.67 for 300 principal
components which is quite low in comparison to previous components and for
700 components the rmse is in the order of 10^-11 which means taking 700
components results in 99.9 % of information of the data.

Result and Observations:-

After applying the PCA on data , we can conclude that in many datasets only a
specific number of features contribute to most of the final analysis , so through
we can find that components and also this will help in dimensionality reduction.

You might also like