0% found this document useful (0 votes)
17 views4 pages

Feature Exploration PCA MNIST

This document outlines an experiment using Principal Component Analysis (PCA) on the MNIST dataset to reduce dimensionality and visualize handwritten digit images. The goal is to extract meaningful features, visualize them in a 2D space, and understand the variance captured by principal components. The results indicate effective feature extraction, with clusters of similar digits observed in the scatter plot.

Uploaded by

editorvar4444
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Feature Exploration PCA MNIST

This document outlines an experiment using Principal Component Analysis (PCA) on the MNIST dataset to reduce dimensionality and visualize handwritten digit images. The goal is to extract meaningful features, visualize them in a 2D space, and understand the variance captured by principal components. The results indicate effective feature extraction, with clusters of similar digits observed in the scatter plot.

Uploaded by

editorvar4444
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Feature Exploration using PCA on MNIST Dataset

Objective:

The objective of this experiment is to:

- Perform feature exploration using Principal Component Analysis (PCA) on the MNIST dataset.

- Reduce the dimensionality of handwritten digit images while preserving essential features.

- Visualize the reduced features to understand patterns and clusters in the data.

Application Domain:

Feature exploration using PCA is widely used in:

- Computer Vision: Reducing image dimensions for efficient classification and clustering.

- Data Preprocessing: Improving model performance by reducing noise and redundancy.

- Finance: Analyzing trends and patterns in stock market data.

- Healthcare: Identifying clusters in medical images or patient data.

Target:

The target of this lab is to:

- Extract meaningful features from the MNIST dataset using PCA.

- Visualize the extracted features in a 2D space.

- Understand the variance captured by principal components.

Dataset:

- Dataset Used: MNIST Handwritten Digits Dataset

- Description: MNIST consists of 70,000 grayscale images of handwritten digits (0-9). Each image is

of size 28x28 pixels.

- Classes: 10 classes (Digits 0 to 9).


- Source: The dataset is available in TensorFlow/Keras and can be directly loaded using the Keras

datasets module.

Dataset Loading Code:

from tensorflow.keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.

x_test = x_test.astype('float32') / 255.

x_train_flat = x_train.reshape((x_train.shape[0], -1))

x_test_flat = x_test.reshape((x_test.shape[0], -1))

Description:

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms

high-dimensional data into a lower-dimensional space while preserving the maximum variance. In

this experiment, we will:

- Flatten the 2D images into 1D vectors.

- Apply PCA to reduce dimensions to 2 principal components.

- Visualize the reduced features in a 2D scatter plot to observe clusters and patterns.

Implementation:

1. Import Libraries:

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.datasets import mnist

from sklearn.decomposition import PCA

2. Load and Preprocess Dataset:


(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.

x_test = x_test.astype('float32') / 255.

x_train_flat = x_train.reshape((x_train.shape[0], -1))

x_test_flat = x_test.reshape((x_test.shape[0], -1))

3. Apply PCA for Feature Extraction:

pca = PCA(n_components=2)

x_test_pca = pca.fit_transform(x_test_flat)

print("Explained variance ratio:", pca.explained_variance_ratio_)

4. Visualize Reduced Features:

plt.scatter(x_test_pca[:, 0], x_test_pca[:, 1], c=y_test, cmap='viridis', s=5)

plt.title('Feature Visualization using PCA on MNIST')

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.show()

Output (Complete):

- Explained Variance Ratio: Displays the percentage of variance retained by the two principal

components.

Example Output:

Explained variance ratio: [0.097, 0.084]

- Feature Visualization: The scatter plot shows clusters of digits in the reduced feature space.

Similar digits (e.g., 0s and 6s) are grouped together, demonstrating effective feature extraction.
Conclusion:

- PCA effectively reduces the dimensionality of the MNIST dataset while preserving essential

features.

- The clusters observed in the 2D scatter plot indicate that digits with similar shapes are grouped

together, showing the capability of PCA in feature exploration.

- This experiment demonstrates the power of PCA for unsupervised feature extraction and

visualization.

Future Enhancements:

- Increase the number of principal components to capture more variance.

- Experiment with other dimensionality reduction techniques like t-SNE or UMAP for better

visualization.

- Apply this method to other image datasets such as CIFAR-10 or Fashion MNIST.

You might also like