0% found this document useful (0 votes)
30 views5 pages

Mla 3

This report explores applying K-Means clustering to the MNIST dataset of handwritten digits after performing dimensionality reduction via PCA. The code loads the MNIST data, performs PCA to reduce dimensions to 2, standardizes features, runs K-Means clustering with 3 clusters, and visualizes the results by plotting the clusters and outliers. The analysis demonstrates clustering techniques can uncover inherent structures in high-dimensional data and reveals distinct digit clusters in the MNIST data with robust handling of outliers.

Uploaded by

Renat Zhamilov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views5 pages

Mla 3

This report explores applying K-Means clustering to the MNIST dataset of handwritten digits after performing dimensionality reduction via PCA. The code loads the MNIST data, performs PCA to reduce dimensions to 2, standardizes features, runs K-Means clustering with 3 clusters, and visualizes the results by plotting the clusters and outliers. The analysis demonstrates clustering techniques can uncover inherent structures in high-dimensional data and reveals distinct digit clusters in the MNIST data with robust handling of outliers.

Uploaded by

Renat Zhamilov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

ASSIGNMENT

REPORT
HOMEWORK 3
Machine Learning Algorithms
(work name)

Student’s names: Zhamilov Renat


Group: CS-2201
Period: 14.02.2024 – 20.02.2024
Date: 19.02.2024
Supervisor: Aigul B. Mimenbayeva

Astana IT University, 2024


1. Introduction:
This report explores the application of K-Means clustering to the
MNIST dataset, aiming to identify patterns within handwritten digit
images. By leveraging Principal Component Analysis (PCA) for
dimensionality reduction, the code partitions the dataset into clusters
and visualizes the clustering results. Through this analysis, we aim to
demonstrate the effectiveness of clustering techniques in uncovering
inherent structures within high-dimensional datasets.
2. Procedure
This code performs clustering on the MNIST dataset using K-Means
algorithm after reducing the dimensionality of the data using Principal
Component Analysis (PCA). Let's break it down step by step:

Import Libraries: The necessary libraries are imported including


numpy for numerical operations, matplotlib for plotting, PCA, KMeans,
and StandardScaler from scikit-learn, and the MNIST dataset from
TensorFlow.

Load MNIST Dataset: The MNIST dataset is loaded using


mnist.load_data() function from TensorFlow Keras. This dataset consists
of 28x28 grayscale images of handwritten digits (0 through 9) and their
corresponding labels.

Display Sample Images: Some sample images from the dataset are
displayed using matplotlib for visualization purposes.

Prepare Data for Clustering:

- The training images are reshaped into a 2D array where each row
represents a flattened image (28x28 = 784 pixels).

- The pixel values are normalized to the range [0, 1] by dividing by


255.0.
Principal Component Analysis (PCA):

- PCA is applied to reduce the dimensionality of the dataset to 2


dimensions (n_components=2).

- PCA helps in visualizing high-dimensional data in a lower-dimensional


space while preserving the variance as much as possible.

Standardize Features:

- The features obtained from PCA are standardized using StandardScaler


to have zero mean and unit variance.

Perform K-Means Clustering:

- K-Means clustering is applied to the standardized features.


- The number of clusters is set to 3 (n_clusters=3).

Visualize Clusters:

- Scatter plot is created where each point represents a data point in the
reduced 2D space.

- Points are colored based on their assigned cluster label.

- Noisy points (outliers) are marked separately.

- The title of the plot includes the number of clusters and the number of
noisy points.

- Print Cluster Information:

- The number of clusters and the number of noisy points (outliers) are
printed.

3. Code
4. Conclusion:
Clustering algorithms like K-Means offer valuable insights into the
MNIST dataset, aiding in digit recognition and classification tasks. PCA
effectively reduces dimensionality for visualization. Our analysis reveals
distinct digit clusters and robust handling of outliers. Future research
could explore parameter tuning and alternative algorithms for improved
performance.

This study highlights clustering's role in understanding complex


datasets, facilitating data-driven applications in various domains.
Link:
https://colab.research.google.com/drive/1pwVt3uDCkKCdKUfi4xE_RgbY
wOkplBoa?usp=sharing

You might also like