0% found this document useful (0 votes)

8 views3 pages

Lab6 Instruction

This lab focuses on using K-Means clustering to analyze the MNIST dataset, guiding students through data loading, normalization, reshaping, and clustering. Students will compare clustering accuracy with different numbers of clusters and visualize the cluster centers. Homework and competition questions encourage reflection on model performance and data preprocessing steps.

Uploaded by

dave1304963270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

Lab6 Instruction

Uploaded by

dave1304963270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

BU.330.

775 Machine Learning: Design and Deployment

Lab 6. Image Clustering using K-Means

Learning Goal: practice using unsupervised machine learning model to cluster image data

Background: We will use the MNIST dataset for this lab. Please refer to Lab 3 instructions for
information about the MNIST dataset.

a. Import the required packages.

from keras.datasets import mnist
from sklearn.cluster import MiniBatchKMeans
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import numpy as np

b. First let’s load the MNIST dataset and check the size of the dataset, namely the number of
training images, number of testing images, size of each image, and the minimun and
maximum values of training data.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape)
print(x_test.shape)
print(x_train.min())
print(x_train.max())

c. Then we will plot 9 sample images from the dataset.

plt.gray() # B/W Images
plt.figure(figsize = (10,9)) # Adjusting figure size
# Displaying a grid of 3x3 images
for i in range(9):
plt.subplot(3,3,i+1)
plt.imshow(x_train[i])

d. We convert the data to float type, and normalize the vectors from 0-255 to range 0-1 for
computation efficiency. We will check the minimum and maximum values again after
normalization.
# Conversion to float
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Normalization
x_train = x_train/255.0
x_test = x_test/255.0
# Checking the minimum and maximum values of x_train
print(x_train.min())
print(x_train.max())
e. The original input data is 3 dimensions: (60000, 28, 28) for training data, and (10000, 28, 28)
for testing data. We need to convert it to 2 dimensional format for K-means clustering
algorithm. After reshaping the data, the dimensions for training data will be (60000, 784) and
for testing data (10000, 784), since 28x28=784.
# Reshaping input data
X_train = x_train.reshape(len(x_train),-1)
X_test = x_test.reshape(len(x_test),-1)
# Checking the shape
print(X_train.shape)
print(X_test.shape)

f. Now we are ready to apply the K-means. First, we will define a help function to map cluster
labels to the most frequent class labels (from y_train) in that cluster. Then we will initialize
the K-means model with 10 clusters, and we use minibatch version of K-Means.
def retrieve_info(cluster_labels,y_train):
# Initializing
reference_labels = {}
# For loop to run through each label of cluster label
for i in range(len(np.unique(kmeans.labels_))):
index = np.where(cluster_labels == i,1,0)
num = np.bincount(y_train[index==1]).argmax()
reference_labels[i] = num
return reference_labels

total_clusters = len(np.unique(y_train))
# Initialize the K-Means model
kmeans = MiniBatchKMeans(n_clusters = total_clusters)
# Fitting the model to training set
kmeans.fit(X_train)

g. After that, we can retrieve the labels and let’s compare the first 20 labels, that is, comparing
our K-means prediction with the true label.
reference_labels = retrieve_info(kmeans.labels_,y_train)
number_labels = np.random.rand(len(kmeans.labels_))
for i in range(len(kmeans.labels_)):
number_labels[i] = reference_labels[kmeans.labels_[i]]

# Comparing Predicted values and Actual values

print(number_labels[:20].astype('int'))
print(y_train[:20])

h. We can calculate the overall accuracy score.

# Calculating accuracy score
print(accuracy_score(number_labels,y_train))

i. Now let’s increase the number of clusters (the k value) to 50, and check whether the accuracy
improves.
# Increase to 50 clusters, and fit the model
kmeans = MiniBatchKMeans(n_clusters = 50)
kmeans.fit(X_train)

# Calculating the reference_labels

reference_labels = retrieve_info(kmeans.labels_,y_train)
# ‘number_labels’ is a list which denotes the number displayed in image
number_labels = np.random.rand(len(kmeans.labels_))
for i in range(len(kmeans.labels_)):
number_labels[i] = reference_labels[kmeans.labels_[i]]
print('Accuracy score : {}'.format(accuracy_score(number_labels,y_train)))
print('\n')

j. Finally, we can visualize the cluster centers to get a better idea about the algorithm.
# Cluster centroids is stored in ‘centroids’
centroids = kmeans.cluster_centers_
centroids.shape
centroids = centroids.reshape(50,28,28)
centroids = centroids * 255
plt.figure(figsize = (10,10))
bottom = 0.35
for i in range(50):
plt.subplots_adjust(bottom)
plt.subplot(5,10,i+1)
plt.title('Num:{}'.format(reference_labels[i]),fontsize = 10)
plt.imshow(centroids[i])

Homework Question 1 (1pt): Compare the accuracy of 10 clusters vs that of 50 clusters, which
one is better?
Homework Question 2 (1pt): Inspect the centroids in step j, discuss why increasing the number
of clusters in this case has a positive/negative impact on the model performance.
Homework Question 3 (1pt): Comment on the performance of K-means in MNIST image
clustering. What insight(s) can we draw?
Competition Question 1 (2pt): Describe your steps including data preprocessing and modeling
approaches.
Competition Question 2 (2pt): Evaluate your model performance compared to the baseline
model.

Submission: Complete and submit on Canvas by the beginning of Class 7. Use

homework6_yourname.ipynb, and Competition_yourname.ipynb, respectively, as the file names.

Reference:
https://medium.com/@joel_34096/k-means-clustering-for-image-classification-a648f28bdc47

SUMERA - Kmeans Clustering - Jupyter Notebook
No ratings yet
SUMERA - Kmeans Clustering - Jupyter Notebook
7 pages
09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
Machine Learning K Means - Unsupervised
No ratings yet
Machine Learning K Means - Unsupervised
5 pages
K Means
100% (2)
K Means
329 pages
Image Segmentation in Python - Practical Hands-On
No ratings yet
Image Segmentation in Python - Practical Hands-On
24 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
Aam Codes
No ratings yet
Aam Codes
8 pages
Scikit Learn Tutorial PDF
100% (2)
Scikit Learn Tutorial PDF
151 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
Experiment No 7
No ratings yet
Experiment No 7
4 pages
AI With Python - Unsupervised Learning - Clustering
No ratings yet
AI With Python - Unsupervised Learning - Clustering
12 pages
DL Unit-V
No ratings yet
DL Unit-V
23 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Lab 1 1.2
No ratings yet
Lab 1 1.2
4 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Paper Presentation On Artificial Intelligence 1
33% (3)
Paper Presentation On Artificial Intelligence 1
14 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
Assignment 6 ML
No ratings yet
Assignment 6 ML
4 pages
AAM 7th Prac
No ratings yet
AAM 7th Prac
4 pages
ML 7
No ratings yet
ML 7
2 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
Assignment 6
No ratings yet
Assignment 6
4 pages
2.3 Aiml Rishit
No ratings yet
2.3 Aiml Rishit
7 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
DMDW Lab8
No ratings yet
DMDW Lab8
3 pages
Lab-7 Clustering
No ratings yet
Lab-7 Clustering
4 pages
Lab Report 4
No ratings yet
Lab Report 4
6 pages
Clustering
No ratings yet
Clustering
1 page
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
7 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
KMEANS
No ratings yet
KMEANS
9 pages
MLT 8 KK
No ratings yet
MLT 8 KK
2 pages
AML Clustering
No ratings yet
AML Clustering
7 pages
Fashion MNIST-6
No ratings yet
Fashion MNIST-6
10 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Aml - Lab (1-6)
No ratings yet
Aml - Lab (1-6)
15 pages
K-Means Clustering Report
No ratings yet
K-Means Clustering Report
2 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
AdaBoost New PDF
No ratings yet
AdaBoost New PDF
45 pages
Full Download Deep Learning With JavaScript: Neural Networks in TensorFlow - Js 1st Edition Shanqing Cai PDF
100% (5)
Full Download Deep Learning With JavaScript: Neural Networks in TensorFlow - Js 1st Edition Shanqing Cai PDF
55 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
Aam Unit 4 QB With Answer
No ratings yet
Aam Unit 4 QB With Answer
11 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Lab 9
No ratings yet
Lab 9
3 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
10.lab Activity
No ratings yet
10.lab Activity
11 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
2022 Bnext
No ratings yet
2022 Bnext
16 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Artificial Intelligence Lab 10
No ratings yet
Artificial Intelligence Lab 10
8 pages
Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
A Comprehensive Survey of Scene Graph Generation
No ratings yet
A Comprehensive Survey of Scene Graph Generation
2 pages
Bone Suplement Market Segmentation
No ratings yet
Bone Suplement Market Segmentation
20 pages
MCS-224 Dec 2023
No ratings yet
MCS-224 Dec 2023
6 pages
List of Mini Projects
No ratings yet
List of Mini Projects
1 page
Lectures1 2
No ratings yet
Lectures1 2
28 pages
Accepted Manuscript: 10.1016/j.patcog.2018.01.035
No ratings yet
Accepted Manuscript: 10.1016/j.patcog.2018.01.035
46 pages
Ci QB
No ratings yet
Ci QB
13 pages
Pert19 - Learning From Examples II
No ratings yet
Pert19 - Learning From Examples II
29 pages
Survey On Large Language Models
No ratings yet
Survey On Large Language Models
52 pages
Iccvg Facial Expression
No ratings yet
Iccvg Facial Expression
8 pages
Literature Review On Feature Selection Methods For High-Dimensional Data
No ratings yet
Literature Review On Feature Selection Methods For High-Dimensional Data
10 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
12.batch Normalization
No ratings yet
12.batch Normalization
12 pages
CSCE689 DRL Project Report
No ratings yet
CSCE689 DRL Project Report
7 pages
Midterm 2021 - Model Answer1
No ratings yet
Midterm 2021 - Model Answer1
4 pages
Yash MP DL
No ratings yet
Yash MP DL
15 pages
Assignment 1 Ai
No ratings yet
Assignment 1 Ai
4 pages
ML Lesson Plan (2021-22)
No ratings yet
ML Lesson Plan (2021-22)
2 pages
Quiz 2A Memo
No ratings yet
Quiz 2A Memo
4 pages
18ai61-Model Question Paper Solutions
No ratings yet
18ai61-Model Question Paper Solutions
71 pages
ZinaGhottis CV
No ratings yet
ZinaGhottis CV
1 page
babyAC AI Predicts Baby's Face 2
No ratings yet
babyAC AI Predicts Baby's Face 2
1 page
IAT-III Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning (AI - ML) Jan-2022-Dr.P.Kavitha and Mr.G.Radha Krishnan
No ratings yet
IAT-III Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning (AI - ML) Jan-2022-Dr.P.Kavitha and Mr.G.Radha Krishnan
6 pages
Qbank ML
No ratings yet
Qbank ML
6 pages
CNN Image Classification Report
No ratings yet
CNN Image Classification Report
2 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Lab6 Instruction

Uploaded by

Lab6 Instruction

Uploaded by

BU.330.

775 Machine Learning: Design and Deployment

a. Import the required packages.

c. Then we will plot 9 sample images from the dataset.

# Comparing Predicted values and Actual values

h. We can calculate the overall accuracy score.

# Calculating the reference_labels

Submission: Complete and submit on Canvas by the beginning of Class 7. Use

You might also like