0% found this document useful (0 votes)
336 views3 pages

Seminar 10

This document provides code snippets and explanations for clustering handwritten digit data using k-means clustering. The code first loads handwritten digit data and finds k-means clusters. It then interprets the 10 cluster centers as prototype digits and plots them. Accuracy is calculated by assigning each datapoint to the most common label of its cluster. A confusion matrix plots the accuracy of this clustering-based digit labeling.

Uploaded by

Nishad Ahamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views3 pages

Seminar 10

This document provides code snippets and explanations for clustering handwritten digit data using k-means clustering. The code first loads handwritten digit data and finds k-means clusters. It then interprets the 10 cluster centers as prototype digits and plots them. Accuracy is calculated by assigning each datapoint to the most common label of its cluster. A confusion matrix plots the accuracy of this clustering-based digit labeling.

Uploaded by

Nishad Ahamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

UWL SCE S2

L-6 Databases and Analytics (CP60056E)

Seminar 10

This relates to Lecture 9

Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps.
The following is a simple demonstration of using WSS to decide and plot the clusters
based on k-means clusters algorithm.

%% Import the necessary packages


%
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import KMeans

%% Generate 6 artificial clusters for illustration purpose


%% Hint: you may need to use make_blobs and scatter functions: check the Python
%% official resources for more information of their usages
%
Insert your code block here

%% Implement the WSS method and check through the number of clusters from 1
%% to 12, and plot the figure of WSS vs. number of clusters.
%% Hint: reference the plots in the lecture slides;
%% You may need to use inertia_ from property WCSS, and kmeans function
%
wcss = []
for i in range(1, 12):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10,
random_state=0)
Insert your code block here

%% Categorize the data using the optimum number of clusters (6)


%% we determined in the last step. Plot the fitting results
%% Hint: you may need to call fit_predict from kmeans; scatter
%
kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=300, n_init=10,
random_state=0)
Insert your code block here
plt.scatter(X[:,0], X[:,1])
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,
c='red')
plt.show()
1
UWL SCE S2

Exercise 2 For the following code blocks and plots, run the code first; then provide your
interpretation/explanation for the required parts.

k-means on digits
We will attempt to use k-means to try to identify similar digits without using the original
label information; this might be similar to a first step in extracting meaning from a new
dataset about which you don't have any a priori label information.
We will start by loading the digits and then finding the k-Means clusters. The digits
consist of 1,797 samples with 64 features, where each of the 64 features is the
brightness of one pixel in an 8×8 image.

import seaborn as sns; sns.set() # for plot styling


from sklearn.datasets import load_digits
digits = load_digits()
digits.data.shape

## Provide your interpretation/explanation for the following block


#
kmeans = KMeans(n_clusters=10, random_state=0)
clusters = kmeans.fit_predict(digits.data)
kmeans.cluster_centers_.shape

## Provide your interpretation/explanation for the following block


#
fig, ax = plt.subplots(2, 5, figsize=(8, 3))
centers = kmeans.cluster_centers_.reshape(10, 8, 8)
for axi, center in zip(ax.flat, centers):
axi.set(xticks=[], yticks=[])
axi.imshow(center, interpolation='nearest', cmap=plt.cm.binary)

from scipy.stats import mode


labels = np.zeros_like(clusters)
for i in range(10):
mask = (clusters == i)
labels[mask] = mode(digits.target[mask])[0]

from sklearn.metrics import accuracy_score


accuracy_score(digits.target, labels)

## Provide your interpretation/explanation for the following block

2
UWL SCE S2

#
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(digits.target, labels)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
xticklabels=digits.target_names,
yticklabels=digits.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label');

You might also like