0% found this document useful (0 votes)
78 views3 pages

PGM 7

This document applies the k-means clustering algorithm and expectation-maximization (EM) algorithm to cluster iris flower data. It compares the results of k-means and EM clustering by evaluating the accuracy and confusion matrices. Both algorithms are able to reasonably cluster the iris data into three groups corresponding to the three iris species, with EM achieving slightly better accuracy than k-means.

Uploaded by

badeni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views3 pages

PGM 7

This document applies the k-means clustering algorithm and expectation-maximization (EM) algorithm to cluster iris flower data. It compares the results of k-means and EM clustering by evaluating the accuracy and confusion matrices. Both algorithms are able to reasonably cluster the iris data into three groups corresponding to the three iris species, with EM achieving slightly better accuracy than k-means.

Uploaded by

badeni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

12/26/21, 4:21 PM PGM7-EM-K-MEANS.

ipynb - Colaboratory

#Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
#for clustering using k-Means algorithm. Compare the results of these two algorithms and
#comment on the quality of clustering. You can add Java/Python ML library classes/API in
#the program.

import matplotlib.pyplot as plt


from sklearn import datasets
from sklearn.cluster import
KMeans import sklearn.metrics as
sm import pandas as pd
import numpy as np

11 = [0,1,2]
def rename(s):
12 []
for i in s:
if i not in 12:
12.append(i)

for i in range(len(s)):
pos = 12.index(s[i])
s[i] = 11[pos]

return s

# import some data to play with


iris = datasets.load_iris()

print(”\n IRIS FEATURES :\n“,iris.feature_names)


print(”\n IRIS TARGET :\n“,iris.target)
print(”\n IRIS TARGET NAMES:\n“,iris.target_names)

# Store the inputs as a Pandas Dataframe and set the column names
X = pd.DataFrame(iris.data)

#print(X)
X.columns ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']

#print(X.columns) #print(”X:”,x)
#print(“Y:“,y)
y = pd.DataFrame(iris.target)
y.columns = ['Targets']

# Set the size of the plot


plt.figure(figsize=(14,7))

# Create a colormap
https://coIab.research.googIe.com/drive/1N7XpAG0S_bJ_Ny8yfYtPincehs2nqBi9#printMode=true 1/5
12/26/21,4:21PM PGM7-EM-KMEANS.ipynb-CoIaboratory
colormap = np . array( [ ' red' , ' lime ' , ' black ' ] )

# Plot Sepal
plt.subplot(1,2,1)
plt.scatter(X.Sepal_Length,X.Sepal_Width, c=colormap[y.Targets], s=40)
plt.title('Sepal')

plt.subplot(1,2,2)
plt.scatter(X.Petal_Length,X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Petal')
plt.show()

print(”Actual Target is:\n“, iris.target)

# K Means Cluster
model = KMeans(n_clusters=3)
model.fit(X)

# Set the size of the plot


plt.figure(figsize=(14,7))

# Create a colormap
colormap = np.array(['red', 'lime', 'black'])

# Plot the Original Classifications


plt.subplot(1,2,1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Classification')

# Plot the Models Classifications


plt.subplot(1,2,2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K Mean Classification')
plt.show()

km = rename(model.labels_)
print(”\nWhat KMeans thought: \n“, km)
print(”Accuracy of KMeans is “,sm.accuracy_score(y, km))
print(”Confusion Matrix for KMeans is \n”,sm.confusion_matrix(y, km))

#The GaussianMixture scikit-learn class can be used to model this problem


#and estimate the parameters of the distributions using the expectation-maximization algorith

from sklearn import preprocessing


scaler =
preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)
print(”\n”,xs.sample(5))

from sklearn.mixture import GaussianMixture


emm = Gaus si anMixture(n comDonents=3 \
12/26/21, 4:21 PM PGM7-EM-K-MEANS.ipynb - Colaboratory

gmm.fit(xs)

y_cluster_gmm = gmm.predict(xs)

plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title('GMM Classification')
plt.show()

em = rename(y_cluster_gmm)
print(”\nWhat EM thought: \n“, em)
print(”Accuracy of EM is “,sm.accuracy_score(y, em))
print(”Confusion Matrix for EM is \n“, sm.confusion_matrix(y, em))

You might also like