0% found this document useful (0 votes)

7 views6 pages

PMA Experiment 2

Uploaded by

siyebic418

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

PMA Experiment 2

Uploaded by

siyebic418

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

K-Means Clustering

A Short Case Study

On this notebook, we're gonna dive not so deep on the basics of how to dow a K-Means Clustering on a small example dataset. At the end of
this study, I hope we could achieve the following understandings regarding our problem:

1. What's a good way to segment our dataset on a small set of clusters?

2. How can we achieve quick results using the Pandas, Numpy, Matplotlib, Pyplot and SKLearn modules?
3. How the select the best hyperparameters for K-Means Clustering?
4. How to display and visualize data in the most honest and friendly way to our stakeholders?

Initialization

# Initialization

# Module Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.cluster import KMeans

# Style Definitions
plt.style.use('Solarize_Light2')

# Get dataframe from CSV file

df = pd.read_csv('customers.csv')
df.head()

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

df.shape

(200, 5)

df.describe()

CustomerID Age Annual Income (k$) Spending Score (1-100)

count 200.000000 200.000000 200.000000 200.000000

mean 100.500000 38.850000 60.560000 50.200000

std 57.879185 13.969007 26.264721 25.823522

min 1.000000 18.000000 15.000000 1.000000

25% 50.750000 28.750000 41.500000 34.750000

50% 100.500000 36.000000 61.500000 50.000000

75% 150.250000 49.000000 78.000000 73.000000

max 200.000000 70.000000 137.000000 99.000000

plt.figure(1, figsize=(16,4))
n = 0
for i in ['Age', 'Annual Income (k$)', 'Spending Score (1-100)']:
n += 1
plt.subplot(1 , 3 , n)
plt.subplots_adjust(hspace =0.5 , wspace = 0.5)
sns.distplot(df[i] , bins = 32)
plt.title(f'Histogram of {i}')
plt.show()

# Assignment Stage

X1 = df.loc[:, ['Age', 'Spending Score (1-100)']].values

inertia = []
for n in range(1 , 11):
model = KMeans(n_clusters = n,
init='k-means++',
max_iter=500,
random_state=42)
model.fit(X1)
inertia.append(model.inertia_)

plt.figure(1 , figsize = (15 ,6))

plt.plot(np.arange(1 , 11) , inertia , 'o')
plt.plot(np.arange(1 , 11) , inertia , '-' , alpha = 0.5)
plt.xlabel('Number of Clusters') , plt.ylabel('Inertia')
plt.show()

model = KMeans(n_clusters = 4,
init='k-means++',
max_iter=500,
random_state=42)
model.fit(X1)
labels = model.labels_
centroids = model.cluster_centers_
y_kmeans = model.fit_predict(X1)

plt.figure(figsize=(20,10))
plt.scatter(X1[y_kmeans == 0, 0], X1[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X1[y_kmeans == 1, 0], X1[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X1[y_kmeans == 2, 0], X1[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X1[y_kmeans == 3, 0], X1[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X1[y_kmeans == 4, 0], X1[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.title('Clusters of Customers - Age X Spending Score')
plt.xlabel('Age')
plt.ylabel('Spending Score')
plt.legend()
plt.show()

Second Clustering
By Annual Income and Spending Score

# Assignment Stage

X2 = df.loc[:, ['Annual Income (k$)', 'Spending Score (1-100)']].values

inertia = []
for n in range(1 , 11):
model = KMeans(n_clusters = n,
init='k-means++',
max_iter=500,
random_state=42)
model.fit(X2)
inertia.append(model.inertia_)

plt.figure(1 , figsize = (20, 10))

plt.plot(np.arange(1 , 11) , inertia , 'o')
plt.plot(np.arange(1 , 11) , inertia , '-' , alpha = 0.5)
plt.xlabel('Number of Clusters') , plt.ylabel('Inertia')
plt.show()
model = KMeans(n_clusters = 5,
init='k-means++',
max_iter=500,
random_state=42)
model.fit(X2)
labels = model.labels_
centroids = model.cluster_centers_
y_kmeans = model.fit_predict(X2)

plt.figure(figsize=(20,10))
plt.scatter(X2[y_kmeans == 0, 0], X2[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X2[y_kmeans == 1, 0], X2[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X2[y_kmeans == 2, 0], X2[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X2[y_kmeans == 3, 0], X2[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X2[y_kmeans == 4, 0], X2[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.title('Clusters of Customers - Annual Income (k$) X Spending Score')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score')
plt.legend()
plt.show()

# Assignment Stage

from sklearn.cluster import KMeans

X3 = df.loc[:, ['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].values

inertia = []
for n in range(1 , 11):
model = KMeans(n_clusters = n,
init='k-means++',
max_iter=500,
random_state=42)
model.fit(X3)
inertia.append(model.inertia_)

plt.figure(1 , figsize = (20, 10))

plt.plot(np.arange(1 , 11) , inertia , 'o')
plt.plot(np.arange(1 , 11) , inertia , '-' , alpha = 0.5)
plt.xlabel('Number of Clusters') , plt.ylabel('Inertia')
plt.show()
model = KMeans(n_clusters = 6,
init='k-means++',
max_iter=500,
random_state=42)
model.fit(X3)
labels = model.labels_
#centroids = model.cluster_centers_

df['cluster'] = labels
df

CustomerID Gender Age Annual Income (k$) Spending Score (1-100) cluster

0 1 Male 19 15 39 0

1 2 Male 21 15 81 1

2 3 Female 20 16 6 0

3 4 Female 23 16 77 1

4 5 Female 31 17 40 0

... ... ... ... ... ... ...

195 196 Female 35 120 79 4

196 197 Female 45 126 28 2

197 198 Male 32 126 74 4

198 199 Male 32 137 18 2

199 200 Male 30 137 83 4

200 rows × 6 columns

fig = px.scatter_3d(df,
x="Age",
y="Annual Income (k$)",
z="Spending Score (1-100)",
color='cluster',
hover_data=["Age",
"Annual Income (k$)",
"Spending Score (1-100)"],
category_orders = {"cluster": range(0, 5)},
)

fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))

fig.show()

Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
ML Solution
No ratings yet
ML Solution
60 pages
09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
Data Mining Ii Sol
No ratings yet
Data Mining Ii Sol
106 pages
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
No ratings yet
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
40 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
A Mini Rpoject
No ratings yet
A Mini Rpoject
7 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
96 pages
Unit IV
No ratings yet
Unit IV
51 pages
Clustering
No ratings yet
Clustering
1 page
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
3 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
KMeans Clustering Bidimensional Daniel Ames Camayo
No ratings yet
KMeans Clustering Bidimensional Daniel Ames Camayo
15 pages
End To End Machine Learning Problem
No ratings yet
End To End Machine Learning Problem
20 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Market Analysis by Pchandru
No ratings yet
Market Analysis by Pchandru
10 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
9 Ds
No ratings yet
9 Ds
5 pages
23CC554
No ratings yet
23CC554
10 pages
K Means Clustering Customer Clustering
No ratings yet
K Means Clustering Customer Clustering
7 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Kmeans
No ratings yet
Kmeans
5 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Practical 5
No ratings yet
Practical 5
6 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Kmeansclustering Sales Dataset
No ratings yet
Kmeansclustering Sales Dataset
6 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
DWM Exp4
No ratings yet
DWM Exp4
9 pages
ML - Unit-6 KMeans
No ratings yet
ML - Unit-6 KMeans
20 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
K-Means Clustering Numerical Example
No ratings yet
K-Means Clustering Numerical Example
5 pages
Lab-7 Clustering
No ratings yet
Lab-7 Clustering
4 pages
Document 10
No ratings yet
Document 10
3 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Week 8. GMM
No ratings yet
Week 8. GMM
11 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Ass6 (DMDS)
No ratings yet
Ass6 (DMDS)
7 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Partition
No ratings yet
Partition
52 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Density Based Clustering
No ratings yet
Density Based Clustering
70 pages
No Ph.D. Game Design With Three.js
From Everand
No Ph.D. Game Design With Three.js
Nikiforos Kontopoulos
No ratings yet
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
Pertemuan-X - Manajemen Data Bagian 2
No ratings yet
Pertemuan-X - Manajemen Data Bagian 2
31 pages
Lec 06 Clustering
No ratings yet
Lec 06 Clustering
44 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Program-1 Aim:: Theory
No ratings yet
Program-1 Aim:: Theory
4 pages
Week 8. K-Means
No ratings yet
Week 8. K-Means
7 pages
Clustering Methods
No ratings yet
Clustering Methods
29 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
Salesforce PD1
No ratings yet
Salesforce PD1
3 pages
NJ - Corrected Final
No ratings yet
NJ - Corrected Final
27 pages
Chapter 4 PDF
No ratings yet
Chapter 4 PDF
89 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
No ratings yet
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
79 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
7 pages
Kmeans Algorithm
No ratings yet
Kmeans Algorithm
3 pages
Compute2
No ratings yet
Compute2
10 pages
Assign 7
No ratings yet
Assign 7
5 pages
Income (K-Means Clustering On A Sample Data Set)
No ratings yet
Income (K-Means Clustering On A Sample Data Set)
3 pages
CSE 319 Pattern Recognition: Clustering
No ratings yet
CSE 319 Pattern Recognition: Clustering
58 pages
Algoritma K-Means Clustering Dan Contoh Soal - KETUTRARE
No ratings yet
Algoritma K-Means Clustering Dan Contoh Soal - KETUTRARE
17 pages
DBSCAN Clustering Algorithm Based On Density
No ratings yet
DBSCAN Clustering Algorithm Based On Density
5 pages
An Incremental Clustering Algorithm Based On Mahalanobis Distance
No ratings yet
An Incremental Clustering Algorithm Based On Mahalanobis Distance
1 page
Zafira fk,+4 Vol11No1 855+ (36-47) +
No ratings yet
Zafira fk,+4 Vol11No1 855+ (36-47) +
12 pages
Metode Subtractive Fuzzy C-Means (SFCM) Dalam Pengelompokan
No ratings yet
Metode Subtractive Fuzzy C-Means (SFCM) Dalam Pengelompokan
13 pages
Inter Cluster Inertia Gains: Slim Kammoun
No ratings yet
Inter Cluster Inertia Gains: Slim Kammoun
13 pages
Program 7-EM Algorithm-K Means Algorithm
No ratings yet
Program 7-EM Algorithm-K Means Algorithm
3 pages
Jurnal 4 Statistik
No ratings yet
Jurnal 4 Statistik
6 pages
Tara Venit Per Capita (US$) Rata de Alfabetizare (%) Rata de Mortalitate Infantila (%) Durata Medie de Viata (Ani)
No ratings yet
Tara Venit Per Capita (US$) Rata de Alfabetizare (%) Rata de Mortalitate Infantila (%) Durata Medie de Viata (Ani)
8 pages
ML Clustering
No ratings yet
ML Clustering
3 pages
Código K-Means en Spyder
No ratings yet
Código K-Means en Spyder
3 pages

PMA Experiment 2

Uploaded by

PMA Experiment 2

Uploaded by

K-Means Clustering

A Short Case Study

1. What's a good way to segment our dataset on a small set of clusters?

# Get dataframe from CSV file

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

CustomerID Age Annual Income (k$) Spending Score (1-100)

count 200.000000 200.000000 200.000000 200.000000

mean 100.500000 38.850000 60.560000 50.200000

std 57.879185 13.969007 26.264721 25.823522

min 1.000000 18.000000 15.000000 1.000000

25% 50.750000 28.750000 41.500000 34.750000

50% 100.500000 36.000000 61.500000 50.000000

75% 150.250000 49.000000 78.000000 73.000000

max 200.000000 70.000000 137.000000 99.000000

X1 = df.loc[:, ['Age', 'Spending Score (1-100)']].values

plt.figure(1 , figsize = (15 ,6))

X2 = df.loc[:, ['Annual Income (k$)', 'Spending Score (1-100)']].values

plt.figure(1 , figsize = (20, 10))

from sklearn.cluster import KMeans

X3 = df.loc[:, ['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].values

plt.figure(1 , figsize = (20, 10))

... ... ... ... ... ... ...

195 196 Female 35 120 79 4

196 197 Female 45 126 28 2

197 198 Male 32 126 74 4

198 199 Male 32 137 18 2

199 200 Male 30 137 83 4

200 rows × 6 columns

fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))

You might also like