0% found this document useful (0 votes)

7 views11 pages

Assignmnet 5

Assignment

Uploaded by

fclick717

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

Assignmnet 5

Assignment

Uploaded by

fclick717

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

10/17/24, 4:11 PM Assignmnet5

In [12]: #Sujal Jaju(T511053)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [13]: df = pd.read_csv("Mall_Customers.csv")

In [14]: df.head()

Out[14]: CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [15]: df.tail()

Out[15]: CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

195 196 Female 35 120 79

196 197 Female 45 126 28

197 198 Male 32 126 74

198 199 Male 32 137 18

199 200 Male 30 137 83

In [16]: df.shape

Out[16]: (200, 5)

In [17]: df.columns

Out[17]: Index(['CustomerID', 'Genre', 'Age', 'Annual Income (k$)',

'Spending Score (1-100)'],
dtype='object')

In [18]: df.drop("CustomerID",axis=1,inplace=True)

In [19]: df

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 1/11

10/17/24, 4:11 PM Assignmnet5

Out[19]: Genre Age Annual Income (k$) Spending Score (1-100)

0 Male 19 15 39

1 Male 21 15 81

2 Female 20 16 6

3 Female 23 16 77

4 Female 31 17 40

... ... ... ... ...

195 Female 35 120 79

196 Female 45 126 28

197 Male 32 126 74

198 Male 32 137 18

199 Male 30 137 83

200 rows × 4 columns

In [20]: print("Missing values:")

df.isnull().sum()

Missing values:
Out[20]: Genre 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64

In [21]: df.describe()

Out[21]: Age Annual Income (k$) Spending Score (1-100)

count 200.000000 200.000000 200.000000

mean 38.850000 60.560000 50.200000

std 13.969007 26.264721 25.823522

min 18.000000 15.000000 1.000000

25% 28.750000 41.500000 34.750000

50% 36.000000 61.500000 50.000000

75% 49.000000 78.000000 73.000000

max 70.000000 137.000000 99.000000

In [22]: df.info()

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 2/11

10/17/24, 4:11 PM Assignmnet5

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Genre 200 non-null object
1 Age 200 non-null int64
2 Annual Income (k$) 200 non-null int64
3 Spending Score (1-100) 200 non-null int64
dtypes: int64(3), object(1)
memory usage: 6.4+ KB

In [23]: df.nunique()

Out[23]: Genre 2
Age 51
Annual Income (k$) 64
Spending Score (1-100) 84
dtype: int64

In [24]: df.hist(bins = 50,figsize = (10,6));

In [25]: df['Genre'].value_counts().plot(kind='pie',figsize=(5,5),autopct='%1.1f%%')
plt.title("Total Gender Count")
plt.show()

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 3/11

10/17/24, 4:11 PM Assignmnet5

In [26]: sns.pairplot(df,hue="Genre");

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 4/11

10/17/24, 4:11 PM Assignmnet5

In [27]: sns.set(style = 'whitegrid')

sns.scatterplot(y = 'Spending Score (1-100)',x ='Annual Income (k$)',data = df,h
plt.title('Mall_Customers')
plt.show()

In [28]: # LabelEncoder for encoding binary categories in a column

from sklearn.preprocessing import LabelEncoder
from sklearn import metrics
le = LabelEncoder()
# One single vector so it is ovbious what we want to encode
df["Genre"] = le.fit_transform(df["Genre"])

In [29]: df

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 5/11

10/17/24, 4:11 PM Assignmnet5

Out[29]: Genre Age Annual Income (k$) Spending Score (1-100)

0 1 19 15 39

1 1 21 15 81

2 0 20 16 6

3 0 23 16 77

4 0 31 17 40

... ... ... ... ...

195 0 35 120 79

196 0 45 126 28

197 1 32 126 74

198 1 32 137 18

199 1 30 137 83

200 rows × 4 columns

In [30]: # Finding the optimum number of clusters using k-means

data = df.copy()
x = data.iloc[:,[2,3]]

#importing Kmean model

from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(x)
# appending the WCSS to the list
#(kmeans.inertia_ returns the WCSS value for an initialized cluster)
wcss.append(kmeans.inertia_)
print('k:',i ,"-> wcss:",kmeans.inertia_)

k: 1 -> wcss: 269981.28

k: 2 -> wcss: 183653.32894736837
k: 3 -> wcss: 106348.37306211118
k: 4 -> wcss: 73880.64496247197
k: 5 -> wcss: 44448.45544793371
k: 6 -> wcss: 40825.16946386946
k: 7 -> wcss: 33642.579220779226
k: 8 -> wcss: 26686.83778518779
k: 9 -> wcss: 24766.47160979344
k: 10 -> wcss: 23103.122085983916

In [31]: # Plotting the results onto a line graph, allowing us to observe 'The elbow'

plt.plot(range(1,11),wcss,marker='o')
plt.title('The Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 6/11

10/17/24, 4:11 PM Assignmnet5

In [32]: #Taking 5 clusters

km1=KMeans(n_clusters=5)
#Fitting the input data
km1.fit(data)
#predicting the labels of the input data
y=km1.predict(data)
#adding the labels to a column named label
data["label"] = y
#The new dataframe with the clustering done
data.head()

Out[32]: Genre Age Annual Income (k$) Spending Score (1-100) label

0 1 19 15 39 4

1 1 21 15 81 4

2 0 20 16 6 2

3 0 23 16 77 4

4 0 31 17 40 2

In [33]: #Scatterplot of the clusters

plt.figure(figsize=(6,4))
sns.scatterplot(x = 'Annual Income (k$)',y = 'Spending Score (1-100)',hue="label
palette=['green','brown','orange','red','dodgerblue'],data = da
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title('Spending Score (1-100) vs Annual Income (k$)')
plt.show()

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 7/11

10/17/24, 4:11 PM Assignmnet5

In [34]: X=data.iloc[:,:4]
y=data.iloc[:,-1]

In [35]: from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_

# Shape of train Test Split

print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)

(160, 4) (160,)
(40, 4) (40,)

In [36]: from sklearn.cluster import KMeans

km=KMeans(n_clusters=5)
km.fit(X_train)

#predicting the target value from the model for the samples
y_train_km = km.predict(X_train)
y_test_km = km.predict(X_test)

In [37]: from sklearn.metrics.cluster import adjusted_rand_score

acc_train_gmm = adjusted_rand_score(y_train,y_train_km)
acc_test_gmm = adjusted_rand_score(y_test,y_test_km)

print("K mean : Accuracy on training Data: {:.3f}".format(acc_train_gmm))

print("K mean : Accuracy on test Data: {:.3f}".format(acc_test_gmm))

K mean : Accuracy on training Data: 0.692

K mean : Accuracy on test Data: 0.656

In [38]: data = df.copy()

data = data.iloc[:,[2,3]]
data

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 8/11

10/17/24, 4:11 PM Assignmnet5

Out[38]: Annual Income (k$) Spending Score (1-100)

0 15 39

1 15 81

2 16 6

3 16 77

4 17 40

... ... ...

195 120 79

196 126 28

197 126 74

198 137 18

199 137 83

200 rows × 2 columns

In [39]: sns.scatterplot(x="Annual Income (k$)",y="Spending Score (1-100)",data = data );

In [40]: import scipy.cluster.hierarchy as shc

dendrogram = shc.dendrogram(shc.linkage(data,method="ward"))
plt.title("dendrogram Plot")
plt.xlabel("Customer")
plt.ylabel("Eclidean Distance")
plt.grid(False)

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 9/11

10/17/24, 4:11 PM Assignmnet5

In [41]: from sklearn.cluster import AgglomerativeClustering

agc = AgglomerativeClustering(n_clusters=5)
data["label"] = agc.fit_predict(data)
data

Out[41]: Annual Income (k$) Spending Score (1-100) label

0 15 39 4

1 15 81 3

2 16 6 4

3 16 77 3

4 17 40 4

... ... ... ...

195 120 79 2

196 126 28 0

197 126 74 2

198 137 18 0

199 137 83 2

200 rows × 3 columns

In [42]: #Scatterplot of the clusters

sns.scatterplot(x = 'Annual Income (k$)',y = 'Spending Score (1-100)',hue="label

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 10/11

10/17/24, 4:11 PM Assignmnet5

palette=['green','brown','orange','red','dodgerblue'],data = da
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title('Spending Score (1-100) vs Annual Income (k$)')
plt.show()

In [ ]:

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 11/11

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Types of AI Agents Artificial Intelligence
100% (1)
Types of AI Agents Artificial Intelligence
4 pages
K Means Clustering
100% (1)
K Means Clustering
10 pages
ML Solution
No ratings yet
ML Solution
60 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Pa66 ML Exp6
No ratings yet
Pa66 ML Exp6
9 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Machine Learning Record VR19
No ratings yet
Machine Learning Record VR19
46 pages
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
No ratings yet
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
40 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
ML
No ratings yet
ML
23 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
Rimjhim
No ratings yet
Rimjhim
21 pages
End To End Machine Learning Problem
No ratings yet
End To End Machine Learning Problem
20 pages
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Machine Learning Program
No ratings yet
Machine Learning Program
12 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
M PDF
No ratings yet
M PDF
13 pages
Practical 3
No ratings yet
Practical 3
8 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
Python Slips
No ratings yet
Python Slips
9 pages
FDS All Practicals
No ratings yet
FDS All Practicals
10 pages
Expt6total.i (2) - JupyterLab
No ratings yet
Expt6total.i (2) - JupyterLab
7 pages
Practical 5
No ratings yet
Practical 5
6 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
21mic0107 1
No ratings yet
21mic0107 1
7 pages
PMA Experiment 2
No ratings yet
PMA Experiment 2
6 pages
Datascience PR 6 Veda
No ratings yet
Datascience PR 6 Veda
6 pages
DSDBAAssignment2 SUMEET
No ratings yet
DSDBAAssignment2 SUMEET
8 pages
K Means
No ratings yet
K Means
5 pages
Fds Slips
No ratings yet
Fds Slips
6 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
Btech1010622 Lab4
No ratings yet
Btech1010622 Lab4
4 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Project 13 Customer Segmentation Using K Means Clustering
No ratings yet
Project 13 Customer Segmentation Using K Means Clustering
9 pages
ML Lab
No ratings yet
ML Lab
7 pages
A Mini Rpoject
No ratings yet
A Mini Rpoject
7 pages
CSTSGTCODE
No ratings yet
CSTSGTCODE
3 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
DATASCI112 Midterm Cheat Sheet
No ratings yet
DATASCI112 Midterm Cheat Sheet
2 pages
Mlext
No ratings yet
Mlext
1 page
Algorithmic Graph Theory
No ratings yet
Algorithmic Graph Theory
343 pages
Control System Engineering
No ratings yet
Control System Engineering
2 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
81 pages
Report Analysis: Over-View of The Dataset
No ratings yet
Report Analysis: Over-View of The Dataset
6 pages
DFT
No ratings yet
DFT
3 pages
MAT 240 Module Three Assignment
No ratings yet
MAT 240 Module Three Assignment
3 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Dynamic Response Characteristics of More Complicated Processes
No ratings yet
Dynamic Response Characteristics of More Complicated Processes
38 pages
Model Evaluation Metrics
No ratings yet
Model Evaluation Metrics
21 pages
Programming and Computation in Physics - FTCS Method
No ratings yet
Programming and Computation in Physics - FTCS Method
27 pages
1 - An Introduction To Machine Learning With Scikit-Learn
No ratings yet
1 - An Introduction To Machine Learning With Scikit-Learn
9 pages
Fandi Ct3 201004 Exam Final
No ratings yet
Fandi Ct3 201004 Exam Final
8 pages
Syllabus (AI - ML BlackBelt Plus Program)
No ratings yet
Syllabus (AI - ML BlackBelt Plus Program)
18 pages
Unit V
No ratings yet
Unit V
9 pages
Chapter 14 - Nonlinear Regression Models
No ratings yet
Chapter 14 - Nonlinear Regression Models
20 pages
BSM DSC16
No ratings yet
BSM DSC16
1 page
Artificial Intelligence: Department of Computer Science and Engineering
No ratings yet
Artificial Intelligence: Department of Computer Science and Engineering
34 pages
Histogram
No ratings yet
Histogram
10 pages
Lesson16 2
No ratings yet
Lesson16 2
22 pages
Functions: Practical No.68
No ratings yet
Functions: Practical No.68
7 pages
Prediction and Sentiment Analysis of Stock Using Machine Learning
No ratings yet
Prediction and Sentiment Analysis of Stock Using Machine Learning
10 pages
Me-Pse Curriculum and Syllabus
No ratings yet
Me-Pse Curriculum and Syllabus
73 pages
CRM in C++
No ratings yet
CRM in C++
2 pages
GIS Interpolation
No ratings yet
GIS Interpolation
14 pages
Sentiment Analysis For E-Commerce Product Reviews
No ratings yet
Sentiment Analysis For E-Commerce Product Reviews
9 pages
تقدير متجه المتوسطات ومصفوفة التباين والتباين المشترك PDF
No ratings yet
تقدير متجه المتوسطات ومصفوفة التباين والتباين المشترك PDF
3 pages
Enhanced Shell Sorting Algorithm: Basit Shahzad, and Muhammad Tanvir Afzal
No ratings yet
Enhanced Shell Sorting Algorithm: Basit Shahzad, and Muhammad Tanvir Afzal
5 pages
U7L06 - Activity Guide - Training A Model in AI Lab Lesson 6
No ratings yet
U7L06 - Activity Guide - Training A Model in AI Lab Lesson 6
2 pages
From Farmland to Card Shop: A History of Shadyside Through the Windows of 5522 Walnut St
From Everand
From Farmland to Card Shop: A History of Shadyside Through the Windows of 5522 Walnut St
Jason Kirin
No ratings yet
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet

Assignmnet 5

Uploaded by

Assignmnet 5

Uploaded by

10/17/24, 4:11 PM Assignmnet5

In [12]: #Sujal Jaju(T511053)

195 196 Female 35 120 79

196 197 Female 45 126 28

197 198 Male 32 126 74

198 199 Male 32 137 18

199 200 Male 30 137 83

Out[17]: Index(['CustomerID', 'Genre', 'Age', 'Annual Income (k$)',

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 1/11

Out[19]: Genre Age Annual Income (k$) Spending Score (1-100)

... ... ... ... ...

195 Female 35 120 79

196 Female 45 126 28

197 Male 32 126 74

198 Male 32 137 18

199 Male 30 137 83

200 rows × 4 columns

In [20]: print("Missing values:")

Out[21]: Age Annual Income (k$) Spending Score (1-100)

count 200.000000 200.000000 200.000000

mean 38.850000 60.560000 50.200000

std 13.969007 26.264721 25.823522

min 18.000000 15.000000 1.000000

25% 28.750000 41.500000 34.750000

50% 36.000000 61.500000 50.000000

75% 49.000000 78.000000 73.000000

max 70.000000 137.000000 99.000000

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 2/11

In [24]: df.hist(bins = 50,figsize = (10,6));

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 3/11

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 4/11

In [27]: sns.set(style = 'whitegrid')

In [28]: # LabelEncoder for encoding binary categories in a column

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 5/11

Out[29]: Genre Age Annual Income (k$) Spending Score (1-100)

... ... ... ... ...

200 rows × 4 columns

In [30]: # Finding the optimum number of clusters using k-means

#importing Kmean model

k: 1 -> wcss: 269981.28

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 6/11

In [32]: #Taking 5 clusters

In [33]: #Scatterplot of the clusters

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 7/11

In [35]: from sklearn.model_selection import train_test_split

# Shape of train Test Split

In [36]: from sklearn.cluster import KMeans

In [37]: from sklearn.metrics.cluster import adjusted_rand_score

print("K mean : Accuracy on training Data: {:.3f}".format(acc_train_gmm))

K mean : Accuracy on training Data: 0.692

In [38]: data = df.copy()

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 8/11

Out[38]: Annual Income (k$) Spending Score (1-100)

... ... ...

200 rows × 2 columns

In [39]: sns.scatterplot(x="Annual Income (k$)",y="Spending Score (1-100)",data = data );

In [40]: import scipy.cluster.hierarchy as shc

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 9/11

In [41]: from sklearn.cluster import AgglomerativeClustering

Out[41]: Annual Income (k$) Spending Score (1-100) label

... ... ... ...

200 rows × 3 columns

In [42]: #Scatterplot of the clusters

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 10/11

file:///C:/Users/Student/Downloads/Assignmnet5 (6).html 11/11

You might also like