0% found this document useful (0 votes)

6 views4 pages

Btech1010622 Lab4

The document outlines a data analysis process using Python libraries to cluster customer data based on spending scores and demographics. It includes data preprocessing steps such as scaling and encoding, followed by the application of Agglomerative Clustering with different configurations. The purity of the clustering results is calculated to evaluate the effectiveness of the clustering methods used.

Uploaded by

priyanshupriyam610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views4 pages

Btech1010622 Lab4

Uploaded by

priyanshupriyam610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

import numpy as np

import pandas as pd
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import pairwise_distances

df=pd.read_csv("Live.csv")
df.head()

status_id status_type status_published \

0 246675545449582_1649696485147474 video 4/22/2018 6:00
1 246675545449582_1649426988507757 photo 4/21/2018 22:45
2 246675545449582_1648730588577397 video 4/21/2018 6:17
3 246675545449582_1648576705259452 photo 4/21/2018 2:29
4 246675545449582_1645700502213739 photo 4/18/2018 3:22

num_reactions num_comments num_shares num_likes num_loves

num_wows \
0 529 512 262 432 92
3
1 150 0 0 150 0
0
2 227 236 57 204 21
1
3 111 0 0 111 0
0
4 213 0 0 204 9
0

num_hahas num_sads num_angrys Column1 Column2 Column3 Column4

0 1 1 0 NaN NaN NaN NaN

1 0 0 0 NaN NaN NaN NaN

2 1 0 0 NaN NaN NaN NaN

3 0 0 0 NaN NaN NaN NaN

4 0 0 0 NaN NaN NaN NaN

df1=pd.read_csv("Mall_Customers.csv")
df1.head()

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
df1['Spending Score (1-100)']=pd.cut(df1['Spending Score (1-100)'],
bins=[-1, 39, 64, 84, 100],labels=['L', 'M', 'H', 'VH'])
print(df1.head())

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 L
1 2 Male 21 15 H
2 3 Female 20 16 L
3 4 Female 23 16 H
4 5 Female 31 17 M

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
df1['Gender'] = label_encoder.fit_transform(df1['Gender'])
print(df1.head())

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 1 19 15 L
1 2 1 21 15 H
2 3 0 20 16 L
3 4 0 23 16 H
4 5 0 31 17 M

df1_cluster = df1.drop(columns=["Spending Score (1-100)"])

scaler = StandardScaler()
df1_scaled = scaler.fit_transform(df1_cluster)
print(df1_scaled[:5])

[[-1.7234121 1.12815215 -1.42456879 -1.73899919]

[-1.70609137 1.12815215 -1.28103541 -1.73899919]
[-1.68877065 -0.88640526 -1.3528021 -1.70082976]
[-1.67144992 -0.88640526 -1.13750203 -1.70082976]
[-1.6541292 -0.88640526 -0.56336851 -1.66266033]]

distance_matrix=[]
distance_matrix=pairwise_distances(df1_scaled,metric='euclidean')
print(distance_matrix)

[[0. 0.14457468 2.01649422 ... 5.51941563 5.8580287

5.84708314]
[0.14457468 0. 2.01627105 ... 5.48623951 5.82672938
5.81921472]
[2.01649422 2.01627105 0. ... 5.81691044 6.13642782
6.12756308]
...
[5.51941563 5.48623951 5.81691044 ... 0. 0.42022084
0.44507011]
[5.8580287 5.82672938 6.13642782 ... 0.42022084 0.
0.14457468]
[5.84708314 5.81921472 6.12756308 ... 0.44507011 0.14457468 0.
]]
from sklearn.cluster import AgglomerativeClustering
agg_clustering_1 = AgglomerativeClustering(n_clusters=4,
affinity='precomputed', linkage='single')
agg_clustering_1.fit(distance_matrix)
agg_clustering_2 = AgglomerativeClustering(n_clusters=2,
affinity='precomputed', linkage='complete')
agg_clustering_2.fit(distance_matrix)
labels_1 = agg_clustering_1.labels_
labels_2 = agg_clustering_2.labels_
print("Cluster labels from the first run:")
print(labels_1)
print("Cluster labels from the second run:")
print(labels_2)

Cluster labels from the first run:

[0 0 1 1 1 1 1 1 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 0 1
1 1
1 1 1 1 0 0 1 1 1 1 1 1 1 1 3 1 0 1 0 1 0 1 0 0 0 1 1 0 0 1 1 0 1 0 1
1 1
0 0 1 0 1 1 0 0 0 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 0 0
0 0
1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 1 1 0 0
0 1
1 0 0 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0
1 1
0 1 0 1 1 1 1 0 1 2 1 2 0 0 0]
Cluster labels from the second run:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 1 0 1 1 0 0
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

from sklearn.metrics import confusion_matrix

import numpy as np
def calculate_purity(true_labels, predicted_labels):
cm = confusion_matrix(true_labels, predicted_labels)
purity = np.sum(np.amax(cm, axis=1)) / np.sum(cm)
return purity
true_labels = df1['Spending Score (1-100)'].map({'L': 0, 'M': 1, 'H':
2, 'VH': 3})
purity_1 = calculate_purity(true_labels, labels_1)
purity_2 = calculate_purity(true_labels, labels_2)
print(f"Purity for Clustering 1: {purity_1}")
print(f"Purity for Clustering 2: {purity_2}")

Purity for Clustering 1: 0.555

Purity for Clustering 2: 0.72

Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Untitled
No ratings yet
Untitled
1,326 pages
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Project 13 Customer Segmentation Using K Means Clustering
No ratings yet
Project 13 Customer Segmentation Using K Means Clustering
9 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
DS 5
No ratings yet
DS 5
2 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
Panda Merged
No ratings yet
Panda Merged
19 pages
Clustering Documentation Python Code
No ratings yet
Clustering Documentation Python Code
8 pages
Data Science Project VI - Ipynb - Colaboratory
No ratings yet
Data Science Project VI - Ipynb - Colaboratory
15 pages
Practical 3
No ratings yet
Practical 3
8 pages
CSTSGTCODE
No ratings yet
CSTSGTCODE
3 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Data Analysis Process
No ratings yet
Data Analysis Process
95 pages
Even Students
No ratings yet
Even Students
36 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Howxtre
No ratings yet
Howxtre
8 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Machine Learning Record VR19
No ratings yet
Machine Learning Record VR19
46 pages
Mlext
No ratings yet
Mlext
1 page
Python Solution
No ratings yet
Python Solution
30 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
KMEANS
No ratings yet
KMEANS
13 pages
Dsbdalab 5
No ratings yet
Dsbdalab 5
3 pages
Datascience PR 6 Veda
No ratings yet
Datascience PR 6 Veda
6 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Logistic Regression 007
No ratings yet
Logistic Regression 007
1 page
Practical 5
No ratings yet
Practical 5
6 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
WIN SEM (2023-24) FRESHERS - CSE0504 - ETH - AP2023247000196 - 2024-02-29 - Reference-Material-II
No ratings yet
WIN SEM (2023-24) FRESHERS - CSE0504 - ETH - AP2023247000196 - 2024-02-29 - Reference-Material-II
13 pages
Lab3.ipynb - Colaboratory
No ratings yet
Lab3.ipynb - Colaboratory
7 pages
Lab 13
No ratings yet
Lab 13
5 pages
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
No ratings yet
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
40 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Quiz Complete
No ratings yet
Quiz Complete
4 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
Heirarchical Clustering - Ipynb - Colab
No ratings yet
Heirarchical Clustering - Ipynb - Colab
4 pages
Tarea 4
No ratings yet
Tarea 4
6 pages
ML Assignment Presentation
No ratings yet
ML Assignment Presentation
37 pages
DATASCI112 Midterm Cheat Sheet
No ratings yet
DATASCI112 Midterm Cheat Sheet
2 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
K Means Clustering
100% (1)
K Means Clustering
10 pages
PMT2 24
No ratings yet
PMT2 24
56 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
21mic0107 1
No ratings yet
21mic0107 1
7 pages
Ip Practical Revision Question
No ratings yet
Ip Practical Revision Question
8 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
ML 5.0
No ratings yet
ML 5.0
2 pages
Customer Churn Syntax
No ratings yet
Customer Churn Syntax
66 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Shake Them Haters off Volume 12: Mastering Your Mathematics Skills – the Study Guide
From Everand
Shake Them Haters off Volume 12: Mastering Your Mathematics Skills – the Study Guide
Russell Bailey
No ratings yet
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
From Everand
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
Barron's Educational Series
No ratings yet
Numpy Mathlib
No ratings yet
Numpy Mathlib
9 pages
PARCIAL 1-MN - Colaboratory
No ratings yet
PARCIAL 1-MN - Colaboratory
4 pages
Latex Error
No ratings yet
Latex Error
4 pages
Reinforcement Learning For Finance Solve Problems in Finance With CNN and RNN Using The Tensorflow Library 1st Edition Samit Ahlawat Download
No ratings yet
Reinforcement Learning For Finance Solve Problems in Finance With CNN and RNN Using The Tensorflow Library 1st Edition Samit Ahlawat Download
87 pages
A Step-by-Step Guide To Calculating Autocorrelation and Partial Autocorrelation
No ratings yet
A Step-by-Step Guide To Calculating Autocorrelation and Partial Autocorrelation
13 pages
Scikit
No ratings yet
Scikit
4 pages
Practice - IP - Series
No ratings yet
Practice - IP - Series
4 pages
Donald E. Knuth - Texbook
100% (6)
Donald E. Knuth - Texbook
494 pages
Latex4Biginners Ar PDF
No ratings yet
Latex4Biginners Ar PDF
7 pages
Python Widgets
No ratings yet
Python Widgets
3 pages
Nelson-Siegel Model
No ratings yet
Nelson-Siegel Model
50 pages
Mowafaqat Tarminal
No ratings yet
Mowafaqat Tarminal
7 pages
Subida XD
No ratings yet
Subida XD
22 pages
Finding Partial Derivatives and Jacobians .Ipynb - Colab
No ratings yet
Finding Partial Derivatives and Jacobians .Ipynb - Colab
7 pages
Array-Numpy-Quiz - Attempt Review
No ratings yet
Array-Numpy-Quiz - Attempt Review
10 pages
How To Install Mask-Rcnn For Nvidia Gpu
No ratings yet
How To Install Mask-Rcnn For Nvidia Gpu
19 pages
Matplotlib - 2D Line Plot
No ratings yet
Matplotlib - 2D Line Plot
12 pages
Bibliographies - Using Harvard Referencing Style - TeX - LaTeX Stack Exchange
No ratings yet
Bibliographies - Using Harvard Referencing Style - TeX - LaTeX Stack Exchange
1 page
Python - How To Draw A Heart With Pylab - Stack Overflow
No ratings yet
Python - How To Draw A Heart With Pylab - Stack Overflow
5 pages
Delhivery Business Case Study 1723758771
No ratings yet
Delhivery Business Case Study 1723758771
56 pages
Manu1 U1 A2 Osrr
No ratings yet
Manu1 U1 A2 Osrr
23 pages
Deep Learning With Python - Keras and Pytorch
No ratings yet
Deep Learning With Python - Keras and Pytorch
121 pages
Ch-1 Type-C Exercise
No ratings yet
Ch-1 Type-C Exercise
6 pages
Untitled2.ipynb - Colaboratory
No ratings yet
Untitled2.ipynb - Colaboratory
4 pages
COVID19.ipynb - Colab
No ratings yet
COVID19.ipynb - Colab
4 pages
Xe Persian
No ratings yet
Xe Persian
196 pages
14 NumPy
No ratings yet
14 NumPy
4 pages
Advance Mathematics: Dosen Pengampu: Indrazno Siradjuddin, ST., MT., PH.D
No ratings yet
Advance Mathematics: Dosen Pengampu: Indrazno Siradjuddin, ST., MT., PH.D
6 pages
Python GTU Study Material E-Notes 3 16012021061619AM
No ratings yet
Python GTU Study Material E-Notes 3 16012021061619AM
36 pages
Logistic Regression Implementation Insurance Data
No ratings yet
Logistic Regression Implementation Insurance Data
3 pages

Btech1010622 Lab4

Uploaded by

Btech1010622 Lab4

Uploaded by

import numpy as np

status_id status_type status_published \

num_reactions num_comments num_shares num_likes num_loves

num_hahas num_sads num_angrys Column1 Column2 Column3 Column4

0 1 1 0 NaN NaN NaN NaN

1 0 0 0 NaN NaN NaN NaN

2 1 0 0 NaN NaN NaN NaN

3 0 0 0 NaN NaN NaN NaN

4 0 0 0 NaN NaN NaN NaN

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

from sklearn.preprocessing import LabelEncoder

CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

df1_cluster = df1.drop(columns=["Spending Score (1-100)"])

[[-1.7234121 1.12815215 -1.42456879 -1.73899919]

[[0. 0.14457468 2.01649422 ... 5.51941563 5.8580287

Cluster labels from the first run:

from sklearn.metrics import confusion_matrix

Purity for Clustering 1: 0.555

You might also like