0% found this document useful (0 votes)

10 views6 pages

Kmeansclustering Sales Dataset

Uploaded by

tryhackkme123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

Kmeansclustering Sales Dataset

Uploaded by

tryhackkme123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

kmeansclustering-sales-dataset

November 6, 2024

[1]: import pandas as pd

C:\Users\ASUS\AppData\Local\Temp\ipykernel_24628\4080736814.py:1:
DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of
pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better
interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

import pandas as pd

[2]: # Read the Dataset

dataframe = pd.read_csv("sales_data_sample.csv", encoding="ISO-8859-1")

[3]: # Create a Copy of the Dataset, we will work on this Copy

df = dataframe

[4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2823 entries, 0 to 2822
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ORDERNUMBER 2823 non-null int64
1 QUANTITYORDERED 2823 non-null int64
2 PRICEEACH 2823 non-null float64
3 ORDERLINENUMBER 2823 non-null int64
4 SALES 2823 non-null float64
5 ORDERDATE 2823 non-null object
6 STATUS 2823 non-null object
7 QTR_ID 2823 non-null int64
8 MONTH_ID 2823 non-null int64
9 YEAR_ID 2823 non-null int64
10 PRODUCTLINE 2823 non-null object

1
11 MSRP 2823 non-null int64
12 PRODUCTCODE 2823 non-null object
13 CUSTOMERNAME 2823 non-null object
14 PHONE 2823 non-null object
15 ADDRESSLINE1 2823 non-null object
16 ADDRESSLINE2 302 non-null object
17 CITY 2823 non-null object
18 STATE 1337 non-null object
19 POSTALCODE 2747 non-null object
20 COUNTRY 2823 non-null object
21 TERRITORY 1749 non-null object
22 CONTACTLASTNAME 2823 non-null object
23 CONTACTFIRSTNAME 2823 non-null object
24 DEALSIZE 2823 non-null object
dtypes: float64(2), int64(7), object(16)
memory usage: 551.5+ KB

[5]: # Drop the Unnecessary Columns

df = df[['ORDERLINENUMBER', 'SALES']]

[6]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2823 entries, 0 to 2822
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ORDERLINENUMBER 2823 non-null int64
1 SALES 2823 non-null float64
dtypes: float64(1), int64(1)
memory usage: 44.2 KB

[7]: df.isna().sum()

[7]: ORDERLINENUMBER 0
SALES 0
dtype: int64

[8]: # Standard Preprocessing

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_values = scaler.fit_transform(df.values)

# This tries to make the Mean 0 and the Standard Deviation as 1

2
[9]: # Import KMeansClustering

from sklearn.cluster import KMeans

[18]: # Finding k with the Elbow Method

# Within Cluster Sum of Squares of Distances

wcss = []

for i in range(1,11):
model = KMeans(n_clusters=i)
model.fit_predict(scaled_values)
wcss.append(model.inertia_)

wcss
# The inertia is computed as the sum of squared distances from each data point␣
↪to the center of its assigned cluster

[18]: [5646.0,
3598.6969488881828,
2087.4819726029436,
1737.9042147878802,
1394.3494502342178,
1122.143072102116,
1000.5032931492917,
868.2475253342712,
794.2912840469619,
735.634443264441]

[32]: # Plot the Elbow Plot

import matplotlib.pyplot as plt

plt.plot(range(1,11),wcss,'ro-')
plt.show()

3
[12]: # K = 7 seems to be a better choice for k

[13]: kmeans_model = KMeans(n_clusters=7)

[14]: cluster = kmeans_model.fit_predict(scaled_values)

[33]: # import warnings

# warnings.filterwarnings('ignore')
df['Cluster'] = cluster

[37]: from sklearn.cluster import AgglomerativeClustering

import scipy.cluster.hierarchy as sch

# Dendrogram to visualize hierarchical clustering

plt.figure(figsize=(10, 7))
dendrogram = sch.dendrogram(sch.linkage(scaled_values, method='ward'))
plt.title("Dendrogram for Hierarchical Clustering")
plt.xlabel("Samples")
plt.ylabel("Euclidean distances")
plt.show()

# Implement Agglomerative Clustering

4
agg_cluster = AgglomerativeClustering(n_clusters=7, metric='euclidean',␣
↪linkage='ward')

y_agg = agg_cluster.fit_predict(scaled_values)

[38]: y_agg

[38]: array([0, 0, 4, …, 4, 0, 2], dtype=int64)

[34]: df

[34]: ORDERLINENUMBER SALES Cluster

0 2 2871.00 6
1 5 2765.90 6
2 2 3884.34 2
3 6 3746.70 2
4 14 5205.27 1
… … … …
2818 15 2244.40 3
2819 1 3978.51 2
2820 4 5417.57 5
2821 1 2116.16 6

5
2822 9 3079.44 0

[2823 rows x 3 columns]

[35]: plt.scatter(df['ORDERLINENUMBER'], df['SALES'],c=df['Cluster'])

[35]: <matplotlib.collections.PathCollection at 0x2407312e9f0>

[ ]:

K Means
100% (2)
K Means
329 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Civil Engineering Orientation
100% (6)
Civil Engineering Orientation
21 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Ds Paper
No ratings yet
Ds Paper
35 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
KMeans Clustering Bidimensional Daniel Ames Camayo
No ratings yet
KMeans Clustering Bidimensional Daniel Ames Camayo
15 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Sales Data Clustering
No ratings yet
Sales Data Clustering
15 pages
Kman 07
No ratings yet
Kman 07
9 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
ML Practical 4D
No ratings yet
ML Practical 4D
11 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
Bone Suplement Market Segmentation
No ratings yet
Bone Suplement Market Segmentation
20 pages
Project Data Mining (AMAN YADAV)
No ratings yet
Project Data Mining (AMAN YADAV)
12 pages
Market Analysis by Pchandru
No ratings yet
Market Analysis by Pchandru
10 pages
ShortCircuit - Table of Content
0% (1)
ShortCircuit - Table of Content
8 pages
SPPUML6
No ratings yet
SPPUML6
9 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
23CC554
No ratings yet
23CC554
10 pages
FMLASS3Q7 - Jupyter Notebook
No ratings yet
FMLASS3Q7 - Jupyter Notebook
6 pages
Program 8
No ratings yet
Program 8
11 pages
Analog Gauge Connections For Digital Optimax Models
100% (1)
Analog Gauge Connections For Digital Optimax Models
6 pages
Practical 5
No ratings yet
Practical 5
6 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
ML Lab
No ratings yet
ML Lab
8 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
Ass6 (DMDS)
No ratings yet
Ass6 (DMDS)
7 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
PMA Experiment 2
No ratings yet
PMA Experiment 2
6 pages
Untitled Document-2-1-13-7-11.4
No ratings yet
Untitled Document-2-1-13-7-11.4
5 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
AAM 7th Prac
No ratings yet
AAM 7th Prac
4 pages
Kmeans
No ratings yet
Kmeans
5 pages
Day59 K Means Clustering 1701989733
No ratings yet
Day59 K Means Clustering 1701989733
5 pages
K Means Illustration Colab
No ratings yet
K Means Illustration Colab
5 pages
K Means
No ratings yet
K Means
5 pages
Department Of: Computer Science & Engineering
No ratings yet
Department Of: Computer Science & Engineering
4 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
Panel Advocate'S List As On 07-11-2020, Circle Office Hyderabad
No ratings yet
Panel Advocate'S List As On 07-11-2020, Circle Office Hyderabad
57 pages
Intro Qugates
No ratings yet
Intro Qugates
4 pages
06K Means Clustering
No ratings yet
06K Means Clustering
4 pages
Avinash Tiwari 9
No ratings yet
Avinash Tiwari 9
4 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
Program 7
No ratings yet
Program 7
3 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
DS Prac 8
No ratings yet
DS Prac 8
4 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
Lab Assignment 3 Ai
No ratings yet
Lab Assignment 3 Ai
1 page
Experiment 11ml
No ratings yet
Experiment 11ml
1 page
PGM 7
No ratings yet
PGM 7
3 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
MS Project - Assign Resources To Task - Tutorialspoint
No ratings yet
MS Project - Assign Resources To Task - Tutorialspoint
4 pages
Latin American Melodies
100% (1)
Latin American Melodies
77 pages
Control Charts
No ratings yet
Control Charts
31 pages
SQLData Script Library User's Guide
No ratings yet
SQLData Script Library User's Guide
102 pages
Ir 2153
No ratings yet
Ir 2153
8 pages
Dme I Mock Test Question Bank
No ratings yet
Dme I Mock Test Question Bank
5 pages
Manual Ds7708
No ratings yet
Manual Ds7708
666 pages
Mazen Ayoubi Dissertation
100% (2)
Mazen Ayoubi Dissertation
8 pages
Concept of Frequency Compensation: A A F F F J J J F F F
No ratings yet
Concept of Frequency Compensation: A A F F F J J J F F F
5 pages
Vds Tits 2007
No ratings yet
Vds Tits 2007
218 pages
Bi-Monthly Inventory Report
No ratings yet
Bi-Monthly Inventory Report
6 pages
Anthropology, Bioethics, and Medicine: A Provocative Trilogy
No ratings yet
Anthropology, Bioethics, and Medicine: A Provocative Trilogy
21 pages
CHE413 Assignment Group
No ratings yet
CHE413 Assignment Group
6 pages
Annotated Bib
No ratings yet
Annotated Bib
5 pages
Custom Number Formats Tutorial
No ratings yet
Custom Number Formats Tutorial
2 pages
Tinywow - MY CV - 44831924
No ratings yet
Tinywow - MY CV - 44831924
3 pages
TLC 5971
No ratings yet
TLC 5971
46 pages
Physics 71.1 Presentation
No ratings yet
Physics 71.1 Presentation
17 pages
Elektra 04vncamswitches
No ratings yet
Elektra 04vncamswitches
19 pages
Single Shot Multibox Detector
No ratings yet
Single Shot Multibox Detector
13 pages
Swap Club: #Books
No ratings yet
Swap Club: #Books
21 pages
Infosys - VRIO Analysis Final
No ratings yet
Infosys - VRIO Analysis Final
8 pages
Lidar Technology and Its Applications
No ratings yet
Lidar Technology and Its Applications
10 pages
Combilift Aisle-Master
No ratings yet
Combilift Aisle-Master
6 pages
Cognos Cubes
No ratings yet
Cognos Cubes
8 pages
No Dues Certificate.
No ratings yet
No Dues Certificate.
2 pages

Kmeansclustering Sales Dataset

Uploaded by

Kmeansclustering Sales Dataset

Uploaded by

kmeansclustering-sales-dataset

[1]: import pandas as pd

[2]: # Read the Dataset

[3]: # Create a Copy of the Dataset, we will work on this Copy

[5]: # Drop the Unnecessary Columns

[8]: # Standard Preprocessing

from sklearn.preprocessing import StandardScaler

# This tries to make the Mean 0 and the Standard Deviation as 1

from sklearn.cluster import KMeans

[18]: # Finding k with the Elbow Method

[32]: # Plot the Elbow Plot

[13]: kmeans_model = KMeans(n_clusters=7)

[14]: cluster = kmeans_model.fit_predict(scaled_values)

[33]: # import warnings

[37]: from sklearn.cluster import AgglomerativeClustering

# Dendrogram to visualize hierarchical clustering

# Implement Agglomerative Clustering

[38]: array([0, 0, 4, …, 4, 0, 2], dtype=int64)

[34]: ORDERLINENUMBER SALES Cluster

[2823 rows x 3 columns]

[35]: plt.scatter(df['ORDERLINENUMBER'], df['SALES'],c=df['Cluster'])

[35]: <matplotlib.collections.PathCollection at 0x2407312e9f0>

You might also like