Income (K-Means Clustering On A Sample Data Set)

The document outlines a clustering analysis using KMeans on a dataset containing age and income data. After preprocessing the data with MinMaxScaler, the analysis identifies three distinct clusters based on age and income. An elbow plot is generated to confirm that three clusters is the optimal choice for the data.

Uploaded by

namyachawla8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views3 pages

Income (K-Means Clustering On A Sample Data Set)

Uploaded by

namyachawla8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

In [3]:

from sklearn.cluster import KMeans

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
%matplotlib inline
df = pd.read_csv('/Users/atulchawla/Downloads/income.csv')
df.head()
Out[3]:

Name Age Income($)

0 Rob 27 70000

1 Michael 29 90000

2 Mohan 29 61000

3 Ismail 28 60000

4 Kory 42 150000

In [4]:

plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')
Out[4]:

Text(0, 0.5, 'Income($)')

In [5]:
#We can see 3 clear clusters
#Pre-processing data (Scaling income)
scaler = MinMaxScaler()

scaler.fit(df[['Income($)']])
df['Income($)'] = scaler.transform(df[['Income($)']])

scaler.fit(df[['Age']])
df['Age'] = scaler.transform(df[['Age']])
df.head()

Out[5]:

Name Age Income($)

0 Rob 0.058824 0.213675

1 Michael 0.176471 0.384615

2 Mohan
Name 0.176471 0.136752
Age Income($)
3 Ismail 0.117647 0.128205

4 Kory 0.941176 0.897436

In [6]:
plt.scatter(df.Age,df['Income($)'])
Out[6]:
<matplotlib.collections.PathCollection at 0x7fa1615b2cd0>

In [7]:
km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted
Out[7]:
array([0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
dtype=int32)

In [11]:
df['cluster']=y_predicted
df.head(3)
Out[11]:

Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 0

In [12]:
km.cluster_centers_
Out[12]:
array([[0.1372549 , 0.11633428],
[0.85294118, 0.2022792 ],
[0.72268908, 0.8974359 ]])

In [13]:
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',
label='centroid')
plt.legend()
Out[13]:
<matplotlib.legend.Legend at 0x7fa161759c70>

In [14]:
#Elbow Plot Graph verification for number of clusters chosen
sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)
plt.xlabel('K')
plt.ylabel('Sum of squared error')
plt.plot(k_rng,sse)
Out[14]:
[<matplotlib.lines.Line2D at 0x7fa16184a880>]

In [ ]:
#We observe that the elbow point appears at K=3

1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
100% (19)
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
50 pages
Homework 3
No ratings yet
Homework 3
3 pages
Graduate Statistics in Excel Manual 3 S
100% (1)
Graduate Statistics in Excel Manual 3 S
347 pages
Observation and Document Analysis
No ratings yet
Observation and Document Analysis
18 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Kmeans
No ratings yet
Kmeans
5 pages
Kmeans
No ratings yet
Kmeans
4 pages
KMeans Clustering Bidimensional Daniel Ames Camayo
No ratings yet
KMeans Clustering Bidimensional Daniel Ames Camayo
15 pages
1 Kmeans-Pratical-No-1
No ratings yet
1 Kmeans-Pratical-No-1
8 pages
Pa66 ML Exp6
No ratings yet
Pa66 ML Exp6
9 pages
Practical 5
No ratings yet
Practical 5
6 pages
PMA Experiment 2
No ratings yet
PMA Experiment 2
6 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
Report ML 2
No ratings yet
Report ML 2
10 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
Code
No ratings yet
Code
2 pages
Salesforce PD1
No ratings yet
Salesforce PD1
3 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
Kmeansclustering Sales Dataset
No ratings yet
Kmeansclustering Sales Dataset
6 pages
23CC554
No ratings yet
23CC554
10 pages
IMP Hierarchical Clustering
No ratings yet
IMP Hierarchical Clustering
3 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
Lab Assignment 3 Ai
No ratings yet
Lab Assignment 3 Ai
1 page
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Market Analysis by Pchandru
No ratings yet
Market Analysis by Pchandru
10 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
No ratings yet
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
40 pages
Program 7
No ratings yet
Program 7
3 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Data Entry
No ratings yet
Data Entry
4 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Intro Qugates
No ratings yet
Intro Qugates
4 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
ML - Unit-6 KMeans
No ratings yet
ML - Unit-6 KMeans
20 pages
Slip
No ratings yet
Slip
5 pages
Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
Nata Code
No ratings yet
Nata Code
2 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
ML Lab Experiment Shortened With Same Output
No ratings yet
ML Lab Experiment Shortened With Same Output
6 pages
MLT Exp 09
No ratings yet
MLT Exp 09
3 pages
ML File - 1
No ratings yet
ML File - 1
12 pages
ML Week-12
No ratings yet
ML Week-12
7 pages
ML
No ratings yet
ML
23 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
AAM 7th Prac
No ratings yet
AAM 7th Prac
4 pages
21AI71 Module 5 Textbook
No ratings yet
21AI71 Module 5 Textbook
25 pages
Certificate
No ratings yet
Certificate
33 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
K Means Clustering Customer Clustering
No ratings yet
K Means Clustering Customer Clustering
7 pages
Aam Unit 4 QB With Answer
No ratings yet
Aam Unit 4 QB With Answer
11 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Mad, Mse, Mape Formulas
No ratings yet
Mad, Mse, Mape Formulas
18 pages
Apriori Report
No ratings yet
Apriori Report
16 pages
Dissertation Chapter 4 Template
100% (2)
Dissertation Chapter 4 Template
6 pages
Bivariate Analysis
No ratings yet
Bivariate Analysis
10 pages
Baddi Y. Big Data Intelligence For Smart Applications 2022
No ratings yet
Baddi Y. Big Data Intelligence For Smart Applications 2022
343 pages
Uji SPSS Rita
No ratings yet
Uji SPSS Rita
3 pages
4 Regression
No ratings yet
4 Regression
24 pages
More Predictive Analytics. Microsoft Excel (PDFDrive)
No ratings yet
More Predictive Analytics. Microsoft Excel (PDFDrive)
465 pages
Unit 1 Rept
No ratings yet
Unit 1 Rept
61 pages
GSEB Solutions Class 12 Statistics Part 1 Chapter 3 Linear Regre
No ratings yet
GSEB Solutions Class 12 Statistics Part 1 Chapter 3 Linear Regre
33 pages
Earnings Management and Financial Performance of Quoted Deposit Money Banks in Nigeria
No ratings yet
Earnings Management and Financial Performance of Quoted Deposit Money Banks in Nigeria
9 pages
Embry 7.2 Assignment
No ratings yet
Embry 7.2 Assignment
2 pages
Impact of Electronic Banking
0% (1)
Impact of Electronic Banking
16 pages
Company CV Siddhant Rawat
No ratings yet
Company CV Siddhant Rawat
2 pages
DA Lab Manual
No ratings yet
DA Lab Manual
60 pages
1 The Table Below Shows The Results of Multiple Regression Analysis To Identify The Influence of Independent Variables
No ratings yet
1 The Table Below Shows The Results of Multiple Regression Analysis To Identify The Influence of Independent Variables
2 pages
Univariate Bivariavte Multivariate
No ratings yet
Univariate Bivariavte Multivariate
10 pages
10 Things Know Before First Data Science Project
No ratings yet
10 Things Know Before First Data Science Project
8 pages
Workerareportaje
No ratings yet
Workerareportaje
23 pages
WBS-2-Operations Analytics-W1S5-Practice-Problems-Solutions
No ratings yet
WBS-2-Operations Analytics-W1S5-Practice-Problems-Solutions
6 pages
Vertical Progression MATHEMATICS - FIRST QUARTER
No ratings yet
Vertical Progression MATHEMATICS - FIRST QUARTER
3 pages
MBA Capstone Business Data Management
No ratings yet
MBA Capstone Business Data Management
11 pages
Multiple Regression MS
No ratings yet
Multiple Regression MS
35 pages
Time Series Analysis - COMPLETE
No ratings yet
Time Series Analysis - COMPLETE
15 pages
MBA Project Report Guideline
No ratings yet
MBA Project Report Guideline
71 pages
Econometrics Midterms Test BFT 64th
No ratings yet
Econometrics Midterms Test BFT 64th
5 pages
For 7 TH Sem AIML4 ABC
No ratings yet
For 7 TH Sem AIML4 ABC
4 pages