0% found this document useful (0 votes)

6 views11 pages

ML Assignment No 5

Uploaded by

Prathamesh Pimpalkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views11 pages

ML Assignment No 5

Uploaded by

Prathamesh Pimpalkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

# Assignment No = 5

# Name : Prathamesh Dilip Pimpalkar

# Class : TE(IT)
# Roll No : 2233052.
# Batch : C

#importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#reading mall_customers.csv file

df = pd.read_csv("/content/Mall_Customers.csv")
print(df)

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
.. ... ... ... ... ...
195 196 Female 35 120 79
196 197 Female 45 126 28
197 198 Male 32 126 74
198 199 Male 32 137 18
199 200 Male 30 137 83

[200 rows x 5 columns]

#dimentions of the dataset

df.shape

(200, 5)

#names of all attributes

df.columns

Index(['CustomerID', 'Genre', 'Age', 'Annual Income (k$)',

'Spending Score (1-100)'],
dtype='object')

#represent top 5 rows of dataset

df.head()
CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

#represent specific number of top rows of dataset

df.head(10)

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

5 6 Female 22 17 76

6 7 Female 35 18 6

7 8 Female 23 18 94

8 9 Male 64 19 3

9 10 Female 30 19 72

#represent buttom 5 rows of dataset

df.tail()

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

195 196 Female 35 120 79

196 197 Female 45 126 28

197 198 Male 32 126 74

198 199 Male 32 137 18

199 200 Male 30 137 83

#represent specific number of buttom rows of dataset

df.tail(10)
CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

190 191 Female 34 103 23

191 192 Female 32 103 69

192 193 Male 33 113 8

193 194 Female 38 113 91

194 195 Female 47 120 16

195 196 Female 35 120 79

196 197 Female 45 126 28

197 198 Male 32 126 74

198 199 Male 32 137 18

199 200 Male 30 137 83

#display all the return data types

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 200 non-null int64
1 Genre 200 non-null object
2 Age 200 non-null int64
3 Annual Income (k$) 200 non-null int64
4 Spending Score (1-100) 200 non-null int64
dtypes: int64(4), object(1)
memory usage: 7.9+ KB

#display all statastical info

df.describe()
CustomerID Age Annual Income (k$) Spending Score (1-100)

count
#find the 200.000000
missing values 200.000000
in the dataset 200.000000 200.000000
df.isna()
mean 100.500000 38.850000 60.560000 50.200000

std CustomerID
57.879185Genre
13.969007 26.264721
Age Annual Income (k$) Spending Score25.823522
(1-100)
min
0 1.000000 18.000000
False False False 15.000000
False 1.000000
False
25%
1 50.750000 28.750000
False False False 41.500000
False 34.750000
False
50%
2 100.500000 36.000000
False False False 61.500000
False 50.000000
False
75%
3 150.250000 49.000000
False False False 78.000000
False 73.000000
False
max
4 200.000000 70.000000
False False False 137.000000
False 99.000000
False

... ... ... ... ... ...

195 False False False False False

196 False False False False False

197 False False False False False

198 False False False False False

199 False False False False False

200 rows × 5 columns

#to find the total number of missing values in the dataset

df.isna().sum()

CustomerID 0
Genre 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64

#to return all the values from dataset which are equal to zero
df==0
CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 False False False False False

1 False False False False False

2 False False False False False

3 False False False False False

4 False False False False False

... ... ... ... ... ...

195 False False False False False

(df==0).sum()
196 False False False False False
CustomerID
197 False False 0
False False False
Genre 0
Age
198 False False 0
False False False
Annual Income (k$) 0
199
Spending False(1-100)
Score False False
0 False False
dtype: int64
200 rows × 5 columns

(df==0).sum().sum()

#to get values of particular coloumn

df["CustomerID"]

0 1
1 2
2 3
3 4
4 5
...
195 196
196 197
197 198
198 199
199 200
Name: CustomerID, Length: 200, dtype: int64

#to get mean of all values of the coloumn

df["CustomerID"].mean()

100.5

#to get values of particular coloumn

df.loc[4]
CustomerID 5
Genre Female
Age 31
Annual Income (k$) 17
Spending Score (1-100) 40
Name: 4, dtype: object

#to get max of all values of the coloumn

df["CustomerID"].max()

200

#to access specific row and specific coloumn using index locations
df.iloc[3, 4]

#to get specific row and specific coloumn using names

df.loc[2, "CustomerID"]

#to get all rows and all coloumn

df.iloc[:,:]

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

... ... ... ... ... ...

195 196 Female 35 120 79

196 197 Female 45 126 28

197 198 Male 32 126 74

198 199 Male 32 137 18

199 200 Male 30 137 83

200 rows × 5 columns

#to get all coloumn but only one row

df.iloc[1,:]

CustomerID 2
Genre Male
Age 21
Annual Income (k$) 15
Spending Score (1-100) 81
Name: 1, dtype: object

#to get all rows but only one coloumn

df.loc[:,"CustomerID"]

0 1
1 2
2 3
3 4
4 5
...
195 196
196 197
197 198
198 199
199 200
Name: CustomerID, Length: 200, dtype: int64

#to get all rows but only same specific coloumns

df.loc[:,["Age","CustomerID"]]

Age CustomerID

0 19 1

1 21 2

2 20 3

3 23 4

4 31 5

... ... ...

195 35 196

196 45 197

197 32 198

198 32 199

199 30 200

200 rows × 2 columns

#to get same specific rows but all coloumns

df.loc[[0,1,2],:]

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

df.dtypes

CustomerID int64
Genre object
Age int64
Annual Income (k$) int64
Spending Score (1-100) int64
dtype: object

# Extracting Independent Variables

# Here we don't need any dependent variables for data pre-processing step as it is a
# we have no idea about what to determine
# get the 'Annual Income (k$)','Spending score (1-100) features
x = df.iloc[:,[3,4]].values
plt.scatter(df['Annual Income (k$)'],df['Spending Score (1-100)'])

<matplotlib.collections.PathCollection at 0x7f078a127a50>

#finding optimal number of clusters using the elbow method

from sklearn.cluster import KMeans
wcss_list = [] #Initializing the list for values of WCSS
#Using for loop for interactions from 1 to 10
for i in range(1,11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
plt.plot(range(1,11), wcss_list)
plt.title('The Elobw Method Graph')
plt.xlabel('Number of clusters(k')
plt.ylabel('wcss_list')
plt.show()

from sklearn. cluster import KMeans

#training the K-means model on a dataset

kmeans = KMeans (n_clusters = 5, init='k-means++', random_state= 42)
y_predict= kmeans. fit_predict(x)

#centroid
print(" Cluster centroids are \n", kmeans. cluster_centers_)
print(" \n\n predicated clusters for data points are :")
y_predict

Cluster centroids are

[[55.2962963 49.51851852]
[88.2 17.11428571]
[26.30434783 20.91304348]
[25.72727273 79.36363636]
[86.53846154 82.12820513]]

predicated clusters for data points are :

array([2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3,
2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 0,
2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 1, 4, 0, 4, 1, 4, 1, 4,
0, 4, 1, 4, 1, 4, 1, 4, 1, 4, 0, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4,
1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4,
1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4,
1, 4], dtype=int32)
# visulaizing the clusters
plt.scatter(x[y_predict==0, 0], x[y_predict==0, 1], s=100, c='red', label ='Cluster 1' )
plt.scatter(x[y_predict==1, 0], x[y_predict==1, 1], s=100, c='blue', label ='Cluster 2' )
plt.scatter(x[y_predict==2, 0], x[y_predict==2, 1], s=100, c='green', label ='Cluster 3' )
plt.scatter(x[y_predict==3, 0], x[y_predict==3, 1], s=100, c='cyan', label ='Cluster 4' )
plt.scatter(x[y_predict==4, 0], x[y_predict==4, 1], s=100, c='magenta', label ='Cluster 5')

plt.scatter (kmeans . cluster_centers_[ : , 0], kmeans . cluster_centers_[: , 1], s=300, c='y

plt.title('Clusters of Customers')
plt.xlabel('Annual Income(k$)')
plt.ylabel('Spending Score(1-100)')
plt.legend()
plt.show()

# display to which cluster customer belongs

df ['cluster' ]=y_predict
df
CustomerID Genre Age Annual Income (k$) Spending Score (1-100) cluster clust

0 1 Male 19 15 39 2

1 2 Male 21 15 81 3

2 3 Female 20 16 6 2

3 4 Female 23 16 77 3

4 5 Female 31 17 40 2

... ... ... ... ... ... ...

195 196 Female 35 120 79 4

196 197 Female 45 126 28 1

197 198 Male 32 126 74 4

198 199 Male 32 137 18 1

199 200 Male 30 137 83 4

200 rows × 7 columns

Colab paid products - Cancel contracts here

AL ICT Marking Scheme English Medium
No ratings yet
AL ICT Marking Scheme English Medium
5 pages
HiPer HR Owners Manual
No ratings yet
HiPer HR Owners Manual
85 pages
Application Processing Summary: Data Extracted From: Https://serviceonline - Gov.in/uidai/ On 2024-12-02 10:17:48.01
No ratings yet
Application Processing Summary: Data Extracted From: Https://serviceonline - Gov.in/uidai/ On 2024-12-02 10:17:48.01
11 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
All-In-One PLC: KV Nano Application Guide Vol. 8
No ratings yet
All-In-One PLC: KV Nano Application Guide Vol. 8
12 pages
Ch-4 Processor Memory Modeling Using Queuing Theory
100% (2)
Ch-4 Processor Memory Modeling Using Queuing Theory
19 pages
All Auto Product Catalogue-2024
No ratings yet
All Auto Product Catalogue-2024
21 pages
Heirarchical Clustering - Ipynb - Colab
No ratings yet
Heirarchical Clustering - Ipynb - Colab
4 pages
Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
ML Solution
No ratings yet
ML Solution
60 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Exponents and Scientifi C Notation: "The of My Are - The of My Are My ."
No ratings yet
Exponents and Scientifi C Notation: "The of My Are - The of My Are My ."
54 pages
Les Articles Contractés Worksheet
No ratings yet
Les Articles Contractés Worksheet
1 page
ENARSI Chapter 3
No ratings yet
ENARSI Chapter 3
52 pages
Machine Learning Record VR19
No ratings yet
Machine Learning Record VR19
46 pages
Even Students
No ratings yet
Even Students
36 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Guidelines in The Professional Electrical Engineer Licensure Examinations
100% (1)
Guidelines in The Professional Electrical Engineer Licensure Examinations
52 pages
West Rox
No ratings yet
West Rox
29 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
K Means Clustering
100% (1)
K Means Clustering
10 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
No ratings yet
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
40 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
TN 2040 VMware NSX For VSphere
No ratings yet
TN 2040 VMware NSX For VSphere
23 pages
The Role of AI in Financial Services A Bibliometric Analysis
No ratings yet
The Role of AI in Financial Services A Bibliometric Analysis
14 pages
Data Sci
No ratings yet
Data Sci
29 pages
Python Solution
No ratings yet
Python Solution
30 pages
ML Lab
No ratings yet
ML Lab
14 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
DTM - PT-2 (Updated) Question Bank
No ratings yet
DTM - PT-2 (Updated) Question Bank
11 pages
KMEANS
No ratings yet
KMEANS
13 pages
VMware User Environment Manager Application Profiler Administrator's Guide
No ratings yet
VMware User Environment Manager Application Profiler Administrator's Guide
21 pages
Customer Mail Analysis
No ratings yet
Customer Mail Analysis
11 pages
Practical 3
No ratings yet
Practical 3
8 pages
Expt6total.i (2) - JupyterLab
No ratings yet
Expt6total.i (2) - JupyterLab
7 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
Project 13 Customer Segmentation Using K Means Clustering
No ratings yet
Project 13 Customer Segmentation Using K Means Clustering
9 pages
21mic0107 1
No ratings yet
21mic0107 1
7 pages
Exploratory Data Analysis66
No ratings yet
Exploratory Data Analysis66
17 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Fds QB
No ratings yet
Fds QB
6 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
K Means
No ratings yet
K Means
5 pages
Sunbase Data Assignment
No ratings yet
Sunbase Data Assignment
11 pages
Practical 5
No ratings yet
Practical 5
6 pages
Data Science Project VI - Ipynb - Colaboratory
No ratings yet
Data Science Project VI - Ipynb - Colaboratory
15 pages
Btech1010622 Lab4
No ratings yet
Btech1010622 Lab4
4 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
14 Efficient Learning
No ratings yet
14 Efficient Learning
7 pages
ADS2
No ratings yet
ADS2
3 pages
Dharani Dharan 23114016 - Jupyter Notebook
No ratings yet
Dharani Dharan 23114016 - Jupyter Notebook
3 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Justinrhill 2018@
No ratings yet
Justinrhill 2018@
9 pages
Nouns Hindi
No ratings yet
Nouns Hindi
6 pages
Emeet Luna
No ratings yet
Emeet Luna
6 pages
b21 DSBDA Assignment No 3
No ratings yet
b21 DSBDA Assignment No 3
3 pages
Mall Customer Data Analysis PDF
No ratings yet
Mall Customer Data Analysis PDF
10 pages
Data Augmentation
No ratings yet
Data Augmentation
2 pages
Pattern of Development
No ratings yet
Pattern of Development
4 pages
Quick Installation Guide
No ratings yet
Quick Installation Guide
7 pages
Assignment 3 Customer
No ratings yet
Assignment 3 Customer
3 pages
CSTSGTCODE
No ratings yet
CSTSGTCODE
3 pages
Https:chartswap My Salesforce-Sites Com:rrequestview?id a0G3y00000RcsDtEAJ
No ratings yet
Https:chartswap My Salesforce-Sites Com:rrequestview?id a0G3y00000RcsDtEAJ
2 pages
Atlassian Portfolio For Jira Jira Align
No ratings yet
Atlassian Portfolio For Jira Jira Align
6 pages
SK-300 Tube Socket
No ratings yet
SK-300 Tube Socket
2 pages
Craiyon - Your FREE AI Image Generator Tool Create AI Art!
No ratings yet
Craiyon - Your FREE AI Image Generator Tool Create AI Art!
1 page
Practice Questions2
No ratings yet
Practice Questions2
2 pages
Second Term Exam - Writing - Reading and Listening (12.5%)
No ratings yet
Second Term Exam - Writing - Reading and Listening (12.5%)
5 pages
DATASCI112 Midterm Cheat Sheet
No ratings yet
DATASCI112 Midterm Cheat Sheet
2 pages
Pandas Cheatsheet DF
No ratings yet
Pandas Cheatsheet DF
1 page
Mlext
No ratings yet
Mlext
1 page
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Sanchit Sachdeva - Resume
No ratings yet
Sanchit Sachdeva - Resume
2 pages
Titanus Product Line Brochure
No ratings yet
Titanus Product Line Brochure
4 pages
Stargan: Unified Generative Adversarial Networks For Multi-Domain Image-To-Image Translation
No ratings yet
Stargan: Unified Generative Adversarial Networks For Multi-Domain Image-To-Image Translation
15 pages
Voucher-WINNER WIFI ZONE-24H-up-813-11.20.22
No ratings yet
Voucher-WINNER WIFI ZONE-24H-up-813-11.20.22
1 page
Project Element Response: Project Name Today's Date Project Start Date Target Completion Date
No ratings yet
Project Element Response: Project Name Today's Date Project Start Date Target Completion Date
2 pages