0% found this document useful (0 votes)

39 views10 pages

K Medoids

This document discusses using k-medoids clustering to cluster housing data. The data is first preprocessed by standardizing it. KMedoids clustering from the scikit-learn-extra library is then applied to cluster the data into 3 clusters. The cluster labels are then added back to the original data frame.

Uploaded by

prerna sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views10 pages

K Medoids

Uploaded by

prerna sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

k-medoids

February 29, 2024

[1]: import numpy as np

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

[9]: df = pd.read_csv("kc_house_data.csv")
df.head()

[9]: id date price bedrooms bathrooms sqft_living \

0 7129300520 20141013T000000 221900.0 3 1.00 1180
1 6414100192 20141209T000000 538000.0 3 2.25 2570
2 5631500400 20150225T000000 180000.0 2 1.00 770
3 2487200875 20141209T000000 604000.0 4 3.00 1960
4 1954400510 20150218T000000 510000.0 3 2.00 1680

sqft_lot floors waterfront view … grade sqft_above sqft_basement \

0 5650 1.0 0 0 … 7 1180 0
1 7242 2.0 0 0 … 7 2170 400
2 10000 1.0 0 0 … 6 770 0
3 5000 1.0 0 0 … 7 1050 910
4 8080 1.0 0 0 … 8 1680 0

yr_built yr_renovated zipcode lat long sqft_living15 \

0 1955 0 98178 47.5112 -122.257 1340
1 1951 1991 98125 47.7210 -122.319 1690
2 1933 0 98028 47.7379 -122.233 2720
3 1965 0 98136 47.5208 -122.393 1360
4 1987 0 98074 47.6168 -122.045 1800

sqft_lot15
0 5650
1 7639
2 8062
3 5000
4 7503

1
[5 rows x 21 columns]

[10]: df.describe()

[10]: id price bedrooms bathrooms sqft_living \

count 2.161300e+04 2.161300e+04 21613.000000 21613.000000 21613.000000
mean 4.580302e+09 5.400881e+05 3.370842 2.114757 2079.899736
std 2.876566e+09 3.671272e+05 0.930062 0.770163 918.440897
min 1.000102e+06 7.500000e+04 0.000000 0.000000 290.000000
25% 2.123049e+09 3.219500e+05 3.000000 1.750000 1427.000000
50% 3.904930e+09 4.500000e+05 3.000000 2.250000 1910.000000
75% 7.308900e+09 6.450000e+05 4.000000 2.500000 2550.000000
max 9.900000e+09 7.700000e+06 33.000000 8.000000 13540.000000

sqft_lot floors waterfront view condition \

count 2.161300e+04 21613.000000 21613.000000 21613.000000 21613.000000
mean 1.510697e+04 1.494309 0.007542 0.234303 3.409430
std 4.142051e+04 0.539989 0.086517 0.766318 0.650743
min 5.200000e+02 1.000000 0.000000 0.000000 1.000000
25% 5.040000e+03 1.000000 0.000000 0.000000 3.000000
50% 7.618000e+03 1.500000 0.000000 0.000000 3.000000
75% 1.068800e+04 2.000000 0.000000 0.000000 4.000000
max 1.651359e+06 3.500000 1.000000 4.000000 5.000000

grade sqft_above sqft_basement yr_built yr_renovated \

count 21613.000000 21613.000000 21613.000000 21613.000000 21613.000000
mean 7.656873 1788.390691 291.509045 1971.005136 84.402258
std 1.175459 828.090978 442.575043 29.373411 401.679240
min 1.000000 290.000000 0.000000 1900.000000 0.000000
25% 7.000000 1190.000000 0.000000 1951.000000 0.000000
50% 7.000000 1560.000000 0.000000 1975.000000 0.000000
75% 8.000000 2210.000000 560.000000 1997.000000 0.000000
max 13.000000 9410.000000 4820.000000 2015.000000 2015.000000

zipcode lat long sqft_living15 sqft_lot15

count 21613.000000 21613.000000 21613.000000 21613.000000 21613.000000
mean 98077.939805 47.560053 -122.213896 1986.552492 12768.455652
std 53.505026 0.138564 0.140828 685.391304 27304.179631
min 98001.000000 47.155900 -122.519000 399.000000 651.000000
25% 98033.000000 47.471000 -122.328000 1490.000000 5100.000000
50% 98065.000000 47.571800 -122.230000 1840.000000 7620.000000
75% 98118.000000 47.678000 -122.125000 2360.000000 10083.000000
max 98199.000000 47.777600 -121.315000 6210.000000 871200.000000

[11]: df.drop(15870, axis = 0, inplace = True)

df.reset_index(drop=True, inplace = True)

2
df.shape

[11]: (21612, 21)

[12]: df[df.columns[df.isnull().sum()>0]].isnull().sum()

[12]: Series([], dtype: float64)

[13]: pip install scikit-learn-extra

Collecting scikit-learn-extra
Downloading scikit_learn_extra-0.3.0-cp311-cp311-win_amd64.whl.metadata (3.7
kB)
Requirement already satisfied: numpy>=1.13.3 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
scikit-learn-extra) (1.26.0)
Requirement already satisfied: scipy>=0.19.1 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
scikit-learn-extra) (1.11.4)
Requirement already satisfied: scikit-learn>=0.23.0 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
scikit-learn-extra) (1.3.2)
Requirement already satisfied: joblib>=1.1.1 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
scikit-learn>=0.23.0->scikit-learn-extra) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
scikit-learn>=0.23.0->scikit-learn-extra) (3.2.0)
Downloading scikit_learn_extra-0.3.0-cp311-cp311-win_amd64.whl (340 kB)
---------------------------------------- 0.0/340.5 kB ? eta -:--:--
- -------------------------------------- 10.2/340.5 kB ? eta -:--:--
- -------------------------------------- 10.2/340.5 kB ? eta -:--:--
--- ----------------------------------- 30.7/340.5 kB 325.1 kB/s eta 0:00:01
------------- ------------------------ 122.9/340.5 kB 798.9 kB/s eta 0:00:01
---------------------------------------- 340.5/340.5 kB 1.8 MB/s eta 0:00:00
Installing collected packages: scikit-learn-extra
Successfully installed scikit-learn-extra-0.3.0
Note: you may need to restart the kernel to use updated packages.

[14]: df.drop(['date', 'id'], axis = 1, inplace = True)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
Clus_dataSet = scaler.fit_transform(df)
Clus_dataSet

[14]: array([[-0.86668617, -0.40692359, -1.44745951, …, -0.30611525,

-0.94339773, -0.26072358],

3
[-0.00567521, -0.40692359, 0.17558163, …, -0.74637458,
-0.43272969, -0.18787744],
[-0.98081575, -1.50829275, -1.44745951, …, -0.13569228,
1.07009338, -0.17238527],
…,
[-0.37584455, -1.50829275, -1.77206774, …, -0.60435544,
-1.41029422, -0.39414664],
[-0.38156737, -0.40692359, 0.50018986, …, 1.02886466,
-0.84126412, -0.42051628],
[-0.58585659, -1.50829275, -1.77206774, …, -0.60435544,
-1.41029422, -0.41795257]])

[15]: from sklearn_extra.cluster import KMedoids

kmedoids = KMedoids(n_clusters=3).fit(Clus_dataSet)

[16]: df.insert(0, 'kmedoids Cluster Labels', kmedoids.labels_)

df.head()

[16]: kmedoids Cluster Labels price bedrooms bathrooms sqft_living \

0 1 221900.0 3 1.00 1180
1 0 538000.0 3 2.25 2570
2 2 180000.0 2 1.00 770
3 0 604000.0 4 3.00 1960
4 2 510000.0 3 2.00 1680

sqft_lot floors waterfront view condition grade sqft_above \

0 5650 1.0 0 0 3 7 1180
1 7242 2.0 0 0 3 7 2170
2 10000 1.0 0 0 3 6 770
3 5000 1.0 0 0 5 7 1050
4 8080 1.0 0 0 3 8 1680

sqft_basement yr_built yr_renovated zipcode lat long \

0 0 1955 0 98178 47.5112 -122.257
1 400 1951 1991 98125 47.7210 -122.319
2 0 1933 0 98028 47.7379 -122.233
3 910 1965 0 98136 47.5208 -122.393
4 0 1987 0 98074 47.6168 -122.045

sqft_living15 sqft_lot15
0 1340 5650
1 1690 7639
2 2720 8062
3 1360 5000
4 1800 7503

4
[17]: X = df.loc[:, df.columns != 'kmedoids Cluster Labels']
X.head()

[17]: price bedrooms bathrooms sqft_living sqft_lot floors waterfront \

0 221900.0 3 1.00 1180 5650 1.0 0
1 538000.0 3 2.25 2570 7242 2.0 0
2 180000.0 2 1.00 770 10000 1.0 0
3 604000.0 4 3.00 1960 5000 1.0 0
4 510000.0 3 2.00 1680 8080 1.0 0

view condition grade sqft_above sqft_basement yr_built yr_renovated \

0 0 3 7 1180 0 1955 0
1 0 3 7 2170 400 1951 1991
2 0 3 6 770 0 1933 0
3 0 5 7 1050 910 1965 0
4 0 3 8 1680 0 1987 0

zipcode lat long sqft_living15 sqft_lot15

0 98178 47.5112 -122.257 1340 5650
1 98125 47.7210 -122.319 1690 7639
2 98028 47.7379 -122.233 2720 8062
3 98136 47.5208 -122.393 1360 5000
4 98074 47.6168 -122.045 1800 7503

[18]: from sklearn import preprocessing

X= preprocessing.StandardScaler().fit(X).transform(X)
X[0:5]

[18]: array([[-0.86668617, -0.40692359, -1.44745951, -0.97984121, -0.22832648,

-0.91546593, -0.08717466, -0.3057672 , -0.62914619, -0.55885272,
-0.73474634, -0.65864212, -0.5449314 , -0.21013346, 1.87013949,
-0.35252787, -0.30611525, -0.94339773, -0.26072358],
[-0.00567521, -0.40692359, 0.17558163, 0.53360192, -0.18989137,
0.93645991, -0.08717466, -0.3057672 , -0.62914619, -0.55885272,
0.46079706, 0.2451683 , -0.68111108, 4.74656291, 0.87957332,
1.16160686, -0.74637458, -0.43272969, -0.18787744],
[-0.98081575, -1.50829275, -1.44745951, -1.42625249, -0.12330593,
-0.91546593, -0.08717466, -0.3057672 , -0.62914619, -1.40959054,
-1.22987038, -0.65864212, -1.29391966, -0.21013346, -0.93334967,
1.28357482, -0.13569228, 1.07009338, -0.17238527],
[ 0.17409931, 0.69444556, 1.14940631, -0.13057096, -0.2440192 ,
-0.91546593, -0.08717466, -0.3057672 , 2.44468843, -0.55885272,
-0.89173689, 1.39752658, -0.20448219, -0.21013346, 1.08516253,
-0.28324429, -1.2718454 , -0.9142167 , -0.2845295 ],
[-0.08194318, -0.40692359, -0.1490266 , -0.4354372 , -0.16965983,
-0.91546593, -0.08717466, -0.3057672 , -0.62914619, 0.29188511,
-0.13093654, -0.65864212, 0.54450607, -0.21013346, -0.07361299,

5
0.40959143, 1.19928763, -0.27223402, -0.19285837]])

[19]: y = df["kmedoids Cluster Labels"]

y.head()

[19]: 0 1
1 0
2 2
3 0
4 2
Name: kmedoids Cluster Labels, dtype: int64

[20]: pip install pywaffle

Collecting pywaffleNote: you may need to restart the kernel to use updated
packages.

Downloading pywaffle-1.1.0-py2.py3-none-any.whl.metadata (2.6 kB)

Collecting fontawesomefree (from pywaffle)
Downloading fontawesomefree-6.5.1-py3-none-any.whl.metadata (824 bytes)
Requirement already satisfied: matplotlib in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
pywaffle) (3.8.0)
Requirement already satisfied: contourpy>=1.0.1 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (1.1.1)
Requirement already satisfied: cycler>=0.10 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (4.43.1)
Requirement already satisfied: kiwisolver>=1.0.1 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (1.4.5)
Requirement already satisfied: numpy<2,>=1.21 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (1.26.0)
Requirement already satisfied: packaging>=20.0 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (23.2)
Requirement already satisfied: pillow>=6.2.0 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (3.1.1)

6
Requirement already satisfied: python-dateutil>=2.7 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
matplotlib->pywaffle) (2.8.2)
Requirement already satisfied: six>=1.5 in
c:\users\lenovo\appdata\local\programs\python\python311\lib\site-packages (from
python-dateutil>=2.7->matplotlib->pywaffle) (1.16.0)
Downloading pywaffle-1.1.0-py2.py3-none-any.whl (30 kB)
Downloading fontawesomefree-6.5.1-py3-none-any.whl (25.6 MB)
---------------------------------------- 0.0/25.6 MB ? eta -:--:--
---------------------------------------- 0.0/25.6 MB 991.0 kB/s eta 0:00:26
--------------------------------------- 0.3/25.6 MB 4.2 MB/s eta 0:00:07
- -------------------------------------- 0.8/25.6 MB 6.5 MB/s eta 0:00:04
-- ------------------------------------- 1.3/25.6 MB 7.7 MB/s eta 0:00:04
--- ------------------------------------ 2.4/25.6 MB 10.9 MB/s eta 0:00:03
---- ----------------------------------- 3.2/25.6 MB 11.9 MB/s eta 0:00:02
----- ---------------------------------- 3.8/25.6 MB 12.1 MB/s eta 0:00:02
------- -------------------------------- 4.6/25.6 MB 12.7 MB/s eta 0:00:02
------- -------------------------------- 5.1/25.6 MB 12.6 MB/s eta 0:00:02
-------- ------------------------------- 5.7/25.6 MB 12.6 MB/s eta 0:00:02
---------- ----------------------------- 6.4/25.6 MB 12.9 MB/s eta 0:00:02
----------- ---------------------------- 7.1/25.6 MB 12.6 MB/s eta 0:00:02
----------- ---------------------------- 7.7/25.6 MB 12.9 MB/s eta 0:00:02
------------- -------------------------- 8.5/25.6 MB 13.0 MB/s eta 0:00:02
-------------- ------------------------- 9.0/25.6 MB 13.1 MB/s eta 0:00:02
-------------- ------------------------- 9.0/25.6 MB 13.1 MB/s eta 0:00:02
-------------- ------------------------- 9.0/25.6 MB 13.1 MB/s eta 0:00:02
---------------- ----------------------- 10.7/25.6 MB 13.9 MB/s eta 0:00:02
----------------- ---------------------- 11.3/25.6 MB 14.2 MB/s eta 0:00:02
------------------ --------------------- 11.8/25.6 MB 13.6 MB/s eta 0:00:02
------------------- -------------------- 12.3/25.6 MB 13.1 MB/s eta 0:00:02
------------------- -------------------- 12.8/25.6 MB 13.1 MB/s eta 0:00:01
--------------------- ------------------ 13.5/25.6 MB 12.8 MB/s eta 0:00:01
--------------------- ------------------ 13.9/25.6 MB 12.9 MB/s eta 0:00:01
---------------------- ----------------- 14.6/25.6 MB 12.6 MB/s eta 0:00:01
----------------------- ---------------- 14.9/25.6 MB 12.6 MB/s eta 0:00:01
------------------------ --------------- 15.8/25.6 MB 12.6 MB/s eta 0:00:01
------------------------- -------------- 16.0/25.6 MB 12.1 MB/s eta 0:00:01
------------------------- -------------- 16.6/25.6 MB 12.4 MB/s eta 0:00:01
--------------------------- ------------ 17.4/25.6 MB 12.1 MB/s eta 0:00:01
--------------------------- ------------ 17.9/25.6 MB 12.1 MB/s eta 0:00:01
---------------------------- ----------- 18.5/25.6 MB 11.9 MB/s eta 0:00:01
----------------------------- ---------- 19.0/25.6 MB 11.7 MB/s eta 0:00:01
------------------------------ --------- 19.7/25.6 MB 13.1 MB/s eta 0:00:01
------------------------------- -------- 20.4/25.6 MB 12.6 MB/s eta 0:00:01
-------------------------------- ------- 21.0/25.6 MB 12.1 MB/s eta 0:00:01
--------------------------------- ------ 21.6/25.6 MB 12.4 MB/s eta 0:00:01
---------------------------------- ----- 22.2/25.6 MB 12.4 MB/s eta 0:00:01
----------------------------------- ---- 22.9/25.6 MB 12.4 MB/s eta 0:00:01

7
------------------------------------ --- 23.6/25.6 MB 12.6 MB/s eta 0:00:01
------------------------------------- -- 24.2/25.6 MB 12.8 MB/s eta 0:00:01
-------------------------------------- - 24.9/25.6 MB 13.1 MB/s eta 0:00:01
--------------------------------------- 25.6/25.6 MB 13.1 MB/s eta 0:00:01
--------------------------------------- 25.6/25.6 MB 13.1 MB/s eta 0:00:01
--------------------------------------- 25.6/25.6 MB 13.1 MB/s eta 0:00:01
--------------------------------------- 25.6/25.6 MB 13.1 MB/s eta 0:00:01
--------------------------------------- 25.6/25.6 MB 13.1 MB/s eta 0:00:01
--------------------------------------- 25.6/25.6 MB 13.1 MB/s eta 0:00:01
---------------------------------------- 25.6/25.6 MB 9.6 MB/s eta 0:00:00
Installing collected packages: fontawesomefree, pywaffle
Successfully installed fontawesomefree-6.5.1 pywaffle-1.1.0

[21]: Count = df.groupby(["kmedoids Cluster Labels"], as_index=False).

↪count()[["kmedoids Cluster Labels", "price"]]

Count.columns = ["kmedoids Cluster Labels", "Count"]

Count

[21]: kmedoids Cluster Labels Count

0 0 6776
1 1 7964
2 2 6872

[23]: from pywaffle import Waffle

fig = plt.figure(
FigureClass=Waffle,
figsize=(6, 8),
rows=5,
values=list(Count.Count/150),
colors=("magenta", "yellow", "cyan"),
legend={'loc': 'upper left', 'bbox_to_anchor': (1, 1)},
icons='sticky-note', icon_size=18,
icon_legend=True,
title={'label': 'Number of Houses in each K-medoids Cluster', 'loc':␣
↪'center'},

labels=list(Count['kmedoids Cluster Labels']))

8
9
[24]: labels = df.groupby(["kmedoids Cluster Labels"], as_index=False).
↪mean()[["kmedoids Cluster Labels", "price"]]

labels

[24]: kmedoids Cluster Labels price

0 0 783367.394923
1 1 327354.355977
2 2 546731.293510

[ ]:

Productivity vs. Uncertainty & Apathy - Book of Hook
No ratings yet
Productivity vs. Uncertainty & Apathy - Book of Hook
3 pages
House Rent Prediction EDA
No ratings yet
House Rent Prediction EDA
35 pages
How ISO 27001 Can Help Achieve GDPR Compliance
No ratings yet
How ISO 27001 Can Help Achieve GDPR Compliance
7 pages
OANDA Exchange Rate With D365FO
No ratings yet
OANDA Exchange Rate With D365FO
6 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
Ads Exp5 Code
No ratings yet
Ads Exp5 Code
2 pages
Data Analysis With Python - Jupyter Notebook
No ratings yet
Data Analysis With Python - Jupyter Notebook
10 pages
Use The Method Value - Counts To Count The Number O...
No ratings yet
Use The Method Value - Counts To Count The Number O...
3 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
Housing Prices Notebook
No ratings yet
Housing Prices Notebook
14 pages
BCA 5th Sem Lab (ML)
No ratings yet
BCA 5th Sem Lab (ML)
20 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
1684918425867
No ratings yet
1684918425867
14 pages
EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
No ratings yet
EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
9 pages
Housing Prices Linear Regression
No ratings yet
Housing Prices Linear Regression
3 pages
Data Cleaning EDA
No ratings yet
Data Cleaning EDA
5 pages
Sales Data Clustering
No ratings yet
Sales Data Clustering
15 pages
DMV - 3 - Jupyter Notebook
No ratings yet
DMV - 3 - Jupyter Notebook
2 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
A
No ratings yet
A
2 pages
Ex 1
No ratings yet
Ex 1
119 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Practical 5
No ratings yet
Practical 5
6 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
ML Lab
No ratings yet
ML Lab
8 pages
Housing Main
No ratings yet
Housing Main
23 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Report
No ratings yet
Report
40 pages
Information Regarding Sales Made in Real Estate in A Tabular Format
No ratings yet
Information Regarding Sales Made in Real Estate in A Tabular Format
13 pages
Week 12
No ratings yet
Week 12
2 pages
Deep Learning - House Price Prediction
No ratings yet
Deep Learning - House Price Prediction
17 pages
Exp 10
No ratings yet
Exp 10
1 page
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
Eda Project
No ratings yet
Eda Project
28 pages
Emllab
No ratings yet
Emllab
6 pages
Final DA LAB1 Merged
No ratings yet
Final DA LAB1 Merged
48 pages
Kaggle House Prices Advanced Regression Techniques
No ratings yet
Kaggle House Prices Advanced Regression Techniques
87 pages
IndianHouses 1695069727
No ratings yet
IndianHouses 1695069727
7 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
8 pages
ML Lab 1-5
No ratings yet
ML Lab 1-5
5 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
Exercise6 Solution
No ratings yet
Exercise6 Solution
8 pages
Spectral Clustering
No ratings yet
Spectral Clustering
5 pages
Código K-Means en Spyder
No ratings yet
Código K-Means en Spyder
3 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
DSM 1
No ratings yet
DSM 1
6 pages
Docu 4
No ratings yet
Docu 4
3 pages
R Prerequisite1
No ratings yet
R Prerequisite1
4 pages
DSM 3
No ratings yet
DSM 3
6 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
K Means Clustering
100% (1)
K Means Clustering
10 pages
Delhi House Price Prediction 1692019997
No ratings yet
Delhi House Price Prediction 1692019997
34 pages
A List of Factorial Math Constants
From Everand
A List of Factorial Math Constants
Archive Classics
No ratings yet
1 Digital Generation-HRY
No ratings yet
1 Digital Generation-HRY
23 pages
LC Install 101
No ratings yet
LC Install 101
203 pages
Microcontrolador msp430g2231
No ratings yet
Microcontrolador msp430g2231
61 pages
Profibus DP & Pa
No ratings yet
Profibus DP & Pa
5 pages
Wachemo University DEPARTEMENT OF Electrical and Computer Engineering School of Post Graduates
No ratings yet
Wachemo University DEPARTEMENT OF Electrical and Computer Engineering School of Post Graduates
26 pages
Justify Your Training SEC401
No ratings yet
Justify Your Training SEC401
3 pages
Investor Self Service Application
No ratings yet
Investor Self Service Application
4 pages
Structured Analysis and Design
No ratings yet
Structured Analysis and Design
87 pages
NORMALIZATION
No ratings yet
NORMALIZATION
11 pages
(Electronic Maintenance of Rural Roads Under PMGSY) Faqs: Emarg
No ratings yet
(Electronic Maintenance of Rural Roads Under PMGSY) Faqs: Emarg
2 pages
Verilog HDL Tutorial: Winter 2003
No ratings yet
Verilog HDL Tutorial: Winter 2003
21 pages
Installation D'exchange 2010 (Tuto de A À Z)
92% (13)
Installation D'exchange 2010 (Tuto de A À Z)
27 pages
PracticeSet Test 4
No ratings yet
PracticeSet Test 4
4 pages
OWASP Top 10 A Guide To Common Web Application Security Vulnerabilities
No ratings yet
OWASP Top 10 A Guide To Common Web Application Security Vulnerabilities
11 pages
(22AR1F0041) Criminal Identification Using ML Final Documentation
No ratings yet
(22AR1F0041) Criminal Identification Using ML Final Documentation
81 pages
Trane 1050 Thermostat User Guide
No ratings yet
Trane 1050 Thermostat User Guide
32 pages
Homework Cloud Computing 2
No ratings yet
Homework Cloud Computing 2
1 page
Ise Passive Identity Connector Ds
No ratings yet
Ise Passive Identity Connector Ds
7 pages
DICT Satellite Services Survey Questionnaire For Local Government
No ratings yet
DICT Satellite Services Survey Questionnaire For Local Government
19 pages
Moxa Nport 5100 Series Datasheet v1.5
No ratings yet
Moxa Nport 5100 Series Datasheet v1.5
5 pages
Presentation of MIET For Engineering and IT Colleges
No ratings yet
Presentation of MIET For Engineering and IT Colleges
36 pages
Gui Lab8
No ratings yet
Gui Lab8
4 pages
Staff Attendance
No ratings yet
Staff Attendance
55 pages
QoS - Linux - NSM - Introduction
No ratings yet
QoS - Linux - NSM - Introduction
62 pages
Log pc51006
No ratings yet
Log pc51006
1,116 pages
Smart Mailing System For Blind People
No ratings yet
Smart Mailing System For Blind People
5 pages
Design of Embedded Video Monitoring System Based On ARM9
No ratings yet
Design of Embedded Video Monitoring System Based On ARM9
24 pages

K Medoids

Uploaded by

K Medoids

Uploaded by

k-medoids

February 29, 2024

[1]: import numpy as np

[9]: id date price bedrooms bathrooms sqft_living \

sqft_lot floors waterfront view … grade sqft_above sqft_basement \

yr_built yr_renovated zipcode lat long sqft_living15 \

[10]: id price bedrooms bathrooms sqft_living \

sqft_lot floors waterfront view condition \

grade sqft_above sqft_basement yr_built yr_renovated \

zipcode lat long sqft_living15 sqft_lot15

[11]: df.drop(15870, axis = 0, inplace = True)

[11]: (21612, 21)

[12]: Series([], dtype: float64)

[13]: pip install scikit-learn-extra

[14]: df.drop(['date', 'id'], axis = 1, inplace = True)

[14]: array([[-0.86668617, -0.40692359, -1.44745951, …, -0.30611525,

[15]: from sklearn_extra.cluster import KMedoids

[16]: df.insert(0, 'kmedoids Cluster Labels', kmedoids.labels_)

[16]: kmedoids Cluster Labels price bedrooms bathrooms sqft_living \

sqft_lot floors waterfront view condition grade sqft_above \

sqft_basement yr_built yr_renovated zipcode lat long \

[17]: price bedrooms bathrooms sqft_living sqft_lot floors waterfront \

view condition grade sqft_above sqft_basement yr_built yr_renovated \

zipcode lat long sqft_living15 sqft_lot15

[18]: from sklearn import preprocessing

[18]: array([[-0.86668617, -0.40692359, -1.44745951, -0.97984121, -0.22832648,

[19]: y = df["kmedoids Cluster Labels"]

[20]: pip install pywaffle

Downloading pywaffle-1.1.0-py2.py3-none-any.whl.metadata (2.6 kB)

[21]: Count = df.groupby(["kmedoids Cluster Labels"], as_index=False).

Count.columns = ["kmedoids Cluster Labels", "Count"]

[21]: kmedoids Cluster Labels Count

[23]: from pywaffle import Waffle

labels=list(Count['kmedoids Cluster Labels']))

[24]: kmedoids Cluster Labels price

You might also like