ML Lab

This project analyzes the AMESHOUSING3 dataset to identify clusters of houses based on their features and sale prices using clustering techniques. The analysis revealed four distinct groups of houses, with overall quality, size, and year built being significant indicators of pricing. The client is encouraged to use these insights for tailored pricing strategies to enhance market segmentation and sales performance.

Uploaded by

radhikaasinghal2307

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

ML Lab

Uploaded by

radhikaasinghal2307

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 8

418125, 9:7 PM In [2]: lab 11 mi-Jupyter Notebook INTRODUCTION The real estate market often includes houses with varying characteristics, making it challenging to analyze patterns across sales. The client seeks to identify natural groupings within the AMESHOUSING3 dataset to better understand the types of houses sold. This project aims to apply clustering techniques to housing feature data to identify meaningful clusters and relate them to the sale price of houses. Understanding these clusters can help in pricing, marketing, and investment decisions in the real estate domain. WM #Load the dataset and perform initial data cleaning and exploration. import pandas as pd # Load the dataset df = pd.read_csv("ameshousing3.csv" df.head() out [2]: In [4]: Obs PID Lot Area House Style Ovorall_ual Overall Cond Yoar_Built Heating 04 s27t27150 4920 4Story e 5 2001 12 527145080 5005 18tory 8 5 1092 23. 527425090 10500 18tory 4 5 tet 34 526228285 3203 18tory 7 5 2006 45 s28250100 750 stvl 7 5 2000 5 rows « 31 columns 6D » WM #checking for the missing values df. isnul1().sum().sort_values(ascending=False) .head(10) df = d#.dropna() localhost 8888inotebookslab 11 mlipynbit 84118125, 9:37 PM. In [5]: out [5]: In [7]: In [9]: In [12]: In [ LU) ” ” lab 11 mi-Jupyter Notebook df .describe() Fireplaces Garage_Area .. Basement Area Full Bathroom Half Bathroom Tots 2 286.0000 266.000000 266.000000 266.000000 266,000000 266.000 9 0.413534 406,394737 903,462406 1.887970 0.283158 1742 1 0877932 132.794856 387.743516 0.670403, 0.457965, o.sese 0.000000 164.0000 0.000000, 1.000000 0.000000, 1.000¢ 2 0.000000 288,000000 714,000000 +,900000 0.000000 1.000¢ 2 0.000000 400.0000 ‘924.5000 2.000000, 0.000000 2.000¢ ) 1,000 504,000000 1152.750000 2,000000 1.900000 2,000¢ 2 2.000000 —902.,000000 +1645.000000 4.900000 2.000000, 4.t00¢ 4 ED » select relevant numeric variables based on their significance. features = [‘Lot_Area’, ‘Overall Qual’, ‘Year_Built', ‘Gr_Liv Area’) df_selected = df[features + [‘SalePrice']] #ormalize the selected features for clustering. from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaled_data = scaler. fit_transform(df_selected[features]) #Convert the data to ARFF format to use in WEKA. from scipy.io import arff import numpy as np # Convert to DataFrane first to save to ARFF df_scaled = pd.DataFrane(scaled_data, colunns-features) df_scaled.to_csv("housing_scaled.csv", index=False) # Manually convert this CSV to ARFF using WEKA's CSVLoader Load the ARFF file into WEKA and apply the k-means clustering algorithm localhost 8888inotebookslab 11 mlipynbitsars825, 9:7 PM lab 1 mi- duper Notebook In [13]: M #Visualize the distributions of key features. import seaborn as sns import matplotlib.pyplot as plt sns.histplot(df['Gr_Liv Area’], kde-True) plt.title("Distribution of Above Ground Living Area”) plt.show() Distribution of Above Ground Living Area Count 200 1000 1200 Gr_Liv_Area Cluster Analysis Process localhost 8888/notebookslab 11 mlipynb418125, 9:7 PM In [16]: In [17]: lab 11 mi-Jupyter Notebook from sklearn.cluster import KMeans # Choose number of clusters (e.g., k=4) kmeans = kMeans(n_clusters=4, random_state=42) clusters = kmeans. fit_predict(scaled_data) # Add cluster Labels back to your original dataframe df_selected[ ‘Cluster'] = clusters C:\users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of “n_init” will chan ge from 1@ to ‘auto’ in 1.4. Set the value of “n_init” explicitly to sup press the warning super()._check_params_vs_input(x, default_n_init=10) C: \Users\Admin\AppData\Local \Temp\ipykernel_14692\11853177.. py: WithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead : Setting See the caveats in the documentation: https: //pandas.pydata.org/pandas-d ocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy (http s://pandas..pydata.org/pandas-docs/stable/user_guide/indexing.html#return ing-a-view-versus-a-copy) df_selected[ ‘Cluster’] = clusters # See the average values for each feature within each cluster print (df_selected.groupby( ‘Cluster’ ).mean()) LotArea Overall_qual Year suilt Gr_livArea SalleP rice Cluster e 7381.975000 5.800000 1929.15000@ 1285.075000 126851.35 e000 1 7844,548387 «4.623656 1953.666667 924.8602 115198.92 4731 2 11530.171429 5.557143 1967.128571 1249.242857 157602..14 2857 3 6001.301587 6.761905 1995.714286 1251.190476 171197.31 7460 localhost 8888inotebookslab 11 mlipyabil ais4118125, 9:37 PM. lab 11 mi-Jupyter Notebook In [18]: MW sns.boxplot(x="Cluster', y="SalePrice’, data-df_selected) plt.title("Sale Price Distribution by Cluster") plt.show() Sale Price Distribution by Cluster 300000 ‘ 250000 — ’ $ 200000, SalePrice | | 150000 100000 a SE 50000 ‘ ¢ ° 1 2 3 Cluster localhost 8888inotebookslab 11 mlipynbit418125, 9:7 PM In [19]: lab 11 mi-Jupyter Notebook M inertia k_range is range(1, 10) for k in k_range: km = kKMeans(n_clusters= km. fit (scaled_data) inertia. append(km.inertia_) random_state=42) # Plot the elbow curve plt.plot(k_range, inertia, marker- plt.xlabel("Number of Clusters") plt.ylabel("Inertia") plt.title("Elbow Method - Optimal k") plt.show() 0") localhost 8888inotebookslab 11 mlipyabil 58418125, 9:7 PM lab 11 mi-Jupyter Notebook C:\users\Adnin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: Futurewarning: The default value of “n_init” will chan ge from 1@ to ‘auto’ in 1.4. Set the value of “n_init’ explicitly to sup press the warning super()._check_params_vs_input(X, default_n_init=10) C:\users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of “n_init” will chan ge from 1@ to ‘auto’ in 1.4, Set the value of “n_init” explicitly to sup press the warning super()._check_params_vs_input(x, default_n_init=10) C:\Users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of “n_init will chan ge from 16 to ‘auto’ in 1.4. Set the value of “n_init” explicitly to sup press the warning super()._check_params_vs_input(x, default_n_init=10) C:\Users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of “n_init’ will chan ge from 1@ to ‘auto’ in 1.4, Set the value of “n_init’ explicitly to sup press the warning super()._check_parans_vs_input(X, default_n_init=10) C:\users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of “n_init” will chan ge from 1@ to ‘auto’ in 1.4, Set the value of “n_init” explicitly to sup press the warning super()._check_parans_vs_input(X, default_n_init=10) C:\Users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of "n_init” will chan ge from 1@ to ‘auto’ in 1.4, Set the value of “n_init” explicitly to sup press the warning super()._check_params_vs_input(x, default_n_init=10) C:\Users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: Futurewarning: The default value of “n_init” will chan ge from 1@ to ‘auto’ in 1.4. Set the value of “n_init” explicitly to sup press the warning super()._check_params_vs_input(x, default_n_init=10) C:\Users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of “n_init” will chan ge from 1@ to ‘auto’ in 1.4. Set the value of “n_init” explicitly to sup press the warning super()._check_params_vs_input(x, default_n_init=10) C:\users\Admin\Desktop\Untitled Folder\Lib\site-packages\sklearn\cluster \_kmeans.py:1412: FutureWarning: The default value of “n_init’ will chan ge from 1@ to ‘auto’ in 1.4. Set the value of “n_init” explicitly to sup press the warning super()._check_params_vs_input(x, default_n_init=10) localhost 8888inotebookslab 11 mlipyabil 718418125, 9:7 PM lab 11 mi-Jupyter Notebook Elbow Method - Optimal K 1100 1000 900 800 700 Inertia 600 500 400 300 1 2 3 4 5 6 7 8 9 Number of Clusters After evaluating the cluster results for different k values, the model with provided the most distinct and interpretable clusters. The selection is based on intra-cluster distance, cluster sizes, and meaningful group differentiation. The cluster analysis identified four main groups of houses with distinct characteristics and sale price profiles. The findings show that overall quality, size, and year built are strong indicators of pricing. The client is advised to use these insights to tailor pricing strategies for different property types. Implementing these findings can enhance market segmentation and improve sales performance. Inf}: W localhost 8888inotebookslab 11 mlipynbit ais

Exercice 1 TP K-Means
No ratings yet
Exercice 1 TP K-Means
1 page
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Pa66 ML Exp6
No ratings yet
Pa66 ML Exp6
9 pages
Kmeansclustering Sales Dataset
No ratings yet
Kmeansclustering Sales Dataset
6 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
KMeans Clustering Bidimensional Daniel Ames Camayo
No ratings yet
KMeans Clustering Bidimensional Daniel Ames Camayo
15 pages
1 Kmeans-Pratical-No-1
No ratings yet
1 Kmeans-Pratical-No-1
8 pages
HW5 Clustering (50 PTS) : Test Algorithms
No ratings yet
HW5 Clustering (50 PTS) : Test Algorithms
5 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
Kmeans
No ratings yet
Kmeans
5 pages
ML Solution
No ratings yet
ML Solution
60 pages
Intro Qugates
No ratings yet
Intro Qugates
4 pages
Sales Data Clustering
No ratings yet
Sales Data Clustering
15 pages
06K Means Clustering
No ratings yet
06K Means Clustering
4 pages
Program 8
No ratings yet
Program 8
11 pages
Project Data Mining (AMAN YADAV)
No ratings yet
Project Data Mining (AMAN YADAV)
12 pages
AAM 7th Prac
No ratings yet
AAM 7th Prac
4 pages
Exercise6 Solution
No ratings yet
Exercise6 Solution
8 pages
Houses Prices Prediction Model
No ratings yet
Houses Prices Prediction Model
11 pages
Set 2
No ratings yet
Set 2
19 pages
Machine Learning - Code - Jupiter
No ratings yet
Machine Learning - Code - Jupiter
14 pages
Practical 5
No ratings yet
Practical 5
6 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
DMT Cia2
No ratings yet
DMT Cia2
11 pages
1684918425867
No ratings yet
1684918425867
14 pages
21BCE5775 Clustering
No ratings yet
21BCE5775 Clustering
42 pages
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
No ratings yet
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
40 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
Lab Assignment 3 Ai
No ratings yet
Lab Assignment 3 Ai
1 page
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
23CC554
No ratings yet
23CC554
10 pages
Marketing Analytics Week-10 LAQ
No ratings yet
Marketing Analytics Week-10 LAQ
5 pages
Customers K Means
No ratings yet
Customers K Means
11 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
4 Clustering With K-Means - Kaggle
No ratings yet
4 Clustering With K-Means - Kaggle
9 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
Ass6 (DMDS)
No ratings yet
Ass6 (DMDS)
7 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
D3 Docs
No ratings yet
D3 Docs
6 pages
ML Assignment
No ratings yet
ML Assignment
11 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Inbuilt Kmeans
No ratings yet
Inbuilt Kmeans
3 pages
Day59 K Means Clustering 1701989733
No ratings yet
Day59 K Means Clustering 1701989733
5 pages
Record 5
No ratings yet
Record 5
22 pages
Final Code
No ratings yet
Final Code
3 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Pattern Recognition Practicals
No ratings yet
Pattern Recognition Practicals
8 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages

ML Lab

Uploaded by

ML Lab

Uploaded by

You might also like