0% found this document useful (0 votes)

16 views33 pages

Real Estate Price Prediction Model

The document outlines a data analysis process using a dataset of Bengaluru house prices, including data cleaning, feature extraction, and outlier removal. Key steps include handling missing values, converting square footage ranges to numeric values, and calculating price per square foot. The final dataset is prepared for visualization and further analysis, focusing on the relationship between property size and price.

Uploaded by

mishraraj5732

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views33 pages

Real Estate Price Prediction Model

Uploaded by

mishraraj5732

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

import pandas as pd

import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
import matplotlib
matplotlib.rcParams["figure.figsize"] = (20,10)

df1 = pd.read_csv(r"C:\Users\hp\Downloads\Bengaluru_House_Data.csv")
df1.head()

area_type availability location

size \
0 Super built-up Area 19-Dec Electronic City Phase II
2 BHK
1 Plot Area Ready To Move Chikka Tirupathi 4
Bedroom
2 Built-up Area Ready To Move Uttarahalli
3 BHK
3 Super built-up Area Ready To Move Lingadheeranahalli
3 BHK
4 Super built-up Area Ready To Move Kothanur
2 BHK

society total_sqft bath balcony price

0 Coomee 1056 2.0 1.0 39.07
1 Theanmp 2600 5.0 3.0 120.00
2 NaN 1440 2.0 3.0 62.00
3 Soiewre 1521 3.0 1.0 95.00
4 NaN 1200 2.0 1.0 51.00

Data Cleaning:
df1.groupby('area_type')['area_type'].agg('count')

area_type
Built-up Area 2418
Carpet Area 87
Plot Area 2025
Super built-up Area 8790
Name: area_type, dtype: int64

df2 = df1.drop(['area_type','society','balcony','availability'],axis =
'columns')
df2.head()

location size total_sqft bath price

0 Electronic City Phase II 2 BHK 1056 2.0 39.07
1 Chikka Tirupathi 4 Bedroom 2600 5.0 120.00
2 Uttarahalli 3 BHK 1440 2.0 62.00
3 Lingadheeranahalli 3 BHK 1521 3.0 95.00
4 Kothanur 2 BHK 1200 2.0 51.00
df2.isnull().sum()

location 1
size 16
total_sqft 0
bath 73
price 0
dtype: int64

df3 = df2.dropna()
df3.isnull().sum()

location 0
size 0
total_sqft 0
bath 0
price 0
dtype: int64

df3.shape

(13246, 5)

df3['bhk'] = df3['size'].apply(lambda x : int(x.split (' ')[0]))

df3.head()

C:\Users\hp\AppData\Local\Temp\ipykernel_10596\945158270.py:1:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#
returning-a-view-versus-a-copy
df3['bhk'] = df3['size'].apply(lambda x : int(x.split (' ')[0]))

location size total_sqft bath price bhk

0 Electronic City Phase II 2 BHK 1056 2.0 39.07 2
1 Chikka Tirupathi 4 Bedroom 2600 5.0 120.00 4
2 Uttarahalli 3 BHK 1440 2.0 62.00 3
3 Lingadheeranahalli 3 BHK 1521 3.0 95.00 3
4 Kothanur 2 BHK 1200 2.0 51.00 2

df3['bhk'].unique()

array([ 2, 4, 3, 6, 1, 8, 7, 5, 11, 9, 27, 10, 19, 16, 43, 14,

12,
13, 18], dtype=int64)

df3[df3.bhk>20]
location size total_sqft bath price
bhk
1718 2Electronic City Phase II 27 BHK 8000 27.0 230.0
27
4684 Munnekollal 43 Bedroom 2400 40.0 660.0
43

df3.total_sqft.unique()

array(['1056', '2600', '1440', ..., '1133 - 1384', '774', '4689'],

dtype=object)

def is_float(x):
try:
float(x)
except:
return False
return True

df3[~df3['total_sqft'].apply(is_float)].head(10)

location size total_sqft bath price bhk

30 Yelahanka 4 BHK 2100 - 2850 4.0 186.000 4
122 Hebbal 4 BHK 3067 - 8156 4.0 477.000 4
137 8th Phase JP Nagar 2 BHK 1042 - 1105 2.0 54.005 2
165 Sarjapur 2 BHK 1145 - 1340 2.0 43.490 2
188 KR Puram 2 BHK 1015 - 1540 2.0 56.800 2
410 Kengeri 1 BHK 34.46Sq. Meter 1.0 18.500 1
549 Hennur Road 2 BHK 1195 - 1440 2.0 63.770 2
648 Arekere 9 Bedroom 4125Perch 9.0 265.000 9
661 Yelahanka 2 BHK 1120 - 1145 2.0 48.130 2
672 Bettahalsoor 4 Bedroom 3090 - 5002 4.0 445.000 4

def sqft_to_num(x):
tokens = x.split('-')
if len(tokens) == 2:
return(float(tokens[0]) + float(tokens[1])) / 2
try:
return float(x)
except:
return None

sqft_to_num('1230-2342')

1786.0

df4 = df3.copy()
df4['total_sqft'] = df4['total_sqft'].apply(sqft_to_num)
df4.head()
location size total_sqft bath price bhk
0 Electronic City Phase II 2 BHK 1056.0 2.0 39.07 2
1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4
2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3
3 Lingadheeranahalli 3 BHK 1521.0 3.0 95.00 3
4 Kothanur 2 BHK 1200.0 2.0 51.00 2

df4.loc[30]

location Yelahanka
size 4 BHK
total_sqft 2475.0
bath 4.0
price 186.0
bhk 4
Name: 30, dtype: object

df4.head(3)

location size total_sqft bath price bhk

0 Electronic City Phase II 2 BHK 1056.0 2.0 39.07 2
1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4
2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3

df5 = df4.copy()
df5['price_per_sqft'] = df5['price']*100000/df5['total_sqft']
df5.head()

location size total_sqft bath price bhk

\
0 Electronic City Phase II 2 BHK 1056.0 2.0 39.07 2

1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4

2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3

3 Lingadheeranahalli 3 BHK 1521.0 3.0 95.00 3

4 Kothanur 2 BHK 1200.0 2.0 51.00 2

price_per_sqft
0 3699.810606
1 4615.384615
2 4305.555556
3 6245.890861
4 4250.000000

len(df5.location.unique())

1304
df5.location = df5.location.apply(lambda x : x.strip())

location_stats = df5.groupby('location')
['location'].agg('count').sort_values(ascending = False)

location_stats

location
Whitefield 535
Sarjapur Road 392
Electronic City 304
Kanakpura Road 266
Thanisandra 236
...
1 Giri Nagar 1
Kanakapura Road, 1
Kanakapura main Road 1
Karnataka Shabarimala 1
whitefiled 1
Name: location, Length: 1293, dtype: int64

len(location_stats[location_stats<=10])

1052

location_stats_less_than_10 = location_stats[location_stats<=10]
location_stats_less_than_10

location
Basapura 10
1st Block Koramangala 10
Gunjur Palya 10
Kalkere 10
Sector 1 HSR Layout 10
..
1 Giri Nagar 1
Kanakapura Road, 1
Kanakapura main Road 1
Karnataka Shabarimala 1
whitefiled 1
Name: location, Length: 1052, dtype: int64

len(df5.location.unique())

1293

df5.location = df5.location.apply(lambda x : 'other' if x in

location_stats_less_than_10 else x)
len(df5.location.unique())

242
df5.head(10)

location size total_sqft bath price bhk

\
0 Electronic City Phase II 2 BHK 1056.0 2.0 39.07 2

1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4

2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3

3 Lingadheeranahalli 3 BHK 1521.0 3.0 95.00 3

4 Kothanur 2 BHK 1200.0 2.0 51.00 2

5 Whitefield 2 BHK 1170.0 2.0 38.00 2

6 Old Airport Road 4 BHK 2732.0 4.0 204.00 4

7 Rajaji Nagar 4 BHK 3300.0 4.0 600.00 4

8 Marathahalli 3 BHK 1310.0 3.0 63.25 3

9 other 6 Bedroom 1020.0 6.0 370.00 6

price_per_sqft
0 3699.810606
1 4615.384615
2 4305.555556
3 6245.890861
4 4250.000000
5 3247.863248
6 7467.057101
7 18181.818182
8 4828.244275
9 36274.509804

Outlier Removal :
df5[df5.total_sqft / df5.bhk<300].head()

location size total_sqft bath price bhk \

9 other 6 Bedroom 1020.0 6.0 370.0 6
45 HSR Layout 8 Bedroom 600.0 9.0 200.0 8
58 Murugeshpalya 6 Bedroom 1407.0 4.0 150.0 6
68 Devarachikkanahalli 8 Bedroom 1350.0 7.0 85.0 8
70 other 3 Bedroom 500.0 3.0 100.0 3

price_per_sqft
9 36274.509804
45 33333.333333
58 10660.980810
68 6296.296296
70 20000.000000

df5.shape

(13246, 7)

df6 = df5[~(df5.total_sqft / df5.bhk<300)]

df6.shape

(12502, 7)

df6.price_per_sqft.describe()

count 12456.000000
mean 6308.502826
std 4168.127339
min 267.829813
25% 4210.526316
50% 5294.117647
75% 6916.666667
max 176470.588235
Name: price_per_sqft, dtype: float64

def remove_pps_outliers(df):
df_out = pd.DataFrame()
for key, subdf in df.groupby('location'):
m = np.mean(subdf.price_per_sqft)
st = np.std(subdf.price_per_sqft)
reduced_df = subdf[(subdf.price_per_sqft > (m-st)) &
(subdf.price_per_sqft <= (m+st))]
df_out = pd.concat([df_out, reduced_df], ignore_index = True)
return df_out

df7 = remove_pps_outliers(df6)
df7.shape

(10241, 7)

def plot_scatter_chart(df,location):
bhk2 = df[(df.location==location) & (df.bhk==2)]
bhk3 = df[(df.location==location) & (df.bhk==3)]
matplotlib.rcParams['figure.figsize'] = (15,10)
plt.scatter(bhk2.total_sqft, bhk2.price, color = 'blue', label =
'2 BHK', s = 50)
plt.scatter(bhk3.total_sqft, bhk3.price, marker = '+', color =
'green', label = '3 BHK', s = 50)
plt.xlabel("Total Square Feet Area")
plt.ylabel("Price")
plt.title(location)
plt.legend()

plot_scatter_chart(df7,"Hebbal")

# We should also remove properties where for same location, the price
of (for example) 3 bedroom apartment is less than
# 2 bedroom apartment (with same sqft area). What we will do for given
location, we will build a dictionary of stats per bhk,i.e.

def remove_bhk_outliers(df):
exclude_indices = np.array([])
for location, location_df in df.groupby('location'):
bhk_stats = {}
for bhk, bhk_df in location_df.groupby('bhk'):
bhk_stats[bhk] = {
'mean' : np.mean(bhk_df.price_per_sqft),
'std' : np.std(bhk_df.price_per_sqft),
'count' : bhk_df.shape[0]
}
for bhk, bhk_df in location_df.groupby('bhk'):
stats = bhk_stats.get(bhk-1)
if stats and stats['count']>5:
exclude_indices = np.append(exclude_indices,
bhk_df[bhk_df.price_per_sqft<(stats['mean'])].index.values)
return df.drop(exclude_indices, axis = 'index')

df8 = remove_bhk_outliers(df7)7
df8.shape

(7329, 7)

plot_scatter_chart(df8,"Hebbal")

matplotlib.rcParams["figure.figsize"] = (20,10)
plt.hist(df8.price_per_sqft,rwidth=0.8)
plt.xlabel("Price Per Square Feet")
plt.ylabel("Count")

Text(0, 0.5, 'Count')

df8.bath.unique()

array([ 4., 3., 2., 5., 8., 1., 6., 7., 9., 12., 16., 13.])

df8[df8.bath>10]

location size total_sqft bath price bhk

price_per_sqft
5277 Neeladri Nagar 10 BHK 4000.0 12.0 160.0 10
4000.000000
8486 other 10 BHK 12000.0 12.0 525.0 10
4375.000000
8575 other 16 BHK 10000.0 16.0 550.0 16
5500.000000
9308 other 11 BHK 6000.0 12.0 150.0 11
2500.000000
9639 other 13 BHK 5425.0 13.0 275.0 13
5069.124424

plt.hist(df8.bath,rwidth=0.8)
plt.xlabel("Number of Bathrooms")
plt.ylabel("Counts")

Text(0, 0.5, 'Counts')

df8[df8.bath>df8.bhk+2]

location size total_sqft bath price bhk

price_per_sqft
1626 Chikkabanavar 4 Bedroom 2460.0 7.0 80.0 4
3252.032520
5238 Nagasandra 4 Bedroom 7000.0 8.0 450.0 4
6428.571429
6711 Thanisandra 3 BHK 1806.0 6.0 116.0 3
6423.034330
8411 other 6 BHK 11338.0 9.0 1000.0 6
8819.897689

df9 = df8[df8.bath<df8.bhk+2]
df9.shape

(7251, 7)

df10 = df9.drop(['size','price_per_sqft'],axis = 'columns')

df10

location total_sqft bath price bhk

0 1st Block Jayanagar 2850.0 4.0 428.0 4
1 1st Block Jayanagar 1630.0 3.0 194.0 3
2 1st Block Jayanagar 1875.0 2.0 235.0 3
3 1st Block Jayanagar 1200.0 2.0 130.0 3
4 1st Block Jayanagar 1235.0 2.0 148.0 2
... ... ... ... ... ...
10232 other 1200.0 2.0 70.0 2
10233 other 1800.0 1.0 200.0 1
10236 other 1353.0 2.0 110.0 2
10237 other 812.0 1.0 26.0 1
10240 other 3600.0 5.0 400.0 4

[7251 rows x 5 columns]

dummies = pd.get_dummies(df10.location).astype(int)
dummies

1st Block Jayanagar 1st Phase JP Nagar 2nd Phase Judicial

Layout \
0 1 0
0
1 1 0
0
2 1 0
0
3 1 0
0
4 1 0
0
... ... ...
...
10232 0 0
0
10233 0 0
0
10236 0 0
0
10237 0 0
0
10240 0 0
0

2nd Stage Nagarbhavi 5th Block Hbr Layout 5th Phase JP Nagar
\
0 0 0 0

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

... ... ... ...

10232 0 0 0

10233 0 0 0
10236 0 0 0

10237 0 0 0

10240 0 0 0

6th Phase JP Nagar 7th Phase JP Nagar 8th Phase JP Nagar \

0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
... ... ... ...
10232 0 0 0
10233 0 0 0
10236 0 0 0
10237 0 0 0
10240 0 0 0

9th Phase JP Nagar ... Vishveshwarya Layout Vishwapriya

Layout \
0 0 ... 0
0
1 0 ... 0
0
2 0 ... 0
0
3 0 ... 0
0
4 0 ... 0
0
... ... ... ... .
..
10232 0 ... 0
0
10233 0 ... 0
0
10236 0 ... 0
0
10237 0 ... 0
0
10240 0 ... 0
0

Vittasandra Whitefield Yelachenahalli Yelahanka Yelahanka

New Town \
0 0 0 0 0
0
1 0 0 0 0
0
2 0 0 0 0
0
3 0 0 0 0
0
4 0 0 0 0
0
... ... ... ... ...
...
10232 0 0 0 0
0
10233 0 0 0 0
0
10236 0 0 0 0
0
10237 0 0 0 0
0
10240 0 0 0 0
0

Yelenahalli Yeshwanthpur other

0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
... ... ... ...
10232 0 0 1
10233 0 0 1
10236 0 0 1
10237 0 0 1
10240 0 0 1

[7251 rows x 242 columns]

df11 = pd.concat([df10,dummies.drop('other',axis='columns')], axis =

'columns')
df11.head(3)

location total_sqft bath price bhk 1st Block

Jayanagar \
0 1st Block Jayanagar 2850.0 4.0 428.0 4
1
1 1st Block Jayanagar 1630.0 3.0 194.0 3
1
2 1st Block Jayanagar 1875.0 2.0 235.0 3
1

1st Phase JP Nagar 2nd Phase Judicial Layout 2nd Stage Nagarbhavi
\
0 0 0 0

1 0 0 0

2 0 0 0

5th Block Hbr Layout ... Vijayanagar Vishveshwarya Layout \

0 0 ... 0 0
1 0 ... 0 0
2 0 ... 0 0

Vishwapriya Layout Vittasandra Whitefield Yelachenahalli

Yelahanka \
0 0 0 0 0
0
1 0 0 0 0
0
2 0 0 0 0
0

Yelahanka New Town Yelenahalli Yeshwanthpur

0 0 0 0
1 0 0 0
2 0 0 0

[3 rows x 246 columns]

df12 = df11.drop(['location'],axis='columns')
df12.head(2)

total_sqft bath price bhk 1st Block Jayanagar 1st Phase JP

Nagar \
0 2850.0 4.0 428.0 4 1
0
1 1630.0 3.0 194.0 3 1
0

2nd Phase Judicial Layout 2nd Stage Nagarbhavi 5th Block Hbr
Layout \
0 0 0
0
1 0 0
0

5th Phase JP Nagar ... Vijayanagar Vishveshwarya Layout \

0 0 ... 0 0
1 0 ... 0 0

Vishwapriya Layout Vittasandra Whitefield Yelachenahalli

Yelahanka \
0 0 0 0 0
0
1 0 0 0 0
0

Yelahanka New Town Yelenahalli Yeshwanthpur

0 0 0 0
1 0 0 0

[2 rows x 245 columns]

X = df12.drop('price',axis='columns') # X => independent

variables.
X.head()

total_sqft bath bhk 1st Block Jayanagar 1st Phase JP Nagar \

0 2850.0 4.0 4 1 0
1 1630.0 3.0 3 1 0
2 1875.0 2.0 3 1 0
3 1200.0 2.0 3 1 0
4 1235.0 2.0 2 1 0

2nd Phase Judicial Layout 2nd Stage Nagarbhavi 5th Block Hbr
Layout \
0 0 0
0
1 0 0
0
2 0 0
0
3 0 0
0
4 0 0
0

5th Phase JP Nagar 6th Phase JP Nagar ... Vijayanagar \

0 0 0 ... 0
1 0 0 ... 0
2 0 0 ... 0
3 0 0 ... 0
4 0 0 ... 0

Vishveshwarya Layout Vishwapriya Layout Vittasandra

Whitefield \
0 0 0 0 0

1 0 0 0 0

2 0 0 0 0
3 0 0 0 0

4 0 0 0 0

Yelachenahalli Yelahanka Yelahanka New Town Yelenahalli

Yeshwanthpur
0 0 0 0 0
0
1 0 0 0 0
0
2 0 0 0 0
0
3 0 0 0 0
0
4 0 0 0 0
0

[5 rows x 244 columns]

y = df12.price # y => dependent variable

y.head()

0 428.0
1 194.0
2 235.0
3 130.0
4 148.0
Name: price, dtype: float64

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test =
train_test_split(X,y,test_size=0.2,random_state=10)

from sklearn.linear_model import LinearRegression

lr_clf = LinearRegression()
lr_clf.fit(X_train,y_train)
lr_clf.score(X_test,y_test)

0.8452277697874324

from sklearn.model_selection import ShuffleSplit

from sklearn.model_selection import cross_val_score

cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)

cross_val_score(LinearRegression(), X, y, cv=cv)

array([0.82430186, 0.77166234, 0.85089567, 0.80837764, 0.83653286])

from sklearn.linear_model import Lasso, LinearRegression

from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV, ShuffleSplit
from sklearn.preprocessing import StandardScaler
import pandas as pd

def find_best_model_using_gridsearchcv(X, y):

algos = {
'linear_regression': {
'model': LinearRegression(),
'params': {
# Removed 'normalize' parameter
'fit_intercept': [True, False]
}
},
'lasso': {
'model': Lasso(),
'params': {
'alpha': [1, 2],
'selection': ['random', 'cyclic']
}
},
'decision_tree': {
'model': DecisionTreeRegressor(),
'params': {
'criterion': ['mse', 'friedman_mse'],
'splitter': ['best', 'random']
}
}
}

scores = []
cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)

for algo_name, config in algos.items():

gs = GridSearchCV(config['model'], config['params'], cv=cv,
return_train_score=False)
gs.fit(X, y)
scores.append({
'model': algo_name,
'best_score': gs.best_score_,
'best_params': gs.best_params_
})

return pd.DataFrame(scores, columns=['model', 'best_score',

'best_params'])

# Example usage: assuming X and y are defined

find_best_model_using_gridsearchcv(X, y)

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\model_selection\
_validation.py:378: FitFailedWarning:
10 fits failed out of a total of 20.
The score on these train-test partitions for these parameters will be
set to nan.
If these failures are not expected, you can try to debug them by
setting error_score='raise'.

Below are more details about the failures:

----------------------------------------------------------------------
----------
10 fits failed with the following error:
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\Lib\site-packages\sklearn\
model_selection\_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\ProgramData\anaconda3\Lib\site-packages\sklearn\tree\
_classes.py", line 1247, in fit
super().fit(
File "C:\ProgramData\anaconda3\Lib\site-packages\sklearn\tree\
_classes.py", line 177, in fit
self._validate_params()
File "C:\ProgramData\anaconda3\Lib\site-packages\sklearn\base.py",
line 600, in _validate_params
validate_parameter_constraints(
File "C:\ProgramData\anaconda3\Lib\site-packages\sklearn\utils\
_param_validation.py", line 97, in validate_parameter_constraints
raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'criterion'
parameter of DecisionTreeRegressor must be a str among {'poisson',
'squared_error', 'absolute_error', 'friedman_mse'}. Got 'mse' instead.

warnings.warn(some_fits_failed_message, FitFailedWarning)
C:\ProgramData\anaconda3\Lib\site-packages\sklearn\model_selection\
_search.py:952: UserWarning: One or more of the test scores are non-
finite: [ nan nan 0.70935255 0.68931782]
warnings.warn(

model best_score \
0 linear_regression 0.819001
1 lasso 0.687436
2 decision_tree 0.709353

best_params
0 {'fit_intercept': False}
1 {'alpha': 2, 'selection': 'random'}
2 {'criterion': 'friedman_mse', 'splitter': 'best'}

def predict_price(location,sqft,bath,bhk):
loc_index = np.where(X.columns==location)[0][0]
x = np.zeros(len(X.columns))
x[0] = sqft
x[1] = bath
x[2] = bhk
if loc_index > 0:
x[loc_index] = 1

return lr_clf.predict([x])[0]

X.columns

Index(['total_sqft', 'bath', 'bhk', '1st Block Jayanagar',

'1st Phase JP Nagar', '2nd Phase Judicial Layout',
'2nd Stage Nagarbhavi', '5th Block Hbr Layout', '5th Phase JP
Nagar',
'6th Phase JP Nagar',
...
'Vijayanagar', 'Vishveshwarya Layout', 'Vishwapriya Layout',
'Vittasandra', 'Whitefield', 'Yelachenahalli', 'Yelahanka',
'Yelahanka New Town', 'Yelenahalli', 'Yeshwanthpur'],
dtype='object', length=244)

predict_price('1st Phase JP Nagar',1000,2,2)

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\base.py:439:
UserWarning: X does not have valid feature names, but LinearRegression
was fitted with feature names
warnings.warn(

83.49904677185246

predict_price('Indira Nagar',1000,2,2)

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\base.py:439:
UserWarning: X does not have valid feature names, but LinearRegression
was fitted with feature names
warnings.warn(

181.2781548400676

import pickle
with open('bangalore_home_prices_model.pickle','wb') as f:
pickle.dump(lr_clf,f)

import json
columns = {
'data_columns' : [col.lower() for col in X.columns]
}
with open("columns.json","w") as f:
f.write(json.dumps(columns))

Delhi House Price Prediction 1692019997
No ratings yet
Delhi House Price Prediction 1692019997
34 pages
Pandas Tutorial - Top 40 Useful Tricks - Ipynb
No ratings yet
Pandas Tutorial - Top 40 Useful Tricks - Ipynb
316 pages
Eda Project
No ratings yet
Eda Project
28 pages
Pandas - Jupyter Notebook - 19!7!2025
No ratings yet
Pandas - Jupyter Notebook - 19!7!2025
36 pages
AM19 EDA Assignment3
No ratings yet
AM19 EDA Assignment3
37 pages
House Rent Prediction EDA
No ratings yet
House Rent Prediction EDA
35 pages
Housing Prices Notebook
No ratings yet
Housing Prices Notebook
14 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
ML Lab34
No ratings yet
ML Lab34
29 pages
IE0005 Exercise Solutions 2-6
No ratings yet
IE0005 Exercise Solutions 2-6
84 pages
House - Price - Prediction
No ratings yet
House - Price - Prediction
16 pages
Wa0000
No ratings yet
Wa0000
26 pages
002 Python Pandas
No ratings yet
002 Python Pandas
19 pages
Ex 1
No ratings yet
Ex 1
119 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
DSPWK 12
No ratings yet
DSPWK 12
11 pages
Zomato Rating Prediction
No ratings yet
Zomato Rating Prediction
11 pages
IndianHouses 1695069727
No ratings yet
IndianHouses 1695069727
7 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
DL 1
No ratings yet
DL 1
11 pages
MLT Practical 2
No ratings yet
MLT Practical 2
6 pages
Housing Main
No ratings yet
Housing Main
23 pages
West Rox
No ratings yet
West Rox
29 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
Data Cleaning On Melbourne Housing
No ratings yet
Data Cleaning On Melbourne Housing
16 pages
Data Analysis With Python - Jupyter Notebook
No ratings yet
Data Analysis With Python - Jupyter Notebook
10 pages
Exercise2 Solution
No ratings yet
Exercise2 Solution
15 pages
Pract1.printdsbdapdf 2
No ratings yet
Pract1.printdsbdapdf 2
7 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Hello
No ratings yet
Hello
3 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Multilevel Modeling Using R
No ratings yet
Multilevel Modeling Using R
253 pages
Quantam - Learning - Colaboratory
No ratings yet
Quantam - Learning - Colaboratory
13 pages
R Prerequisite1
No ratings yet
R Prerequisite1
4 pages
Assignement 4
No ratings yet
Assignement 4
6 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
Data Science Project
No ratings yet
Data Science Project
7 pages
Q 1
No ratings yet
Q 1
2 pages
Week 12
No ratings yet
Week 12
2 pages
Riya - 2412res102@iitp - Ac.in - Ipynb - Colab
No ratings yet
Riya - 2412res102@iitp - Ac.in - Ipynb - Colab
3 pages
STT206 Summary-1
100% (1)
STT206 Summary-1
36 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Assigment1 - Manuel Tapia
No ratings yet
Assigment1 - Manuel Tapia
3 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
DMV - 3 - Jupyter Notebook
No ratings yet
DMV - 3 - Jupyter Notebook
2 pages
Pandas Assignment 1
No ratings yet
Pandas Assignment 1
7 pages
Customer Data Outliers Pyspark
No ratings yet
Customer Data Outliers Pyspark
1 page
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Project PDF
No ratings yet
Project PDF
13 pages
Jorion FRM Exam Questions
No ratings yet
Jorion FRM Exam Questions
96 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages
Exp 10
No ratings yet
Exp 10
1 page
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
8 pages
House Price Prediction
No ratings yet
House Price Prediction
1 page
Practice Questions2
No ratings yet
Practice Questions2
2 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Act. Stat
67% (3)
Act. Stat
10 pages
Stochastic Hydrology: Indian Institute of Science
No ratings yet
Stochastic Hydrology: Indian Institute of Science
56 pages
Week 5 Worksheet Answers
No ratings yet
Week 5 Worksheet Answers
6 pages
IDF Procedure
No ratings yet
IDF Procedure
5 pages
CS2 CMP Upgrade 2025
No ratings yet
CS2 CMP Upgrade 2025
12 pages
Lesson 7: Standard Scores (Z)
No ratings yet
Lesson 7: Standard Scores (Z)
14 pages
EPS - Chapter - 4 - Discrete Distributions - JNN - OK
No ratings yet
EPS - Chapter - 4 - Discrete Distributions - JNN - OK
56 pages
Rose Sparkling Wine
100% (1)
Rose Sparkling Wine
32 pages
Cia 2
No ratings yet
Cia 2
14 pages
4.1 Descriptive Measures
No ratings yet
4.1 Descriptive Measures
34 pages
Advanced Statistics Test of Hypothesis
No ratings yet
Advanced Statistics Test of Hypothesis
31 pages
Core Data Analysis Worksheet 6
No ratings yet
Core Data Analysis Worksheet 6
20 pages
Probability Basics
No ratings yet
Probability Basics
5 pages
Cambridge Assessment International Education
No ratings yet
Cambridge Assessment International Education
16 pages
Econometrics ch6
No ratings yet
Econometrics ch6
51 pages
Chi-Square Test and Odds Ratio - Tagged
No ratings yet
Chi-Square Test and Odds Ratio - Tagged
45 pages
J.D. Opdyke - Operational Risk Capital Estimation and Planning (With Alex Cavallo) in New Frontiers in Operational Risk Modeling
No ratings yet
J.D. Opdyke - Operational Risk Capital Estimation and Planning (With Alex Cavallo) in New Frontiers in Operational Risk Modeling
56 pages
Unit4 Statistics
No ratings yet
Unit4 Statistics
37 pages
Ejercicios Cap3 Produccion
No ratings yet
Ejercicios Cap3 Produccion
4 pages
Appendix A Cumulative Probabilities For A Standard Normal Distribution P (Z X) N (X) For X 0 or P (Z Z) N (Z) For Z 0
No ratings yet
Appendix A Cumulative Probabilities For A Standard Normal Distribution P (Z X) N (X) For X 0 or P (Z Z) N (Z) For Z 0
9 pages
Decision Science
No ratings yet
Decision Science
9 pages
Assignment 6
No ratings yet
Assignment 6
9 pages
Kalman Filters: Emtiyaz CS, Ubc
No ratings yet
Kalman Filters: Emtiyaz CS, Ubc
12 pages
Honors Exam 2012 Econometrics With Answers
No ratings yet
Honors Exam 2012 Econometrics With Answers
6 pages
Qcar 17 18 Compre QP
No ratings yet
Qcar 17 18 Compre QP
2 pages
Series 3. Discrete - Random - Variables
No ratings yet
Series 3. Discrete - Random - Variables
2 pages
Lampiran 1 Lembar Persetujuan Setelah Penjelasan (PSP) : (Informed Consent)
No ratings yet
Lampiran 1 Lembar Persetujuan Setelah Penjelasan (PSP) : (Informed Consent)
9 pages
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
The Holodeck: A Specification
From Everand
The Holodeck: A Specification
Michael Cloran
No ratings yet

Real Estate Price Prediction Model

Uploaded by

Real Estate Price Prediction Model

Uploaded by

import pandas as pd

area_type availability location

society total_sqft bath balcony price

location size total_sqft bath price

df3['bhk'] = df3['size'].apply(lambda x : int(x.split (' ')[0]))

See the caveats in the documentation:

location size total_sqft bath price bhk

array([ 2, 4, 3, 6, 1, 8, 7, 5, 11, 9, 27, 10, 19, 16, 43, 14,

array(['1056', '2600', '1440', ..., '1133 - 1384', '774', '4689'],

location size total_sqft bath price bhk

location size total_sqft bath price bhk

location size total_sqft bath price bhk

1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4

2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3

3 Lingadheeranahalli 3 BHK 1521.0 3.0 95.00 3

4 Kothanur 2 BHK 1200.0 2.0 51.00 2

df5.location = df5.location.apply(lambda x : 'other' if x in

location size total_sqft bath price bhk

1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4

2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3

3 Lingadheeranahalli 3 BHK 1521.0 3.0 95.00 3

4 Kothanur 2 BHK 1200.0 2.0 51.00 2

5 Whitefield 2 BHK 1170.0 2.0 38.00 2

6 Old Airport Road 4 BHK 2732.0 4.0 204.00 4

7 Rajaji Nagar 4 BHK 3300.0 4.0 600.00 4

8 Marathahalli 3 BHK 1310.0 3.0 63.25 3

9 other 6 Bedroom 1020.0 6.0 370.00 6

location size total_sqft bath price bhk \

df6 = df5[~(df5.total_sqft / df5.bhk<300)]

Text(0, 0.5, 'Count')

location size total_sqft bath price bhk

Text(0, 0.5, 'Counts')

location size total_sqft bath price bhk

df10 = df9.drop(['size','price_per_sqft'],axis = 'columns')

location total_sqft bath price bhk

[7251 rows x 5 columns]

1st Block Jayanagar 1st Phase JP Nagar 2nd Phase Judicial

... ... ... ...

6th Phase JP Nagar 7th Phase JP Nagar 8th Phase JP Nagar \

9th Phase JP Nagar ... Vishveshwarya Layout Vishwapriya

Vittasandra Whitefield Yelachenahalli Yelahanka Yelahanka

Yelenahalli Yeshwanthpur other

[7251 rows x 242 columns]

df11 = pd.concat([df10,dummies.drop('other',axis='columns')], axis =

location total_sqft bath price bhk 1st Block

5th Block Hbr Layout ... Vijayanagar Vishveshwarya Layout \

Vishwapriya Layout Vittasandra Whitefield Yelachenahalli

Yelahanka New Town Yelenahalli Yeshwanthpur

[3 rows x 246 columns]

total_sqft bath price bhk 1st Block Jayanagar 1st Phase JP

5th Phase JP Nagar ... Vijayanagar Vishveshwarya Layout \

Vishwapriya Layout Vittasandra Whitefield Yelachenahalli

Yelahanka New Town Yelenahalli Yeshwanthpur

[2 rows x 245 columns]

X = df12.drop('price',axis='columns') # X => independent

total_sqft bath bhk 1st Block Jayanagar 1st Phase JP Nagar \

5th Phase JP Nagar 6th Phase JP Nagar ... Vijayanagar \

Vishveshwarya Layout Vishwapriya Layout Vittasandra

Yelachenahalli Yelahanka Yelahanka New Town Yelenahalli

[5 rows x 244 columns]

y = df12.price # y => dependent variable

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import ShuffleSplit

cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)

array([0.82430186, 0.77166234, 0.85089567, 0.80837764, 0.83653286])

from sklearn.linear_model import Lasso, LinearRegression

def find_best_model_using_gridsearchcv(X, y):

for algo_name, config in algos.items():

return pd.DataFrame(scores, columns=['model', 'best_score',

# Example usage: assuming X and y are defined

Below are more details about the failures:

Index(['total_sqft', 'bath', 'bhk', '1st Block Jayanagar',

predict_price('1st Phase JP Nagar',1000,2,2)

You might also like