Deep Learning - House Price Prediction
Deep Learning - House Price Prediction
May 5, 2025
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from scikeras.wrappers import KerasRegressor
from tensorflow.keras import regularizers
1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
# print(rows)
# Convert to a DataFrame and render.
import pandas as pd
df = pd.DataFrame.from_records(rows)
2
1 0 TA TA CBlock Gd TA Gd
2 162 Gd TA PConc Gd TA Mn
3 0 TA TA BrkTil TA Gd No
4 350 Gd TA PConc Gd TA Av
3
0 ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold \
0 0 0 NA NA NA 0 2 2008
1 0 0 NA NA NA 0 5 2007
2 0 0 NA NA NA 0 9 2008
3 0 0 NA NA NA 0 2 2006
4 0 0 NA NA NA 0 12 2008
[6]: # Dropping id
df.drop('Id', axis=1, inplace=True)
[9]: x.shape
[10]: cols_drop = []
MSSubClass 0
MSZoning 0
LotFrontage 259
LotArea 0
Street 0
LotShape 0
LandContour 0
4
Utilities 0
LotConfig 0
LandSlope 0
Neighborhood 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
OverallQual 0
OverallCond 0
YearBuilt 0
YearRemodAdd 0
RoofStyle 0
RoofMatl 0
Exterior1st 0
Exterior2nd 0
MasVnrType 8
MasVnrArea 8
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinSF1 0
BsmtFinType2 38
BsmtFinSF2 0
BsmtUnfSF 0
TotalBsmtSF 0
Heating 0
HeatingQC 0
CentralAir 0
Electrical 1
1stFlrSF 0
2ndFlrSF 0
LowQualFinSF 0
GrLivArea 0
BsmtFullBath 0
BsmtHalfBath 0
FullBath 0
HalfBath 0
BedroomAbvGr 0
KitchenAbvGr 0
KitchenQual 0
TotRmsAbvGrd 0
Functional 0
Fireplaces 0
5
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageCars 0
GarageArea 0
GarageQual 81
GarageCond 81
PavedDrive 0
WoodDeckSF 0
OpenPorchSF 0
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
MiscVal 0
MoSold 0
YrSold 0
SaleType 0
SaleCondition 0
[14]: x.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 75 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSSubClass 1460 non-null object
1 MSZoning 1460 non-null object
2 LotFrontage 1201 non-null object
3 LotArea 1460 non-null object
4 Street 1460 non-null object
5 LotShape 1460 non-null object
6 LandContour 1460 non-null object
7 Utilities 1460 non-null object
8 LotConfig 1460 non-null object
9 LandSlope 1460 non-null object
10 Neighborhood 1460 non-null object
11 Condition1 1460 non-null object
12 Condition2 1460 non-null object
13 BldgType 1460 non-null object
14 HouseStyle 1460 non-null object
15 OverallQual 1460 non-null object
16 OverallCond 1460 non-null object
6
17 YearBuilt 1460 non-null object
18 YearRemodAdd 1460 non-null object
19 RoofStyle 1460 non-null object
20 RoofMatl 1460 non-null object
21 Exterior1st 1460 non-null object
22 Exterior2nd 1460 non-null object
23 MasVnrType 1452 non-null object
24 MasVnrArea 1452 non-null object
25 ExterQual 1460 non-null object
26 ExterCond 1460 non-null object
27 Foundation 1460 non-null object
28 BsmtQual 1423 non-null object
29 BsmtCond 1423 non-null object
30 BsmtExposure 1422 non-null object
31 BsmtFinType1 1423 non-null object
32 BsmtFinSF1 1460 non-null object
33 BsmtFinType2 1422 non-null object
34 BsmtFinSF2 1460 non-null object
35 BsmtUnfSF 1460 non-null object
36 TotalBsmtSF 1460 non-null object
37 Heating 1460 non-null object
38 HeatingQC 1460 non-null object
39 CentralAir 1460 non-null object
40 Electrical 1459 non-null object
41 1stFlrSF 1460 non-null object
42 2ndFlrSF 1460 non-null object
43 LowQualFinSF 1460 non-null object
44 GrLivArea 1460 non-null object
45 BsmtFullBath 1460 non-null object
46 BsmtHalfBath 1460 non-null object
47 FullBath 1460 non-null object
48 HalfBath 1460 non-null object
49 BedroomAbvGr 1460 non-null object
50 KitchenAbvGr 1460 non-null object
51 KitchenQual 1460 non-null object
52 TotRmsAbvGrd 1460 non-null object
53 Functional 1460 non-null object
54 Fireplaces 1460 non-null object
55 FireplaceQu 770 non-null object
56 GarageType 1379 non-null object
57 GarageYrBlt 1379 non-null object
58 GarageFinish 1379 non-null object
59 GarageCars 1460 non-null object
60 GarageArea 1460 non-null object
61 GarageQual 1379 non-null object
62 GarageCond 1379 non-null object
63 PavedDrive 1460 non-null object
64 WoodDeckSF 1460 non-null object
7
65 OpenPorchSF 1460 non-null object
66 EnclosedPorch 1460 non-null object
67 3SsnPorch 1460 non-null object
68 ScreenPorch 1460 non-null object
69 PoolArea 1460 non-null object
70 MiscVal 1460 non-null object
71 MoSold 1460 non-null object
72 YrSold 1460 non-null object
73 SaleType 1460 non-null object
74 SaleCondition 1460 non-null object
dtypes: object(75)
memory usage: 855.6+ KB
8
Column MasVnrType contains non-numeric values
Column MasVnrArea converted to numeric
Column ExterQual contains non-numeric values
Column ExterCond contains non-numeric values
Column Foundation contains non-numeric values
Column BsmtQual contains non-numeric values
Column BsmtCond contains non-numeric values
Column BsmtExposure contains non-numeric values
Column BsmtFinType1 contains non-numeric values
Column BsmtFinSF1 converted to numeric
Column BsmtFinType2 contains non-numeric values
Column BsmtFinSF2 converted to numeric
Column BsmtUnfSF converted to numeric
Column TotalBsmtSF converted to numeric
Column Heating contains non-numeric values
Column HeatingQC contains non-numeric values
Column CentralAir contains non-numeric values
Column Electrical contains non-numeric values
Column 1stFlrSF converted to numeric
Column 2ndFlrSF converted to numeric
Column LowQualFinSF converted to numeric
Column GrLivArea converted to numeric
Column BsmtFullBath converted to numeric
Column BsmtHalfBath converted to numeric
Column FullBath converted to numeric
Column HalfBath converted to numeric
Column BedroomAbvGr converted to numeric
Column KitchenAbvGr converted to numeric
Column KitchenQual contains non-numeric values
Column TotRmsAbvGrd converted to numeric
Column Functional contains non-numeric values
Column Fireplaces converted to numeric
Column FireplaceQu contains non-numeric values
Column GarageType contains non-numeric values
Column GarageYrBlt converted to numeric
Column GarageFinish contains non-numeric values
Column GarageCars converted to numeric
Column GarageArea converted to numeric
Column GarageQual contains non-numeric values
Column GarageCond contains non-numeric values
Column PavedDrive contains non-numeric values
Column WoodDeckSF converted to numeric
Column OpenPorchSF converted to numeric
Column EnclosedPorch converted to numeric
Column 3SsnPorch converted to numeric
Column ScreenPorch converted to numeric
Column PoolArea converted to numeric
Column MiscVal converted to numeric
9
Column MoSold converted to numeric
Column YrSold converted to numeric
Column SaleType contains non-numeric values
Column SaleCondition contains non-numeric values
[16]: x[num_features].head()
10
4 0 0 0 0 0 12 2008
[17]: x[non_num_features].head()
[18]: len(non_num_features)
11
[18]: 39
[19]: ordinal_features =␣
↪['ExterQual','ExterCond','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2',
x[ordinal_features].head()
0 GarageCond Functional
0 TA Typ
1 TA Typ
2 TA Typ
3 TA Typ
4 TA Typ
[21]: x[cat_features].head()
12
4 Y WD Normal BrkFace VinylSd NoRidge
categorical_pipeline = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value = 'Missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
]
)
numerical_pipeline = Pipeline(steps=[
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())
]
)
Column Transformer
[23]: # column Transformer
preprocessor = ColumnTransformer([
('ordinal', ordinal_pipeline, ordinal_features),
('categorical', categorical_pipeline, cat_features),
('numerical', numerical_pipeline, num_features)
])
# Display
13
preprocessor
[23]: ColumnTransformer(transformers=[('ordinal',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='most_frequent')),
('ordina_encoder',
OrdinalEncoder())]),
['ExterQual', 'ExterCond', 'BsmtQual',
'BsmtCond', 'BsmtExposure', 'BsmtFinType1',
'BsmtFinType2', 'HeatingQC', 'KitchenQual',
'FireplaceQu', 'GarageFinish', 'GarageQual',
'GarageCond', 'Functional']),
('categorical…
'OverallQual', 'OverallCond', 'YearBuilt',
'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1',
'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF',
'1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath',
'FullBath', 'HalfBath', 'BedroomAbvGr',
'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces',
'GarageYrBlt', 'GarageCars', 'GarageArea',
'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch',
…])])
[25]: len(x_train.columns)
[25]: 75
14
[27]: model_input_shape = x_train_transformed.shape[1]
model_input_shape
[27]: 223
# nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(32, activation='relu', kernel_regularizer=regularizers.l2(0.
↪01)))
15
nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.
↪01)))
# nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(128, activation='relu', kernel_regularizer=regularizers.
↪l2(0.01))) # hidden layer
# nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(1)) # Output layer
nn_model3.compile(optimizer = tf.keras.optimizers.Adam(learning_rate=0.008),␣
↪loss = 'mse',
metrics=['mse', tf.keras.metrics.RootMeanSquaredError()])
1.6 Evaluation
[133]: # Histroys to data frame
training_histroy_df1 = pd.DataFrame(training_histroy1.history)
training_histroy_df2 = pd.DataFrame(training_histroy2.history)
training_histroy_df3 = pd.DataFrame(training_histroy3.history)
[134]: # Evaluation
rmse_model1 = nn_model1.evaluate(x_val_transformed, y_val, verbose=0)
rmse_model2 = nn_model2.evaluate(x_val_transformed, y_val, verbose=0)
rmse_model3 = nn_model3.evaluate(x_val_transformed, y_val, verbose=0)
[136]: # display
print(f'Model 1 RMSE: {rmse_model1[2]:,.0f}')
print(f'Model 2 RMSE: {rmse_model2[2]:,.0f}')
print(f'Model 3 RMSE: {rmse_model3[2]:,.0f}')
#
16
1.7 Root Mean Squared Error (RMSE) Summary
• Model 1 RMSE: 193,173: This model performs very poorly. The high RMSE indicates
that its predictions are far from the actual house prices. This may be due to underfitting.
• Model 2 RMSE: 48,126: A significant improvement over Model 1. The deeper architecture
with ReLU activations allowed the model to learn more complex relationships in the data.
• Model 3 RMSE: 32,417: This the best perfroming model. The model combines:
1. A deeper architecture with more hidden layers and units, allowing the model to capture more
complex relationships in the data.
2. L2 regularization to prevent overfitting by penalizing overly complex weights, which helps the
model generalize better.
3. No dropout layers, as the data set is relatively small, and dropout might not provide significant
additional benefit in this case.
4. A slightly smaller learning rate (0.005) which allows more stable and refined updates.
Together, these lead to better generalization and significantly lower prediction error on unseen data.
17