0% found this document useful (0 votes)
66 views17 pages

Deep Learning - House Price Prediction

The document outlines an advanced machine learning assignment focused on housing price prediction using Python libraries such as Pandas, NumPy, and TensorFlow. It details the process of importing necessary libraries, loading data from Google Sheets, and performing data inspection and cleaning, including handling missing values. The assignment emphasizes preparing the dataset for modeling by identifying and dropping columns with excessive missing data.

Uploaded by

mohamad21sh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views17 pages

Deep Learning - House Price Prediction

The document outlines an advanced machine learning assignment focused on housing price prediction using Python libraries such as Pandas, NumPy, and TensorFlow. It details the process of importing necessary libraries, loading data from Google Sheets, and performing data inspection and cleaning, including handling missing values. The assignment emphasizes preparing the dataset for modeling by identifying and dropping columns with excessive missing data.

Uploaded by

mohamad21sh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Advanced_Machine_Learning_Assignment

May 5, 2025

0.1 Importing Libraries


[1]: !pip install -Uq scikeras

[2]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

import warnings
warnings.filterwarnings('ignore')

from sklearn.compose import ColumnTransformer


from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, \
LabelEncoder, OrdinalEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from scikeras.wrappers import KerasRegressor
from tensorflow.keras import regularizers

1 Loading the Data


[3]: from google.colab import auth
auth.authenticate_user()
import gspread
from google.auth import default
creds, _ = default()
gc = gspread.authorize(creds)
worksheet = gc.open('Housing Price Prediction - Train').sheet1

1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
# print(rows)
# Convert to a DataFrame and render.
import pandas as pd
df = pd.DataFrame.from_records(rows)

1.1 Data Inspection and Cleaning


[4]: # Display all columns
pd.set_option('display.max_columns', None)

[5]: # setting first row as headers


df.columns = df.iloc[0]
df = df.iloc[1:].reset_index(drop=True)
df.head()

[5]: 0 Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \


0 1 60 RL 65 8450 Pave NA Reg
1 2 20 RL 80 9600 Pave NA Reg
2 3 60 RL 68 11250 Pave NA IR1
3 4 70 RL 60 9550 Pave NA IR1
4 5 60 RL 84 14260 Pave NA IR1

0 LandContour Utilities LotConfig LandSlope Neighborhood Condition1 \


0 Lvl AllPub Inside Gtl CollgCr Norm
1 Lvl AllPub FR2 Gtl Veenker Feedr
2 Lvl AllPub Inside Gtl CollgCr Norm
3 Lvl AllPub Corner Gtl Crawfor Norm
4 Lvl AllPub FR2 Gtl NoRidge Norm

0 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt \


0 Norm 1Fam 2Story 7 5 2003
1 Norm 1Fam 1Story 6 8 1976
2 Norm 1Fam 2Story 7 5 2001
3 Norm 1Fam 2Story 7 5 1915
4 Norm 1Fam 2Story 8 5 2000

0 YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType \


0 2003 Gable CompShg VinylSd VinylSd BrkFace
1 1976 Gable CompShg MetalSd MetalSd None
2 2002 Gable CompShg VinylSd VinylSd BrkFace
3 1970 Gable CompShg Wd Sdng Wd Shng None
4 2000 Gable CompShg VinylSd VinylSd BrkFace

0 MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure \


0 196 Gd TA PConc Gd TA No

2
1 0 TA TA CBlock Gd TA Gd
2 162 Gd TA PConc Gd TA Mn
3 0 TA TA BrkTil TA Gd No
4 350 Gd TA PConc Gd TA Av

0 BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF \


0 GLQ 706 Unf 0 150 856
1 ALQ 978 Unf 0 284 1262
2 GLQ 486 Unf 0 434 920
3 ALQ 216 Unf 0 540 756
4 GLQ 655 Unf 0 490 1145

0 Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF \


0 GasA Ex Y SBrkr 856 854 0
1 GasA Ex Y SBrkr 1262 0 0
2 GasA Ex Y SBrkr 920 866 0
3 GasA Gd Y SBrkr 961 756 0
4 GasA Ex Y SBrkr 1145 1053 0

0 GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr \


0 1710 1 0 2 1 3
1 1262 0 1 2 0 3
2 1786 1 0 2 1 3
3 1717 1 0 1 0 3
4 2198 1 0 2 1 4

0 KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu \


0 1 Gd 8 Typ 0 NA
1 1 TA 6 Typ 1 TA
2 1 Gd 6 Typ 1 TA
3 1 Gd 7 Typ 1 Gd
4 1 Gd 9 Typ 1 TA

0 GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual \


0 Attchd 2003 RFn 2 548 TA
1 Attchd 1976 RFn 2 460 TA
2 Attchd 2001 RFn 2 608 TA
3 Detchd 1998 Unf 3 642 TA
4 Attchd 2000 RFn 3 836 TA

0 GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch \


0 TA Y 0 61 0 0
1 TA Y 298 0 0 0
2 TA Y 0 42 0 0
3 TA Y 0 35 272 0
4 TA Y 192 84 0 0

3
0 ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold \
0 0 0 NA NA NA 0 2 2008
1 0 0 NA NA NA 0 5 2007
2 0 0 NA NA NA 0 9 2008
3 0 0 NA NA NA 0 2 2006
4 0 0 NA NA NA 0 12 2008

0 SaleType SaleCondition SalePrice


0 WD Normal 208500
1 WD Normal 181500
2 WD Normal 223500
3 WD Abnorml 140000
4 WD Normal 250000

[6]: # Dropping id
df.drop('Id', axis=1, inplace=True)

[7]: # Features and Target


x = df.drop('SalePrice', axis=1)
y = df['SalePrice']

[8]: # setting y as type float


y = y.astype(float)

[9]: x.shape

[9]: (1460, 79)

[10]: cols_drop = []

for col in x.columns:


if df[df[col] == 'NA'].shape[0] > 1000:
cols_drop.append(col)

[11]: # droping the columns


x.drop(cols_drop, axis=1, inplace=True)

[12]: for col in x.columns:


print(col, df[df[col] == 'NA'].shape[0])

MSSubClass 0
MSZoning 0
LotFrontage 259
LotArea 0
Street 0
LotShape 0
LandContour 0

4
Utilities 0
LotConfig 0
LandSlope 0
Neighborhood 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
OverallQual 0
OverallCond 0
YearBuilt 0
YearRemodAdd 0
RoofStyle 0
RoofMatl 0
Exterior1st 0
Exterior2nd 0
MasVnrType 8
MasVnrArea 8
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinSF1 0
BsmtFinType2 38
BsmtFinSF2 0
BsmtUnfSF 0
TotalBsmtSF 0
Heating 0
HeatingQC 0
CentralAir 0
Electrical 1
1stFlrSF 0
2ndFlrSF 0
LowQualFinSF 0
GrLivArea 0
BsmtFullBath 0
BsmtHalfBath 0
FullBath 0
HalfBath 0
BedroomAbvGr 0
KitchenAbvGr 0
KitchenQual 0
TotRmsAbvGrd 0
Functional 0
Fireplaces 0

5
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageCars 0
GarageArea 0
GarageQual 81
GarageCond 81
PavedDrive 0
WoodDeckSF 0
OpenPorchSF 0
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
MiscVal 0
MoSold 0
YrSold 0
SaleType 0
SaleCondition 0

[13]: # Replacing NA with nan


x.replace('NA', np.nan, inplace=True)

[14]: x.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 75 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSSubClass 1460 non-null object
1 MSZoning 1460 non-null object
2 LotFrontage 1201 non-null object
3 LotArea 1460 non-null object
4 Street 1460 non-null object
5 LotShape 1460 non-null object
6 LandContour 1460 non-null object
7 Utilities 1460 non-null object
8 LotConfig 1460 non-null object
9 LandSlope 1460 non-null object
10 Neighborhood 1460 non-null object
11 Condition1 1460 non-null object
12 Condition2 1460 non-null object
13 BldgType 1460 non-null object
14 HouseStyle 1460 non-null object
15 OverallQual 1460 non-null object
16 OverallCond 1460 non-null object

6
17 YearBuilt 1460 non-null object
18 YearRemodAdd 1460 non-null object
19 RoofStyle 1460 non-null object
20 RoofMatl 1460 non-null object
21 Exterior1st 1460 non-null object
22 Exterior2nd 1460 non-null object
23 MasVnrType 1452 non-null object
24 MasVnrArea 1452 non-null object
25 ExterQual 1460 non-null object
26 ExterCond 1460 non-null object
27 Foundation 1460 non-null object
28 BsmtQual 1423 non-null object
29 BsmtCond 1423 non-null object
30 BsmtExposure 1422 non-null object
31 BsmtFinType1 1423 non-null object
32 BsmtFinSF1 1460 non-null object
33 BsmtFinType2 1422 non-null object
34 BsmtFinSF2 1460 non-null object
35 BsmtUnfSF 1460 non-null object
36 TotalBsmtSF 1460 non-null object
37 Heating 1460 non-null object
38 HeatingQC 1460 non-null object
39 CentralAir 1460 non-null object
40 Electrical 1459 non-null object
41 1stFlrSF 1460 non-null object
42 2ndFlrSF 1460 non-null object
43 LowQualFinSF 1460 non-null object
44 GrLivArea 1460 non-null object
45 BsmtFullBath 1460 non-null object
46 BsmtHalfBath 1460 non-null object
47 FullBath 1460 non-null object
48 HalfBath 1460 non-null object
49 BedroomAbvGr 1460 non-null object
50 KitchenAbvGr 1460 non-null object
51 KitchenQual 1460 non-null object
52 TotRmsAbvGrd 1460 non-null object
53 Functional 1460 non-null object
54 Fireplaces 1460 non-null object
55 FireplaceQu 770 non-null object
56 GarageType 1379 non-null object
57 GarageYrBlt 1379 non-null object
58 GarageFinish 1379 non-null object
59 GarageCars 1460 non-null object
60 GarageArea 1460 non-null object
61 GarageQual 1379 non-null object
62 GarageCond 1379 non-null object
63 PavedDrive 1460 non-null object
64 WoodDeckSF 1460 non-null object

7
65 OpenPorchSF 1460 non-null object
66 EnclosedPorch 1460 non-null object
67 3SsnPorch 1460 non-null object
68 ScreenPorch 1460 non-null object
69 PoolArea 1460 non-null object
70 MiscVal 1460 non-null object
71 MoSold 1460 non-null object
72 YrSold 1460 non-null object
73 SaleType 1460 non-null object
74 SaleCondition 1460 non-null object
dtypes: object(75)
memory usage: 855.6+ KB

[15]: # Converting numeric data to float


num_features = []
non_num_features = []
for col in x.columns:
try:
x[col] = pd.to_numeric(x[col], errors='raise')
print(f'Column {col} converted to numeric')
num_features.append(col)
except ValueError:
print(f'Column {col} contains non-numeric values')
non_num_features.append(col)

Column MSSubClass converted to numeric


Column MSZoning contains non-numeric values
Column LotFrontage converted to numeric
Column LotArea converted to numeric
Column Street contains non-numeric values
Column LotShape contains non-numeric values
Column LandContour contains non-numeric values
Column Utilities contains non-numeric values
Column LotConfig contains non-numeric values
Column LandSlope contains non-numeric values
Column Neighborhood contains non-numeric values
Column Condition1 contains non-numeric values
Column Condition2 contains non-numeric values
Column BldgType contains non-numeric values
Column HouseStyle contains non-numeric values
Column OverallQual converted to numeric
Column OverallCond converted to numeric
Column YearBuilt converted to numeric
Column YearRemodAdd converted to numeric
Column RoofStyle contains non-numeric values
Column RoofMatl contains non-numeric values
Column Exterior1st contains non-numeric values
Column Exterior2nd contains non-numeric values

8
Column MasVnrType contains non-numeric values
Column MasVnrArea converted to numeric
Column ExterQual contains non-numeric values
Column ExterCond contains non-numeric values
Column Foundation contains non-numeric values
Column BsmtQual contains non-numeric values
Column BsmtCond contains non-numeric values
Column BsmtExposure contains non-numeric values
Column BsmtFinType1 contains non-numeric values
Column BsmtFinSF1 converted to numeric
Column BsmtFinType2 contains non-numeric values
Column BsmtFinSF2 converted to numeric
Column BsmtUnfSF converted to numeric
Column TotalBsmtSF converted to numeric
Column Heating contains non-numeric values
Column HeatingQC contains non-numeric values
Column CentralAir contains non-numeric values
Column Electrical contains non-numeric values
Column 1stFlrSF converted to numeric
Column 2ndFlrSF converted to numeric
Column LowQualFinSF converted to numeric
Column GrLivArea converted to numeric
Column BsmtFullBath converted to numeric
Column BsmtHalfBath converted to numeric
Column FullBath converted to numeric
Column HalfBath converted to numeric
Column BedroomAbvGr converted to numeric
Column KitchenAbvGr converted to numeric
Column KitchenQual contains non-numeric values
Column TotRmsAbvGrd converted to numeric
Column Functional contains non-numeric values
Column Fireplaces converted to numeric
Column FireplaceQu contains non-numeric values
Column GarageType contains non-numeric values
Column GarageYrBlt converted to numeric
Column GarageFinish contains non-numeric values
Column GarageCars converted to numeric
Column GarageArea converted to numeric
Column GarageQual contains non-numeric values
Column GarageCond contains non-numeric values
Column PavedDrive contains non-numeric values
Column WoodDeckSF converted to numeric
Column OpenPorchSF converted to numeric
Column EnclosedPorch converted to numeric
Column 3SsnPorch converted to numeric
Column ScreenPorch converted to numeric
Column PoolArea converted to numeric
Column MiscVal converted to numeric

9
Column MoSold converted to numeric
Column YrSold converted to numeric
Column SaleType contains non-numeric values
Column SaleCondition contains non-numeric values

[16]: x[num_features].head()

[16]: 0 MSSubClass LotFrontage LotArea OverallQual OverallCond YearBuilt \


0 60 65.0 8450 7 5 2003
1 20 80.0 9600 6 8 1976
2 60 68.0 11250 7 5 2001
3 70 60.0 9550 7 5 1915
4 60 84.0 14260 8 5 2000

0 YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF \


0 2003 196.0 706 0 150 856
1 1976 0.0 978 0 284 1262
2 2002 162.0 486 0 434 920
3 1970 0.0 216 0 540 756
4 2000 350.0 655 0 490 1145

0 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath \


0 856 854 0 1710 1 0
1 1262 0 0 1262 0 1
2 920 866 0 1786 1 0
3 961 756 0 1717 1 0
4 1145 1053 0 2198 1 0

0 FullBath HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces \


0 2 1 3 1 8 0
1 2 0 3 1 6 1
2 2 1 3 1 6 1
3 1 0 3 1 7 1
4 2 1 4 1 9 1

0 GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF \


0 2003.0 2 548 0 61
1 1976.0 2 460 298 0
2 2001.0 2 608 0 42
3 1998.0 3 642 0 35
4 2000.0 3 836 192 84

0 EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold


0 0 0 0 0 0 2 2008
1 0 0 0 0 0 5 2007
2 0 0 0 0 0 9 2008
3 272 0 0 0 0 2 2006

10
4 0 0 0 0 0 12 2008

[17]: x[non_num_features].head()

[17]: 0 MSZoning Street LotShape LandContour Utilities LotConfig LandSlope \


0 RL Pave Reg Lvl AllPub Inside Gtl
1 RL Pave Reg Lvl AllPub FR2 Gtl
2 RL Pave IR1 Lvl AllPub Inside Gtl
3 RL Pave IR1 Lvl AllPub Corner Gtl
4 RL Pave IR1 Lvl AllPub FR2 Gtl

0 Neighborhood Condition1 Condition2 BldgType HouseStyle RoofStyle RoofMatl \


0 CollgCr Norm Norm 1Fam 2Story Gable CompShg
1 Veenker Feedr Norm 1Fam 1Story Gable CompShg
2 CollgCr Norm Norm 1Fam 2Story Gable CompShg
3 Crawfor Norm Norm 1Fam 2Story Gable CompShg
4 NoRidge Norm Norm 1Fam 2Story Gable CompShg

0 Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual \


0 VinylSd VinylSd BrkFace Gd TA PConc Gd
1 MetalSd MetalSd None TA TA CBlock Gd
2 VinylSd VinylSd BrkFace Gd TA PConc Gd
3 Wd Sdng Wd Shng None TA TA BrkTil TA
4 VinylSd VinylSd BrkFace Gd TA PConc Gd

0 BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC \


0 TA No GLQ Unf GasA Ex
1 TA Gd ALQ Unf GasA Ex
2 TA Mn GLQ Unf GasA Ex
3 Gd No ALQ Unf GasA Gd
4 TA Av GLQ Unf GasA Ex

0 CentralAir Electrical KitchenQual Functional FireplaceQu GarageType \


0 Y SBrkr Gd Typ NaN Attchd
1 Y SBrkr TA Typ TA Attchd
2 Y SBrkr Gd Typ TA Attchd
3 Y SBrkr Gd Typ Gd Detchd
4 Y SBrkr Gd Typ TA Attchd

0 GarageFinish GarageQual GarageCond PavedDrive SaleType SaleCondition


0 RFn TA TA Y WD Normal
1 RFn TA TA Y WD Normal
2 RFn TA TA Y WD Normal
3 Unf TA TA Y WD Abnorml
4 RFn TA TA Y WD Normal

[18]: len(non_num_features)

11
[18]: 39

[19]: ordinal_features =␣
↪['ExterQual','ExterCond','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2',

'HeatingQC', 'KitchenQual', 'FireplaceQu', 'GarageFinish','GarageQual',␣


↪'GarageCond', 'Functional']

x[ordinal_features].head()

[19]: 0 ExterQual ExterCond BsmtQual BsmtCond BsmtExposure BsmtFinType1 \


0 Gd TA Gd TA No GLQ
1 TA TA Gd TA Gd ALQ
2 Gd TA Gd TA Mn GLQ
3 TA TA TA Gd No ALQ
4 Gd TA Gd TA Av GLQ

0 BsmtFinType2 HeatingQC KitchenQual FireplaceQu GarageFinish GarageQual \


0 Unf Ex Gd NaN RFn TA
1 Unf Ex TA TA RFn TA
2 Unf Ex Gd TA RFn TA
3 Unf Gd Gd Gd Unf TA
4 Unf Ex Gd TA RFn TA

0 GarageCond Functional
0 TA Typ
1 TA Typ
2 TA Typ
3 TA Typ
4 TA Typ

[20]: # cat features


cat_features = list(set(non_num_features) - set(ordinal_features))

[21]: x[cat_features].head()

[21]: 0 Foundation LotShape Condition2 MSZoning Heating LandContour BldgType \


0 PConc Reg Norm RL GasA Lvl 1Fam
1 CBlock Reg Norm RL GasA Lvl 1Fam
2 PConc IR1 Norm RL GasA Lvl 1Fam
3 BrkTil IR1 Norm RL GasA Lvl 1Fam
4 PConc IR1 Norm RL GasA Lvl 1Fam

0 PavedDrive SaleType SaleCondition MasVnrType Exterior2nd Neighborhood \


0 Y WD Normal BrkFace VinylSd CollgCr
1 Y WD Normal None MetalSd Veenker
2 Y WD Normal BrkFace VinylSd CollgCr
3 Y WD Abnorml None Wd Shng Crawfor

12
4 Y WD Normal BrkFace VinylSd NoRidge

0 CentralAir Exterior1st Electrical LotConfig Street LandSlope GarageType \


0 Y VinylSd SBrkr Inside Pave Gtl Attchd
1 Y MetalSd SBrkr FR2 Pave Gtl Attchd
2 Y VinylSd SBrkr Inside Pave Gtl Attchd
3 Y Wd Sdng SBrkr Corner Pave Gtl Detchd
4 Y VinylSd SBrkr FR2 Pave Gtl Attchd

0 HouseStyle Utilities RoofMatl RoofStyle Condition1


0 2Story AllPub CompShg Gable Norm
1 1Story AllPub CompShg Gable Feedr
2 2Story AllPub CompShg Gable Norm
3 2Story AllPub CompShg Gable Norm
4 2Story AllPub CompShg Gable Norm

1.2 Machine Learning Pipeline


Pipelines
[22]: # Ordinal Pipleline
ordinal_pipeline = Pipeline(steps=[
('imputer', SimpleImputer(strategy = 'most_frequent')),
('ordina_encoder', OrdinalEncoder())
])

categorical_pipeline = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value = 'Missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
]
)

numerical_pipeline = Pipeline(steps=[
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())
]
)

Column Transformer
[23]: # column Transformer
preprocessor = ColumnTransformer([
('ordinal', ordinal_pipeline, ordinal_features),
('categorical', categorical_pipeline, cat_features),
('numerical', numerical_pipeline, num_features)
])

# Display

13
preprocessor

[23]: ColumnTransformer(transformers=[('ordinal',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='most_frequent')),
('ordina_encoder',
OrdinalEncoder())]),
['ExterQual', 'ExterCond', 'BsmtQual',
'BsmtCond', 'BsmtExposure', 'BsmtFinType1',
'BsmtFinType2', 'HeatingQC', 'KitchenQual',
'FireplaceQu', 'GarageFinish', 'GarageQual',
'GarageCond', 'Functional']),
('categorical…
'OverallQual', 'OverallCond', 'YearBuilt',
'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1',
'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF',
'1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath',
'FullBath', 'HalfBath', 'BedroomAbvGr',
'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces',
'GarageYrBlt', 'GarageCars', 'GarageArea',
'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch',
…])])

[24]: # spliting the data into train and val sets


x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2,␣
↪random_state=42)

[25]: len(x_train.columns)

[25]: 75

[26]: x_train_transformed = preprocessor.fit_transform(x_train)


x_val_transformed = preprocessor.transform(x_val)
x_train_transformed[:5]

[26]: array([[ 3. , 2. , 3. , …, -0.09274033,


-0.13341669, 1.65006527],
[ 2. , 4. , 2. , …, -0.09274033,
-0.5080097 , 0.89367742],
[ 3. , 4. , 1. , …, -0.09274033,
-0.5080097 , 0.13728958],
[ 3. , 4. , 2. , …, -0.09274033,
-0.13341669, -0.61909827],
[ 3. , 4. , 3. , …, -0.09274033,
-0.5080097 , 1.65006527]])

14
[27]: model_input_shape = x_train_transformed.shape[1]
model_input_shape

[27]: 223

1.3 Model 1: Base


[28]: # buidling a keras model with 2 hidden layers
nn_model1 = Sequential()
nn_model1.add(Dense(16, input_shape=(model_input_shape,))) # input layer
nn_model1.add(Dense(32, activation='relu')) # hidden layer
nn_model1.add(Dense(1)) # Output layer
nn_model1.compile(optimizer = 'Adam', loss = 'mse',
metrics=['mse', tf.keras.metrics.RootMeanSquaredError()])

# training the model


training_histroy1 = nn_model1.fit(x_train_transformed, y_train,
epochs=10, batch_size = 32,␣
↪validation_data=(x_val_transformed, y_val), verbose = 0)

1.4 Model 2: Deeper Layer


[29]: # buidling a keras model with 2 hidden layers
nn_model2 = Sequential()
nn_model2.add(Dense(16, input_shape=(model_input_shape,))) # input layer
nn_model2.add(Dense(32, activation='relu'))
nn_model2.add(Dense(64, activation='relu'))
nn_model2.add(Dense(128, activation='relu'))# hidden layer
nn_model2.add(Dense(1)) # Output layer
nn_model2.compile(optimizer = 'Adam', loss = 'mse',
metrics=['mse', tf.keras.metrics.RootMeanSquaredError()])

# training the model


training_histroy2 = nn_model2.fit(x_train_transformed, y_train,
epochs=10, batch_size = 32,␣
↪validation_data=(x_val_transformed, y_val), verbose=0)

1.5 Model 3: Regularized (Dropout + L2)


[132]: # buidling a keras model with 2 hidden layers
nn_model3 = Sequential()
nn_model3.add(Dense(16, input_shape=(model_input_shape,), activation='relu',␣
↪kernel_regularizer=regularizers.l2(0.01))) # input layer

# nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(32, activation='relu', kernel_regularizer=regularizers.l2(0.
↪01)))

15
nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.
↪01)))

# nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(128, activation='relu', kernel_regularizer=regularizers.
↪l2(0.01))) # hidden layer

# nn_model3.add(Dropout(0.4))
nn_model3.add(Dense(1)) # Output layer
nn_model3.compile(optimizer = tf.keras.optimizers.Adam(learning_rate=0.008),␣
↪loss = 'mse',

metrics=['mse', tf.keras.metrics.RootMeanSquaredError()])

# training the model


training_histroy3 = nn_model3.fit(x_train_transformed, y_train,
epochs=10, batch_size = 32,␣
↪validation_data=(x_val_transformed, y_val), verbose=0)

1.6 Evaluation
[133]: # Histroys to data frame
training_histroy_df1 = pd.DataFrame(training_histroy1.history)
training_histroy_df2 = pd.DataFrame(training_histroy2.history)
training_histroy_df3 = pd.DataFrame(training_histroy3.history)

[134]: # Evaluation
rmse_model1 = nn_model1.evaluate(x_val_transformed, y_val, verbose=0)
rmse_model2 = nn_model2.evaluate(x_val_transformed, y_val, verbose=0)
rmse_model3 = nn_model3.evaluate(x_val_transformed, y_val, verbose=0)

[135]: nn_model1.evaluate(x_val_transformed, y_val, verbose=1)

10/10 �������������������� 0s 4ms/step - loss:


36596183040.0000 - mse: 36596183040.0000 - root_mean_squared_error: 191221.2500

[135]: [37315776512.0, 37315776512.0, 193172.921875]

[136]: # display
print(f'Model 1 RMSE: {rmse_model1[2]:,.0f}')
print(f'Model 2 RMSE: {rmse_model2[2]:,.0f}')
print(f'Model 3 RMSE: {rmse_model3[2]:,.0f}')
#

Model 1 RMSE: 193,173


Model 2 RMSE: 48,126
Model 3 RMSE: 32,417

16
1.7 Root Mean Squared Error (RMSE) Summary
• Model 1 RMSE: 193,173: This model performs very poorly. The high RMSE indicates
that its predictions are far from the actual house prices. This may be due to underfitting.
• Model 2 RMSE: 48,126: A significant improvement over Model 1. The deeper architecture
with ReLU activations allowed the model to learn more complex relationships in the data.
• Model 3 RMSE: 32,417: This the best perfroming model. The model combines:
1. A deeper architecture with more hidden layers and units, allowing the model to capture more
complex relationships in the data.
2. L2 regularization to prevent overfitting by penalizing overly complex weights, which helps the
model generalize better.
3. No dropout layers, as the data set is relatively small, and dropout might not provide significant
additional benefit in this case.
4. A slightly smaller learning rate (0.005) which allows more stable and refined updates.
Together, these lead to better generalization and significantly lower prediction error on unseen data.

17

You might also like