0% found this document useful (0 votes)

20 views46 pages

Ds ML House Price Book

The document outlines a project by Renan Moura focused on predicting house prices using a dataset from Kaggle. It includes sections on exploratory data analysis, data cleaning, machine learning model training, and API development, along with a GitHub repository for code access. The project emphasizes the importance of understanding the dataset and provides step-by-step instructions for executing the analysis and model training.

Uploaded by

SAIKAT PAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views46 pages

Ds ML House Price Book

Uploaded by

SAIKAT PAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Renan Moura - renanmf.

com

Data Science and Machine Learning

Project - House Prices Dataset
Renan Moura

2
Renan Moura - renanmf.com

Table of Contents
1. Preface
2. Github Repository
3. Exploratory Data Analysis
4. Data Cleaning Script
5. Machine Learning Model
6. API
7. Conclusion

3
Renan Moura - renanmf.com

Preface
I’ve been working with software for almost 10 years now and
do my best to share what I learned throughout the years.

I’m Renan Moura and I write about Software Development on

renanmf.com.

You can also find me as @renanmouraf on:

Twitter: https://twitter.com/renanmouraf
LinkedIn: https://www.linkedin.com/in/renanmouraf
Instagram: https://www.instagram.com/renanmouraf

4
Renan Moura - renanmf.com

Github Repository
You can download the complete code in the Github
Repository.

In the repository you will find:

requirements.txt: The packages you need to install using

pip
raw_data.csv: The raw data we are using on this project
Exploratory-Data-Analysis-House-Prices.ipynb: The
Jupyter Notebook with te Exploratory Data Analysis
data_cleaning.py: The script that clean the data
train_model.py: The script to train the Machine Learning
Model using the cleaned data
predict.py: The file with the HousePriceModel class that
we use to load the ML model and make the predictions
api.py: The API created with the framework FastAPI
test_api.py: The script to test the API

To use the data and code in the repository, follow the steps in
the next sections.

Environment and Packages

Create a virtual environment to isolate your Python project:

python3 -m venv venv

Activate the virtual environment

source ./venv/bin/activate

Install the necessary packages:

5
Renan Moura - renanmf.com

pip install -r requirements.txt

You should see a message similar to this at the end:

Successfully installed Babel-2.9.0 Jinja2-2.11.3 MarkupSafe-

1.1.1 Pygments-2.8.0 Send2Trash-1.5.0 anyio-2.1.0 argon2-cffi-
20.1.0 async-generator-1.10 attrs-20.3.0 backcall-0.2.0
bleach-3.3.0 certifi-2020.12.5 cffi-1.14.5 chardet-4.0.0
click-7.1.2 decorator-4.4.2 defusedxml-0.6.0 entrypoints-0.3
fastapi-0.63.0 h11-0.12.0 idna-2.10 ipykernel-5.4.3 ipython-
7.20.0 ipython-genutils-0.2.0 jedi-0.18.0 joblib-1.0.1 json5-
0.9.5 jsonschema-3.2.0 jupyter-client-6.1.11 jupyter-core-
4.7.1 jupyter-server-1.3.0 jupyterlab-3.0.7 jupyterlab-
pygments-0.1.2 jupyterlab-server-2.2.0 mistune-0.8.4
nbclassic-0.2.6 nbclient-0.5.2 nbconvert-6.0.7 nbformat-5.1.2
nest-asyncio-1.5.1 notebook-6.2.0 numpy-1.20.1 packaging-20.9
pandas-1.2.2 pandocfilters-1.4.3 parso-0.8.1 pexpect-4.8.0
pickleshare-0.7.5 prometheus-client-0.9.0 prompt-toolkit-
3.0.16 ptyprocess-0.7.0 pycparser-2.20 pydantic-1.7.3
pyparsing-2.4.7 pyrsistent-0.17.3 python-dateutil-2.8.1 pytz-
2021.1 pyzmq-22.0.3 requests-2.25.1 scikit-learn-0.24.1 scipy-
1.6.0 six-1.15.0 sniffio-1.2.0 starlette-0.13.6 terminado-
0.9.2 testpath-0.4.4 threadpoolctl-2.1.0 tornado-6.1
traitlets-5.0.5 urllib3-1.26.3 uvicorn-0.13.3 wcwidth-0.2.5
webencodings-0.5.1

EDA (Exploratory Data Analysis)

To check the EDA (Exploratory Data Analysis):

jupyter-notebook Exploratory-Data-Analysis-House-Prices.ipynb

Then, with the Jupyter Notebook open go to Cell > Run All
to run all the commands.

Clean the Data, Machine Learning Model, Run

API
The following steps must be executed in sequence.

6
Renan Moura - renanmf.com

Clean the Data

To perform the cleaning process on the raw data:

python data_cleaning.py

Expected output:

Original Data: (1168, 81)

Columns with missing values: 0
Series([], dtype: int64)
After Cleaning: (1168, 73)

This will generate the ‘cleaned_data.csv’.

Create the Machine Learning Model

To train the model:

python train_model.py

Expected output:

Train data for modeling: (934, 73)

Test data for predictions: (234, 73)
Training the model ...
Testing the model ...
Average Price Test: 175652.0128205128
RMSE: 11098.009355519898
Model saved at model.pkl

This will create the ‘train.csv’, ‘test.csv’, and ‘model.pkl’.

Run and test the API

7
Renan Moura - renanmf.com

To run the API:

uvicorn api:app

Expected output:

INFO: Started server process [56652]

INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press
CTRL+C to quit)

To test the API, on another terminal, activate the virtual

environment again (this time you already have the packages
installed):

source ./venv/bin/activate

And then run:

python test_api.py

Expected output:

The actual Sale Price: 109000

The predicted Sale Price: 109000.01144237864

8
Renan Moura - renanmf.com

Exploratory Data Analysis

Let’s begin with an Exploratory Data Analysis, a.k.a EDA, of
the dataset “House Prices: Advanced Regression
Techniques”.

We will:

Understand the problem

Explore the data and deal with missing values
Select and transform variables, especially categorical
ones

The Problem
This is the description of the problem on Kaggle:

"Ask a home buyer to describe their dream house, and they

probably won’t begin with the height of the basement ceiling
or the proximity to an east-west railroad. But this playground
competition’s dataset proves that much more influences price
negotiations than the number of bedrooms or a white-picket
fence.

With 79 explanatory variables describing (almost) every

aspect of residential homes in Ames, Iowa, this competition
challenges you to predict the final price of each home."

So, we are going to explore the dataset, try to get some

insights from it, and use some tools to transform the data into
formats that make more sense.

Initial Exploration and First Insights

In this section, we are going to make an initial exploration of

9
Renan Moura - renanmf.com

the dataset.

This EDA was performed on a Jupyter Notebook.

Importing Libraries

We begin by importing the libs we are going to use:

The standard math module provides access to the

mathematical functions.
The NumPy lib is fundamental for any kind of scientific
computing with Python.
pandas is a must-have tool for data analysis and
manipulation.
matplotlib is the most complete package in Python when
it comes to data visualizations.
seaborn is based on matplotlib as a higher-level set of
visualization tools, not as powerful as matplotlib, but
much easier to work with and delivers a lot with less
work.

import math
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

%matplotlib inline

Loading Data

Since we have tabular data, we are going to use pandas to

load the data and take a first look at it.

To load the data, since the format is CSV (Comma-Separated

10
Renan Moura - renanmf.com

Values), we use the read_csv() function from pandas.

Then we print its shape, which is 1168x81, meaning we have

1168 rows (records) and 81 columns (features).

Actually, we have 1169 rows in the CSV file, but the header
that describes the columns doesn’t count.

And we actually have 79 features since one of the columns is

SalePrice, which is the column we will try to predict in a
model, and we also will not use the column Id and will get rid
of it later.

train = pd.read_csv('raw_data.csv')
train.shape

(1168, 81)

Looking at the Data

First, I recommend you to read this brief description of each
column.

Using the head() function from pandas with an argument of 3,

we can take a look at the first 3 records.

The .T means Transpose, this way we visualize rows as

columns and vice-versa.

Notice how it doesn’t show all of the columns in the middle

and only displays ... because there are too many of them.

train.head(3).T

0 1 2

11
Renan Moura - renanmf.com

Id 893 1106 414

MSSubClass 20 60 30
MSZoning RL RL RM
LotFrontage 70 98 56
LotArea 8414 12256 8960
… … … …
MoSold 2 4 3
YrSold 2006 2010 2010
SaleType WD WD WD
SaleCondition Normal Normal Normal
SalePrice 154500 325000 115000

81 rows × 3 columns

The info() method from pandas will give you a summary of

the data.

Notice how Alley has 70 non-null values, meaning it doesn’t

have a value for most of the 1168 records.

We can also visualize the data types.

train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1168 entries, 0 to 1167
Data columns (total 81 columns):
Id 1168 non-null int64
MSSubClass 1168 non-null int64
MSZoning 1168 non-null object
LotFrontage 964 non-null float64
LotArea 1168 non-null int64
Street 1168 non-null object
Alley 70 non-null object
LotShape 1168 non-null object
LandContour 1168 non-null object
Utilities 1168 non-null object
LotConfig 1168 non-null object
LandSlope 1168 non-null object
Neighborhood 1168 non-null object
Condition1 1168 non-null object
Condition2 1168 non-null object
BldgType 1168 non-null object

12
Renan Moura - renanmf.com

HouseStyle 1168 non-null object

OverallQual 1168 non-null int64
OverallCond 1168 non-null int64
YearBuilt 1168 non-null int64
YearRemodAdd 1168 non-null int64
RoofStyle 1168 non-null object
RoofMatl 1168 non-null object
Exterior1st 1168 non-null object
Exterior2nd 1168 non-null object
MasVnrType 1160 non-null object
MasVnrArea 1160 non-null float64
ExterQual 1168 non-null object
ExterCond 1168 non-null object
Foundation 1168 non-null object
BsmtQual 1138 non-null object
BsmtCond 1138 non-null object
BsmtExposure 1137 non-null object
BsmtFinType1 1138 non-null object
BsmtFinSF1 1168 non-null int64
BsmtFinType2 1137 non-null object
BsmtFinSF2 1168 non-null int64
BsmtUnfSF 1168 non-null int64
TotalBsmtSF 1168 non-null int64
Heating 1168 non-null object
HeatingQC 1168 non-null object
CentralAir 1168 non-null object
Electrical 1167 non-null object
1stFlrSF 1168 non-null int64
2ndFlrSF 1168 non-null int64
LowQualFinSF 1168 non-null int64
GrLivArea 1168 non-null int64
BsmtFullBath 1168 non-null int64
BsmtHalfBath 1168 non-null int64
FullBath 1168 non-null int64
HalfBath 1168 non-null int64
BedroomAbvGr 1168 non-null int64
KitchenAbvGr 1168 non-null int64
KitchenQual 1168 non-null object
TotRmsAbvGrd 1168 non-null int64
Functional 1168 non-null object
Fireplaces 1168 non-null int64
FireplaceQu 617 non-null object
GarageType 1099 non-null object
GarageYrBlt 1099 non-null float64
GarageFinish 1099 non-null object
GarageCars 1168 non-null int64
GarageArea 1168 non-null int64
GarageQual 1099 non-null object
GarageCond 1099 non-null object
PavedDrive 1168 non-null object
WoodDeckSF 1168 non-null int64

13
Renan Moura - renanmf.com

OpenPorchSF 1168 non-null int64

EnclosedPorch 1168 non-null int64
3SsnPorch 1168 non-null int64
ScreenPorch 1168 non-null int64
PoolArea 1168 non-null int64
PoolQC 4 non-null object
Fence 217 non-null object
MiscFeature 39 non-null object
MiscVal 1168 non-null int64
MoSold 1168 non-null int64
YrSold 1168 non-null int64
SaleType 1168 non-null object
SaleCondition 1168 non-null object
SalePrice 1168 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 739.2+ KB

The describe() method is good to have the first insights of

the data.

It automatically gives you descriptive statistics for each

feature: number of non-NA/null observations, mean, standard
deviation, the min value, the quartiles, and the max value.

Note that the calculations don’t take NaN values into

consideration.

For LotFrontage, for instance, it uses only the 964 non-null

values, and excludes the other 204 null observations.

train.describe().T

count mean std min 25% 50% 75% max

Id 1168.0 720.240582 420.237685 1.0 355.75 716.5 1080.25 1460.0
MSSubClass 1168.0 56.699486 41.814065 20.0 20.00 50.0 70.00 190.0
LotFrontage 964.0 70.271784 25.019386 21.0 59.00 69.5 80.00 313.0
LotArea 1168.0 10597.720890 10684.958323 1477.0 7560.00 9463.0 11601.50 215245.0
OverallQual 1168.0 6.095034 1.403402 1.0 5.00 6.0 7.00 10.0
OverallCond 1168.0 5.594178 1.116842 1.0 5.00 5.0 6.00 9.0
YearBuilt 1168.0 1971.120719 30.279560 1872.0 1954.00 1972.0 2000.00 2009.0
YearRemodAdd 1168.0 1985.200342 20.498566 1950.0 1968.00 1994.0 2004.00 2010.0
MasVnrArea 1160.0 104.620690 183.996031 0.0 0.00 0.0 166.25 1600.0
BsmtFinSF1 1168.0 444.345890 466.278751 0.0 0.00 384.0 706.50 5644.0
BsmtFinSF2 1168.0 46.869863 162.324086 0.0 0.00 0.0 0.00 1474.0
BsmtUnfSF 1168.0 562.949486 445.605458 0.0 216.00 464.5 808.50 2336.0

14
Renan Moura - renanmf.com

TotalBsmtSF 1168.0 1054.165240 448.848911 0.0 792.75 984.0 1299.00 6110.0

1stFlrSF 1168.0 1161.268836 393.541120 334.0 873.50 1079.5 1392.00 4692.0
2ndFlrSF 1168.0 351.218322 437.334802 0.0 0.00 0.0 730.50 2065.0
LowQualFinSF 1168.0 5.653253 48.068312 0.0 0.00 0.0 0.00 572.0
GrLivArea 1168.0 1518.140411 534.904019 334.0 1133.25 1467.5 1775.25 5642.0
BsmtFullBath 1168.0 0.426370 0.523376 0.0 0.00 0.0 1.00 3.0
BsmtHalfBath 1168.0 0.061644 0.244146 0.0 0.00 0.0 0.00 2.0
FullBath 1168.0 1.561644 0.555074 0.0 1.00 2.0 2.00 3.0
HalfBath 1168.0 0.386130 0.504356 0.0 0.00 0.0 1.00 2.0
BedroomAbvGr 1168.0 2.865582 0.817491 0.0 2.00 3.0 3.00 8.0
KitchenAbvGr 1168.0 1.046233 0.218084 1.0 1.00 1.0 1.00 3.0
TotRmsAbvGrd 1168.0 6.532534 1.627412 2.0 5.00 6.0 7.00 14.0
Fireplaces 1168.0 0.612158 0.640872 0.0 0.00 1.0 1.00 3.0
GarageYrBlt 1099.0 1978.586897 24.608158 1900.0 1962.00 1980.0 2002.00 2010.0
GarageCars 1168.0 1.761130 0.759039 0.0 1.00 2.0 2.00 4.0
GarageArea 1168.0 473.000000 218.795260 0.0 318.75 479.5 577.00 1418.0
WoodDeckSF 1168.0 92.618151 122.796184 0.0 0.00 0.0 168.00 736.0
OpenPorchSF 1168.0 45.256849 64.120769 0.0 0.00 24.0 68.00 523.0
EnclosedPorch 1168.0 20.790240 58.308987 0.0 0.00 0.0 0.00 330.0
3SsnPorch 1168.0 3.323630 27.261055 0.0 0.00 0.0 0.00 407.0
ScreenPorch 1168.0 14.023116 52.498520 0.0 0.00 0.0 0.00 410.0
PoolArea 1168.0 1.934075 33.192538 0.0 0.00 0.0 0.00 648.0
MiscVal 1168.0 42.092466 538.941473 0.0 0.00 0.0 0.00 15500.0
MoSold 1168.0 6.377568 2.727010 1.0 5.00 6.0 8.00 12.0
YrSold 1168.0 2007.815068 1.327339 2006.0 2007.00 2008.0 2009.00 2010.0
SalePrice 1168.0 181081.876712 81131.228007 34900.0 129975.00 162950.0 214000.00 755000.0

Data Cleaning
In this section, we will perform some Data Cleaning.

The id column

The id column is only a dumb identification with no

correlation to SalePrice.

So let’s remove the id:

train.drop(columns=['Id'], inplace=True)

Missing values

When we used info() to see the data summary, we could

see many columns had a bunch of missing data.

15
Renan Moura - renanmf.com

Let’s see which columns have missing values and the

proportion in each one of them.

isna() from pandas will return the missing values for each
column, then the sum() function will add them up to give you a
total.

columns_with_miss = train.isna().sum()
#filtering only the columns with at least 1 missing value
columns_with_miss = columns_with_miss[columns_with_miss!=0]
#The number of columns with missing values
print('Columns with missing values:', len(columns_with_miss))
#sorting the columns by the number of missing values descending
columns_with_miss.sort_values(ascending=False)

Columns with missing values: 19

PoolQC 1164
MiscFeature 1129
Alley 1098
Fence 951
FireplaceQu 551
LotFrontage 204
GarageYrBlt 69
GarageType 69
GarageFinish 69
GarageQual 69
GarageCond 69
BsmtFinType2 31
BsmtExposure 31
BsmtFinType1 30
BsmtCond 30
BsmtQual 30
MasVnrArea 8
MasVnrType 8
Electrical 1
dtype: int64

Out of 80 columns, 19 have missing values.

Missing values per se it not a big problem, but columns with a

16
Renan Moura - renanmf.com

high number of missing values can cause distortions.

This is the case for:

PoolQC: Pool quality

MiscFeature: Miscellaneous feature not covered in other
categories
Alley: Type of alley access to property
Fence: Fence quality

Let’s drop them from the dataset for now.

# Removing columns
train.drop(columns=['PoolQC', 'MiscFeature', \
'Alley', 'Fence'], inplace=True)

FireplaceQu has 551 missing values, which is also pretty

high.

In this case, the missing values have meaning, which is “NO

Fireplace”.

Fireplace has the following categories:

Ex Excellent - Exceptional Masonry Fireplace

Gd Good - Masonry Fireplace in main level
TA Average - Prefabricated Fireplace in main living area
or Masonry Fireplace in basement
Fa Fair - Prefabricated Fireplace in basement
Po Poor - Ben Franklin Stove
NA No Fireplace

Let’s check the correlation between FireplaceQu and

SalePrice, to see how important this feature is in order to
determine the price.

First, we will replace the missing values for 0.

17
Renan Moura - renanmf.com

Then, we encode the categories into numbers from 1 to 5.

train['FireplaceQu'].fillna(0, inplace=True)
train['FireplaceQu'].replace({'Po': 1, 'Fa': 2, \
'TA': 3, 'Gd': 4, 'Ex': 5}, inplace=True)

Using a barplot, we can see how the category of the

FirePlace increases the value of SalePrice.

It is also worth noting how much higher the value is when the
house has an Excellent fireplace.

This means we should keep FireplaceQu as feature.

sns.set(style="whitegrid")
sns.barplot(x='FireplaceQu', y="SalePrice", data=train)

Missing values in numeric columns

Another feature with a high number of missing values is

LotFrontage with a count 204.

Let’s see the correlation between the remaining features with

missing values and the SalePrice.

18
Renan Moura - renanmf.com

columns_with_miss = train.isna().sum()
columns_with_miss = columns_with_miss[columns_with_miss!=0]
c = list(columns_with_miss.index)
c.append('SalePrice')
train[c].corr()

LotFrontage MasVnrArea GarageYrBlt SalePrice

LotFrontage 1.000000 0.196649 0.089542 0.371839
MasVnrArea 0.196649 1.000000 0.253348 0.478724
GarageYrBlt 0.089542 0.253348 1.000000 0.496575
SalePrice 0.371839 0.478724 0.496575 1.000000

Note that LotFrontage, MasVnrArea, and GarageYrBlt have a

positive correlation with SalePrice, but this correlation isn’t
very strong.

To simplify this analisys, we will remove theses columns for

now:

cols_to_be_removed = ['LotFrontage', 'GarageYrBlt', \

'MasVnrArea']
train.drop(columns=cols_to_be_removed, inplace=True)

Finally, these are the remaining columns with missing values:

columns_with_miss = train.isna().sum()
columns_with_miss = columns_with_miss[columns_with_miss!=0]
print(f'Columns with missing values: {len(columns_with_miss)}')
columns_with_miss.sort_values(ascending=False)

Columns with missing values: 11

GarageCond 69
GarageQual 69
GarageFinish 69
GarageType 69
BsmtFinType2 31
BsmtExposure 31

19
Renan Moura - renanmf.com

BsmtFinType1 30
BsmtCond 30
BsmtQual 30
MasVnrType 8
Electrical 1
dtype: int64

Categorical variables
Let’s work on the categorical variables of our dataset.

Dealing with missing values

Filling Categorical NaN that we know how to fill due to the

description file.

# Fills NA in place of NaN

for c in ['GarageType', 'GarageFinish', 'BsmtFinType2', \
'BsmtExposure', 'BsmtFinType1']:
train[c].fillna('NA', inplace=True)

# Fills None in place of NaN

train['MasVnrType'].fillna('None', inplace=True)

With this have only 5 columns with missing values left in our
dataset.

Columns with missing values: 5

GarageCond 69
GarageQual 69
BsmtCond 30
BsmtQual 30

20
Renan Moura - renanmf.com

Electrical 1
dtype: int64

Ordinal

Also by reading the description file, we can identify other

variables that have a similar system to FireplaceQu to
categorize the quality: Poor, Good, Excellent, etc.

We are going to replicate the treatment we gave to

FireplaceQu to these variables according to the following
descriptions:

ExterQual: Evaluates the quality of the material on the

exterior

Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor

ExterCond: Evaluates the present condition of the material on

the exterior

Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor

BsmtQual: Evaluates the height of the basement

Ex Excellent (100+ inches)

Gd Good (90-99 inches)
TA Typical (80-89 inches)

21
Renan Moura - renanmf.com

Fa Fair (70-79 inches)

Po Poor ( < 70 inches)
NA No Basement

BsmtCond: Evaluates the general condition of the basement

Ex Excellent
Gd Good
TA Typical - slight dampness allowed
Fa Fair - dampness or some cracking or settling
Po Poor - Severe cracking, settling, or wetness
NA No Basement

HeatingQC: Heating quality and condition

Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor

KitchenQual: Kitchen quality

Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor

GarageQual: Garage quality

Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
NA No Garage

22
Renan Moura - renanmf.com

GarageCond: Garage condition

Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
NA No Garage

ord_cols = ['ExterQual', 'ExterCond', 'BsmtQual', 'BsmtCond', \

'HeatingQC', 'KitchenQual', 'GarageQual', 'GarageCond']
for col in ord_cols:
train[col].fillna(0, inplace=True)
train[col].replace({'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, \
'Ex': 5}, inplace=True)

Let’s now plot the correlation of these variables with

SalePrice.

ord_cols = ['ExterQual', 'ExterCond', 'BsmtQual', 'BsmtCond', \

'HeatingQC', 'KitchenQual', 'GarageQual', 'GarageCond']
f, axes = plt.subplots(2, 4, figsize=(15, 10), sharey=True)

for r in range(0, 2):

for c in range(0, 4):
sns.barplot(x=ord_cols.pop(), y="SalePrice", \
data=train, ax=axes[r][c])

plt.tight_layout()
plt.show()

23
Renan Moura - renanmf.com

As you can see, the better the category of a variable, the

higher the price, which means these variables will be
important for a prediction model.

Nominal
Other categorical variables don’t seem to follow any clear
ordering.

Let’s see how many values these columns can assume:

cols = train.columns
num_cols = train._get_numeric_data().columns
nom_cols = list(set(cols) - set(num_cols))
print(f'Nominal columns: {len(nom_cols)}')

value_counts = {}
for c in nom_cols:
value_counts[c] = len(train[c].value_counts())

sorted_value_counts = {k: v for k, v in \

24
Renan Moura - renanmf.com

sorted(value_counts.items(), key=lambda item: item[1])}

sorted_value_counts

Nominal columns: 31

{'CentralAir': 2,
'Street': 2,
'Utilities': 2,
'LandSlope': 3,
'PavedDrive': 3,
'MasVnrType': 4,
'GarageFinish': 4,
'LotShape': 4,
'LandContour': 4,
'BsmtCond': 5,
'MSZoning': 5,
'Electrical': 5,
'Heating': 5,
'BldgType': 5,
'BsmtExposure': 5,
'LotConfig': 5,
'Foundation': 6,
'RoofStyle': 6,
'SaleCondition': 6,
'BsmtFinType2': 7,
'Functional': 7,
'GarageType': 7,
'BsmtFinType1': 7,
'RoofMatl': 7,
'HouseStyle': 8,
'Condition2': 8,
'SaleType': 9,
'Condition1': 9,
'Exterior1st': 15,
'Exterior2nd': 16,
'Neighborhood': 25}

Some categorical variables can assume several different

values like Neighborhood.

To simplify, let’s analyze only variables with 6 different values

or less.

25
Renan Moura - renanmf.com

nom_cols_less_than_6 = []
for c in nom_cols:
n_values = len(train[c].value_counts())
if n_values < 7:
nom_cols_less_than_6.append(c)

print(f'Nominal columns with less than 6 values: \

{len(nom_cols_less_than_6)}')

Nominal columns with less than 6 values: 19

Plotting against SalePrice to have a better idea of how they

affect it:

ncols = 3
nrows = math.ceil(len(nom_cols_less_than_6) / ncols)
f, axes = plt.subplots(nrows, ncols, figsize=(15, 30))

for r in range(0, nrows):

for c in range(0, ncols):
if not nom_cols_less_than_6:
continue
sns.barplot(x=nom_cols_less_than_6.pop(), \
y="SalePrice", data=train, ax=axes[r][c])

plt.tight_layout()
plt.show()

26
Renan Moura - renanmf.com

27
Renan Moura - renanmf.com

We can see a good correlation of many of these columns with

the target variable.

For now, let’s keep them.

We still have NaN in ‘Electrical’.

As we could see in the plot above, ‘SBrkr’ is the most

frequent value in ‘Electrical’.

Let’s use this value to replace NaN in Electrical.

# Inputs more frequent value in place of NaN

train['Electrical'].fillna('SBrkr', inplace=True)

Zero values

Another quick check is to see how many columns have lots of

data equals to 0.

train.isin([0]).sum().sort_values(ascending=False).head(25)

PoolArea 1164
LowQualFinSF 1148
3SsnPorch 1148
MiscVal 1131
BsmtHalfBath 1097
ScreenPorch 1079
BsmtFinSF2 1033
EnclosedPorch 1007
HalfBath 727
BsmtFullBath 686
2ndFlrSF 655
WoodDeckSF 610
Fireplaces 551
FireplaceQu 551
OpenPorchSF 534

28
Renan Moura - renanmf.com

BsmtFinSF1 382
BsmtUnfSF 98
GarageCars 69
GarageArea 69
GarageCond 69
GarageQual 69
TotalBsmtSF 30
BsmtCond 30
BsmtQual 30
FullBath 8
dtype: int64

In this case, even though there are many 0’s, they have
meaning.

For instance, PoolArea (Pool area in square feet) equals 0

means that the house doesn’t have any pool area.

This is important information correlated to the house and

thus, we are going to keep them.

Outliers
We can also take a look at the outliers in the numeric
variables.

# Get only numerical columns

numerical_columns = \
list(train.dtypes[train.dtypes == 'int64'].index)

len(numerical_columns)

# Create the plot grid

rows = 7
columns = 6

fig, axes = plt.subplots(rows,columns, figsize=(30,30))

29
Renan Moura - renanmf.com

x, y = 0, 0

for i, column in enumerate(numerical_columns):

sns.boxplot(x=train[column], ax=axes[x, y])

if y < columns-1:
y += 1
elif y == columns-1:
x += 1
y = 0
else:
y += 1

30
Renan Moura - renanmf.com

There are a lot of outliers in the dataset.

But, if we check the data description file, we see that,

actually, some numerical variables, are categorical variables
that were saved (codified) as numbers.

So, some of these data points that seem to be outliers are,

actually, categorical data with only one example of some
category.

Let’s keep these outliers.

31
Renan Moura - renanmf.com

Saving cleaned data

Let’s see how the cleaned data looks like and how many
columns we have left.

We have no more missing values:

Columns with missing values: 0

Series([], dtype: int64)

After cleaning the data, we are left with 73 columns out of the
initial 81.

train.shape

(1168, 73)

Let’s take a look at the first 3 records of the cleaned data.

train.head(3).T

0 1 2
MSSubClass 20 60 30
MSZoning RL RL RM
LotArea 8414 12256 8960
Street Pave Pave Pave
LotShape Reg IR1 Reg
… … … …
MoSold 2 4 3
YrSold 2006 2010 2010
SaleType WD WD WD

32
Renan Moura - renanmf.com

SaleCondition Normal Normal Normal

SalePrice 154500 325000 115000

73 rows × 3 columns

We can see a summary of the data showing that, for all the
1168 records, there isn’t a single missing (null) value.

train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1168 entries, 0 to 1167
Data columns (total 73 columns):
MSSubClass 1168 non-null int64
MSZoning 1168 non-null object
LotArea 1168 non-null int64
Street 1168 non-null object
LotShape 1168 non-null object
LandContour 1168 non-null object
Utilities 1168 non-null object
LotConfig 1168 non-null object
LandSlope 1168 non-null object
Neighborhood 1168 non-null object
Condition1 1168 non-null object
Condition2 1168 non-null object
BldgType 1168 non-null object
HouseStyle 1168 non-null object
OverallQual 1168 non-null int64
OverallCond 1168 non-null int64
YearBuilt 1168 non-null int64
YearRemodAdd 1168 non-null int64
RoofStyle 1168 non-null object
RoofMatl 1168 non-null object
Exterior1st 1168 non-null object
Exterior2nd 1168 non-null object
MasVnrType 1168 non-null object
ExterQual 1168 non-null int64
ExterCond 1168 non-null int64
Foundation 1168 non-null object
BsmtQual 1168 non-null int64
BsmtCond 1168 non-null object
BsmtExposure 1168 non-null object
BsmtFinType1 1168 non-null object
BsmtFinSF1 1168 non-null int64
BsmtFinType2 1168 non-null object
BsmtFinSF2 1168 non-null int64

33
Renan Moura - renanmf.com

BsmtUnfSF 1168 non-null int64

TotalBsmtSF 1168 non-null int64
Heating 1168 non-null object
HeatingQC 1168 non-null int64
CentralAir 1168 non-null object
Electrical 1168 non-null object
1stFlrSF 1168 non-null int64
2ndFlrSF 1168 non-null int64
LowQualFinSF 1168 non-null int64
GrLivArea 1168 non-null int64
BsmtFullBath 1168 non-null int64
BsmtHalfBath 1168 non-null int64
FullBath 1168 non-null int64
HalfBath 1168 non-null int64
BedroomAbvGr 1168 non-null int64
KitchenAbvGr 1168 non-null int64
KitchenQual 1168 non-null int64
TotRmsAbvGrd 1168 non-null int64
Functional 1168 non-null object
Fireplaces 1168 non-null int64
FireplaceQu 1168 non-null int64
GarageType 1168 non-null object
GarageFinish 1168 non-null object
GarageCars 1168 non-null int64
GarageArea 1168 non-null int64
GarageQual 1168 non-null int64
GarageCond 1168 non-null int64
PavedDrive 1168 non-null object
WoodDeckSF 1168 non-null int64
OpenPorchSF 1168 non-null int64
EnclosedPorch 1168 non-null int64
3SsnPorch 1168 non-null int64
ScreenPorch 1168 non-null int64
PoolArea 1168 non-null int64
MiscVal 1168 non-null int64
MoSold 1168 non-null int64
YrSold 1168 non-null int64
SaleType 1168 non-null object
SaleCondition 1168 non-null object
SalePrice 1168 non-null int64
dtypes: int64(42), object(31)
memory usage: 666.2+ KB

Finally, let’s save the cleaned data in a separate file.

train.to_csv('train-cleaned.csv')

34
Renan Moura - renanmf.com

Conclusions
We dealt with missing values and removed the following
columns: ‘Id’, ‘PoolQC’, ‘MiscFeature’, ‘Alley’, ‘Fence’,
‘LotFrontage’, ‘GarageYrBlt’, ‘MasVnrArea’.

We also:

Replaced the NaN with NA in the following columns:

‘GarageType’, ‘GarageFinish’, ‘BsmtFinType2’,
‘BsmtExposure’, ‘BsmtFinType1’.
Replaced the NaN with None in ‘MasVnrType’.
Imputed the most frequent value in place of NaN in
‘Electrical’.

Please note that the removed columns are not useless and
may contribute to the final model.

After the first round of analysis and testing of the hypothesis,

if you ever need to improve your future model further, you
can consider reevaluating these columns and understand
them better to see how they fit into the problem.

Data Analysis and Machine Learning is NOT a straight path.

It is a process where you iterate and keep testing ideas until

you have the result you want, or until find out the result you
need is not possible.

We are going to use this data to create our Machine Learning

model and predict the house prices in the next post of this
series.

35
Renan Moura - renanmf.com

Data Cleaning Script

This chapter converts the final decisions made to clean the
data in the Exploratory Data Analysis into a single Python
script that will take the data in CSV format and write the
cleaned data also as a CSV.

Code
You can save the script on a file ‘data_cleaning.py’ and
execute it directly with python3 data_cleaning.py or python
data_cleaning.py, depending on your installation.

The script expects the ‘raw_data.csv’.

The output will be a file named ‘cleaned_data.csv’.

It will also print the shape of the original data and the shape
of the new cleaned data.

Original Data: (1168, 81)

After Cleaning: (1168, 73)

The cleaning script:

import os
import pandas as pd

# writes the output on 'cleaned_data.csv' by default

def clean_data(df, output_file='cleaned_data.csv'):

# Removes columns with missing values issues

cols_to_be_removed = ['Id', 'PoolQC', 'MiscFeature', \
'Alley', 'Fence', 'LotFrontage',
'GarageYrBlt', 'MasVnrArea']
df.drop(columns=cols_to_be_removed, inplace=True)

36
Renan Moura - renanmf.com

# Transforms ordinal columns to numerical

ordinal_cols = ['FireplaceQu', 'ExterQual', 'ExterCond', \
'BsmtQual', 'BsmtCond',
'HeatingQC', 'KitchenQual', 'GarageQual', 'GarageCond']
for col in ordinal_cols:
df[col].fillna(0, inplace=True)
df[col].replace({'Po': 1, 'Fa': 2, 'TA': 3, \
'Gd': 4, 'Ex': 5}, inplace=True)

# Replace the NaN with NA

for c in ['GarageType', 'GarageFinish', \
'BsmtFinType2', 'BsmtExposure', 'BsmtFinType1']:
df[c].fillna('NA', inplace=True)

# Replace the NaN with None

df['MasVnrType'].fillna('None', inplace=True)

# Imputes with most frequent value

df['Electrical'].fillna('SBrkr', inplace=True)

# Saves a copy
cleaned_data = os.path.join(output_file)
df.to_csv(cleaned_data)

return df

if __name__ == "__main__":
# Reads the file train.csv
train_file = os.path.join('train.csv')

if os.path.exists(train_file):
df = pd.read_csv(train_file)
print(f'Original Data: {df.shape}')
cleaned_df = clean_data(df)
print(f'After Cleaning: {cleaned_df.shape}')
else:
print(f'File not found {train_file}')

37
Renan Moura - renanmf.com

Machine Learning Model

Now we are going to use the ‘cleaned_data.csv’ file
generated with the data cleaning script to generate the
Machine Learning Model.

Training the Machine Learning Model

You can save the script on a file train_model.py and execute
it directly with python3 train_model.py or python
train_model.py, depending on your installation.

It expects you to have a file called ‘cleaned_data.csv’.

The script will output three other files:

model.pkl: the model in binary format generated by pickle

that we can reuse later
train.csv: the train data after the split of the original data
into train and test
test.csv: the test data after the split of the original data
into train and test

The output on the terminal will be similar to this:

Train data for modeling: (934, 74)

Test data for predictions: (234, 74)
Training the model ...
Testing the model ...
Average Price Test: 175652.0128205128
RMSE: 10552.188828855931
Model saved at model.pkl

It means the models used 934 data point to train and 234
data points to test.

38
Renan Moura - renanmf.com

The average Sale Price in the test set is 175k dollars.

The RMSE (root-mean-square error) is a good metric to

understand the output because in you can read it using the
same scale of you dependent variable, which is Sale Price in
this case.

A RMSE of 10552 means that, on average, we missed the

correct Sale Prices by a bit over 10k dollars.

Considering an average if 175k, missing the mark, on

average, by 10k, is not too bad.

The Training Script

import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import pickle

def create_train_test_data(dataset):
# load and split the data
data_train = dataset.sample(frac=0.8, \
random_state=30).reset_index(drop=True)

data_test = \
dataset.drop(data_train.index).reset_index(drop=True)

# save the data

data_train.to_csv('train.csv', index=False)
data_test.to_csv('test.csv', index=False)

print(f"Train data for modeling: {data_train.shape}")

print(f"Test data for predictions: {data_test.shape}")

def train_model(x_train, y_train):

39
Renan Moura - renanmf.com

print("Training the model ...")

model = Pipeline(steps=[
("label encoding", \
OneHotEncoder(handle_unknown='ignore')),
("tree model", LinearRegression())
])
model.fit(x_train, y_train)

return model

def accuracy(model, x_test, y_test):

print("Testing the model ...")
predictions = model.predict(x_test)
tree_mse = mean_squared_error(y_test, predictions)
tree_rmse = np.sqrt(tree_mse)
return tree_rmse

def export_model(model):
# Save the model
pkl_path = 'model.pkl'
with open(pkl_path, 'wb') as file:
pickle.dump(model, file)
print(f"Model saved at {pkl_path}")

def main():
# Load the whole data
data = pd.read_csv('cleaned_data.csv', \
keep_default_na=False, index_col=0)

# Split train/test
# Creates train.csv and test.csv
create_train_test_data(data)

# Loads the data for the model training

train = pd.read_csv('train.csv', keep_default_na=False)
x_train = train.drop(columns=['SalePrice'])
y_train = train['SalePrice']

# Loads the data for the model testing

test = pd.read_csv('test.csv', keep_default_na=False)
x_test = test.drop(columns=['SalePrice'])
y_test = test['SalePrice']

# Train and Test

model = train_model(x_train, y_train)

40
Renan Moura - renanmf.com

rmse_test = accuracy(model, x_test, y_test)

print(f"Average Price Test: {y_test.mean()}")

print(f"RMSE: {rmse_test}")

# Save the model

export_model(model)

if __name__ == '__main__':
main()

41
Renan Moura - renanmf.com

API
The output of the last chapter is the Machine Learning Model
that we are going to use in the API.

Class HousePriceModel
Save this script on a file named predict.py.

This file has the class HousePriceModel and is used to load the
Machine Learning model and make the predictions.

# the pickle lib is used to load the machine learning model

import pickle
import pandas as pd

class HousePriceModel():

def __init__(self):
self.model = self.load_model()
self.preds = None

def load_model(self):
# uses the file model.pkl
pkl_filename = 'model.pkl'

try:
with open(pkl_filename, 'rb') as file:
pickle_model = pickle.load(file)
except:
print(f'Error loading the model at {pkl_filename}')
return None

return pickle_model

def predict(self, data):

if not isinstance(data, pd.DataFrame):

data = pd.DataFrame(data, index=[0])

42
Renan Moura - renanmf.com

# makes the predictions using the loaded model

self.preds = self.model.predict(data)
return self.preds

The API with FastAPI

To run the API:

uvicorn api:app

Expected output:

INFO: Started server process [56652]

INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press
CTRL+C to quit)

The API created with the framework FastAPI.

The “/predict” endpoint will give you a prediction based on a

sample.

from fastapi import FastAPI

from datetime import datetime
from predict import HousePriceModel

app = FastAPI()

@app.get("/")
def root():
return {"status": "online"}

@app.post("/predict")
def predict(inputs: dict):

model = HousePriceModel()

43
Renan Moura - renanmf.com

start = datetime.today()
pred = model.predict(inputs)[0]
dur = (datetime.today() - start).total_seconds()

return pred

Testing the API

You can save the script on a file test_api.py and execute it
directly with python3 test_api.py or python test_api.py,
depending on your installation.

Remember to execute this test on a second terminal while

the first one runs the server for the actual API.

Expected output:

The actual Sale Price: 109000

The predicted Sale Price: 109000.01144237864

The code to test the API:

# import requests library to make API calls

import requests
from predict import HousePriceModel

# a sample input with all the features we

# used to train the model
sample_input = {'MSSubClass': 20, 'MSZoning': 'RL',
'LotArea': 7922, 'Street': 'Pave',
'LotShape': 'Reg', 'LandContour': 'Lvl',
'Utilities': 'AllPub', 'LotConfig': 'Inside',
'LandSlope': 'Gtl', 'Neighborhood': 'NAmes',
'Condition1': 'Norm', 'Condition2': 'Norm',
'BldgType': '1Fam', 'HouseStyle': '1Story',
'OverallQual': 5, 'OverallCond': 7,
'YearBuilt': 1953, 'YearRemodAdd': 2007,
'RoofStyle': 'Gable', 'RoofMatl': 'CompShg',
'Exterior1st': 'VinylSd', 'Exterior2nd': 'VinylSd',
'MasVnrType': 'None', 'ExterQual': 3,

44
Renan Moura - renanmf.com

'ExterCond': 4, 'Foundation': 'CBlock',

'BsmtQual': 3, 'BsmtCond': 3,
'BsmtExposure': 'No', 'BsmtFinType1': 'GLQ',
'BsmtFinSF1': 731, 'BsmtFinType2': 'Unf',
'BsmtFinSF2': 0, 'BsmtUnfSF': 326,
'TotalBsmtSF': 1057, 'Heating': 'GasA',
'HeatingQC': 3, 'CentralAir': 'Y',
'Electrical': 'SBrkr', '1stFlrSF': 1057,
'2ndFlrSF': 0, 'LowQualFinSF': 0,
'GrLivArea': 1057, 'BsmtFullBath': 1,
'BsmtHalfBath': 0, 'FullBath': 1,
'HalfBath': 0, 'BedroomAbvGr': 3,
'KitchenAbvGr': 1, 'KitchenQual': 4,
'TotRmsAbvGrd': 5, 'Functional': 'Typ',
'Fireplaces': 0, 'FireplaceQu': 0,
'GarageType': 'Detchd', 'GarageFinish': 'Unf',
'GarageCars': 1, 'GarageArea': 246,
'GarageQual': 3, 'GarageCond': 3,
'PavedDrive': 'Y', 'WoodDeckSF': 0,
'OpenPorchSF': 52, 'EnclosedPorch': 0,
'3SsnPorch': 0, 'ScreenPorch': 0,
'PoolArea': 0, 'MiscVal': 0, 'MoSold': 1,
'YrSold': 2010, 'SaleType': 'WD',
'SaleCondition': 'Abnorml'}

def run_prediction_from_sample():

url="http://127.0.0.1:8000/predict"
headers = {"Content-Type": "application/json", \
"Accept":"text/plain"}

response = requests.post(url, headers=headers, \

json=sample_input)
print("The actual Sale Price: 109000")
print(f"The predicted Sale Price: {response.text}")

if __name__ == "__main__":
run_prediction_from_sample()

45
Renan Moura - renanmf.com

Conclusion
That’s it!

Congratulations on reaching the end.

I want to thank you for reading this book.

If you want to learn more, check out my blog renanmf.com.

Let me know if you have any suggestions by reaching out to

me at [email protected].

You can also find me as @renanmouraf on:

Twitter: https://twitter.com/renanmouraf
Linkedin: https://www.linkedin.com/in/renanmouraf
Instagram: https://www.instagram.com/renanmouraf

Programming and Problem Solving Using Python
No ratings yet
Programming and Problem Solving Using Python
3 pages
SystemVerilog Vs Verilog in RTL Design
No ratings yet
SystemVerilog Vs Verilog in RTL Design
4 pages
Report
No ratings yet
Report
40 pages
Housing Prices Notebook
No ratings yet
Housing Prices Notebook
14 pages
Evan Marie Carr - Python and SKlearn
No ratings yet
Evan Marie Carr - Python and SKlearn
32 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
Dawit House
No ratings yet
Dawit House
49 pages
ISMLA Module5
No ratings yet
ISMLA Module5
25 pages
Data Cleaning
No ratings yet
Data Cleaning
7 pages
Final DA LAB1 Merged
No ratings yet
Final DA LAB1 Merged
48 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
(House Price Prediction) Capstone Project For Python
No ratings yet
(House Price Prediction) Capstone Project For Python
10 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Module 2
No ratings yet
Module 2
35 pages
AIMLlatestmodule 2notes Removed
No ratings yet
AIMLlatestmodule 2notes Removed
33 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
ML File
No ratings yet
ML File
6 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Unit 2
No ratings yet
Unit 2
78 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
Copy - of - Descriptive - EDA - Munjal - Exercise1.ipynb - Colaboratory
No ratings yet
Copy - of - Descriptive - EDA - Munjal - Exercise1.ipynb - Colaboratory
30 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
House Prices Analysis - Final Assessment
No ratings yet
House Prices Analysis - Final Assessment
2 pages
Data Cleaning EDA
No ratings yet
Data Cleaning EDA
5 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
Formal Research Paper Slideshow by Slidesgo
No ratings yet
Formal Research Paper Slideshow by Slidesgo
9 pages
Module 2
No ratings yet
Module 2
20 pages
Ex 1
No ratings yet
Ex 1
119 pages
End To End Machine Learning Project-2
No ratings yet
End To End Machine Learning Project-2
10 pages
Machine Learning Problem-Solving Steps: 1. Look at The Big Picture
No ratings yet
Machine Learning Problem-Solving Steps: 1. Look at The Big Picture
41 pages
EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
No ratings yet
EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
9 pages
House Price Prediction Using Machine Learning in Python
No ratings yet
House Price Prediction Using Machine Learning in Python
13 pages
L03 The Regression Pipeline
No ratings yet
L03 The Regression Pipeline
94 pages
Major Project Guide
No ratings yet
Major Project Guide
5 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
Data Analysis With Python - Jupyter Notebook
No ratings yet
Data Analysis With Python - Jupyter Notebook
10 pages
Deep Learning - House Price Prediction
No ratings yet
Deep Learning - House Price Prediction
17 pages
1684918425867
No ratings yet
1684918425867
14 pages
Exercise Explore Your Data
No ratings yet
Exercise Explore Your Data
2 pages
Story Point Estimation Copy
No ratings yet
Story Point Estimation Copy
16 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Task 6
No ratings yet
Task 6
14 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Emllab
No ratings yet
Emllab
6 pages
Kirubavathi
No ratings yet
Kirubavathi
10 pages
MiniProject BI
No ratings yet
MiniProject BI
16 pages
Rajasri
No ratings yet
Rajasri
10 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
Nội Dung Đoạn Văn Bản Của Bạn
No ratings yet
Nội Dung Đoạn Văn Bản Của Bạn
22 pages
HousePricePrediction Zillow Solution Methodology
No ratings yet
HousePricePrediction Zillow Solution Methodology
5 pages
Project PDF
No ratings yet
Project PDF
13 pages
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
An Introduction To JavaScript
No ratings yet
An Introduction To JavaScript
28 pages
Distributed Systems Assignment 2
100% (1)
Distributed Systems Assignment 2
3 pages
Course 3 - 20742B Identity With Windows Server 2016
No ratings yet
Course 3 - 20742B Identity With Windows Server 2016
8 pages
Secure Iot Devices Using Aes Encryption: K. Siva Kumar Swamy, G.Sony, CH - Jagadeesh Ram, B.Navven, J.Harshitha
No ratings yet
Secure Iot Devices Using Aes Encryption: K. Siva Kumar Swamy, G.Sony, CH - Jagadeesh Ram, B.Navven, J.Harshitha
4 pages
SAST Code Verification
No ratings yet
SAST Code Verification
13 pages
T4 Client-Server Networks
No ratings yet
T4 Client-Server Networks
37 pages
Diagnostic Test Emp Tech 2022 2023
No ratings yet
Diagnostic Test Emp Tech 2022 2023
2 pages
Design of A Mobile and Desktop Application Platform For Hospital
No ratings yet
Design of A Mobile and Desktop Application Platform For Hospital
4 pages
Design and Implementation of Hospital Re
No ratings yet
Design and Implementation of Hospital Re
8 pages
Basic Pipelining PDF
No ratings yet
Basic Pipelining PDF
39 pages
Unit V
No ratings yet
Unit V
10 pages
Question MS OFFICe
No ratings yet
Question MS OFFICe
2 pages
Hand Gesture Recognition Based On Computer Vision: A Review of Techniques
No ratings yet
Hand Gesture Recognition Based On Computer Vision: A Review of Techniques
29 pages
Cad & Advance 3D Modelling: Name:Shivakalyan Bandi ROLL NO: 19011BB002 FSP 2 Year 4 SEM
No ratings yet
Cad & Advance 3D Modelling: Name:Shivakalyan Bandi ROLL NO: 19011BB002 FSP 2 Year 4 SEM
51 pages
WEBSITE FOR UNIVERSITY Micro Project
No ratings yet
WEBSITE FOR UNIVERSITY Micro Project
18 pages
Link Dan Password Training Fungsi Personil Toko
No ratings yet
Link Dan Password Training Fungsi Personil Toko
7 pages
CentOS7 FreeIPA AD Integration
No ratings yet
CentOS7 FreeIPA AD Integration
18 pages
Desigo CC Install Web Client
100% (1)
Desigo CC Install Web Client
24 pages
Computing 1
No ratings yet
Computing 1
15 pages
Eaton Sc300 System Controller Datasheet en
No ratings yet
Eaton Sc300 System Controller Datasheet en
13 pages
Lab 2
No ratings yet
Lab 2
9 pages
Sem-4 SEN 4 Marks Only IMP Solution
No ratings yet
Sem-4 SEN 4 Marks Only IMP Solution
24 pages
Advatage DatacomDB Messages Codes
No ratings yet
Advatage DatacomDB Messages Codes
790 pages
Cloud Notes
No ratings yet
Cloud Notes
23 pages
Vijay Yadav Resume
No ratings yet
Vijay Yadav Resume
1 page
Interview Question
No ratings yet
Interview Question
11 pages
ETRM Service Catalogue Global 2019
No ratings yet
ETRM Service Catalogue Global 2019
97 pages
Data Manipulation & Analysis
No ratings yet
Data Manipulation & Analysis
28 pages