0% found this document useful (0 votes)

35 views

Machine Learning Prediction

This document discusses predicting housing prices using a regression model with a neural network. It uses a dataset containing information about Boston suburbs in the 1970s to predict median home prices. The dataset has 506 samples with 13 input features describing each suburb and a target value for the median home price. Before using the data to train a neural network, it is normalized to center and scale the input features since they have different value ranges.

Uploaded by

Akor Anthony

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Machine Learning Prediction

Uploaded by

Akor Anthony

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

3/3/2021 3.

7-predicting-house-prices (2)

In [2]: import keras

keras.__version__

Using TensorFlow backend.

Out[2]: '2.3.1'

Predicting house prices: a regression example

This notebook contains the code samples found in Chapter 3, Section 6 of Deep Learning with Python
(https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the
original text features far more content, in particular further explanations and figures: in this notebook, you will
only find source code and related comments.

In our two previous examples, we were considering classification problems, where the goal was to predict a
single discrete label of an input data point. Another common type of machine learning problem is "regression",
which consists of predicting a continuous value instead of a discrete label. For instance, predicting the
temperature tomorrow, given meteorological data, or predicting the time that a software project will take to
complete, given its specifications.

Do not mix up "regression" with the algorithm "logistic regression": confusingly, "logistic regression" is not a
regression algorithm, it is a classification algorithm.

The Boston Housing Price dataset

We will be attempting to predict the median price of homes in a given Boston suburb in the mid-1970s, given a
few data points about the suburb at the time, such as the crime rate, the local property tax rate, etc.

The dataset we will be using has another interesting difference from our two previous examples: it has very few
data points, only 506 in total, split between 404 training samples and 102 test samples, and each "feature" in the
input data (e.g. the crime rate is a feature) has a different scale. For instance some values are proportions, which
take a values between 0 and 1, others take values between 1 and 12, others between 0 and 100...

Let's take a look at the data:

In [3]: from keras.datasets import boston_housing

(train_data, train_targets), (test_data, test_targets) = boston_housing.load_

data()

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 1/9

3/3/2021 3.7-predicting-house-prices (2)

In [4]: train_data.shape
print(type(train_data))
train_data.shape

Out[4]: (404, 13)

In [5]: test_data.shape

Out[5]: (102, 13)

As you can see, we have 404 training samples and 102 test samples. The data comprises 13 features. The 13
features in the input data are as follow:

1. Per capita crime rate.

2. Proportion of residential land zoned for lots over 25,000 square feet.
3. Proportion of non-retail business acres per town.
4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
5. Nitric oxides concentration (parts per 10 million).
6. Average number of rooms per dwelling.
7. Proportion of owner-occupied units built prior to 1940.
8. Weighted distances to five Boston employment centres.
9. Index of accessibility to radial highways.
10. Full-value property-tax rate per $10,000.
11. Pupil-teacher ratio by town.
12. 1000 * (Bk - 0.63) ** 2 where Bk is the proportion of Black people by town.
13. % lower status of the population.

The targets are the median values of owner-occupied homes, in thousands of dollars:

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 2/9

3/3/2021 3.7-predicting-house-prices (2)

In [6]: train_targets

Out[6]: array([15.2, 42.3, 50. , 21.1, 17.7, 18.5, 11.3, 15.6, 15.6, 14.4, 12.1,
17.9, 23.1, 19.9, 15.7, 8.8, 50. , 22.5, 24.1, 27.5, 10.9, 30.8,
32.9, 24. , 18.5, 13.3, 22.9, 34.7, 16.6, 17.5, 22.3, 16.1, 14.9,
23.1, 34.9, 25. , 13.9, 13.1, 20.4, 20. , 15.2, 24.7, 22.2, 16.7,
12.7, 15.6, 18.4, 21. , 30.1, 15.1, 18.7, 9.6, 31.5, 24.8, 19.1,
22. , 14.5, 11. , 32. , 29.4, 20.3, 24.4, 14.6, 19.5, 14.1, 14.3,
15.6, 10.5, 6.3, 19.3, 19.3, 13.4, 36.4, 17.8, 13.5, 16.5, 8.3,
14.3, 16. , 13.4, 28.6, 43.5, 20.2, 22. , 23. , 20.7, 12.5, 48.5,
14.6, 13.4, 23.7, 50. , 21.7, 39.8, 38.7, 22.2, 34.9, 22.5, 31.1,
28.7, 46. , 41.7, 21. , 26.6, 15. , 24.4, 13.3, 21.2, 11.7, 21.7,
19.4, 50. , 22.8, 19.7, 24.7, 36.2, 14.2, 18.9, 18.3, 20.6, 24.6,
18.2, 8.7, 44. , 10.4, 13.2, 21.2, 37. , 30.7, 22.9, 20. , 19.3,
31.7, 32. , 23.1, 18.8, 10.9, 50. , 19.6, 5. , 14.4, 19.8, 13.8,
19.6, 23.9, 24.5, 25. , 19.9, 17.2, 24.6, 13.5, 26.6, 21.4, 11.9,
22.6, 19.6, 8.5, 23.7, 23.1, 22.4, 20.5, 23.6, 18.4, 35.2, 23.1,
27.9, 20.6, 23.7, 28. , 13.6, 27.1, 23.6, 20.6, 18.2, 21.7, 17.1,
8.4, 25.3, 13.8, 22.2, 18.4, 20.7, 31.6, 30.5, 20.3, 8.8, 19.2,
19.4, 23.1, 23. , 14.8, 48.8, 22.6, 33.4, 21.1, 13.6, 32.2, 13.1,
23.4, 18.9, 23.9, 11.8, 23.3, 22.8, 19.6, 16.7, 13.4, 22.2, 20.4,
21.8, 26.4, 14.9, 24.1, 23.8, 12.3, 29.1, 21. , 19.5, 23.3, 23.8,
17.8, 11.5, 21.7, 19.9, 25. , 33.4, 28.5, 21.4, 24.3, 27.5, 33.1,
16.2, 23.3, 48.3, 22.9, 22.8, 13.1, 12.7, 22.6, 15. , 15.3, 10.5,
24. , 18.5, 21.7, 19.5, 33.2, 23.2, 5. , 19.1, 12.7, 22.3, 10.2,
13.9, 16.3, 17. , 20.1, 29.9, 17.2, 37.3, 45.4, 17.8, 23.2, 29. ,
22. , 18. , 17.4, 34.6, 20.1, 25. , 15.6, 24.8, 28.2, 21.2, 21.4,
23.8, 31. , 26.2, 17.4, 37.9, 17.5, 20. , 8.3, 23.9, 8.4, 13.8,
7.2, 11.7, 17.1, 21.6, 50. , 16.1, 20.4, 20.6, 21.4, 20.6, 36.5,
8.5, 24.8, 10.8, 21.9, 17.3, 18.9, 36.2, 14.9, 18.2, 33.3, 21.8,
19.7, 31.6, 24.8, 19.4, 22.8, 7.5, 44.8, 16.8, 18.7, 50. , 50. ,
19.5, 20.1, 50. , 17.2, 20.8, 19.3, 41.3, 20.4, 20.5, 13.8, 16.5,
23.9, 20.6, 31.5, 23.3, 16.8, 14. , 33.8, 36.1, 12.8, 18.3, 18.7,
19.1, 29. , 30.1, 50. , 50. , 22. , 11.9, 37.6, 50. , 22.7, 20.8,
23.5, 27.9, 50. , 19.3, 23.9, 22.6, 15.2, 21.7, 19.2, 43.8, 20.3,
33.2, 19.9, 22.5, 32.7, 22. , 17.1, 19. , 15. , 16.1, 25.1, 23.7,
28.7, 37.2, 22.6, 16.4, 25. , 29.8, 22.1, 17.4, 18.1, 30.3, 17.5,
24.7, 12.6, 26.5, 28.7, 13.3, 10.4, 24.4, 23. , 20. , 17.8, 7. ,
11.8, 24.4, 13.8, 19.4, 25.2, 19.4, 19.4, 29.1])

The prices are typically between 10,000 and 50,000. If that sounds cheap, remember this was the mid-1970s,
and these prices are not inflation-adjusted.

Preparing the data

It would be problematic to feed into a neural network values that all take wildly different ranges. The network
might be able to automatically adapt to such heterogeneous data, but it would definitely make learning more
difficult. A widespread best practice to deal with such data is to do feature-wise normalization: for each feature in
the input data (a column in the input data matrix), we will subtract the mean of the feature and divide by the
standard deviation, so that the feature will be centered around 0 and will have a unit standard deviation. This is
easily done in Numpy:

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 3/9

3/3/2021 3.7-predicting-house-prices (2)

In [8]: mean = train_data.mean(axis=0)

train_data -= mean
std = train_data.std(axis=0)
train_data /= std

test_data -= mean
test_data /= std

Note that the quantities that we use for normalizing the test data have been computed using the training data.
We should never use in our workflow any quantity computed on the test data, even for something as simple as
data normalization.

Building our network

Because so few samples are available, we will be using a very small network with two hidden layers, each with
64 units. In general, the less training data you have, the worse overfitting will be, and using a small network is
one way to mitigate overfitting.

In [9]: from keras import models

from keras import layers

def build_model():
# Because we will need to instantiate
# the same model multiple times,
# we use a function to construct it.
model = models.Sequential()
model.add(layers.Dense(64, activation='relu',input_shape=(train_data.shape
[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
return model

Our network ends with a single unit, and no activation (i.e. it will be linear layer). This is a typical setup for scalar
regression (i.e. regression where we are trying to predict a single continuous value). Applying an activation
function would constrain the range that the output can take; for instance if we applied a sigmoid activation
function to our last layer, the network could only learn to predict values between 0 and 1. Here, because the last
layer is purely linear, the network is free to learn to predict values in any range.

Note that we are compiling the network with the mse loss function -- Mean Squared Error, the square of the
difference between the predictions and the targets, a widely used loss function for regression problems.

We are also monitoring a new metric during training: mae . This stands for Mean Absolute Error. It is simply the
absolute value of the difference between the predictions and the targets. For instance, a MAE of 0.5 on this
problem would mean that our predictions are off by $500 on average.

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 4/9

3/3/2021 3.7-predicting-house-prices (2)

Validating our approach using K-fold validation

To evaluate our network while we keep adjusting its parameters (such as the number of epochs used for
training), we could simply split the data into a training set and a validation set, as we were doing in our previous
examples. However, because we have so few data points, the validation set would end up being very small (e.g.
about 100 examples). A consequence is that our validation scores may change a lot depending on which data
points we choose to use for validation and which we choose for training, i.e. the validation scores may have a
high variance with regard to the validation split. This would prevent us from reliably evaluating our model.

The best practice in such situations is to use K-fold cross-validation. It consists of splitting the available data into
K partitions (typically K=4 or 5), then instantiating K identical models, and training each one on K-1 partitions
while evaluating on the remaining partition. The validation score for the model used would then be the average of
the K validation scores obtained

In terms of code, this is straightforward:

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 5/9

3/3/2021 3.7-predicting-house-prices (2)

In [ ]: import numpy as np

k = 4
num_val_samples = len(train_data) // k
num_epochs = 100
all_scores = []
for i in range(k):
print('processing fold #', i)
# Prepare the validation data: data from partition # k
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples
]

# Prepare the training data: data from all other partitions

partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)

# Build the Keras model (already compiled)

model = build_model()
# Train the model (in silent mode, verbose=0)
model.fit(partial_train_data, partial_train_targets,
epochs=num_epochs, batch_size=1, verbose=0)
# Evaluate the model on the validation data
val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)
all_scores.append(val_mae)

processing fold # 0
WARNING:tensorflow:From /data/user/0/ru.iiec.pydroid3/files/arm-linux-android
eabi/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.
py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated an
d will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /data/user/0/ru.iiec.pydroid3/files/arm-linux-android
eabi/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_i
nt32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed
in a future version.
Instructions for updating:
Use tf.cast instead.
processing fold # 1
processing fold # 2

In [ ]: all_scores

In [ ]: np.mean(all_scores)

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 6/9

3/3/2021 3.7-predicting-house-prices (2)

As you can notice, the different runs do indeed show rather different validation scores, from 2.1 to 2.9. Their
average (2.4) is a much more reliable metric than any single of these scores -- that's the entire point of K-fold
cross-validation. In this case, we are off by 2,400 on average, which is still significant considering that the prices
range from 10,000 to 50,000.

Let's try training the network for a bit longer: 500 epochs. To keep a record of how well the model did at each
epoch, we will modify our training loop to save the per-epoch validation score log:

In [ ]: from keras import backend as K

# Some memory clean-up

K.clear_session()

In [ ]: num_epochs = 500
all_mae_histories = []
for i in range(k):
print('processing fold #', i)
# Prepare the validation data: data from partition # k
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples
]

# Prepare the training data: data from all other partitions

# Build the Keras model (already compiled)

model = build_model()
# Train the model (in silent mode, verbose=0)
history = model.fit(partial_train_data, partial_train_targets,
validation_data=(val_data, val_targets),
epochs=num_epochs, batch_size=1, verbose=0)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)

We can then compute the average of the per-epoch MAE scores for all folds:

In [ ]: average_mae_history = [
np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]

Let's plot this:

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 7/9

3/3/2021 3.7-predicting-house-prices (2)

In [1]: import matplotlib

matplotlib.use('nbAgg')
import matplotlib.pyplot as plt

plt.plot(range(1, len(average_mae_history) + 1), average_mae_history)

plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-f668bb6a0ce3> in <module>
1 import matplotlib.pyplot as plt
2
----> 3 plt.plot(range(1, len(average_mae_history) + 1), average_mae_history)
4 plt.xlabel('Epochs')
5 plt.ylabel('Validation MAE')

NameError: name 'average_mae_history' is not defined

It may be a bit hard to see the plot due to scaling issues and relatively high variance. Let's:

Omit the first 10 data points, which are on a different scale from the rest of the curve.
Replace each point with an exponential moving average of the previous points, to obtain a smooth curve.

In [ ]: def smooth_curve(points, factor=0.9):

smoothed_points = []
for point in points:
if smoothed_points:
previous = smoothed_points[-1]
smoothed_points.append(previous * factor + point * (1 - factor))
else:
smoothed_points.append(point)
return smoothed_points

smooth_mae_history = smooth_curve(average_mae_history[10:])

plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)

plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()

According to this plot, it seems that validation MAE stops improving significantly after 80 epochs. Past that point,
we start overfitting.

Once we are done tuning other parameters of our model (besides the number of epochs, we could also adjust
the size of the hidden layers), we can train a final "production" model on all of the training data, with the best
parameters, then look at its performance on the test data:

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 8/9

3/3/2021 3.7-predicting-house-prices (2)

In [ ]: # Get a fresh, compiled model.

model = build_model()
# Train it on the entirety of the data.
model.fit(train_data, train_targets,
epochs=80, batch_size=16, verbose=0)
test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)

In [ ]: test_mae_score

We are still off by about $2,550.

Wrapping up
Here's what you should take away from this example:

Regression is done using different loss functions from classification; Mean Squared Error (MSE) is a
commonly used loss function for regression.
Similarly, evaluation metrics to be used for regression differ from those used for classification; naturally the
concept of "accuracy" does not apply for regression. A common regression metric is Mean Absolute Error
(MAE).
When features in the input data have values in different ranges, each feature should be scaled
independently as a preprocessing step.
When there is little data available, using K-Fold validation is a great way to reliably evaluate a model.
When little training data is available, it is preferable to use a small network with very few hidden layers
(typically only one or two), in order to avoid severe overfitting.

This example concludes our series of three introductory practical examples. You are now able to handle common
types of problems with vector data input:

Binary (2-class) classification.

Multi-class, single-label classification.
Scalar regression.

In the next chapter, you will acquire a more formal understanding of some of the concepts you have encountered
in these first examples, such as data preprocessing, model evaluation, and overfitting.

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 9/9

House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
No ratings yet
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
6 pages
Sklearn Tutorial: DNN On Boston Data
No ratings yet
Sklearn Tutorial: DNN On Boston Data
9 pages
Copy of Project 4 _ House Price Prediction.ipynb - Colab
No ratings yet
Copy of Project 4 _ House Price Prediction.ipynb - Colab
5 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
Wine Classification
No ratings yet
Wine Classification
10 pages
Python - Vectorized - Tute - Jupyter Notebook
No ratings yet
Python - Vectorized - Tute - Jupyter Notebook
16 pages
Lec3 4 ML Project
No ratings yet
Lec3 4 ML Project
26 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
03 Multiple Linear Regression
No ratings yet
03 Multiple Linear Regression
7 pages
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
T2_summary_VHA
No ratings yet
T2_summary_VHA
14 pages
PRJ Housuing Price
No ratings yet
PRJ Housuing Price
14 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Pandas
No ratings yet
Pandas
4 pages
Vertopal.com C1 W2 Lab03 Feature Scaling and Learning Rate Soln
No ratings yet
Vertopal.com C1 W2 Lab03 Feature Scaling and Learning Rate Soln
10 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
lab ML
No ratings yet
lab ML
26 pages
Pytorch (Tabular) - Regression
No ratings yet
Pytorch (Tabular) - Regression
13 pages
Data Mining Portfolio
No ratings yet
Data Mining Portfolio
19 pages
Building A Brain in 10 Minutes: Perceptron Research From The 50's & 6 Perceptron Research From The 50's & 6
No ratings yet
Building A Brain in 10 Minutes: Perceptron Research From The 50's & 6 Perceptron Research From The 50's & 6
14 pages
Module 2
No ratings yet
Module 2
20 pages
Assignment 1
100% (1)
Assignment 1
3 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Assignment No 8
No ratings yet
Assignment No 8
17 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Machine Learning Laboratory Manual
No ratings yet
Machine Learning Laboratory Manual
11 pages
Import As Import As From Import: "Mean Squared Errors: "
No ratings yet
Import As Import As From Import: "Mean Squared Errors: "
1 page
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
HW8 La
No ratings yet
HW8 La
18 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
ML File
No ratings yet
ML File
37 pages
Docu 4
No ratings yet
Docu 4
3 pages
ml record
No ratings yet
ml record
21 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
4 - Học Máy Cơ Bản - Hồi Quy Tuyến Tính
No ratings yet
4 - Học Máy Cơ Bản - Hồi Quy Tuyến Tính
113 pages
One Hot Encoding
No ratings yet
One Hot Encoding
12 pages
ML Project
No ratings yet
ML Project
10 pages
Advance AI and ML LAB
No ratings yet
Advance AI and ML LAB
16 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
vertopal.com_C1_W2_Lab04_FeatEng_PolyReg_Soln
No ratings yet
vertopal.com_C1_W2_Lab04_FeatEng_PolyReg_Soln
5 pages
Bilal Ahmad Ai & DSS Assign # 03
No ratings yet
Bilal Ahmad Ai & DSS Assign # 03
7 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Ml-Exp-3 - Jupyter Notebook
No ratings yet
Ml-Exp-3 - Jupyter Notebook
6 pages
Presentation 1
No ratings yet
Presentation 1
2 pages
Report
No ratings yet
Report
40 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Argha's ML LAB_240927_121838
No ratings yet
Argha's ML LAB_240927_121838
13 pages
Injecttive Blockchain
No ratings yet
Injecttive Blockchain
14 pages
DL Lab 3
No ratings yet
DL Lab 3
5 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Notes5_Regression
No ratings yet
Notes5_Regression
14 pages
Recipes For Data Processing
No ratings yet
Recipes For Data Processing
51 pages

Machine Learning Prediction

Uploaded by

Machine Learning Prediction

Uploaded by

3/3/2021 3.

In [2]: import keras

Using TensorFlow backend.

Predicting house prices: a regression example

The Boston Housing Price dataset

Let's take a look at the data:

In [3]: from keras.datasets import boston_housing

(train_data, train_targets), (test_data, test_targets) = boston_housing.load_

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 1/9

Out[4]: (404, 13)

Out[5]: (102, 13)

1. Per capita crime rate.

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 2/9

Preparing the data

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 3/9

In [8]: mean = train_data.mean(axis=0)

Building our network

In [9]: from keras import models

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 4/9

Validating our approach using K-fold validation

In terms of code, this is straightforward:

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 5/9

# Prepare the training data: data from all other partitions

# Build the Keras model (already compiled)

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 6/9

In [ ]: from keras import backend as K

# Some memory clean-up

# Prepare the training data: data from all other partitions

# Build the Keras model (already compiled)

Let's plot this:

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 7/9

In [1]: import matplotlib

plt.plot(range(1, len(average_mae_history) + 1), average_mae_history)

NameError: name 'average_mae_history' is not defined

In [ ]: def smooth_curve(points, factor=0.9):

plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 8/9

In [ ]: # Get a fresh, compiled model.

We are still off by about $2,550.

Binary (2-class) classification.

file:///C:/Users/teedaniels/Downloads/3.7-predicting-house-prices (2).html 9/9

You might also like