PRJ Housuing Price
PRJ Housuing Price
Learning
Samatrix Consulting Pvt Ltd
Linear Regression - Predict Price
of the House
Objective
• We have developed an understanding of feature columns and data
pipelines.
• Now let’s focus on building a regression model using a real dataset,
the Boston Housing Price data set.
• There are 506 sample cases in the dataset.
• 14 attributes have been assigned to each house
• We will use a Tensor Flow estimator to build a linear regression model
Features
Variables in order:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000's
Import Module and Download Dataset
In [1]: import tensorflow as tf
In [10]: df_train_x.head()
Out[10]:
CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO B LSTAT
0 1.23247 0.0 8.14 0.0 0.538 ... 4.0 307.0 21.0 396.90 18.72
1 0.02177 82.5 2.03 0.0 0.415 ... 2.0 348.0 14.7 395.38 3.11
2 4.89822 0.0 18.10 0.0 0.631 ... 24.0 666.0 20.2 375.52 3.26
3 0.03961 0.0 5.19 0.0 0.515 ... 5.0 224.0 20.2 396.90 8.01
4 3.69311 0.0 18.10 0.0 0.713 ... 24.0 666.0 20.2 391.43 14.65
[5 rows x 13 columns]
Feature Normalization
Normalize the data using
𝑥−𝜇
𝜎
First calculate the mean and std using training data. Then normalize the training and test
data using calculated mean and std.
Please note that the test data should always be normalized using training data mean and
std.
In [11]: mean = df_train_x.mean(axis=0)
In [12]: std = df_train_x.std(axis=0)
In [13]: df_train_x -= mean
In [14]: df_train_x /= std
In [15]: df_test_x -= mean
In [16]: df_test_x /= std
Feature Column and Data Pipeline
In [17]: feature_columns_numeric = []
Out[23]: <tensorflow_estimator.python.estimator.canned.linear.LinearRegressorV2
at 0x7f811457a280>
In [25]: print(result)