0% found this document useful (0 votes)
22 views14 pages

PRJ Housuing Price

The document discusses building a linear regression model to predict housing prices using the Boston housing dataset. It covers importing and exploring the data, normalizing features, creating feature columns, building and evaluating a linear regression model, and making predictions on test data. It also assigns building additional models using different input variable combinations as an exercise.

Uploaded by

shubhammeena5532
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views14 pages

PRJ Housuing Price

The document discusses building a linear regression model to predict housing prices using the Boston housing dataset. It covers importing and exploring the data, normalizing features, creating feature columns, building and evaluating a linear regression model, and making predictions on test data. It also assigns building additional models using different input variable combinations as an exercise.

Uploaded by

shubhammeena5532
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Neural Network and Deep

Learning
Samatrix Consulting Pvt Ltd
Linear Regression - Predict Price
of the House
Objective
• We have developed an understanding of feature columns and data
pipelines.
• Now let’s focus on building a regression model using a real dataset,
the Boston Housing Price data set.
• There are 506 sample cases in the dataset.
• 14 attributes have been assigned to each house
• We will use a Tensor Flow estimator to build a linear regression model
Features
Variables in order:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000's
Import Module and Download Dataset
In [1]: import tensorflow as tf

In [2]: import pandas as pd

In [3]: from tensorflow.keras.datasets import boston_housing

In [4]: (train_x, train_y), (test_x, test_y) = boston_housing.load_data()

In [5]: features = ['CRIM', 'ZN',


...: 'INDUS','CHAS','NOX','RM','AGE',
...: 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
Convert to Pandas DataFrame
In [6]: df_train_x = pd.DataFrame(train_x, columns= features)

In [7]: df_test_x = pd.DataFrame(test_x, columns= features)

In [8]: df_train_y = pd.DataFrame(train_y, columns=['MEDV'])

In [9]: df_test_y = pd.DataFrame(test_y, columns=['MEDV'])

In [10]: df_train_x.head()
Out[10]:
CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO B LSTAT
0 1.23247 0.0 8.14 0.0 0.538 ... 4.0 307.0 21.0 396.90 18.72
1 0.02177 82.5 2.03 0.0 0.415 ... 2.0 348.0 14.7 395.38 3.11
2 4.89822 0.0 18.10 0.0 0.631 ... 24.0 666.0 20.2 375.52 3.26
3 0.03961 0.0 5.19 0.0 0.515 ... 5.0 224.0 20.2 396.90 8.01
4 3.69311 0.0 18.10 0.0 0.713 ... 24.0 666.0 20.2 391.43 14.65

[5 rows x 13 columns]
Feature Normalization
Normalize the data using
𝑥−𝜇
𝜎

First calculate the mean and std using training data. Then normalize the training and test
data using calculated mean and std.
Please note that the test data should always be normalized using training data mean and
std.
In [11]: mean = df_train_x.mean(axis=0)
In [12]: std = df_train_x.std(axis=0)
In [13]: df_train_x -= mean
In [14]: df_train_x /= std
In [15]: df_test_x -= mean
In [16]: df_test_x /= std
Feature Column and Data Pipeline
In [17]: feature_columns_numeric = []

In [18]: feature_columns_numeric = [tf.feature_column.numeric_column(fname, dtype=tf.float32)


for fname in features]

In [19]: def estimator_input_fn(df_data, df_label, epochs=10, shuffle=True, batch_size=32):


...: def input_funct():
...: ds = tf.data.Dataset.from_tensor_slices((dict(df_data), df_label))
...: if shuffle:
...: ds = ds.shuffle(100)
...: ds = ds.batch(batch_size).repeat(epochs)
...: return ds
...: return input_funct

In [20]: train_input_fn = estimator_input_fn(df_train_x, df_train_y)

In [21]: val_input_fn = estimator_input_fn(df_test_x, df_test_y, epochs=1, shuffle=False)


Linear Regressor Estimator
In [22]: model = tf.estimator.LinearRegressor(feature_columns =
feature_columns_numeric, optimizer="RMSProp")

In [23]: model.train(train_input_fn, steps=100)

Out[23]: <tensorflow_estimator.python.estimator.canned.linear.LinearRegressorV2
at 0x7f811457a280>

In [24]: result = model.evaluate(val_input_fn)

In [25]: print(result)

{'average_loss': 30.506386, 'label/mean': 23.078432, 'loss': 35.6515,


'prediction/mean': 19.73616, 'global_step': 100}
Make Predictons
In [26]: result = model.predict(val_input_fn)
In [27]: for pred,exp in zip(result, test_y[:32]):
...: print("Predicted Value: ", pred['predictions'][0], "Expected: ", exp)
Predicted Value: 4.5952473 Expected: 7.2
Predicted Value: 18.098837 Expected: 18.8
Predicted Value: 18.036978 Expected: 19.0
Predicted Value: 28.551752 Expected: 27.0
Predicted Value: 22.708357 Expected: 22.2
Predicted Value: 17.807247 Expected: 24.5
Predicted Value: 26.886124 Expected: 31.2
Predicted Value: 22.386887 Expected: 22.9
Predicted Value: 14.808427 Expected: 20.5
Predicted Value: 17.94912 Expected: 23.2
Make Predictions
Predicted Value: 16.694212 Expected: 18.6
Predicted Value: 15.2323 Expected: 14.5
Predicted Value: 12.917276 Expected: 17.8
Predicted Value: 31.8312 Expected: 50.0
Predicted Value: 13.911081 Expected: 20.8
Predicted Value: 17.826385 Expected: 24.3
Predicted Value: 21.831257 Expected: 24.2
Predicted Value: 20.43619 Expected: 19.8
Predicted Value: 15.380453 Expected: 19.1
Predicted Value: 18.216017 Expected: 22.7
Predicted Value: 7.0832386 Expected: 12.0
Predicted Value: 11.415534 Expected: 10.2
Predicted Value: 19.121294 Expected: 20.0
Predicted Value: 11.366928 Expected: 18.5
Predicted Value: 21.06049 Expected: 20.9
Predicted Value: 18.94875 Expected: 23.0
Predicted Value: 28.08468 Expected: 27.5
Predicted Value: 23.687477 Expected: 30.1
Predicted Value: 8.315009 Expected: 9.5
Predicted Value: 19.124638 Expected: 22.0
Predicted Value: 20.701284 Expected: 21.2
Predicted Value: 13.510205 Expected: 14.1
Assignment
Problem
• Complete the Exploratory Data Analysis of the dataset and present
your findings
• In the project, we have performed the linear regression on all the
input variables available in the dataset. Build 6 models by selecting
different input variables as given below
• Any 3 single input variables
• Any 2 combinations of two input variables
• Any 1 combinations of three input variables
Thanks
Samatrix Consulting Pvt Ltd

You might also like