Boston Housing Kaggle Challenge With Linear Regression
Boston Housing Kaggle Challenge With Linear Regression
Boston Housing Data: This dataset was taken from the StatLib library and is maintained by
Carnegie Mellon University. This dataset concerns the housing prices in housing city of Boston.
The dataset provided has 506 instances with 13 features.
boston.data.shape
boston.feature_names
# Converting data from nd-array to dataframe and adding feature names to the data
data = pd.DataFrame(boston.data)
data.columns = boston.feature_names
data.head(10)
data['Price'] = boston.target
data.head()
data.describe()
data.info()
#Getting input and output data and further splitting data to training and testing dataset.
# Input Data
x = boston.data
# Output Data
y = boston.target
random_state = 0)
print("xtrain shape : ", xtrain.shape)
print("xtest shape : ", xtest.shape)
print("ytrain shape : ", ytrain.shape)
print("ytest shape : ", ytest.shape)
# Applying Linear Regression Model to the dataset and predicting the prices.
# Fitting Multi Linear regression model to training model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(xtrain, ytrain)
# predicting the test set results
y_pred = regressor.predict(xtest)
# Plotting Scatter graph to show the prediction results – ‘ytrue’ value vs ‘y_pred’ value
# Plotting Scatter graph to show the prediction
# results - 'ytrue' value vs 'y_pred' value
plt.scatter(ytest, y_pred, c = 'green')
plt.xlabel("Price: in $1000's")
plt.ylabel("Predicted value")
plt.title("True value vs predicted value : Linear Regression")
plt.show()
As per the result our model is only 66.55% accurate. So, the prepared model is not very good
for predicting the housing prices
https://towardsdatascience.com/linear-regression-on-boston-housing-dataset-f409b7e4a155