Project 4 - House Price Prediction - Ipynb - Colab
Project 4 - House Price Prediction - Ipynb - Colab
ipynb - Colab
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn.datasets
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn import metrics
house_price_dataset = sklearn.datasets.fetch_california_housing()
print(house_price_dataset)
house_price_dataframe.head()
(20640, 9)
https://colab.research.google.com/drive/1m-p2Nj9HPWN-fEk57uIZ6J2wWjW4pn2R#scrollTo=mv3Vgwq2SHp-&printMode=true 1/5
01/10/2024, 17:26 Copy of Project 4 : House Price Prediction.ipynb - Colab
# check for missing values
house_price_dataframe.isnull().sum()
MedInc 0
HouseAge 0
AveRooms 0
AveBedrms 0
Population 0
AveOccup 0
Latitude 0
Longitude 0
price 0
count 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000
mean 3.870671 28.639486 5.429000 1.096675 1425.476744 3.070655 35.631861 -119.569704 2.068558
std 1.899822 12.585558 2.474173 0.473911 1132.462122 10.386050 2.135952 2.003532 1.153956
min 0.499900 1.000000 0.846154 0.333333 3.000000 0.692308 32.540000 -124.350000 0.149990
25% 2.563400 18.000000 4.440716 1.006079 787.000000 2.429741 33.930000 -121.800000 1.196000
50% 3.534800 29.000000 5.229129 1.048780 1166.000000 2.818116 34.260000 -118.490000 1.797000
75% 4.743250 37.000000 6.052381 1.099526 1725.000000 3.282261 37.710000 -118.010000 2.647250
max 15 000100 52 000000 141 909091 34 066667 35682 000000 1243 333333 41 950000 -114 310000 5 000010
1. Positive Correlation
2. Negative Correlation
correlation = house_price_dataframe.corr()
https://colab.research.google.com/drive/1m-p2Nj9HPWN-fEk57uIZ6J2wWjW4pn2R#scrollTo=mv3Vgwq2SHp-&printMode=true 2/5
01/10/2024, 17:26 Copy of Project 4 : House Price Prediction.ipynb - Colab
<Axes: >
X = house_price_dataframe.drop(['price'], axis=1)
Y = house_price_dataframe['price']
print(X)
print(Y)
Longitude
0 -122.23
1 -122.22
2 -122.24
3 -122.25
4 -122.25
... ...
20635 -121.09
20636 -121.21
20637 -121.22
20638 -121.32
20639 -121.24
https://colab.research.google.com/drive/1m-p2Nj9HPWN-fEk57uIZ6J2wWjW4pn2R#scrollTo=mv3Vgwq2SHp-&printMode=true 3/5
01/10/2024, 17:26 Copy of Project 4 : House Price Prediction.ipynb - Colab
3 3.413
4 3.422
...
20635 0.781
20636 0.771
20637 0.923
20638 0.847
20639 0.894
Name: price, Length: 20640, dtype: float64
Model Training
XGBoost Regressor
▾ XGBRegressor i
Evaluation
print(training_data_prediction)
# R squared error
score_1 = metrics.r2_score(Y_train, training_data_prediction)
plt.scatter(Y_train, training_data_prediction)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual Price vs Preicted Price")
plt.show()
https://colab.research.google.com/drive/1m-p2Nj9HPWN-fEk57uIZ6J2wWjW4pn2R#scrollTo=mv3Vgwq2SHp-&printMode=true 4/5
01/10/2024, 17:26 Copy of Project 4 : House Price Prediction.ipynb - Colab
# R squared error
score_1 = metrics.r2_score(Y_test, test_data_prediction)
https://colab.research.google.com/drive/1m-p2Nj9HPWN-fEk57uIZ6J2wWjW4pn2R#scrollTo=mv3Vgwq2SHp-&printMode=true 5/5