Houses Prices Prediction Model
Houses Prices Prediction Model
[2042]: train_dataset=pd.read_csv('train.csv')
test_dataset=pd.read_csv('test.csv')
print('Training dataset count\n',training_dataset.count())
print('Test dataset count\n',training_dataset.count())
1
LotFrontage 1201
LotArea 1460
…
MoSold 1460
YrSold 1460
SaleType 1460
SaleCondition 1460
SalePrice 1460
Length: 81, dtype: int64
2
0.0.3 Exploring the relationship between the the price of house based on their square
footage and the number of bedrooms and bathrooms
[2120]: #Extracting the independent and dependent variables columns from the training␣
↪set
train_dataset['Bathroom']=train_dataset['BsmtFullBath']+train_dataset['BsmtHalfBath']+train_da
# Sample data (assuming you have your data loaded into a DataFrame)
data = {
'BedroomNb': train_dataset['BedroomAbvGr'],
'BathroomNb': train_dataset['Bathroom'],
'SquareFg':train_dataset['LotArea'],
'Saleprice':train_dataset['SalePrice']
}
# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
3
4
[2084]: #Distribution of the target variable
sns.distplot(df['Saleprice_in_hundreds']);
C:\Users\nermi\AppData\Local\Temp\ipykernel_9812\3099313100.py:2: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(df['Saleprice_in_hundreds']);
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
5
[2126]: #Relationship of Sales Price with other variables
sns.pairplot(df, x_vars=['BedroomNb', 'BathroomNb','SquareFg'],␣
↪y_vars='Saleprice', height=4, aspect=1, kind='scatter')
plt.show()
[2128]: print(df.describe())
[2130]: sns.pairplot(df)
plt.show()
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
6
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
7
[2134]: #Extracting the independent and dependent variables columns from the training␣
↪set
train_dataset['Bathroom']=train_dataset['BsmtFullBath']+train_dataset['BsmtHalfBath']+train_da
train_data = {
'BedroomNb': train_dataset['BedroomAbvGr'],
'BathroomNb': train_dataset['Bathroom'],
'SquareFg':train_dataset['LotArea']
}
# Create a training DataFrame from the dictionary
x_train= pd.DataFrame(train_data)
y_train=train_dataset['SalePrice']
[2146]: #Extracting the independent and dependent variables columns from the test set
test_dataset['Bathroom']=test_dataset['BsmtFullBath']+test_dataset['BsmtHalfBath']+test_datase
test_data = {
'BedroomNb': test_dataset['BedroomAbvGr'],
'BathroomNb': test_dataset['Bathroom'],
'SquareFg':test_dataset['LotArea']
}
# Create a DataFrame from the dictionary
x_test = pd.DataFrame(test_data)
8
y_test=sample_submission['SalePrice']
Intercept: 47216.372461355786
reg_model_diff
9
3 179317.477511 208876.533988
4 150730.079977 152936.293630
… … …
1454 167081.220949 148383.535511
1455 164788.778231 148331.168885
1456 219222.423400 170179.918231
1457 184924.279659 158987.777353
1458 187741.866657 208438.898610
[2164]: model=smf.ols(formula='y_train~BedroomNb+BathroomNb+SquareFg',data=data).fit()
print(model.summary())
10
Kurtosis: 9.636 Cond. No. 6.09e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
[2] The condition number is large, 6.09e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
[2166]: df=pd.DataFrame({'Actual':y_test,'Predicted':y_pred})
[2168]: df1=df.head(60)
df1.plot(kind='bar',figsize=(16,7))
plt.grid(which='major',linestyle='-',linewidth='0.5',color='green')
plt.grid(which='minor',linestyle=':',linewidth='0.5',color='black')
plt.show()
[ ]:
11