0% found this document useful (0 votes)

10 views11 pages

Houses Prices Prediction Model

The document outlines a data analysis process using Python, focusing on a housing dataset for predicting sale prices based on features like square footage and the number of bedrooms and bathrooms. It includes steps for data loading, handling missing values, exploratory data analysis, model training using linear regression, and evaluation of model performance with metrics such as mean absolute error and mean squared error. The results indicate a moderate correlation between the features and sale prices, with the model coefficients and predictions provided.

Uploaded by

nermine.limem.tbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

Houses Prices Prediction Model

Uploaded by

nermine.limem.tbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

task1

July 11, 2024

0.0.1 Import Librairies

[2170]: import pandas as pd

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn import metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split, cross_val_score

0.0.2 Loading the dataset

[2042]: train_dataset=pd.read_csv('train.csv')
test_dataset=pd.read_csv('test.csv')
print('Training dataset count\n',training_dataset.count())
print('Test dataset count\n',training_dataset.count())

Training dataset count

Id 1460
MSSubClass 1460
MSZoning 1460
LotFrontage 1201
LotArea 1460
…
MoSold 1460
YrSold 1460
SaleType 1460
SaleCondition 1460
SalePrice 1460
Length: 81, dtype: int64
Test dataset count
Id 1460
MSSubClass 1460
MSZoning 1460

1
LotFrontage 1201
LotArea 1460
…
MoSold 1460
YrSold 1460
SaleType 1460
SaleCondition 1460
SalePrice 1460
Length: 81, dtype: int64

[2044]: # Checking Null Values for the train dataset

(train_dataset.isnull().sum() /train_dataset.shape[0] *
100).sort_values(ascending=False).round(2).astype(str) + ' %'

[2044]: PoolQC 99.52 %

MiscFeature 96.3 %
Alley 93.77 %
Fence 80.75 %
MasVnrType 59.73 %
…
ExterQual 0.0 %
Exterior2nd 0.0 %
Exterior1st 0.0 %
RoofMatl 0.0 %
SalePrice 0.0 %
Length: 81, dtype: object

[2046]: # Checking Null Values for the test dataset

(test_dataset.isnull().sum() /test_dataset.shape[0] *
100).sort_values(ascending=False).round(2).astype(str) + ' %'

[2046]: PoolQC 99.79 %

MiscFeature 96.5 %
Alley 92.67 %
Fence 80.12 %
MasVnrType 61.27 %
…
Electrical 0.0 %
1stFlrSF 0.0 %
2ndFlrSF 0.0 %
LowQualFinSF 0.0 %
SaleCondition 0.0 %
Length: 80, dtype: object

[2048]: #replace NaN values with 0 for the test set

test_dataset['BsmtFullBath']=test_dataset['BsmtFullBath'].fillna(0)
test_dataset['BsmtHalfBath']=test_dataset['BsmtHalfBath'].fillna(0)

2
0.0.3 Exploring the relationship between the the price of house based on their square
footage and the number of bedrooms and bathrooms

[2120]: #Extracting the independent and dependent variables columns from the training␣
↪set

train_dataset['Bathroom']=train_dataset['BsmtFullBath']+train_dataset['BsmtHalfBath']+train_da
# Sample data (assuming you have your data loaded into a DataFrame)
data = {
'BedroomNb': train_dataset['BedroomAbvGr'],
'BathroomNb': train_dataset['Bathroom'],
'SquareFg':train_dataset['LotArea'],
'Saleprice':train_dataset['SalePrice']
}
# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

[2124]: #checking for outliers

fig, axs = plt.subplots(3, figsize = (5,15))
plt1 = sns.boxplot(df['BedroomNb'], ax = axs[0])
plt2 = sns.boxplot(df['BathroomNb'], ax = axs[1])
plt3 = sns.boxplot(df['SquareFg'], ax = axs[2])
plt.tight_layout()

3
4
[2084]: #Distribution of the target variable
sns.distplot(df['Saleprice_in_hundreds']);

C:\Users\nermi\AppData\Local\Temp\ipykernel_9812\3099313100.py:2: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(df['Saleprice_in_hundreds']);
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):

5
[2126]: #Relationship of Sales Price with other variables
sns.pairplot(df, x_vars=['BedroomNb', 'BathroomNb','SquareFg'],␣
↪y_vars='Saleprice', height=4, aspect=1, kind='scatter')

plt.show()

[2128]: print(df.describe())

BedroomNb BathroomNb SquareFg Saleprice

count 1460.000000 1460.000000 1460.000000 1460.000000
mean 2.866438 2.430822 10516.828082 180921.195890
std 0.815778 0.922647 9981.264932 79442.502883
min 0.000000 1.000000 1300.000000 34900.000000
25% 2.000000 2.000000 7553.500000 129975.000000
50% 3.000000 2.000000 9478.500000 163000.000000
75% 3.000000 3.000000 11601.500000 214000.000000
max 8.000000 6.000000 215245.000000 755000.000000

[2130]: sns.pairplot(df)
plt.show()

C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
C:\Users\nermi\anaconda\Lib\site-packages\seaborn\_oldcore.py:1119:
FutureWarning: use_inf_as_na option is deprecated and will be removed in a

6
future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):

[2131]: # Create the correlation matrix and represent it as a heatmap.

sns.heatmap(df.corr(), annot = True, cmap = 'coolwarm')
plt.show()

7
[2134]: #Extracting the independent and dependent variables columns from the training␣
↪set

train_dataset['Bathroom']=train_dataset['BsmtFullBath']+train_dataset['BsmtHalfBath']+train_da
train_data = {
'BedroomNb': train_dataset['BedroomAbvGr'],
'BathroomNb': train_dataset['Bathroom'],
'SquareFg':train_dataset['LotArea']
}
# Create a training DataFrame from the dictionary
x_train= pd.DataFrame(train_data)
y_train=train_dataset['SalePrice']

[2146]: #Extracting the independent and dependent variables columns from the test set
test_dataset['Bathroom']=test_dataset['BsmtFullBath']+test_dataset['BsmtHalfBath']+test_datase
test_data = {
'BedroomNb': test_dataset['BedroomAbvGr'],
'BathroomNb': test_dataset['Bathroom'],
'SquareFg':test_dataset['LotArea']
}
# Create a DataFrame from the dictionary
x_test = pd.DataFrame(test_data)

8
y_test=sample_submission['SalePrice']

[2148]: reg_model = linear_model.LinearRegression()

[2150]: reg_model = LinearRegression().fit(x_train, y_train)

[2152]: #Printing the model coefficients

print('Intercept: ',reg_model.intercept_)
# pair the feature names with the coefficients
list(zip(x_train, reg_model.coef_))

Intercept: 47216.372461355786

[2152]: [('BedroomNb', -726.2539203998313),

('BathroomNb', 50466.0363499113),
('SquareFg', 1.2468244375637385)]

[2154]: #Predicting the Test and Train set result

y_pred=reg_model.predict(x_test)
x_pred=reg_model.predict(x_train)

[2156]: print("Prediction for test set: {}".format(y_pred))

Prediction for test set: [110720.49458383 163758.1276507 213679.3017214 …

170179.91823085
158987.77735258 208438.89861032]

[2158]: #the predictions confidence interval

prediction_train.conf_int()

[2158]: array([[250918.4053824 , 263956.43981199],

[204628.49815082, 212181.97055014],
[254572.34929148, 267284.71275324],
…,
[150411.99664392, 162622.43544407],
[154149.4562222 , 163473.20453818],
[205066.30335169, 212584.52502019]])

[2160]: #Actual value and the predicted value

reg_model_diff = pd.DataFrame({'Actual value': y_test, 'Predicted value':␣
↪y_pred})

reg_model_diff

[2160]: Actual value Predicted value

0 169277.052498 110720.494584
1 187758.393989 163758.127651
2 183583.683570 213679.301721

9
3 179317.477511 208876.533988
4 150730.079977 152936.293630
… … …
1454 167081.220949 148383.535511
1455 164788.778231 148331.168885
1456 219222.423400 170179.918231
1457 184924.279659 158987.777353
1458 187741.866657 208438.898610

[1459 rows x 2 columns]

[2162]: mae = metrics.mean_absolute_error(y_test, y_pred)

mse = metrics.mean_squared_error(y_test, y_pred)
r2 = np.sqrt(metrics.mean_squared_error(y_test, y_pred))

print('Mean Absolute Error:', mae)

print('Mean Square Error:', mse)
print('Root Mean Square Error:', r2)

Mean Absolute Error: 38059.81770203212

Mean Square Error: 2187213754.2428126
Root Mean Square Error: 46767.65713869803

[2164]: model=smf.ols(formula='y_train~BedroomNb+BathroomNb+SquareFg',data=data).fit()
print(model.summary())

OLS Regression Results

==============================================================================
Dep. Variable: y_train R-squared: 0.399
Model: OLS Adj. R-squared: 0.398
Method: Least Squares F-statistic: 322.7
Date: Thu, 11 Jul 2024 Prob (F-statistic): 1.29e-160
Time: 22:47:28 Log-Likelihood: -18172.
No. Observations: 1460 AIC: 3.635e+04
Df Residuals: 1456 BIC: 3.637e+04
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.722e+04 6567.626 7.189 0.000 3.43e+04 6.01e+04
BedroomNb -726.2539 2058.527 -0.353 0.724 -4764.249 3311.741
BathroomNb 5.047e+04 1838.538 27.449 0.000 4.69e+04 5.41e+04
SquareFg 1.2468 0.165 7.560 0.000 0.923 1.570
==============================================================================
Omnibus: 542.033 Durbin-Watson: 1.993
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3304.854
Skew: 1.604 Prob(JB): 0.00

10
Kurtosis: 9.636 Cond. No. 6.09e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
[2] The condition number is large, 6.09e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

[2166]: df=pd.DataFrame({'Actual':y_test,'Predicted':y_pred})

[2168]: df1=df.head(60)
df1.plot(kind='bar',figsize=(16,7))
plt.grid(which='major',linestyle='-',linewidth='0.5',color='green')
plt.grid(which='minor',linestyle=':',linewidth='0.5',color='black')
plt.show()

[ ]:

Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ML Batch
No ratings yet
ML Batch
36 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
ML Manual
No ratings yet
ML Manual
30 pages
ML PDF
No ratings yet
ML PDF
30 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
West Rox
No ratings yet
West Rox
29 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Investigating Data PDF
100% (1)
Investigating Data PDF
44 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Wa0003
No ratings yet
Wa0003
16 pages
Exercise4 Solution
No ratings yet
Exercise4 Solution
20 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Skewness
50% (2)
Skewness
6 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Report
No ratings yet
Report
40 pages
ML Spy Programs
No ratings yet
ML Spy Programs
16 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
ML
No ratings yet
ML
17 pages
Ash Regression
No ratings yet
Ash Regression
11 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
ML Journal External
No ratings yet
ML Journal External
14 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
ML Programs
No ratings yet
ML Programs
14 pages
Final ML File
No ratings yet
Final ML File
34 pages
1
No ratings yet
1
13 pages
Machine Learning - Code - Jupiter
No ratings yet
Machine Learning - Code - Jupiter
14 pages
ML Regression
No ratings yet
ML Regression
9 pages
Sofcomputing Da2
No ratings yet
Sofcomputing Da2
7 pages
Program
No ratings yet
Program
10 pages
ML
No ratings yet
ML
11 pages
ML Manual
No ratings yet
ML Manual
9 pages
V
No ratings yet
V
8 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Boston House Prediction - Colab1
No ratings yet
Boston House Prediction - Colab1
10 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
No ratings yet
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
7 pages
ML Minimized Programs
No ratings yet
ML Minimized Programs
9 pages
New Opendocument Text
No ratings yet
New Opendocument Text
7 pages
Analysis On Weight Capacity
No ratings yet
Analysis On Weight Capacity
4 pages
External
No ratings yet
External
11 pages
Python File
No ratings yet
Python File
5 pages
Malicious Coding
No ratings yet
Malicious Coding
4 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Code 1
No ratings yet
Code 1
3 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
Docu 4
No ratings yet
Docu 4
3 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
Deepak Data Analysis 1
No ratings yet
Deepak Data Analysis 1
31 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
ML Four To Eight
No ratings yet
ML Four To Eight
3 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Data Science Libraries
No ratings yet
Data Science Libraries
4 pages
Correlation Analysis Correlations: Pearson Product Moment Correlation and Spearman Rank-Order Correlation
100% (1)
Correlation Analysis Correlations: Pearson Product Moment Correlation and Spearman Rank-Order Correlation
29 pages
Mlext
No ratings yet
Mlext
1 page
Linear
No ratings yet
Linear
2 pages
4 Measures of Central Tendency
No ratings yet
4 Measures of Central Tendency
18 pages
Book Credit Scoring
No ratings yet
Book Credit Scoring
382 pages
SB11 - Group 1
100% (1)
SB11 - Group 1
33 pages
Bridging The Gap - The Impact of Open Banking On Traditional Banking and FinTech Collaboration
No ratings yet
Bridging The Gap - The Impact of Open Banking On Traditional Banking and FinTech Collaboration
11 pages
Answers IBS
No ratings yet
Answers IBS
13 pages
ENGDAT1 - Module2 (Review) PDF
No ratings yet
ENGDAT1 - Module2 (Review) PDF
40 pages
Statistics Assignment Chinar Dawod Ozair
100% (1)
Statistics Assignment Chinar Dawod Ozair
12 pages
Startup Ecosystem Analysis Model
No ratings yet
Startup Ecosystem Analysis Model
21 pages
Steps For The Technical Analysis
No ratings yet
Steps For The Technical Analysis
3 pages
Jurnal Persepsi Masyarakat Terhadap Pencemaran Sungai
No ratings yet
Jurnal Persepsi Masyarakat Terhadap Pencemaran Sungai
11 pages
Maths Project Work For FA-4
No ratings yet
Maths Project Work For FA-4
14 pages
AIand Credit Scoring
No ratings yet
AIand Credit Scoring
13 pages
2 FCFF FCFE Valuation Models Blank
No ratings yet
2 FCFF FCFE Valuation Models Blank
2 pages
Enhancing Portfolio Management Using Artificial Intelligence
No ratings yet
Enhancing Portfolio Management Using Artificial Intelligence
20 pages
Financial Market Project Report
No ratings yet
Financial Market Project Report
46 pages
Addressing Bias and Data Privacy Concerns in AI-Driven Credit Scoring Systems Through Cybersecurity Risk Assessment
No ratings yet
Addressing Bias and Data Privacy Concerns in AI-Driven Credit Scoring Systems Through Cybersecurity Risk Assessment
25 pages
2020 Pgt202e Measurement Scale
No ratings yet
2020 Pgt202e Measurement Scale
42 pages
PFE Report
No ratings yet
PFE Report
10 pages
Correction Project 5
No ratings yet
Correction Project 5
35 pages
Project Plan Template: Activity Start Date
No ratings yet
Project Plan Template: Activity Start Date
5 pages
Enhancing Financial Decision-Making and Education in Fintech With Data Analytics and Information Technology
No ratings yet
Enhancing Financial Decision-Making and Education in Fintech With Data Analytics and Information Technology
9 pages
Business-Plan EasyBank
No ratings yet
Business-Plan EasyBank
16 pages
Credit Scoring Models Enhancement Using Support Vector Machines
No ratings yet
Credit Scoring Models Enhancement Using Support Vector Machines
6 pages
MMW Transes
No ratings yet
MMW Transes
5 pages
CH 2 Central Tendency F P. 14-32
No ratings yet
CH 2 Central Tendency F P. 14-32
20 pages
RST-MLP Method
No ratings yet
RST-MLP Method
11 pages
Using Data Mining To Improve Assessment
No ratings yet
Using Data Mining To Improve Assessment
10 pages
14 4 Extra Practice L2
No ratings yet
14 4 Extra Practice L2
7 pages
Files Sent To The Students - Compressed
No ratings yet
Files Sent To The Students - Compressed
11 pages
Regression Analysis - Classical Assumptions Additional Notes
No ratings yet
Regression Analysis - Classical Assumptions Additional Notes
7 pages
Coursework FSA Fall 2024-2025
No ratings yet
Coursework FSA Fall 2024-2025
4 pages
Financial Markets Project 2023 2024
No ratings yet
Financial Markets Project 2023 2024
3 pages
Measures of Variation
No ratings yet
Measures of Variation
15 pages
Unit 3 Summarising Data - Averages and Dispersion
No ratings yet
Unit 3 Summarising Data - Averages and Dispersion
22 pages
Futuresvaluation
No ratings yet
Futuresvaluation
2 pages
3 Variations 1
No ratings yet
3 Variations 1
21 pages
Chapter 3. Describing Data-Numerical Measures
No ratings yet
Chapter 3. Describing Data-Numerical Measures
24 pages
Documentclass (Article)
No ratings yet
Documentclass (Article)
2 pages
Output Hasil Spss
No ratings yet
Output Hasil Spss
7 pages
Descriptive Statistics - SPSS Annotated Output
No ratings yet
Descriptive Statistics - SPSS Annotated Output
13 pages
Page de Garde
No ratings yet
Page de Garde
3 pages
Ranking Ratios Based On Weighted Scores
No ratings yet
Ranking Ratios Based On Weighted Scores
2 pages
Linear Correlation Analysis Application
No ratings yet
Linear Correlation Analysis Application
4 pages
Experimental Design Assignment
No ratings yet
Experimental Design Assignment
4 pages
One-Sample Kolmogorov-Smirnov Test: Npar Tests
No ratings yet
One-Sample Kolmogorov-Smirnov Test: Npar Tests
25 pages
FIN2704 AY24-25 Sem1 Tutorial 4 Questions
No ratings yet
FIN2704 AY24-25 Sem1 Tutorial 4 Questions
2 pages
Lesson 4 1 MCT Ungrouped Data
No ratings yet
Lesson 4 1 MCT Ungrouped Data
3 pages
Comparing Ordinary Kriging Interpolation Variance and Indicator Kriging Conditional Variance For Assessing Uncertainties at Unsampled Locations
No ratings yet
Comparing Ordinary Kriging Interpolation Variance and Indicator Kriging Conditional Variance For Assessing Uncertainties at Unsampled Locations
5 pages
Standard Scores
No ratings yet
Standard Scores
3 pages
X FX N: MATH 7 Quiz Complete The Frequency Distribution Table Below (5 Points)
No ratings yet
X FX N: MATH 7 Quiz Complete The Frequency Distribution Table Below (5 Points)
1 page