0% found this document useful (0 votes)

24 views5 pages

Wang 2021

This paper presents a comparative analysis of house price prediction using OLS linear regression and random forest models based on a dataset with 20 features. The study emphasizes the importance of various factors such as house area, age, and proximity to amenities in determining house prices, and highlights the strengths of random forest in handling non-linear relationships and outliers. The results indicate that while OLS provides a straightforward approach, random forest offers improved accuracy and robustness in predictions.

Uploaded by

4002dj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views5 pages

Wang 2021

Uploaded by

4002dj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

House-price Prediction Based on OLS Linear Regression and

Random Forest
Yige Wang∗
Beijing Jiaotong University, Beijing 100044, China
[email protected]
ABSTRACT multiple linear regression to predict house prices. They found that
Based on a simple dataset with 20 features of houses, this paper the house area, age, proximity to transportation, school, and scenic
builds two forecasting models to predict house prices. It primarily view are the most important environmental factors on house price.
takes a series of data processes, including missing value process, Bourassa, Cantoni, & Hoesli [4] compared OLS linear regression
outlier removal, new variables creation, and correlation analysis. and other three models to explore the characteristic of OLS in
Then this paper conducts the first theoretical model using OLS the relationship between spatial pattern and house prices. They
method. Next, it uses secondary cross-validation to optimize the presented that OLS emphasize neighborhood factors but ignore
parameters of random forest and gets a respectively realistic model. spatial structure. Anderson & West [5], Crompton [6], Dehring &
Finally based on the results, this paper analyzes and compares the Dunse [7] concluded house sales price and a variety of open space,
advantages and disadvantages of the two models. such as urban parks, land in conservation easements, agriculture
farmlands, golf site, and other greenbelts have positive relationships
CCS CONCEPTS by hedonic methods.
Antipov, Fan et al. [8] used the tree decision to evaluate house
• Applied computing; • Operations research; • Consumer
characteristics and to predict house price. They concluded ran-
products;
dom forest is stable in controlling outliers and effective in missing
values as well as categorical variables. Hong [9] maintained that
KEYWORDS
traditional hedonic pricing method is unstable and inaccurate, but
OLS linear regression, random forest, house-price forecasting random forest is competent because it can capture the complexity
ACM Reference Format: and non-linear relationship in practical operation. Yoo [10] applied
Yige Wang. 2021. House-price Prediction Based on OLS Linear Regression machine learning regression methods to house prices prediction
and Random Forest. In 2021 2nd Asia Service Sciences and Software Engineer- to select feature variables and compared the results using machine
ing Conference (ASSE ’21), February 24–26, 2021, Macau, Macao. ACM, New learning with traditional OLS method. He found that random forest
York, NY, USA, 5 pages. https://doi.org/10.1145/3456126.3456139 is useful in selecting importance of variables for the hedonic price
equation because there is an in-depth hypothesis of variables show-
1 INTRODUCTION ing customers’ preferences by random forest method. Ceh [11] used
In the house-price market, the most growing topic in research is GIS explanatory dataset, such as structural and age features of the
‘forecasting’ as its importance and popularity among the masses houses, and neighborhood information to process random forest
and also companies of different scales due to financial benefits and model. He also measured the performance using R2, MAPE, and
its low risk. The real estate industry accounts for a large part of COD (coefficient of dispersion), revealing that machine learning
GPA, and it can directly influence citizens’ consumption level as method on house price prediction is prospective.
well as the overall domestic economy. Thus, accurate house-price OLS linear regression and random tree model are some of the
forecasting not only benefits investors and consumers but also is most popular forecasting models. The former is easier to under-
a foundation of steady economic development. Attributing to the stand and focuses on explaining the correlatives among variables.
weakness of traditional methods, such as subjectivity and empiri- The latter is bagging learning and needs a number of arithmetic
cism, some scholars proposed the Hedonic method to improve the operations. It can deal with non-linear and overfitting problems.
accuracy. The Hedonic method, combined with OLS regression Therefore, this research aims to use the two methods in house price
method, and random forest has been widely applied in house price forecasting and compare the difference between them.
predicting [1].
Based on the Hedonic model, Thibodeau [2], Selim [3] studied
the impact of environmental characteristics on house prices, built 2 DATA PREPROCESSING
Permission to make digital or hard copies of all or part of this work for personal or 2.1 Data Access and Description
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation The house price dataset in USA, comes from Kaggle, containing
on the first page. Copyrights for components of this work owned by others than ACM 21613 samples with 20 features, which is suitable for this study
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a both in data size and features. This dataset includes architectural
fee. Request permissions from [email protected]. features of houses (e.g. number of waterfronts, time of yr_built,
ASSE ’21, February 24–26, 2021, Macau, Macao number of bedrooms), location of residential houses (e.g. latitude
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8908-2/21/02. . . $15.00 and longitude), and other features (e.g. price and grade). As there is
https://doi.org/10.1145/3456126.3456139 no null value in the dataset, we skip processing the missing value.

89
ASSE ’21, February 24–26, 2021, Macau, Macao YIGE WANG

Table 1: Category of 20 Feature Variables

Variable Type Architectural Features Geographic Features Others

Categorical Variable waterfront, yr_built, yr_renovated, zipcode, lat, long id, date
Numerical Variable bedrooms, bathrooms, sqft_living, price, view,
sqft_lot, floors, sqft_living15, condition, grade,
sqft_lot15

Table 2: The Description of Variables

Variable Description
sqft_living Square footage of the apartment interior living space
grade An index from 1 to 13, where 1-3 falls short of building construction and design, 7 has an average level of
construction and design, and 11-13 has a high level of construction and design
sqft_above Square footage of the apartment interior housing space that is above the ground level
sqft_living15 Square footage of the apartment interior living space for the nearest 15 neighbors
date The date when the house was sold
yr_built The year when the house was built

Figure 1: Boxplot of Price Before and After Outliers Removal

2.2 Outlier Removal In this housing price predicting dataset, the units of each vari-
Boxplot is an effective method to detect outliers that are defined as able are different. To eliminate the adverse effect brought by the
the values smaller than QL − 1.5IQR or larger than QL + 1.5IQR dimension of variables, they are all standardized.
(QL is lower quartile; QU is upper quartile). Draw a boxplot of price
with the outliers highlighted in red, and then remove the outliers. 2.4 Correlation Analysis
Due to a large number of variables, the correlation chart was made
for variables with price correlations greater than 0.3. It can be seen
2.3 New Variable Creation and Variable that price has the strongest correlation with the score and house
Transformation area, and also has a correlation with latitude and bathrooms.
Because there is no variable indicating house age and whether
the house is renovated, we create a new variable "built_year", ex- 3 METHODS
pressing the house age, which is the year sold minus the year built. 3.1 OLS Linear Regression
And create new variable "yr_renovated", expressed by 0 and 1 (0
3.1.1 Theory of OLS Linear Regression. Linear regression model
represents no renovation).
assumes regression function E(Y |X ) on the input variable X1, X2,
Detecting the normal distribution of data, and there are some
. . .Xp linear, that is for input vector XT = (X1, X2, . . .Xp) and the
variables that were significantly skewed to the left or right, which
output variables of ynecessary to predict. The following form is:
would lead to heteroscedasticity in the regression analysis. For
p
example, “sqft_above” needs applying logarithmic transformation Õ
due to its skewness 1.068. As a result, the skewness declines to f (X ) = β 0 + X j βj (1)
0.1736, meeting the requirement of normal distribution. We apply j=1
the logarithmic transformation to other variables whose skewness In the formula β j is an unknown parameter and needs optimizing.
is greater than 1. Variable X j is a feature variable. It can come from:

90
House-price Prediction Based on OLS Linear Regression and Random Forest ASSE ’21, February 24–26, 2021, Macau, Macao

Figure 2: Distribution of Sqft_basement

Figure 3: Correlation of Variables Greater than 0.3

(1) Quantitative input variables. This research sets the house price as the dependent variable
(2) Transform quantitative input variables, such as logarithmic Y and the variables after data processing as the independent
transformation square root transformation, etc. variables X j . The Sklearn function in Python is used to divide
(3) The polynomial generated by the expansion of the basis the data into 80% training set and 20% test set. Then use
function, such as X 2 = X 1 , X 3 = X 1 the least square method to give an optimal solution of the
(4) Sorting or dummy encoding of categorical variables. For parameter β = (β0, β1, β2, ..., βj)
example, X j is a qualitative input variable with five levels, Because X is full column rank, strives for the partial deriva-
the five levels can be transformed into 1, 2, . . ., 5. Or one hot tives on loss function L(Y , f (X )), and make the first order
coding, which constitutes several 0 and one 1. differential equals to 0. The final solution is:
(5) Cross effects of variables, such as X 3 = X 1 · X 2 .
β˜ = (X T X )−1X T y (3)

n
Õ Then input feature vector x 0 , and predicted value is given
L(Y , f (X )) = (yi − f (x i ))2 by f˜ (x 0 ) = (1 : x 0 )T β˜
i=1 When the number of feature variables is large, the linear regres-
n p
Õ Õ sion prediction model often has small deviation and large variance,
= (yi − β 0 − x i j β i )2 (2) namely, overfitting. In order to improve the accuracy of the model, a
i=1 j=1 sacrifice will be made to shrink some feature variable coefficient that
= (y − X β)T (y − X β) is not very explanatory to 0. This study chooses lasso regression,

91
ASSE ’21, February 24–26, 2021, Macau, Macao YIGE WANG

Table 3: Outcome of OLS Linear Regression

Dep. Variable: price R-squared: 0.687

Model: OLS Adj. R-squared: 0.687
Method: Least Squares F-statistic: 1770
No. Observations: 16118 Prob (F-statistic): 0
Df Residuals: 16097 Log-Likelihood: -13460
Df Model: 20 AIC: 2.70E+04
Covariance Type: nonrobust BIC: 2.71E+04

Table 4: Sort of Feature Importance

coef std err t P>|t| [0.025 0.975]

if_renovated -4.163 0.542 -7.680 0.000 -5.225 -3.101
built_year 3.195 0.442 7.230 0.000 2.329 4.061
grade 0.373 0.007 51.592 0.000 0.359 0.387
lat 0.372 0.005 79.101 0.000 0.363 0.381
ln_sqft_living15 0.168 0.007 23.668 0.000 0.154 0.181

Figure 4: Bagging Structure

which is adding a penalty term to loss function, to avoid overfitting that the results of the overall model have high accuracy and gener-
phenomenon. alization performance. Its good performance is mainly attributed
to "randomness" and "forest", one making it resistant to overfitting
n p
Õ Õ and one making it more accurate.
L(Y, f(X)) = (yi − f (x i ))2 + λ |ω j | (4) Except for bagging algorithm, random forest applies a unique ap-
i=1 j=1 proach to node splitting. In traditional classification and regression
trees, node splitting is usually conducted using all the predictors,
3.1.2 Results of OLS Linear Regression. The training set was mod-
whereas a random subset of the predictors is used to select on each
eled and the results were shown in the following table. The sig-
node in random forest node splitting. When a tree is constructed,
nificance level is passed. R2 was 0.687 so the fitting effect was
nearly one third of the observations are hardly applied to any indi-
impractical, and the accuracy verified by the testing set was only
vidual trees, which is called out-of-bag (OOB) due to the hold-out
0.682.
cases. In the OOB approach, the errors excluded from the data gen-
The important features are recognized. The result presents con-
erated by each regression tree are used to give random forest a
sumers emphasize more on house condition and grade than spiral
notice of the relative advantages and correlation of the tree [12].
characteristics.
3.2.2 Secondary Cross Validation. In machine learning, generaliza-
3.2 Random Forest tion error is used to measure the accuracy of the prediction model.
3.2.1 Theory of Random Forest Regression. Random forest is an When the random forest model is complex, overfitting may appear,
Ensemble Learning algorithm, belonging to bagging. By combining leading to large generalization error. In contrast, when the model
several weak classifiers, the final results are voted or averaged, so is simple, underfitting will come out, with large generation error.

92
House-price Prediction Based on OLS Linear Regression and Random Forest ASSE ’21, February 24–26, 2021, Macau, Macao

Table 5: Value Range of Secondary Cross Validation Param-

eters

Parameter First Time Second Final

Time
n_estimators [1,200] [167,176] 170
min_samples_split[2, 5, 10] [3, 9] 5
min_samples_leaf [1, 2, 4] [2, 3] 2
max_features [‘auto’,’sprt’] [‘auto’] ‘auto’
max_depth [10,100] or [76,84] 84
[None]
bootstrap [True, False] [True] TRUE

Table 6: Sort of Feature Importance

Variable Importance
lat 0.44
ln_sqft_living 0.32 Figure 5: Sort of Variables Importance using Random Forest
grade 0.06
long 0.05
than OLS in the face of complex and nonlinear data. Therefore, in
ln_sqft_living15 0.03
these two conditions, one is a large number of observations in this
ln_sqft_lot 0.02
dataset, the other is complex and has noise samples, we recommend
ln_sqft_lot15 0.02
to choose random forest.
bathrooms 0.01
view 0.01 REFERENCES
[1] Shekarian, E., & Fallahpour, A.. Predicting house price via gene expression pro-
gramming. International Journal of Housing Markets and Analysis, 6(3) (2013),
250-268.
Therefore, parameters such as max_depth and min_samples_leaf [2] Thibodeau, T.G. Marking Single-Family Property Values to Market. Real Estate
will be adjusted to reduce the depth, branches and complexity of Economics, 2003, 31:1, 1-22.
the tree. [3] Selim, H.. "Determinants of house prices in Turkey: hedonic regression versus
artificial neural network", Expert Systems with Applications, Vol. 36 (2009) No. 2,
An optimal combination of parameters will significantly prompt pp. 2843-2852.
the performance of a random forest model. This study uses 17290 [4] Bourassa, S. C., Cantoni, E., & Hoesli, M.. Predicting house prices with spatial
out of 21613 samples as training set and uses the remaining 4323 dependence: A comparison of alternative methods. The Journal of Real Estate
Research, 32(2) (2010), 139-160.
samples to test the model. Training and testing sets are selected [5] Anderson, S. T., & West, S. E. Open space, residential property values, and spatial
randomly according to the municipality to which each residential context. Regional Science and Urban Economics, 36 (2006), 773–789.
[6] Crompton, J. L. The impact of parks on property values: A review of the empirical
property belongs. In the course of cross validation, 80% of original evidence. Journal of Leisure Research, 33(1) (2001), 1–31.
training data are selected to train model and 20% of training sets are [7] Dehring, C., & Dunse, N. Housing density and the effect of proximity to public
selected to validate the model. For the reason that there are various open space in Aberdeen, Scotland. Real Estate Economics, 34(4) (2006), 553–566.
[8] Antipov, E. A., & Pokryshevskaya, E. B.. Mass appraisal of residential apartments:
parameters with a large range to test in this model, secondary cross- An application of Random forest for valuation and a CART-based approach for
validation is used to select the best combination of parameters. model diagnostics. Expert Systems with Applications, 39(2) (2012), 1772-1778.
[9] Hong, J., Choi, H., & Kim, W. S.. A house price valuation based on the ran-
3.2.3 Results of Random Forest. Use the optimal parameter to estab- dom forest approach: the mass appraisal of residential property in South Korea.
International Journal of Strategic Property Management, 24(3) (2020), 140-152.
lish the random forest model, and get the final result, the accuracy [10] Yoo, Sanglim, Jungho Im, and John E. Wagner. "Variable Selection for Hedonic
is 0.8589, which precedes OLS model. Model using Machine Learning Approaches: A Case Study in Onondaga County,
The figure below is the importance order of feature variables NY." Landscape and Urban Planning 107, no. 3 (2012): 293-306.
[11] Čeh, Marjan, Milan Kilibarda, Anka Lisec, and Branislav Bajat. "Estimating the
according to the random forest algorithm. The dotted red line rep- Performance of Random Forest Versus Multiple Regression for Predicting Prices
resents the cumulative importance is 0.95. Four of the top five of the Apartments." ISPRS International Journal of Geo-Information 7, no. 5 (2018):
variables are related with geographical characteristics (latitude, 168.
[12] Breiman, L. Random forests. Machine Learning, 45(1) (2001), 5–32.
sqft_living, longtitude, and sqft_living15).

4 CONCLUSION
In terms of important feature ranking, although there are similar
features, OLS ranks house age and renovation first, while decision
tree focuses on spatial and geographical features. In terms of price
prediction, decision tree fitting effect is obviously more practical

RF Dynam Pro Manual en
No ratings yet
RF Dynam Pro Manual en
116 pages
KIIT Deemed To Be University: A Project Report
No ratings yet
KIIT Deemed To Be University: A Project Report
33 pages
Henry E. Kyburg JR Theory and Measurement
100% (1)
Henry E. Kyburg JR Theory and Measurement
280 pages
Linear Correlation
No ratings yet
Linear Correlation
5 pages
House Price Prediction - Research Paper FINAL DRAFT
100% (1)
House Price Prediction - Research Paper FINAL DRAFT
10 pages
Comparing Linear Regression and Decision Trees For Housing Price Prediction
No ratings yet
Comparing Linear Regression and Decision Trees For Housing Price Prediction
8 pages
Real Estate Price Prediction Based On Linear Regre
No ratings yet
Real Estate Price Prediction Based On Linear Regre
10 pages
Vtu Notes
No ratings yet
Vtu Notes
91 pages
Clasification of 5 Mark Questions - New Book: Dr. K. Thirumurugan, GHSS
50% (2)
Clasification of 5 Mark Questions - New Book: Dr. K. Thirumurugan, GHSS
2 pages
Ciiltire Guidline Book: Impulsive Diagnol
0% (1)
Ciiltire Guidline Book: Impulsive Diagnol
16 pages
Decision Trees For Objective House Price Prediction
No ratings yet
Decision Trees For Objective House Price Prediction
4 pages
Summer Internship Outlook
No ratings yet
Summer Internship Outlook
35 pages
House Price Prediction Using Linear Regression in ML
No ratings yet
House Price Prediction Using Linear Regression in ML
9 pages
Electromagnetic Fields R 22 - Hyd ECE Course Structure & Syllabus
No ratings yet
Electromagnetic Fields R 22 - Hyd ECE Course Structure & Syllabus
2 pages
Subtracting-Fractions-Unlike Denominators
0% (1)
Subtracting-Fractions-Unlike Denominators
2 pages
AS1 GCS200400 TranTrungTien BI
No ratings yet
AS1 GCS200400 TranTrungTien BI
44 pages
Geometry: (Common Core)
No ratings yet
Geometry: (Common Core)
24 pages
Lecture 1 Functions, Limits and Continuity
No ratings yet
Lecture 1 Functions, Limits and Continuity
16 pages
Prediction of Rental Prices
No ratings yet
Prediction of Rental Prices
7 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
3 pages
Business House Price Assessment 3
No ratings yet
Business House Price Assessment 3
24 pages
Cambridge International List of Fees June 2023-1
No ratings yet
Cambridge International List of Fees June 2023-1
12 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
15 pages
ML Project CLG
No ratings yet
ML Project CLG
62 pages
3D Kinematics: Presented By: Amir Patel PHD (Mechatronics) Cape Town
No ratings yet
3D Kinematics: Presented By: Amir Patel PHD (Mechatronics) Cape Town
32 pages
Lesson Exemplar Math Demo
No ratings yet
Lesson Exemplar Math Demo
13 pages
Real-Estate Property
No ratings yet
Real-Estate Property
11 pages
Intership Report
No ratings yet
Intership Report
20 pages
Synopsis Format1 PDF
No ratings yet
Synopsis Format1 PDF
6 pages
Predicting House Prices Grades: Business Computing - Ii
No ratings yet
Predicting House Prices Grades: Business Computing - Ii
14 pages
Data Analysis and Modeling
No ratings yet
Data Analysis and Modeling
24 pages
House Price Prediction
No ratings yet
House Price Prediction
17 pages
Capstone Project PPT by Roshan Padhi
No ratings yet
Capstone Project PPT by Roshan Padhi
9 pages
Real Estate Price Prediction Using A Logistic Regression Model
No ratings yet
Real Estate Price Prediction Using A Logistic Regression Model
8 pages
Phase 5
No ratings yet
Phase 5
5 pages
A Collection of Useful Scripts, Tutorials and Other Python Related Thing
No ratings yet
A Collection of Useful Scripts, Tutorials and Other Python Related Thing
5 pages
Housing Prices AI
No ratings yet
Housing Prices AI
10 pages
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
No ratings yet
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
15 pages
House Price Prediction With Analysis
No ratings yet
House Price Prediction With Analysis
9 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
14 pages
Oral Presentation
No ratings yet
Oral Presentation
9 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
Abstract Machine Learning Has Been Instrumental Across Diver
No ratings yet
Abstract Machine Learning Has Been Instrumental Across Diver
6 pages
Story Point Estimation Copy
No ratings yet
Story Point Estimation Copy
16 pages
Rev Ajrcos 101262 Ina A
No ratings yet
Rev Ajrcos 101262 Ina A
11 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
Dsbda Mini Manav
No ratings yet
Dsbda Mini Manav
17 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
House Price Predicting Model Using
No ratings yet
House Price Predicting Model Using
7 pages
YuanLishun Project2019
No ratings yet
YuanLishun Project2019
18 pages
Assessment and Service Life Updating of Existing Tunnels: Safe & Reliable Tunnels. Innovative European Achievements
No ratings yet
Assessment and Service Life Updating of Existing Tunnels: Safe & Reliable Tunnels. Innovative European Achievements
10 pages
Advanced Regression Techniques Based Housing Price Prediction Model
No ratings yet
Advanced Regression Techniques Based Housing Price Prediction Model
11 pages
1 s2.0 S187705092403151X Main
No ratings yet
1 s2.0 S187705092403151X Main
7 pages
MBB JETIR2204579
No ratings yet
MBB JETIR2204579
5 pages
Comparative Study of House Price Prediction Using Machine Learning Research Paper
No ratings yet
Comparative Study of House Price Prediction Using Machine Learning Research Paper
14 pages
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
Report
No ratings yet
Report
7 pages
Synopsis 01
No ratings yet
Synopsis 01
2 pages
Real Estate Price Prediction
No ratings yet
Real Estate Price Prediction
7 pages
House Prices Prediction in King County
No ratings yet
House Prices Prediction in King County
10 pages
Maths Notes
No ratings yet
Maths Notes
27 pages
18BCS115
No ratings yet
18BCS115
25 pages
Fyp Proposal
No ratings yet
Fyp Proposal
3 pages
House Prices
No ratings yet
House Prices
5 pages
120 GSJ10713
No ratings yet
120 GSJ10713
8 pages
ISL56 Python Lab - EXAM-FINAL-QB
No ratings yet
ISL56 Python Lab - EXAM-FINAL-QB
4 pages
Housing Price Prediction Model Using Machine Learning
No ratings yet
Housing Price Prediction Model Using Machine Learning
4 pages
Project1 Report1
No ratings yet
Project1 Report1
3 pages
BIMA Abstract
No ratings yet
BIMA Abstract
3 pages
Utkarsh Gupta - House Price Prediction
No ratings yet
Utkarsh Gupta - House Price Prediction
6 pages
Trigonometry p2 Revision
No ratings yet
Trigonometry p2 Revision
30 pages
SSRN Id3565512
No ratings yet
SSRN Id3565512
5 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
Iamsp 2
No ratings yet
Iamsp 2
8 pages
Ijcse Icter P113
No ratings yet
Ijcse Icter P113
5 pages
Mas Alla de La Codicia y El Miedo PDF
No ratings yet
Mas Alla de La Codicia y El Miedo PDF
4 pages
Scientific Computations With Python: E & ICT Academy, National Institute of Technology, Warangal
No ratings yet
Scientific Computations With Python: E & ICT Academy, National Institute of Technology, Warangal
2 pages
Educ 202
No ratings yet
Educ 202
8 pages
House Price Prediction
No ratings yet
House Price Prediction
3 pages
Circuits Lab Exp 4 Report
No ratings yet
Circuits Lab Exp 4 Report
15 pages
Inductors: Publishing As Pearson (Imprint) Boylestad
No ratings yet
Inductors: Publishing As Pearson (Imprint) Boylestad
62 pages
Bangalore House Price Prediction Using The Best Machine Learning Model Submitted by Rukzana Vadakkekudy Rassak P2682221
No ratings yet
Bangalore House Price Prediction Using The Best Machine Learning Model Submitted by Rukzana Vadakkekudy Rassak P2682221
9 pages
Sixth Grade End of Year-Reflection Coloring Sheet
No ratings yet
Sixth Grade End of Year-Reflection Coloring Sheet
2 pages
Math 8 Condensed Syllabus 2024 - 2025
No ratings yet
Math 8 Condensed Syllabus 2024 - 2025
3 pages
MD 145 8 081704
No ratings yet
MD 145 8 081704
16 pages
Exercise Solutions of Java Functions
No ratings yet
Exercise Solutions of Java Functions
2 pages
Assignment 3 Grade 10 Math
No ratings yet
Assignment 3 Grade 10 Math
3 pages
Howto Program The Z80-SIO
No ratings yet
Howto Program The Z80-SIO
53 pages

Wang 2021

Uploaded by

Wang 2021

Uploaded by

House-price Prediction Based on OLS Linear Regression and

Table 1: Category of 20 Feature Variables

Variable Type Architectural Features Geographic Features Others

Table 2: The Description of Variables

Figure 1: Boxplot of Price Before and After Outliers Removal

Figure 2: Distribution of Sqft_basement

Figure 3: Correlation of Variables Greater than 0.3

Table 3: Outcome of OLS Linear Regression

Dep. Variable: price R-squared: 0.687

Table 4: Sort of Feature Importance

coef std err t P>|t| [0.025 0.975]

Figure 4: Bagging Structure

Table 5: Value Range of Secondary Cross Validation Param-

Parameter First Time Second Final

Table 6: Sort of Feature Importance

You might also like