0% found this document useful (0 votes)

30 views

CS Assignment (Raam Kumar)

This document describes a student's computer project report on predicting house prices. It was submitted by M. Raam Kumar, a grade 12 student at Vedic Vidyashram Senior Secondary School, as partial fulfillment of their practical computer science requirements. The project aims to help customers predict future house prices based on previous market trends and price ranges, reducing risks in real estate transactions. It involves building a machine learning model using linear regression on housing data to perform price predictions.

Uploaded by

Royal Kavi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

CS Assignment (Raam Kumar)

Uploaded by

Royal Kavi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

VEDIC VIDYASHRAM SENIOR SECONDARY SCHOOL

Madurai Road, Thachanallur, Tirunelveli - 627 358

COMPUTER PROJECT REPORT

HOUSE PRICE PREDICTION

Submitted in partial fulfillment of the requirement

Practicals of Senior Secondary (CBSE)

(2023 - 2024)

Submitted By : M.RAAM KUMAR

Grade : XII B
VEDIC VIDYASHRAM SENIOR SECONDARY SCHOOL

Madurai Road, Thachanallur, Tirunelveli - 627 358

CERTIFICATE

This is to certify that the Project Work entitled “HOUSE PRICE

PREDICTION” is the bonafide record of work done by M. RAAM KUMAR of
Grade : XII, Exam No: in partial fulfillment of the practical
classes of 12th Standard during the Academic Year 2023-24.

He has taken proper care and shown utmost sincerity in completion of this
project as per the guidelines issued by CBSE.

DATE: INTERNAL EXAMINER

PRINCIPAL EXTERNAL EXAMINER

ACKNOWLEDGEMENT

● It is with a sense of gratitude, I acknowledge the efforts of the entire

host of well-wishers who have contributed in their own special ways to

the success and execution of this project.

● First of all, I express my heartfelt gratitude and indebtedness to my

school CORRESPONDENT, Mr.T. DURAISAMY, MCA, from the

bottom of my heart, for his unlimited support, motivation and
infrastructural aid rendered at all times.

● I would like to express my sincere thanks to my PRINCIPAL,

Mr.C P ENOSH, M.A, M.Phil, B.Ed, for all his substantial valuable
guidance and moral support which has helped me to patch this project
with undoubted success.

● I had been immeasurably enriched by working under

the expert supervision my subject teacher,

Mr.S. SHUNMUGA SUNDARAM, M.E.,A.M.I.E, who has the
knack of correcting and directing me in every situation. I convey my
special thanks to him.

● At last, I extend thanks with all my heart to the Teaching and Non-

Teaching staff who have assisted me constructively in my work .

Page | 1
DECLARATION

We hereby declare that the project work entitled HOUSE

PRICE PREDICTION submitted to DEPARTMENT OF

COMPUTER SCIENCE, VEDIC VIDYASHRAM SENIOR

SECONDARY SCHOOL is a result of my own work and my

indebtedness to other work publications, references, if any, have been

duly acknowledged.

DATE: M.RAAM KUMAR

Page | 2
CONTENTS

Pg.
S.NO TITLE
no

01. ACKNOWLEDGEMENT 1

02. DECLARATION 2

03. PROBLEM DEFINITION 4

04. PROJECT STAGES 5

05. OBJECTIVE 6

06. EXISTING AND PROPOSED SYSTEM 7

HARDWARE AND SOFTWARE

07. 9
REQUIREMENT

08. WORKING DESCRIPTION 10

09. CODING 11

10. OUTPUT SCREENS 26

11. CONCLUSION 30

12. BIBLIOGRAPHY 31

Page | 3
PROBLEM DEFINITION

 People looking to buy a new home tend to be more conservative with their
budgets and market strategies. The existing system involves calculation of
house prices without the necessary prediction about future market trends and
price increase. The goal of the paper is to predict the efficient house pricing
for real estate customers with respect to their budgets and priorities.

 By analyzing previous market trends and price ranges, and also

upcoming developments future prices will be predicted. The functioning of
this paper involves a website which accepts customer’s specifications and
then combines the application of multiple linear regression algorithm of data
mining.

 This application will help customers to invest in an estate without

approaching an agent. It also decreases the risk involved in the transaction.

Page | 4
PROJECT STAGES

The project consists of the following stages:

IMPORTING
LIBRARIES AND
DATASET

EXPLORING AND
PREPROCESSING
THE DATASET

MODEL
IMPLEMENTATIO
N

MODEL TESTING

Page | 5
OBJECTIVES

 Create a machine learning model using linear regression

and Boston housing dataset while following the machine learning

workflow.

High-Level Approach:

● Exploring and analyzing the data used for making prediction

● Creating a simple model using linear regression

● Using the model to carryout prediction and evaluating it's

efficiency

Page | 6
EXISTING AND PROPOSED SYSTEM

EXISTING SYSTEM:
 There are several approaches that can be used to determine the price of the
house, one of them is the prediction analysis. The first approach is a
quantitative prediction.

 A quantitative approach is an approach that utilizes time series data [5].

The time-series approach is to look for the relationship between current prices
and prevailing prices. The second approach is to use linear regression based on
hedonic pricing. Previous research conducted by Gharehchopogh using linear
regression approach get 0,929 errors with the actual price. In linear regression,
determining coefficients generally using the least square method, but it takes a
long time to get the best formula.

 Particle swarm optimization (PSO) is proposed to find the coefficients

aimed at obtaining optimal result . Some previous researches such as Marini
and Walzack show that PSO gets better results than other hybrid methods.
There are several advantages of PSO, in the small search space PSO can do
better solution search. Although the PSO global search is less than optimal , but
on the optimization problem the value of the variable on the regression
equation can find a maximum solution using PSO 3. PROPOSED SYSTEM.
The land prices are predicted with a new set of parameters with a different
technique. Also we predicted the compensation for the settlement of the
property.

Page | 7
 Mathematical relationships help us to understand many aspects of everyday
life. When such relationships are expressed with exact numbers, we gain
Additional clarity Regression is concerned with specifying the relationship
between a single numeric dependent variable and one or more numeric
independent variables. House prices increase every year, so there is a need for a
system to predict house prices in the future. House price prediction can help the
developer determine the selling price of a house and can help the customer to
arrange the right time to purchase a house.

PROPOSED SYSTEM:
 Nowadays, e-education and e-learning is highly influenced. Everything is
shifting from manual to automated systems. The objective of this project is to
predict the house prices so as to minimize the problems faced by the customer.
The present method is that the customer approaches a real estate agent to
manage his/her investments and suggest suitable estates for his investments.
But this method is risky as the agent might predict wrong estates and thus
leading to loss of the customer’s investments.

 The manual method which is currently used in the market is out dated and
has high risk. So as to overcome this fault, there is a need for an updated and
automated system. Data mining algorithms can be used to help investors to
invest in an appropriate estate according to their mentioned requirements. Also
the new system will be cost and time efficient. This will have simple
operations. The proposed system works on Linear Regression Algorithm.

Page | 8
REQUIREMENT

libraries:
 Numpy
 Pandas
 Sklearn
 matplotlib.plt
 Seaborn

SOFTWARE :

● PYTHON 3.7

● MYSQL 5.0

HARDWARE :

CPUs Intel Dual Core

RAM 2GB (minimum) - 4GB (recommended)

Disk Storage 500GB

Operating
Windows or Linux
System

Page | 9
WORKING DESCRIPTION

 The Sequence diagram above explains the working of the system. The
proposed system is supposed to be a website with 3 objects namely:
Customer, the Web Interface and the Database Server.

 The database
server also includes
the computational
mechanism described
in the algorithm.
When the customer
first enters into the
website they are
displayed with a GUI
where they can enter
inputs such as the
type of house, the
area in which it is
located etc.

 A data index
searching then
provides with outputs
consisting of matching properties. Now, if the customer wants to check the
house price in future they can enter the date from the future. The system will
identify the date and categorize it in the quarters. The algorithm then will
compute the value of rate and provide the results back to the customer.

Page | 10
CODING
# -*- coding: utf-8 -*-

"""house-price-prediction-top-14-xgboost.ipynb

Automatically generated by colaboratory.

Original file is located at

https://colab.research.google.com/drive/16p1a388cb30t6r0sgf6w0tahiwqewtw-

My main objectives on this project are:

* applying exploratory data analysis and trying to get some insights about our
dataset

* getting data in better shape by transforming and feature engineering to help us

in building better models

* Building and tuning couple models to get some stable results on predicting
housing prices

"""

import os

for dirname, _, filenames in os.walk('/kaggle/input'):

for filename in filenames:

print(os.path.join(dirname, filename))

"""# meeting the data

We’re going to start by loading the data and taking first look on it as usual. for
the column names we have great dictionary file in our dataset location so we can
get familiar with them in no time. I highly recommend looking at that before you
start working on the dataset.

"""

import pandas as pd
Page | 11
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

df_train = pd.read_csv('../input/house-prices-advanced-regression-techniques/
train.csv')

df_test = pd.read_csv('../input/house-prices-advanced-regression-techniques/
test.csv')

df_train.head()

df_test.head()

df_train.shape

df_test.shape

"""As we can see that in train there are 1460 rows with 81 columns and in test
dataset 1459 rows with 80 columns. our dependent variable is **'saleprice'**"""

df_train.describe()

df_test.describe()

df_train.columns , df_test.columns

"""We have 1460 observations of 80 variables in the training dataframe. The

variables are described below:

saleprice - this is the target variable/dependent variable that you're trying to

predict.

* mssubclass: the building class

* mszoning: the general zoning classification

* lotfrontage: linear feet of street connected to property

* lotarea: lot size in square feet

* street: type of road access Page | 12

* alley: type of alley access

* lotshape: general shape of property

* landcontour: flatness of the property

* utilities: type of utilities available

* lotconfig: lot configuration

* landslope: slope of property

* neighborhood: physical locations within ames city limits

* condition1: proximity to main road or railroad

* condition2: proximity to main road or railroad (if a second is present)

* bldgtype: type of dwelling

* housestyle: style of dwelling

* overallqual: overall material and finish quality

* overallcond: overall condition rating

* yearbuilt: original construction date

* yearremodadd: remodel date

* roofstyle: type of roof

* roofmatl: roof material

* exterior1st: exterior covering on house

* exterior2nd: exterior covering on house (if more than one material)

* masvnrtype: masonry veneer type

* masvnrarea: masonry veneer area in square feet

* exterqual: exterior material quality

* extercond: present condition of the material on the exterior

* foundation: type of foundation

* bsmtqual: height of the basement

* bsmtcond: general condition of the basement Page | 13

* bsmtexposure: walkout or garden level basement walls

* bsmtfintype1: quality of basement finished area

* bsmtfinsf1: type 1 finished square feet

* bsmtfintype2: quality of second finished area (if present)

* bsmtfinsf2: type 2 finished square feet

* bsmtunfsf: unfinished square feet of basement area

* totalbsmtsf: total square feet of basement area

* heating: type of heating

* heatingqc: heating quality and condition

* centralair: central air conditioning

* electrical: electrical system

* 1stflrsf: first floor square feet

* 2ndflrsf: second floor square feet

* lowqualfinsf: low quality finished square feet (all floors)

* grlivarea: above grade (ground) living area square feet

* bsmtfullbath: basement full bathrooms

* bsmthalfbath: basement half bathrooms

* fullbath: full bathrooms above grade

* halfbath: half baths above grade

* bedroom: number of bedrooms above basement level

* kitchen: number of kitchens

* kitchenqual: kitchen quality

* totrmsabvgrd: total rooms above grade (does not include bathrooms)

* functional: home functionality rating fireplaces: number of fireplaces

* fireplacequ: fireplace quality

* garagetype: garage location Page | 14

* garageyrblt: year garage was built

* garagefinish: interior finish of the garage

* garagecars: size of garage in car capacity

* garagearea: size of garage in square feet

* garagequal: garage quality

* garagecond: garage condition

* paveddrive: paved driveway

* wooddecksf: wood deck area in square feet

* openporchsf: open porch area in square feet

* enclosedporch: enclosed porch area in square feet

* 3ssnporch: three season porch area in square feet

* screenporch: screen porch area in square feet

* poolarea: pool area in square feet

* poolqc: pool quality

* fence: fence quality

* miscfeature: miscellaneous feature not covered in other categories

* miscval: $value of miscellaneous feature

* mosold: month sold

* yrsold: year sold

* saletype: type of sale

* salecondition: condition of sale"""

#correlation matrix

import matplotlib.pyplot as plt

import seaborn as sns

corrmat = df_train.corr()

f, ax = plt.subplots(figsize=(15, 12)) Page | 15

sns.heatmap(corrmat, vmax=.8, square=true)

#saleprice correlation matrix

k = 10 #number of variables for heatmap

cols = corrmat.nlargest(k, 'saleprice')['saleprice'].index

cm = np.corrcoef(df_train[cols].values.t)

sns.set(font_scale=1.25)

plt.figure(figsize=(10,10))

hm = sns.heatmap(cm, cbar=true, annot=true, square=true, fmt='.2f',

annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)

plt.show()

"""**so what is log transformation:-log transformation is used to transform

skewed data to approximately conform to normality.**"""

'''#before log transformation

sns.distplot(df_train['saleprice']);

fig_saleprice = plt.figure(figsize=(12,5))

result1 = stats.probplot(df_train['saleprice'],plot = plt)'''

'''#applying log transformation

df_train['saleprice'] = np.log(df_train['saleprice'])'''

'''#after log transformation

sns.distplot(df_train['saleprice']);

fig_saleprice2 = plt.figure(figsize=(12,5))

result3 = stats.probplot(df_train['saleprice'],plot = plt)'''

"""below code is used to see top 10 highly correlated columns with saleprice in
which overallqual,grlivearea,garagecars,garagearea,totalbsmtsf and 1stflrsf are
highly correlated"""

Page | 16
#below code is used to see which column is more correlated to dependent
variable so first ten columns are more correlated compare to other columns

corr = df_train.corr()["saleprice"]

corr[np.argsort(corr, axis=0)[::-1]]

"""# **outliers**

We are going to plot first 10 highly correlated columns to see how many outliers
we have in our dataset

"""

fig = plt.subplots()

plt.scatter(x = df_train['grlivarea'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)

plt.xlabel('grlivarea', fontsize=13)

plt.show()

fig1= plt.subplots()

plt.scatter(x = df_train['overallqual'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)

plt.xlabel('overallqual', fontsize=13)

plt.show()

fig2= plt.subplots()

plt.scatter(x = df_train['garagecars'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)

plt.xlabel('garagecars', fontsize=13)

plt.show()

fig3= plt.subplots()

plt.scatter(x = df_train['garagearea'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)
Page | 17
plt.xlabel('garagearea', fontsize=13)

plt.show()

fig4= plt.subplots()

plt.scatter(x = df_train['totalbsmtsf'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)

plt.xlabel('totalbsmtsf', fontsize=13)

plt.show()

fig5= plt.subplots()

plt.scatter(x = df_train['1stflrsf'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)

plt.xlabel('1stflrsf', fontsize=13)

plt.show()

fig6= plt.subplots()

plt.scatter(x = df_train['fullbath'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)

plt.xlabel('fullbath', fontsize=13)

plt.show()

fig7= plt.subplots()

plt.scatter(x = df_train['totrmsabvgrd'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13)

plt.xlabel('totrmsabvgrd', fontsize=13)

plt.show()

fig8= plt.subplots()

plt.scatter(x = df_train['yearbuilt'], y = df_train['saleprice'])

plt.ylabel('saleprice', fontsize=13) Page | 18

plt.xlabel('yearbuilt', fontsize=13)

plt.show()

'''#deleting outliers

df = df.drop(df[(df['grlivarea']>4000) & (df['saleprice']<300000)].index)

df = df.drop(df[(df['garagearea']>1200) & (df['saleprice']<500000)].index)

df = df.drop(df[(df['totalbsmtsf']>3000) & (df['saleprice']<700000)].index)

df = df.drop(df[(df['1stflrsf']>2700) & (df['1stflrsf']<700000)].index)'''

#scatterplot

sns.set()

columns = ['saleprice', 'overallqual', 'grlivarea', 'garagecars', 'totalbsmtsf',

'1stflrsf']

sns.pairplot(df_train[columns], size = 3)

plt.show();

"""# some feature engineering

Here I have merged some columns to just reduce complexity I have tried with all
the columns but I didn't get this much accuracy which I am getting right now

"""

#feature engineering

df_train['totalsf'] = df_train['totalbsmtsf']+df_train['1stflrsf']+df_train['2ndflrsf']

df_train=df_train.drop(columns={'1stflrsf', '2ndflrsf','totalbsmtsf'})

df_train['wholeexterior'] = df_train['exterior1st']+df_train['exterior2nd']

df_train=df_train.drop(columns={'exterior1st','exterior2nd'})

df_train['bsmt'] = df_train['bsmtfinsf1']+ df_train['bsmtfinsf2']

df_train = df_train.drop(columns={'bsmtfinsf1','bsmtfinsf2'})

df_train['totalbathroom'] = df_train['fullbath'] + df_train['halfbath']

df_train = df_train.drop(columns={'fullbath','halfbath'})
Page | 19
df_test['totalsf'] = df_test['totalbsmtsf']+df_test['1stflrsf']+df_test['2ndflrsf']

df_test=df_test.drop(columns={'1stflrsf', '2ndflrsf','totalbsmtsf'})

df_test['wholeexterior'] = df_test['exterior1st']+df_test['exterior2nd']

df_test=df_test.drop(columns={'exterior1st','exterior2nd'})

df_test['bsmt'] = df_test['bsmtfinsf1']+ df_test['bsmtfinsf2']

df_test = df_test.drop(columns={'bsmtfinsf1','bsmtfinsf2'})

df_test['totalbathroom'] = df_test['fullbath'] + df_test['halfbath']

df_test = df_test.drop(columns={'fullbath','halfbath'})

"""**we're going to merge the datasets here before we start editing it so we don't
have to do these operations twice. Let’s call it features since it has features only.
so our data has 2919 observations and 79 features to begin with...**"""

frames = [df_train,df_test]

df = pd.concat(frames,keys=['train','test'])

"""there are 2919 observations with 76 columns. including the target variable
saleprice and id.the train set has 1460 observations while the test set has 1459
observations, the target variable saleprice is absent in test. the aim of this study is
to train a model on the train set and use it to predict the target saleprice of the test
set."""

df_missing=df.isnull().sum().sort_values(ascending=false)

df_missing

"""now we are separating categorical columns and numerical columns for filling
missing values"""

cat_col = df.select_dtypes(include=['object'])

cat_col.isnull().sum()

cat_col.columns

num_col = df.select_dtypes(include=['int64', 'float64']) Page | 20

num_col.isnull().sum()

num_col.columns

"""In below cell you have your numerical columns so I just replace nan by 0. I
have also tried mode, median and mean but I got best result in 0.if you want to do
it then just fork my notebook and apply that functions. If you want that other
function's code then just comment below I will give you the code in comment
section.

# handling missing data

# Numerical columns

"""

# handling missing values of numerical columns

df['lotfrontage'] = df['lotfrontage'].fillna(value=0)

df['garageyrblt'] = df['garageyrblt'].fillna(value=0)

df['masvnrarea'] = df['masvnrarea'].fillna(value=0)

df['bsmtfullbath'] = df['bsmtfullbath'].fillna(value=0)

df['bsmthalfbath'] = df['bsmthalfbath'].fillna(value=0)

df['garagearea'] = df['garagearea'].fillna(value=0)

df['garagecars'] = df['garagecars'].fillna(value=0)

df['bsmtunfsf'] = df['bsmtunfsf'].fillna(value=0)

df['bsmt'] = df['bsmt'].fillna(value=0)

df['totalsf'] = df['totalsf'].fillna(value=0)

"""I have applied same technique as I applied in numerical columns where I put 0
and here i have replaced all the nan values with none. That means if the original
dataset have nan values, it means that the particular house is doesn't have that
thing. For example, if id no = 220 do not have garage then why we put values
that id no = 220 has a garage.
Page | 21
so i replaced them with none.

# Categorical columns

"""

# handling missing values of categorical columns

df['mszoning'] = df['mszoning'].fillna(value='none')

df['garagequal'] = df['garagequal'].fillna(value='none')

df['garagecond'] = df['garagecond'].fillna(value='none')

df['garagefinish'] = df['garagefinish'].fillna(value='none')

df['garagetype'] = df['garagetype'].fillna(value='none')

df['bsmtexposure'] = df['bsmtexposure'].fillna(value='none')

df['bsmtcond'] = df['bsmtcond'].fillna(value='none')

df['bsmtqual'] = df['bsmtqual'].fillna(value='none')

df['bsmtfintype2'] = df['bsmtfintype2'].fillna(value='none')

df['bsmtfintype1'] = df['bsmtfintype1'].fillna(value='none')

df['masvnrtype'] = df['masvnrtype'].fillna(value='none')

df['utilities'] = df['utilities'].fillna(value='none')

df['functional'] = df['functional'].fillna(value='none')

df['electrical'] = df['electrical'].fillna(value='none')

df['kitchenqual'] = df['kitchenqual'].fillna(value='none')

df['saletype'] = df['saletype'].fillna(value='none')

df['wholeexterior'] = df['wholeexterior'].fillna(value='none')

"""top 40 correlated columns after data preprocessing"""

#saleprice correlation matrix

k = 40 #number of variables for heatmap

cols = corrmat.nlargest(k, 'saleprice')['saleprice'].index

cm = np.corrcoef(df_main[cols].values.t) Page | 22
sns.set(font_scale=1.25)

plt.figure(figsize=(10,10))

hm = sns.heatmap(cm, cbar=true, square=true, fmt='.2f', annot_kws={'size': 10},

yticklabels=cols.values, xticklabels=cols.values)

plt.show()

eid = df_main.loc['test']

df_test = df_main.loc['test']

df_train = df_main.loc['train']

eid = eid.id

df_test.drop(['saleprice','id'], axis =1, inplace=true)

x_train = df_train.drop(['saleprice','id'], axis = 1)

y_train = df_train['saleprice']

import xgboost

xgboost = xgboost.xgbregressor(learning_rate=0.05,

colsample_bytree = 0.5,

subsample = 0.8,

n_estimators=1000,

max_depth=5,

gamma=5)

xgboost.fit(x_train, y_train)

y_pred = xgboost.predict(df_test)

y_pred

#making main csv file

main_submission = pd.dataframe({'id': eid, 'saleprice': y_pred})

main_submission.to_csv("submission.csv", index=false)

main_submission.head() Page | 23
OUTPUT SCREEN
***

#saleprice correlation matrix

k = 10 #number of variables for heatmap

cols = corrmat.nlargest(k, 'SalePrice')['SalePrice'].index

cm = np.corrcoef(df_train[cols].values.T)

sns.set(font_scale=1.25)

plt.figure(figsize=(10,10))

hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f',

annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)

plt.show()

OUTPUT:-

SalePrice
OverallQual
GrLiveArea
GarageCars
GarageArea
TotalBsmtSF
1stFlrSF
FullBath
TotRmsAbvGrd Page | 24
YearBuilt
***
#values of correlation

abs(df_train.corr()['SalePrice']).nlargest(10)

***

OUTPUT:-

SalePrice 1.000000
OverallQual 0.790982
GrLivArea 0.708624
GarageCars 0.640409
GarageArea 0.623431
TotalBsmtSF 0.613581
1stFlrSF 0.605852
FullBath 0.560664
TotRmsAbvGrd 0.533723
YearBuilt 0.522897
Name: SalePrice, dtype: float64
***
#sum of missing data

df.isnull().sum().sort_values(ascending=False)

***
OUTPUT:-

SalePrice :1459
MSZoning: 4
LotFrontage: 486
Alley: 2721
Utilities: 2 Page | 25
Exterior1st: 1
Exterior2nd: 1
MasVnrType: 24
MasVnrArea: 23
BsmtQual: 81
BsmtCond: 82
BsmtExposure: 82
BsmtFinType1: 79
BsmtFinSF1: 79
BsmtFinType2: 80
BsmtFinSF2: 1
BsmtUnfSF: 1
TotalBsmtSF: 1
Electrical: 1
BsmtFullBath: 2
BsmtHalfBath: 2
KitchenQual: 1
Functional: 2
FireplaceQu: 1420
GarageType: 157
GarageYrBlt: 159
GarageFinish: 159
GarageCars: 1
GarageArea: 1
GarageQual: 159
GarageCond: 159
PoolQC: 2909
Fence: 2348
MiscFeature:2814
MoSold: Month Sold
YrSold: Year Sold
SaleType: 1
Length: 36, dtype: int64
#encoded
df_main = pd.get_dummies(df)

df_main.shape

Page | 26
***

OUTPUT:-

(2919, 339)

***
#rmse

y_test = y_train.drop([10], axis=0)

from math import sqrt

print('xgb rmse', sqrt(mean_squared_error(y_test, y_pred1)))

print('gbr rmse', sqrt(mean_squared_error(y_test, y_pred2)))

print('rf rmse', sqrt(mean_squared_error(y_test, y_pred3)))

print('lightgbm rmse', sqrt(mean_squared_error(y_test, y_pred4)))

print('svr rmse:', sqrt(mean_squared_error(y_test, y_pred5)))

print('stacked rmse:', sqrt(mean_squared_error(y_test, y_pred6)))

***
OUTPUT:-
Page | 27
xgb rmse: 0.1223501568206363
gbr rmse :0.5585375883105338
rf rmse : 0.43600854434323927
lightgbm rmse : 0.5596622356678556
SVR rmse: 0.5246953605047906
stacked rmse: 0.5026308085477498

Page | 28
CONCLUSION

 In today’s real estate world, it has become tough to store such huge data
and extract them for one’s own requirement. Also, the extracted data should
be useful. The system makes optimal use of the Linear Regression Algorithm.
The system makes use of such data in the most efficient way. The linear
regression algorithm helps to fulfill customers by increasing the accuracy of
estate choice and reducing the risk of investing in an estate.

 A lot’s of features that could be added to make the system more widely
acceptable. One of the major future scopes is adding estate database of more
cities which will provide the user to explore more estates and reach an
accurate decision. More factors like recession that affect the house prices shall
be added. In-depth details of every property will be added to provide ample
details of a desired estate. This will help the system to run on a larger level.

Page | 29
REFERENCES

 WIKIPEDIA

 HTTPS://WWW.CRIO.DO/

 HTTPS://WWW.GEEKSFORGEEKS.ORG/

 HTTPS://WWW.KAGGLE.COM/

 HTTPS://WWW.GITHUB.COM/

*********

Page | 30

KIIT Deemed To Be University: A Project Report
No ratings yet
KIIT Deemed To Be University: A Project Report
33 pages
CDA - by Laws
100% (4)
CDA - by Laws
15 pages
Case Study Sector 17
No ratings yet
Case Study Sector 17
20 pages
Dsbda Mini Manav
No ratings yet
Dsbda Mini Manav
17 pages
Final House Prediction
50% (2)
Final House Prediction
83 pages
House Price Prediction Report
100% (1)
House Price Prediction Report
26 pages
Malaysia 2014: JUBM and Langdon Seah Construction Cost Handbook
100% (1)
Malaysia 2014: JUBM and Langdon Seah Construction Cost Handbook
9 pages
Ip Project Kavi Priyan
No ratings yet
Ip Project Kavi Priyan
32 pages
House Prices Prediction
100% (1)
House Prices Prediction
51 pages
BDA_REPORT
No ratings yet
BDA_REPORT
27 pages
A Synopsys Report
No ratings yet
A Synopsys Report
16 pages
BT3083 - RPT - Amit Kumar
No ratings yet
BT3083 - RPT - Amit Kumar
16 pages
FRONT PAge (Aastha Mahajan)
No ratings yet
FRONT PAge (Aastha Mahajan)
4 pages
Mini Project Report Format
No ratings yet
Mini Project Report Format
22 pages
House Price Prediction
No ratings yet
House Price Prediction
12 pages
Krishna Sorthiya_House Price prediction using ML
No ratings yet
Krishna Sorthiya_House Price prediction using ML
41 pages
R D National College Mumbai University: On "House Price Prediction System"
No ratings yet
R D National College Mumbai University: On "House Price Prediction System"
14 pages
Minor Project Report - 1
No ratings yet
Minor Project Report - 1
5 pages
Final Report
No ratings yet
Final Report
92 pages
House Rent Prediction Final
No ratings yet
House Rent Prediction Final
30 pages
reprot final_pdf
No ratings yet
reprot final_pdf
57 pages
Synopsis Format1.PDF
No ratings yet
Synopsis Format1.PDF
6 pages
Report On Java Chatting
No ratings yet
Report On Java Chatting
10 pages
HOUSE-PRICE-PREDICTION-Shreya-Majumder
No ratings yet
HOUSE-PRICE-PREDICTION-Shreya-Majumder
22 pages
Vasanth Sample 2
No ratings yet
Vasanth Sample 2
30 pages
AIreport
No ratings yet
AIreport
17 pages
Housing Price Prediction
No ratings yet
Housing Price Prediction
7 pages
Major Project Report PUCSE - 244 PDF
No ratings yet
Major Project Report PUCSE - 244 PDF
45 pages
Dsbda Mini Priyanshu
No ratings yet
Dsbda Mini Priyanshu
17 pages
Project Synopsis Shaiba
No ratings yet
Project Synopsis Shaiba
5 pages
7th Sem Report File
No ratings yet
7th Sem Report File
41 pages
Research Report
No ratings yet
Research Report
36 pages
Review paper of house rate prediction
No ratings yet
Review paper of house rate prediction
7 pages
Yug Removed
No ratings yet
Yug Removed
29 pages
House Pricing Prediction System
No ratings yet
House Pricing Prediction System
36 pages
Fyp Proposal
No ratings yet
Fyp Proposal
3 pages
Project - Synopsis - Format (1) (1) (1) Copy 2
No ratings yet
Project - Synopsis - Format (1) (1) (1) Copy 2
33 pages
Synopsis 427
No ratings yet
Synopsis 427
5 pages
house price prediction
No ratings yet
house price prediction
55 pages
18BCS115
No ratings yet
18BCS115
25 pages
1822 B.E Ece Batchno 120
No ratings yet
1822 B.E Ece Batchno 120
29 pages
MBB JETIR2204579
No ratings yet
MBB JETIR2204579
5 pages
Price Prediction
No ratings yet
Price Prediction
16 pages
Final Page Setup 29 Gatya Done All
No ratings yet
Final Page Setup 29 Gatya Done All
45 pages
IJCRT2111135
No ratings yet
IJCRT2111135
7 pages
PY016
No ratings yet
PY016
8 pages
Main
No ratings yet
Main
35 pages
Aastha
No ratings yet
Aastha
21 pages
Data Science Assignment Chapter 1
No ratings yet
Data Science Assignment Chapter 1
5 pages
House
No ratings yet
House
58 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
MAJOR Synopsis
No ratings yet
MAJOR Synopsis
5 pages
Main
No ratings yet
Main
35 pages
House Price Prediction 3 47
No ratings yet
House Price Prediction 3 47
45 pages
S2 PDF
No ratings yet
S2 PDF
21 pages
Utkarsh Gupta G (73) (House Price Prediction)
No ratings yet
Utkarsh Gupta G (73) (House Price Prediction)
6 pages
Real Estate Price Prediction
No ratings yet
Real Estate Price Prediction
7 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
synopsis of predicting house prices using decison tree
No ratings yet
synopsis of predicting house prices using decison tree
14 pages
House File
No ratings yet
House File
30 pages
Comprehensive Project
No ratings yet
Comprehensive Project
10 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
025 - Commencement of First Year 2023-24 Law
No ratings yet
025 - Commencement of First Year 2023-24 Law
3 pages
Jan 2024 SMC Meeting
No ratings yet
Jan 2024 SMC Meeting
5 pages
Law Brochure
No ratings yet
Law Brochure
4 pages
Economic Food Supply Channel in India Rough
No ratings yet
Economic Food Supply Channel in India Rough
31 pages
Chainsaw Playtest Scenario TheCulling v1.11
No ratings yet
Chainsaw Playtest Scenario TheCulling v1.11
20 pages
Cotm Working
No ratings yet
Cotm Working
124 pages
Eucalyptus: Libya, Tripoli Libya
No ratings yet
Eucalyptus: Libya, Tripoli Libya
29 pages
DCR For Navi Mumbai 2003
No ratings yet
DCR For Navi Mumbai 2003
107 pages
Beverly Heights Hot2000 Results
No ratings yet
Beverly Heights Hot2000 Results
16 pages
Single Line Plan
100% (1)
Single Line Plan
11 pages
BCIS Standard Form of Cost Analysis (Excel) - December 2014
100% (1)
BCIS Standard Form of Cost Analysis (Excel) - December 2014
19 pages
Ize04001a HKGSG v1 02manual
100% (1)
Ize04001a HKGSG v1 02manual
46 pages
L&T Weekly Safety Inspection Report (35) CN - 4400012370 UTHMANIYAH 8th - Oct - 2020
No ratings yet
L&T Weekly Safety Inspection Report (35) CN - 4400012370 UTHMANIYAH 8th - Oct - 2020
17 pages
Method Statment (Structural Works)
100% (2)
Method Statment (Structural Works)
9 pages
Instructions Everest Designer en
No ratings yet
Instructions Everest Designer en
3 pages
Select Citywalk LIVE STUDY PDF
No ratings yet
Select Citywalk LIVE STUDY PDF
18 pages
Barlow-Just in Time Implementation Within The Hotel Industry
100% (1)
Barlow-Just in Time Implementation Within The Hotel Industry
13 pages
RFA-TECH Waterproofing Brochure
No ratings yet
RFA-TECH Waterproofing Brochure
48 pages
10-26-15 Atlantic Yards/Pacific Park Brooklyn Construction Alert
No ratings yet
10-26-15 Atlantic Yards/Pacific Park Brooklyn Construction Alert
5 pages
Basement Smoke Ventilation
No ratings yet
Basement Smoke Ventilation
5 pages
Setting Up Shop - The Practical Guide To
No ratings yet
Setting Up Shop - The Practical Guide To
242 pages
Rangs Project Status
No ratings yet
Rangs Project Status
255 pages
Basic Architecture Vocabulary List
No ratings yet
Basic Architecture Vocabulary List
2 pages
Knauf Insulation v. Johns Manville - Complaint
No ratings yet
Knauf Insulation v. Johns Manville - Complaint
117 pages
Full Height Basement Insulation - Best Practice Guide
No ratings yet
Full Height Basement Insulation - Best Practice Guide
74 pages
[FREE PDF sample] (eBook PDF) The Science of Psychology: An Appreciative View 5th Edition ebooks
100% (2)
[FREE PDF sample] (eBook PDF) The Science of Psychology: An Appreciative View 5th Edition ebooks
41 pages
DBR Part 1 PDF
No ratings yet
DBR Part 1 PDF
4 pages
Bihar Byelaws
No ratings yet
Bihar Byelaws
130 pages
LPG Storage
No ratings yet
LPG Storage
2 pages
Is 1642
100% (1)
Is 1642
18 pages
A/E Review Checklist: Architectural
No ratings yet
A/E Review Checklist: Architectural
14 pages