0% found this document useful (0 votes)

39 views28 pages

Report Capstone Project House Price Prediction

The document discusses the importance of exploratory data analysis (EDA) in predicting house prices, emphasizing that various features beyond location and square footage contribute to a home's value. It outlines the objective to create an effective price prediction model using multiple variables and methodologies, including regression analysis. The document also highlights the significance of understanding data patterns and relationships to improve predictive accuracy in the real estate market.

Uploaded by

suryavanshirohit45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views28 pages

Report Capstone Project House Price Prediction

Uploaded by

suryavanshirohit45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

lOMoARcPSD|46745212

House Price
Prediction

1
lOMoARcPSD|46745212

Problem statement:
A house value is simply more than location and square footage. Like the features that make up a person, an
educated party would want to know all aspects that give a house its value. For example, you want to sell a house
and you don’t know the price which you may expect — it can’t be too low or too high. To find house price you
usually try to find similar properties in your neighborhood and based on gathered data you will try to assess your
house price.

2
lOMoARcPSD|46745212

Objective:
Take advantage of all of the feature variables available below, use it to analyse and predict house prices.
1. cid: a notation for a house
2. dayhours: Date house was sold
3. price: Price is prediction target
4. room_bed: Number of Bedrooms/House
5. room_bath: Number of bathrooms/bedrooms
6. living_measure: square footage of the home
7. lot_measure: quare footage of the lot
8. ceil: Total floors (levels) in house
9. coast: House which has a view to a waterfront
10. sight: Has been viewed
11. condition: How good the condition is (Overall)
12. quality: grade given to the housing unit, based on grading system
13. ceil_measure: square footage of house apart from basement
14. basement_measure: square footage of the basement
15. yr_built: Built Year
16. yr_renovated: Year when house was renovated
17. zipcode: zip
18. lat: Latitude coordinate
19. long: Longitude coordinate
20. living_measure15: Living room area in 2015(implies-- some renovations) This might or might not have
affected the lotsize area
21. lot_measure15: lotSize area in 2015(implies-- some renovations)
22. furnished: Based on the quality of room
23. total_area: Measure of both living and lot

3
lOMoARcPSD|46745212

TABLE OF CONTENTS

Title Page Nos.

List of Graphs 4

Executive Summary 6-7

Chapter 1: Introduction and background 8-12

Chapter 2: Research Methodology 13-20

Chapter 3: Data analysis and interpretation 21-27

Chapter 4: Findings, Recommendations and Conclusion 28-31

List of Graphs
Graph No. Graph Title Page No.
2.2.4 Bar graph for Univariate 17
2.2.4 Scatter plot for Bivariate 18
2.2.4 Heat map for Multi-variate 18
2.2.2.1 Histogram plot 19
2.2.2.2 Box plot 19
2.2.2.3 Correlation between variables 20
3.1 Scatter plot for Linear regression model 21
3.1 Distplot for Linear regression model 22
3.2 Scatter plot for Ridge regression model 23
3.2.1 Distplot for Ridge regression model 23
3.3 Scatter plot for Lasso regression 24
3.4 Scatter plot for Support Victor Regression 25
3.4 Distplot for Support Victor Regression 25
3.5 Scatter plot for Random forest regressor 26
3.5 Distplot for Random forest regressor 27

4
lOMoARcPSD|46745212

List of Tables
Table No. Table Title Page No.
1 Model Evaluation Comparison between all models 27

5
lOMoARcPSD|46745212

CHAPTER 1

INTRODUCTION AND BACKGROUND

1.1 EXECUTIVE SUMMARY

EDA is an important step in any Data Analysis or Data Science project. EDA involves generating
summary statistics for numerical data in the dataset and creating various graphical representations
to understand the data better. The goal of EDA is to identify patterns, anomalies, and
relationships in the data that can be used to inform subsequent steps in the data science process,
such as building models or identifying insights. EDA is to help look at data before making any
assumptions. It can help identify obvious errors, as well as better understand patterns within the
data, detect outliers or anomalous events, find interesting relations among the variables. It also
helps answer the questions about standard deviations, categorical variables, and confidence
intervals. Finally, once EDA is complete and insights are drawn, its features can then be used for
more sophisticated data analysis or modelling, including machine learning.

Data scientists can use exploratory analysis to ensure the results they produce are valid and
applicable to any desired business outcomes and goals. EDA also helps stakeholders by
confirming they are asking the right questions. EDA can help answer questions about standard
deviations, categorical variables, and confidence intervals. Once EDA is complete and insights
are drawn, its features can then be used for more sophisticated data analysis or modelling,
including machine learning.

6
lOMoARcPSD|46745212

anomalies. Univariate visualization of each field in the raw dataset, with summary statistics. Bi-
variate visualizations and summary statistics that allow you to assess the relationship between
each variable in the dataset and the target variable you‘re looking at. Predictive models, such as
linear regression, use statistics and data to predict outcomes.

Plotting the graphs with different attributes of the dataset and analyzing the given dataset. Then
Use the algorithms of regression to understand which is better fit for the data set in house price
prediction using model matrix i.e., Mean Squared error, Mean absolute error , Root Mean squared
error, R-Squared. Analyze these model matrix for all algorithms in the form of table then identify
the best fit.

Some of the most common data science tools used to create an EDA include python, Jupyter. The
common packages used are pandas, numpy, matplotlib, seaborn, etc.

One important benefit of conducting exploratory data analysis is that it can help you organize a
dataset before you model it. This can help you start to make assumptions and predictions about
your dataset. Another benefit of EDA is that it can help you understand the variables in your
dataset. This can help you organize your dataset and begin to pinpoint relationships between
variables, which is an integral part of data analysis.
Conducting EDA can also help you identify the relationships between the variables in your
dataset. Identifying the relationships between variables is a critical part of drawing conclusions
from a dataset.
Another important benefit of EDA is helping you choose the right model for your dataset. You
can use all of the information that you gain from conducting an EDA to help you choose a data
model. It's important to choose the right data model because it can make it easier for everyone in
your organization to understand your data. Some commonly used data models that you can
choose from include:
You can also use EDA to help you find patterns in a dataset. Finding patterns in a dataset is
important because it can help you make predictions and estimations. This can help your
organization plan for the future and anticipate problems and solutions.

7
lOMoARcPSD|46745212

1.2 Introduction and Background

If you come across any random home buyer questioning them about their dream house, then there
are high chances that their descriptions would not start off describing the various aspects of house
like the height of basement ceiling or the nearness to a commercial building. Thousands of people
seek to place their home on market with the motto of coming up with a reasonable price.
Generally, assessors apply their experience and common knowledge to gauge a home based on its
various characteristics like its location, commodities and its dimensions. But, regression analysis
comes up with another approach which provides much better home prices with reliable
predictions. Better still, assessor experience can help guide the modeling process to fine tune a
final predictive model. So, this model will help for both the home buyers and home sellers. There
is ongoing competition hosted by Kaggle.com from where I am gathering the required data set
[1]. The dataset of the competition furnishes good amount of info which helps in price
negotiations than the other features of home. This dataset also supports advanced machine
learning techniques like random forests and gradient boosting.

The real estate sector is an important industry with many stakeholders ranging from regulatory
bodies to private companies and investors. Among these stakeholders, there is a high demand for
a better understanding of the industry operational mechanism and driving factors. Today there is
a large amount of data available on relevant statistics as well as on additional contextual factors,
and it is natural to try to make use of these in order to improve our understanding of the industry.

Let‘s suppose we want to make a data science project on the house price prediction of a
company. But before we make a model on this data we have to analyze all the information
which is present across the dataset like as what is the price of the house, what is the price they
are getting, what is the area of the house, and the living measures. These all steps of analyzing
and modifying the data come under EDA.

Exploratory Data Analysis (EDA) is an approach that is used to analyze the data and discover
trends, patterns, or check assumptions in data with the help of statistical summaries and
graphical representations.

The main goal of the project is to find out the accurate predictions of the houses/ properties for
the next upcoming years. Here are the step by step process involved

8
lOMoARcPSD|46745212

1. Requirement Gathering – We have to gather the information extract the main information from
it.
2. Normalizing the data
3. Detecting Outliners in the data
4. Analysis and visualisation using the data

Types of EDA

Depending on the number of columns we are analyzing we can divide EDA into two types.
1. Univariate Analysis – In univariate analysis, we analyze or deal with only one variable at a
time. The analysis of univariate data is thus the simplest form of analysis since the
information deals with only one quantity that changes. It does not deal with causes or
relationships and the main purpose of the analysis is to describe the data and find patterns
that exist within it.
2. Bi-Variate analysis – This type of data involves two different variables. The analysis of
this type of data deals with causes and relationships and the analysis is done to find out the
relationship between the two variables.
3. Multivariate Analysis – When the data involves three or more variables, it is categorized
under multivariate.
Depending on the type of analysis we can also subcategorize EDA into two parts.

1. Non-graphical Analysis – In non-graphical analysis, we analyze data using statistical tools

like mean median or mode or skewness
2. Graphical Analysis – In graphical analysis, we use visualizations charts to visualize trends
and patterns in the data

Data Encoding

There are some models like Linear Regression which does not work with categorical dataset in
that case we should try to encode categorical dataset into the numerical column. we can use
different methods for encoding like Label encoding or One-hot encoding. pandas and sklearn

9
lOMoARcPSD|46745212

provide different functions for encoding in our case we will use the Label Encoding function
from sklearn to encode.

In this article, we will understand EDA with the help of an example dataset. We will
use python language for this purpose. In this dataset, we used Pandas, Numpy, matplotlib,
seaborn, and open datasets libraries. Then loading the dataset into a data frame and reading the
dataset using pandas, view the columns and rows of the data, perform descriptive statistics to
know better about the features inside the dataset, write the observations, finding the missing
values and duplicate rows. Discovering the anomalies in the given set and remove those
anomalies. Univariate visualization of each field in the raw dataset, with summary statistics.
Bivariate visualizations and summary statistics that allow you to assess the relationship between
each variable in the dataset and the target variable you‘re looking at. Predictive models, such as
linear regression, use statistics and data to predict outcomes.

1.3 Problem Statement

A house value is simply more than location and square footage. Like the features that make up a
person, an educated party would want to know all aspects that give a house its value. For
example, you want to sell a house and you don‘t know the price which you may expect — it can‘t
be too low or too high. To find house price you usually try to find similar properties in your
neighbourhood and based on gathered data you will try to assess your house price.

10
lOMoARcPSD|46745212

1.4 Objective of the study:

 Create an effective price prediction model
 Validate the model‘s prediction accuracy
 Identify the important home price attributes which feed the model‘s predictive power
Take advantage of all of the feature variables available below, use it to analyse and predict house
prices.
1. cid: a notation for a house
2. day hours: Date house was sold
3. price: Price is prediction target
4. room_bed: Number of Bedrooms/House
5. room_bath: Number of bathrooms/bedrooms
6. living_measure: square footage of the home
7. lot_measure: quare footage of the lot
8. ceil: Total floors (levels) in house
9. coast: House which has a view to a waterfront
10. sight: Has been viewed
11. condition: How good the condition is (Overall)
12. quality: grade given to the housing unit, based on grading system
13. ceil_measure: square footage of house apart from basement
14. basement_measure: square footage of the basement
15. yr_built: Built Year
16. yr_renovated: Year when house was renovated
17. zip code: zip
18. lat: Latitude coordinate
19. long: Longitude coordinate
20. living_measure15: Living room area in 2015(implies-- some renovations) This might
or might not have affected the lot size area
21. lot_measure15: lot Size area in 2015(implies-- some renovations)
22. furnished: Based on the quality of room
23. total_area: Measure of both living and lot

1.5 Literature Survey

11
lOMoARcPSD|46745212

The real estate market is one of the most competitive in terms of pricing and same tends to be
vary significantly based on lots of factor, forecasting property price is an important modules in
decision making for both the buyers and investors in supporting budget allocation, finding
property finding stratagems and determining suitable policies hence it becomes one of the prime
fields to apply the concepts of machine learning to optimize and predict the prices with high
accuracy. The literature review give the clear idea and it will serve as the support for the future
projects. most of the authors have concluded that artificial neural network have more influence in
predicting the but in real world there are other algorithms which should have taken into the
consideration. Investors decisions are based on the market trends to reap maximum returns.
Developers are interested to know the future trends for their decision making, this helps to know
about the pros and cons and also help to build the project. To accurately estimate property prices
and future trends, large amount of data that influences land price is required for analysis,
modeling and forecasting. The factors that affect the land price have to be studied and their
impact on price has also to be modeled. It is inferred that establishing a simple Regression linear
mathematical relationship for these time-series data is found not viable for prediction. Hence it
became imperative to establish a non-linear model which can well fit the data characteristic to
analyze and predict future trends. As the real estate is fast developing sector, the analysis and
prediction of land prices using mathematical modeling and other techniques is an immediate
urgent need for decision making by all those concerned.

12
lOMoARcPSD|46745212

2.1.1 Implementation

The mean of the dataset:

13
lOMoARcPSD|46745212

The median of the dataset:

The standard deviation of the dataset:

2.1.2 Handling Missing data :

The important part and problem of data preprocessing is handling missing values in the dataset.
Data scientists must manage missing values because it can adversely affect the operation of
machine learning models. Data can be imputed in such a procedure, missing values can be filled
based on the other observations.

Techniques involved in imputing unknown or missing observations include:

14
lOMoARcPSD|46745212

1. Deleting the whole rows or columns with unknown or missing observations.

2. Missing values can be inferred by averaging techniques like mean, median, mode.

3. Imputing missing observations with the most frequent values.

4. Imputing missing observations by exploring correlations.

5. Imputing missing observations by exploring similarities between cases.

Missing values are usually represented with ‗nan‘, ‘NA‘ or ‗null‘(Refer image 5). Below is the
list of variables with missing variables in the train dataset

2.1.3 Uni-Variate, Bi-Variate, Multi-Variate:

Uni-Variate: Uni-Variate in House Price Prediction , chosen attribute like price because by price
is independent each other.

Bi-Variate: Bi-variate in House Price Prediction, chosen attributes like price, living_measure
because by living_measure price is calculated so these two variables are dependent to each other.

15
lOMoARcPSD|46745212

Multi-Variate: Multi-variate in House Price Prediction, chosen attributes like price,

living_measure, ceil_measure, basement because ceil_measure ,basement will calculates
living_measure and by living_measure price is calculated so these four variables are dependent
to each other.

16
lOMoARcPSD|46745212

2.2.2.1 Plots: Histogram plot

2.2.2.2 Plots: Box plot

17
lOMoARcPSD|46745212

2.2.2.3 The correlation between variables:

18
lOMoARcPSD|46745212

DATA ANALYSIS AND INTERPRETATION

3.1 Linear regression model

Linear regression model shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.

Scatter Plot for Linear Regression Model

19
lOMoARcPSD|46745212

Distplot for Linear Regression Model

3.1.1 Model evaluation against Linear Regression

Mean Absolute Error (MAE) : It is the simplest error metric used in regression problems. It is
basically the sum of average of the absolute difference between the predicted and actual values.

Mean Square Error (MSE) : MSE is like the MAE, but the only difference is that the it squares
the difference of actual and predicted output values before summing them all instead of using the
absolute value.

Root Mean Squared Error (RSME) : RSME provides information about the short-term
performance of a model by allowing a term-by-term comparison of the actual difference between
the estimated and the measured value.

R Squared (R2): R Squared metric is generally used for explanatory purpose and provides an
indication of the goodness or fit of a set of predicted output values to the actual output values.

20
lOMoARcPSD|46745212

3.2 Ridge regression model

Ridge regression is a technique used to analyze multi-linear regression (multi-collinear), also
known as L2 regularization.

21
lOMoARcPSD|46745212

3.2.1 Model evaluation against Ridge Regression

3.3 Lasso Regression

Lasso. It stands for – Least Absolute Shrinkage and Selection Operator is a technique where data
points are shrunk towards a central point, like the mean. Lasso is also known as L1
regularization.

3.3.1 Model evaluation against Lasso Regression

22
lOMoARcPSD|46745212

3.4 Support Victor Regression (SVR)

Support Vector Regression (SVR) is a type of machine learning algorithm used for regression
analysis. The goal of SVR is to find a function that approximates the relationship between the
input variables and a continuous target variable, while minimizing the prediction error.

3.4.1 Model evaluation against SVR

23
lOMoARcPSD|46745212

3.5 Random forest regression

Random Forest is an ensemble technique capable of performing both regression and

classification tasks with the use of multiple decision trees and a technique called Bootstrap and
Aggregation, commonly known as bagging. The basic idea behind this is to combine multiple
decision trees in determining the final output rather than relying on individual decision trees.
Random Forest has multiple decision trees as base learning models. Randomly perform row
sampling and feature sampling from the dataset forming sample datasets for every model.

24
lOMoARcPSD|46745212

3.5.1 Model evaluation against random forest regression

Model Evaluation Comparison between all Models

SL. Algorithms Mean Absolute Mean Root Mean R

No Error(MAE) Squared Squared Squared
Error(MSE) Error(RMSE)

1 Linear Regression 0.53 0.52 0.72 0.47

2 Ridge Regression 0.60 0.63 0.79 0.35
3 Lasso Regression 0.73 0.98 0.99 -6.06
4 Epsilon-Support 0.50 0.50 0.70 0.49
Vector Regression
5 Random Forest 0.53 0.55 0.74 0.43
Regression

25
lOMoARcPSD|46745212

CHAPTER 4

FINDINGS, RECOMMENDATIONS AND CONCLUSION

4.1 Findings Based on Observations

 The experiment is done to pre-process the data and evaluate the prediction accuracy of the
models. The experiment has multiple stages that are required to get the prediction results.
These stages can be defined as:
 Pre-processing: Datasets will be checked and pre-processed using the methods. These
methods have various ways of handling data. Thus, the preprocessing is done on multiple
iterations where each time the accuracy will be evaluated with the used combination.
 Data splitting: dividing the dataset into two parts is essential to train the model with one
and use the other in the evaluation. The dataset will be split 80% for training and 20% for
testing.
 Evaluation: the accuracy of dataset will be evaluated by measuring the R2 and RMSE rate
when training the model alongside an evaluation of the actual prices on the test dataset
with the prices that are being predicted by the model.
 Performance: alongside the evaluation metrics, the required time to train the model will
be measured to show the algorithm vary in terms of time.
 Correlation: correlation between the available features and house price will be evaluated
using the Pearson Coefficient Correlation to identify whether the features have a negative,
positive or zero correlation with the house price.

4.2 Findings Based on analysis of Data

 Pre-processing methods played a significant role to provide the final prediction accuracy,
as shown in the experiment sequence in both public and local data.
 outlier, as suggested by gave a worse outcome than Isolation Forest where it has
improved the prediction accuracy.

26
lOMoARcPSD|46745212

 The performance of trained models has been measured by evaluating the RMSE, R2
metrics, MAE, MSE .
 The accuracy has been evaluated by plotting the actual prices on the predicted values, as
shown below
4.3 Recommendation based on findings

4.3 Experiment Results

 Many machine learning algorithms are used to predict. However, previous
researches have shown a comparison between all algorithms.
 Therefore, using these algorithms is beneficial so that the result can be as near to
the claimed results.
 However, the prediction accuracy of these algorithms depends heavily on the
given data when training the model.
 If the data is in bad shape, the model will be over fitted and inefficient, which
means that data pre-processing is an important part of this experiment and will
affect the final results.
 Thus, multiple combinations of pre-processing methods need to be tested before
getting the data ready to be used in train

4.4 Scope for future research

Future work on this study could be divided into seven main areas to improve the result even
further. Which can be done by:
 The used pre-processing methods do help in the prediction accuracy. However,
experimenting with different combinations of pre-processing methods to achieve better
prediction accuracy.
 Make use of the available features and if they could be combined as binning features has
shown that the data got improved.
 Training the datasets with different regression methods such as Elastic net regression that
combines both L1 and L2 norms. In order to expand the comparison and check the
performance.

27
lOMoARcPSD|46745212

 The correlation has shown the association in the local data. Thus, attempting to enhance
the local data is required to make rich with features that vary and can provide a strong
correlation relationship.
 The factors that have been studied in this study has a weak correlation with the sale price.
Hence, by adding more factors to the local dataset that affect the house price, such as
GDP, average income, and the population. In order to increase the number of factors that
have an impact on house prices. This could also lead to a better finding for question 1 and
2.

Exploratory Data Analysis Unit 2
No ratings yet
Exploratory Data Analysis Unit 2
39 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Stata Index: Release 18
No ratings yet
Stata Index: Release 18
307 pages
InterimReport House Price Prediction
No ratings yet
InterimReport House Price Prediction
37 pages
Final Report Capstone Project House Price Prediction
No ratings yet
Final Report Capstone Project House Price Prediction
34 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Final Report Capstone Project House Price Prediction
No ratings yet
Final Report Capstone Project House Price Prediction
35 pages
EDA Lecture Notes
No ratings yet
EDA Lecture Notes
205 pages
Prediction
100% (1)
Prediction
10 pages
Anushi Project-House Price Prediction
100% (2)
Anushi Project-House Price Prediction
26 pages
Final Report Capstone Project House Price Prediction
No ratings yet
Final Report Capstone Project House Price Prediction
35 pages
Predictive Modeling
100% (1)
Predictive Modeling
22 pages
Komorowski EDA2016
No ratings yet
Komorowski EDA2016
20 pages
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
No ratings yet
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
17 pages
Ai - Foundations of Machine Learning III
No ratings yet
Ai - Foundations of Machine Learning III
98 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Unit 3
No ratings yet
Unit 3
222 pages
Capstone Project Submission
100% (2)
Capstone Project Submission
31 pages
Final DA LAB1 Merged
No ratings yet
Final DA LAB1 Merged
48 pages
Python Exploratory Data Analysis
No ratings yet
Python Exploratory Data Analysis
24 pages
Ese Lab File
No ratings yet
Ese Lab File
30 pages
Module2 Ids 240201 162026
No ratings yet
Module2 Ids 240201 162026
11 pages
Fundamentals of Data Source and Preparation For ML v31
No ratings yet
Fundamentals of Data Source and Preparation For ML v31
45 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Unit 1
No ratings yet
Unit 1
52 pages
Making Predictions
No ratings yet
Making Predictions
13 pages
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
No ratings yet
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
14 pages
House Price Prediction Project
No ratings yet
House Price Prediction Project
55 pages
Comparing Tools Provided by Python and R For Exploratory Data Analysis
No ratings yet
Comparing Tools Provided by Python and R For Exploratory Data Analysis
12 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
Unit I Exploratory Data Analysis
No ratings yet
Unit I Exploratory Data Analysis
38 pages
Unit 2
No ratings yet
Unit 2
58 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Unit 1
No ratings yet
Unit 1
50 pages
Document
No ratings yet
Document
21 pages
Datascience Unit-4
No ratings yet
Datascience Unit-4
6 pages
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
No ratings yet
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
15 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
12 pages
Unit 1
No ratings yet
Unit 1
23 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Unit 3
No ratings yet
Unit 3
47 pages
Machine
No ratings yet
Machine
10 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
6.1EDA Inferential
No ratings yet
6.1EDA Inferential
3 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
DL EDA Process
No ratings yet
DL EDA Process
2 pages
Business Data Analysis and Interpretation Notes Lecture Notes Lectures 1 13
No ratings yet
Business Data Analysis and Interpretation Notes Lecture Notes Lectures 1 13
20 pages
Cumulative Standard Normal Distribution Table: Department of Mathematics, Sinclair Community College, Dayton, OH
100% (1)
Cumulative Standard Normal Distribution Table: Department of Mathematics, Sinclair Community College, Dayton, OH
2 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Linear Regression: An Approach For Forecasting
No ratings yet
Linear Regression: An Approach For Forecasting
12 pages
Machine Learning OBE Question Paper 2020
0% (1)
Machine Learning OBE Question Paper 2020
3 pages
MATH 121 (Chapter 10) - Correlation & Regression
No ratings yet
MATH 121 (Chapter 10) - Correlation & Regression
30 pages
The Econometrics of Financial Market PDF
0% (1)
The Econometrics of Financial Market PDF
8 pages
Chapter 102 Biostatistics
No ratings yet
Chapter 102 Biostatistics
44 pages
Business Statistics: Australasian
No ratings yet
Business Statistics: Australasian
38 pages
CQF EXAM 3-Answer
No ratings yet
CQF EXAM 3-Answer
14 pages
Unit 8 Packet - Part 1
No ratings yet
Unit 8 Packet - Part 1
21 pages
Laboratory
No ratings yet
Laboratory
19 pages
Project Cardio Good Fitness
No ratings yet
Project Cardio Good Fitness
29 pages
DL Prac1
No ratings yet
DL Prac1
5 pages
Class SCA 07
No ratings yet
Class SCA 07
46 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
BADB1014 Quantitative Methods
No ratings yet
BADB1014 Quantitative Methods
14 pages
Germany22 Luedicke
No ratings yet
Germany22 Luedicke
39 pages
Chapter Four Data Analysis and Presention of Findings1-1
No ratings yet
Chapter Four Data Analysis and Presention of Findings1-1
15 pages
Anova Satu Arah
No ratings yet
Anova Satu Arah
69 pages
Mathematics - Applications and Interpretation Standard Level Paper 2 - SP
No ratings yet
Mathematics - Applications and Interpretation Standard Level Paper 2 - SP
10 pages
Lecture Material 12
No ratings yet
Lecture Material 12
9 pages
محاضرة ادارة رقم 5
No ratings yet
محاضرة ادارة رقم 5
12 pages
Lecture 6 Example Problem
No ratings yet
Lecture 6 Example Problem
5 pages
Statistics Review
0% (1)
Statistics Review
5 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
4 pages
8MA0 02 Mock Q S1
No ratings yet
8MA0 02 Mock Q S1
1 page
QMM: Exercise Sheet 9 - Structural Equation Model: Mediation and Moderation
No ratings yet
QMM: Exercise Sheet 9 - Structural Equation Model: Mediation and Moderation
3 pages
Case Problem Rubrics (SIDM)
No ratings yet
Case Problem Rubrics (SIDM)
1 page
Lab Activity #11: Introduction To Hypothesis Testing (KEY)
No ratings yet
Lab Activity #11: Introduction To Hypothesis Testing (KEY)
4 pages
Assignment II Stat I
No ratings yet
Assignment II Stat I
1 page
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet

Report Capstone Project House Price Prediction

Uploaded by

Report Capstone Project House Price Prediction

Uploaded by

lOMoARcPSD|46745212

Title Page Nos.

Executive Summary 6-7

Chapter 1: Introduction and background 8-12

Chapter 2: Research Methodology 13-20

Chapter 3: Data analysis and interpretation 21-27

Chapter 4: Findings, Recommendations and Conclusion 28-31

INTRODUCTION AND BACKGROUND

1.1 EXECUTIVE SUMMARY

1.2 Introduction and Background

1. Non-graphical Analysis – In non-graphical analysis, we analyze data using statistical tools

1.3 Problem Statement

1.4 Objective of the study:

1.5 Literature Survey

The mean of the dataset:

The median of the dataset:

The standard deviation of the dataset:

2.1.2 Handling Missing data :

Techniques involved in imputing unknown or missing observations include:

1. Deleting the whole rows or columns with unknown or missing observations.

3. Imputing missing observations with the most frequent values.

4. Imputing missing observations by exploring correlations.

5. Imputing missing observations by exploring similarities between cases.

2.1.3 Uni-Variate, Bi-Variate, Multi-Variate:

Multi-Variate: Multi-variate in House Price Prediction, chosen attributes like price,

2.2.2.1 Plots: Histogram plot

2.2.2.2 Plots: Box plot

2.2.2.3 The correlation between variables:

DATA ANALYSIS AND INTERPRETATION

3.1 Linear regression model

Scatter Plot for Linear Regression Model

Distplot for Linear Regression Model

3.1.1 Model evaluation against Linear Regression

3.2 Ridge regression model

3.2.1 Model evaluation against Ridge Regression

3.3 Lasso Regression

3.3.1 Model evaluation against Lasso Regression

3.4 Support Victor Regression (SVR)

3.4.1 Model evaluation against SVR

3.5 Random forest regression

Random Forest is an ensemble technique capable of performing both regression and

3.5.1 Model evaluation against random forest regression

Model Evaluation Comparison between all Models

SL. Algorithms Mean Absolute Mean Root Mean R

1 Linear Regression 0.53 0.52 0.72 0.47

FINDINGS, RECOMMENDATIONS AND CONCLUSION

4.1 Findings Based on Observations

4.2 Findings Based on analysis of Data

4.3 Experiment Results

4.4 Scope for future research

You might also like