0% found this document useful (0 votes)
18 views17 pages

#Raw Report On Predicting House Prices Using Decsion Tree Not Ready

kasdjfl;kasdjf[

Uploaded by

Ready For
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views17 pages

#Raw Report On Predicting House Prices Using Decsion Tree Not Ready

kasdjfl;kasdjf[

Uploaded by

Ready For
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

A

Project Report (Stage-1)


On

“PREDICTING HOUSE PRICES USING DECISION TREE”

Submitted in Partial Fulfillment of


Final Year of the Requirements for the Degree
of

Bachelor of Technology
in

COMPUTER SCIENCE AND ENGINEERING


to

Sandip University, Sijoul, Madhubani, Bihar


Submitted by
210205131071-MD HIFZULLAH
210205131072-KUNDAN KUMAR
2102051310791-MANISHA KUMARI
210205131077-SANDHYA KUMARI

Under the Guidance of


Dr. Shambhu Kumar Singh

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

SANDIP UNIVERSITY
NeelamVidyaVihar, Sijoul, Mailam, Madhubani, BIHAR – 847235
Website:http://www.sandipuniversity.edu.in Email: info@ sandipuniversity.edu.in

Session: 2021-2025
SANDIP UNIVERSITY
NeelamVidyaVihar, Sijoul, Mailam, Madhubani, BIHAR – 847235
Website:http://www.sandipuniversity.edu.in Email: info@
sandipuniversity.edu.in

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

==================================================

CERTIFICATE

This is to certify that the Project entitled “ Predicting house prices using decision tree ”
submitted by

Md Hifzullah (PRN -210205131071), Kundan Kumar (PRN-210205131072), Sandhya Kumari


(PRN-210205131077), Manisha Kumari (PRN -210205131091), in partial fulfillment of the
degree of Bachelor of Technology in Computer Science and Engineering has been satisfactorily
carried out under my guidance as per the requirement of University.

Date:
Place: Sijoul

Guide HOD Associate Dean


Acknowledgements
We here, feel very grateful presenting this project report on “Predicting house prices using
decision tree”. It was very nice experience while working on this project.
First of all, I would like to thank Dr. Shambhu Kumar Singh, Dean of SOCSE and our
project coordinator, for his invaluable guidance, support, and feedback throughout this project.
His expertise was instrumental in improving the quality of our research and report.

We would also like to extend my gratitude to the respected Head of the Department,
Computer Science and Engineering, Prof. Aishwarya Shekhar, Dean Academics, SOCSE

Dr. Shambhu Kumar Singh, and Hon’ble Vice Chancellor, Dr. Samir Kumar Varma for
providing me with all the facilities that was required.

Lastly, we would also like to thank my all faculties of CSE department, friends and all
those people who have helped me a lot throughout this project.

Date:
Place:

Md Hifzullah (PRN -1210205131071),

Kundan Kumar (PRN-210205131072),

Sandhya Kumari (PRN-210205131077),

Manisha Kumari (PRN -210205131091)


INDEX

Sr. No. Chapters Name Page No.

1 List of Figures ii

2 List of Tables iii

3 List of Acronyms iv

4 Abstract v

5 Chapter-1:Introduction 1-3

6 Chapter-2:Project Management 4-5

7 Chapter-3:Project Planning 6-9

8 Chapter-4:Requirement Analysis Specification 10-13

9 Chapter-5:Design 14-24

10 Chapter-6:Conclusion and Future Work 25

11. References 26
LIST OF FIGURES

Sr. No. Figure Name Page No.

1 Distribution of House Prices by University

2 Decision Tree Structure for Sale Price Prediction

3 Correlation Heatmap of Predictive Features


LIST OF TABLES

Sr. No. Table Name Page No.

1 Summary Statistics of Real Estate Dataset

2 Feature Importance Rankings

3 Model Performance Metrics


LIST OF ACRONYMS

Sr. No. Acronym Details

1 Rs.: Indian Rupees

2 Sqft: Square Footage

3 NIT: National Institute of Technology

4 IIT: Indian Institute of Technology


Abstract

The Indian real estate market is a complex interplay of factors such as location,
property size, age, and proximity to educational institutions. This report utilizes a
Decision Tree model to predict house prices, aligning with local contexts like IIT
and NIT campuses, typical Indian infrastructure, and real estate costs. By
leveraging advanced machine learning techniques, we analyze a comprehensive
dataset of housing sales across Indian regions. This report highlights the significant
predictors of sale prices and offers actionable insights for buyers, sellers, and
investors aiming to navigate the dynamic Indian real estate sector.
Chapter-1:Introduction

The Indian real estate market plays a pivotal role in the economy and societal
development. With urbanization and infrastructure growth, the need for accurate
real estate valuation has never been more critical. Predicting house prices with high
precision enables stakeholders, including buyers, sellers, developers, and
policymakers, to make well-informed decisions. This study focuses on predicting
house prices using Decision Tree models, emphasizing regional differences and the
influence of proximity to premier institutions like IITs and NITs. Key factors such
as property size, age, and location are meticulously analyzed to decode their
impact on pricing trends.
Chapter-2:Project Management

Efficient project management was crucial in ensuring the success of this research.
The study involved systematic data collection and analysis. Key steps included:

1. “Identifying Reliable Data Sources”: Data was sourced from Indian real estate
platforms and government housing records to ensure reliability.
2. “Data Cleaning and Preprocessing”: Missing and redundant information was
handled carefully to maintain data integrity.
3. “Algorithm Implementation”: The Decision Tree algorithm was chosen for its
interpretability and robustness in capturing non-linear relationships.
4. “Model Optimization”: The model parameters were fine-tuned using grid search
to achieve balanced performance.
Chapter-3:Project Planning

The project followed a structured timeline to ensure a seamless execution:

- “Week 1-2”: Data sourcing and preparation.


- “Week 3-4”: Exploratory data analysis and feature engineering to uncover
patterns.
- “Week 5-6”: Model development and evaluation using training and testing
datasets.
- “Week 7”: Insights generation, report preparation, and visualization of results.

Step-by-step process to predict house prices using decision tree models:

Step 1: Collect and Understand the Data


Goal: Gather a dataset containing features like house size, location, number of
rooms, etc., along with their respective prices.
Example: Use datasets like the Boston Housing Dataset or Kaggle datasets.
Tip: Ensure the data is in a structured format (e.g., CSV file).

Step 2: Install Necessary Libraries


Install Python libraries like pandas, numpy, scikit-learn, and matplotlib using:
pip install pandas numpy scikit-learn matplotlib

Step 3: Load the Data


Use pandas to load the dataset into a DataFrame:
import pandas as pd
data = pd.read_csv('house_prices.csv')
print(data.head())

Step 4: Data Preprocessing


Check for Missing Values:
print(data.isnull().sum())
data = data.dropna() # Remove rows with missing values
Handle Categorical Data: Convert non-numeric columns (e.g., location) into
numeric values using encoding:
data = pd.get_dummies(data, drop_first=True)
Step 5: Feature Selection
Separate features (X) and target variable (y):
X = data.drop('price', axis=1) # Features
y = data['price'] # Target

Step 6: Split the Data


Divide data into training and testing sets (e.g., 80% training, 20% testing):
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Step 7: Train a Decision Tree Model


Use scikit-learn to create and train a decision tree regressor:
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor(max_depth=5, random_state=42)
model.fit(X_train, y_train)

Step 8: Evaluate the Model


Predict house prices on the test data:
y_pred = model.predict(X_test)
Measure performance using metrics like Mean Absolute Error (MAE) or R-
squared:
from sklearn.metrics import mean_absolute_error, r2_score
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MAE: {mae}, R2 Score: {r2}")

Step 9: Optimize the Model


Tune hyperparameters like max_depth, min_samples_split, etc., to improve
accuracy:
model = DecisionTreeRegressor(max_depth=10, random_state=42)
model.fit(X_train, y_train)

Step 10: Make Predictions


Use the trained model to predict prices for new houses:
new_data = [[2500, 3, 2, 1]] # Example: square footage, bedrooms, bathrooms,
location encoded
price_prediction = model.predict(new_data)
print(f"Predicted Price: {price_prediction}")

Optional: Visualize the Decision Tree


Visualize the tree to understand the decision-making process:
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(20,10))
plot_tree(model, feature_names=X.columns, filled=True)
plt.show()
Chapter-4:Requirement Analysis Specification

Dataset Overview

- “Number of Records”: 10,659


- “Key Features”:
- Sale Price (Rs.)
- Number of Bedrooms and Bathrooms
- Home Square Footage
- Lot Size (Sqft)
- Age of Property
- Proximity to IITs and NITs

Data Preprocessing

1. “Normalization”: Numerical variables were normalized using Z-score


normalization to ensure uniform scaling.
2. “Feature Engineering”: Categorical variables such as property type and location
were transformed into binary dummy variables.
3. “Removal of Redundancy”: Non-essential columns like "sale year" were
replaced with computed metrics like "property age."
Chapter-5:Design

Exploratory Data Analysis

The dataset revealed interesting trends in the Indian housing market:

- Most properties were priced between Rs. 50 lakh and Rs. 2 crore, with higher
prices observed near IIT Kanpur and IIT Bombay.
- Square footage emerged as a critical determinant, with larger homes commanding
significantly higher prices.
- Proximity to renowned educational institutions like IITs and NITs influenced
property prices, reflecting their demand-driven markets.

Model Development
The Decision Tree model was developed and refined through iterative testing:

- “Initial Model Performance”:


- Training R²: 99% (indicating overfitting)
- Testing R²: 53% (poor generalization)
- “Optimized Model Performance”:
- Training R²: 78%
- Testing R²: 70% (indicating better generalization)

Feature Importance
The following features emerged as the most critical predictors:
1. “Home Square Footage”: Larger homes were strongly associated with higher
prices.
2. “Proximity to IIT/NIT Campuses”: Locations near prestigious institutions
consistently fetched premium prices.
3. “Lot Size”: Properties with larger outdoor spaces attracted higher valuations.
4. “Age of Property”: While older properties had mixed effects, their condition and
location often compensated for their age.
Chapter-6:Conclusion and Future Work

This research highlights the factors influencing house prices in the Indian real
estate market. Proximity to educational hubs like IITs and NITs, combined with
property size and amenities, significantly affects pricing dynamics. The Decision
Tree model proved effective in identifying these relationships, with its insights
applicable for real-world decision-making.

Recommendations and Future Directions:

1. “Expand the Dataset”: Incorporate data from more cities and rural areas to cover
diverse market segments.
2. “Dynamic Market Factors”: Account for variables like demand-supply
dynamics, interest rates, and regional economic policies.
3. “Enhanced Algorithms”: Experiment with ensemble models like Random Forest
and Gradient Boosting for improved prediction accuracy.
4. “Integrate Real-Time Data”: Leverage APIs for real-time housing market trends
to make the model adaptable to market fluctuations.

The findings of this study provide a robust foundation for stakeholders to navigate
the Indian real estate sector strategically.
References

1. Indian Real Estate Insights, 2024 Edition.


2. Data Science for Real Estate Prediction by
3. Machine Learning Models for Market Analysis, 2023.

You might also like