0% found this document useful (0 votes)
15 views36 pages

Final PHPU

Uploaded by

Ready For
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views36 pages

Final PHPU

Uploaded by

Ready For
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Predicting House Prices Using Decision Tree

Project Report (Stage-1)


On

“PREDICTING HOUSE PRICES USING DECISION TREE”

Submitted in Partial Fulfillment of


Final Year of the Requirements for the Degree
of

Bachelor of Technology
in

COMPUTER SCIENCE AND ENGINEERING


to

Sandip University, Sijoul, Madhubani, Bihar


Submitted by
210205131071-MD HIFZULLAH
210205131072-KUNDAN KUMAR
2102051310791-MANISHA KUMARI
210205131077-SANDHYA KUMARI

Under the Guidance of


Dr. Shambhu Kumar Singh

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

SANDIP UNIVERSITY
NeelamVidyaVihar, Sijoul, Mailam, Madhubani, BIHAR – 847235
Website:http://www.sandipuniversity.edu.in Email: info@ sandipuniversity.edu.in

Session: 2021-2025

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

SANDIP UNIVERSITY
NeelamVidyaVihar, Sijoul, Mailam, Madhubani, BIHAR – 847235
Website:http://www.sandipuniversity.edu.in Email: info@
sandipuniversity.edu.in

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

==================================================

CERTIFICATE

This is to certify that the Project entitled “ Predicting House Prices Using decision tree ”
submitted by

Md Hifzullah (PRN -210205131071), Kundan Kumar (PRN-210205131072), Sandhya Kumari


(PRN-210205131077), Manisha Kumari (PRN -210205131091), in partial fulfillment of the
degree of Bachelor of Technology in Computer Science and Engineering has been satisfactorily
carried out under my guidance as per the requirement of University.

Date:
Place: Sijoul

Guide HOD Associate Dean

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Acknowledgements

We here, feel very grateful presenting this project report on “Predicting house prices using
decision tree”. It was very nice experience while working on this project.

First of all, I would like to thank Dr. Shambhu Kumar Singh, Dean of SOCSE and our project
coordinator, for his invaluable guidance, support, and feedback throughout this project. His
expertise was instrumental in improving the quality of our research and report.

We would also like to extend my gratitude to the respected Head of the Department, Computer
Science and Engineering, Prof. Aishwarya Shekhar, Dean Academics, SOCSE
Dr. Shambhu Kumar Singh, and Hon’ble Vice Chancellor, Dr. Samir Kumar Varma for
providing me with all the facilities that was required.

Lastly, we would also like to thank my all faculties of CSE department, friends and all
those people who have helped me a lot throughout this project.

Date:
Place:

Md Hifzullah (PRN -1210205131071),

Kundan Kumar (PRN-210205131072),

Sandhya Kumari (PRN-210205131077),

Manisha Kumari (PRN -210205131091)

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

INDEX

Sr. No. Chapters Name Page No.

1 List of Figures ii

2 List of Tables iii

3 List of Acronyms iv

4 Abstract v

5 Chapter-1:Introduction 1-3

6 Chapter-2:Project Management 4-5

7 Chapter-3:Project Planning 6-9

8 Chapter-4:Requirement Analysis Specification 10-13

9 Chapter-5:Design 14-24

10 Chapter-6:Conclusion and Future Work 25

11. References 26

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

LIST OF FIGURES

Sr. No. Figure Name Page No.

1 Distribution of House Prices by University

2 Decision Tree Structure for Sale Price Prediction

3 Correlation Heatmap of Predictive Features

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

LIST OF TABLES

Sr. No. Table Name Page No.

1 Summary Statistics of Real Estate Dataset

2 Feature Importance Rankings

3 Model Performance Metrics

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

LIST OF ACRONYMS

Sr. No. Acronym Details

1 Rs.: Indian Rupees

2 Sqft: Square Footage

3 NIT: National Institute of Technology

4 IIT: Indian Institute of Technology

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Abstract

The Indian real estate market is characterized by its dynamic nature, influenced by factors such as
geographic location, infrastructure development, population growth, and proximity to prestigious
educational institutions like IITs and NITs. Real estate investments in India often fluctuate due to
variations in property features such as square footage, age, lot size, and market demand. In recent
years, technological advancements and data-driven techniques have played a vital role in
predicting property prices, thereby empowering stakeholders to make informed decisions.

This study focuses on predicting house prices using a Decision Tree model, a widely adopted
machine learning technique that excels in handling large datasets and identifying non-linear
relationships among variables. By analyzing historical housing data, including sale price, square
footage, property age, and proximity to major institutions, this research provides actionable
insights into the Indian real estate market. The integration of localized data, such as properties
near IITs and NITs, brings relevance to the Indian context, allowing for tailored strategies and
market-specific analysis.

The objective of this study is threefold:

1. To understand the key factors influencing property prices in various regions across
India.
2. To develop a robust predictive model that estimates property prices with high accuracy.
3. To provide valuable insights for buyers, sellers, investors, and developers aiming to
optimize decision-making in real estate transactions.

The data used in this study comprises property records across multiple regions, with key features
such as number of bedrooms, number of bathrooms, square footage of homes and lots, property
age, and location attributes. A special focus is placed on properties located near educational hubs
like IIT Bombay, IIT Kanpur, NIT Trichy, and others, as these institutions often serve as drivers
for regional real estate growth. This localized approach ensures that the predictive model reflects
the real-world dynamics of Indian housing markets.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

A comprehensive Exploratory Data Analysis (EDA) was conducted to evaluate the distribution
and relationships among key variables. Univariate analysis revealed the concentration of
properties with specific features, such as houses with 3-4 bedrooms and 2-3 bathrooms, which are
most prevalent in the dataset. The median house prices varied significantly across regions, with
properties near prestigious institutions commanding higher valuations due to increased demand
and perceived value. Multivariate analysis demonstrated strong correlations between house prices
and features like home square footage, lot size, and the number of amenities.

To ensure accuracy and efficiency, the Decision Tree model was chosen for its interpretability
and ability to identify hierarchical relationships within the data. The model was developed
iteratively using training and testing datasets, with appropriate preprocessing techniques like
feature normalization and categorical encoding applied. The performance of the model was
evaluated using metrics such as R-squared (R²) scores and feature importance rankings. The final
model achieved a balanced accuracy of 70% on the testing dataset, demonstrating its reliability
in predicting property prices.

The findings of this research underscore the following key insights:

1. Home Size Matters: Larger homes (higher square footage) consistently drive higher sale
prices, reflecting buyer preferences for spacious living areas.
2. Location Drives Value: Proximity to IITs, NITs, and other key infrastructure
significantly increases property prices, highlighting the importance of location-based
investments.
3. Property Age: While newer properties are preferred, well-maintained older properties in
prime locations still command competitive prices.
4. Lot Size and Amenities: Buyers value outdoor spaces and additional amenities,
influencing overall property valuations.

The Decision Tree model provides a transparent and actionable framework for understanding
real estate trends in India. For buyers, this research helps identify regions and property features
that offer the best value. For sellers, it highlights the importance of optimizing property features
to maximize sale prices. Developers and investors can leverage these insights to focus on high-
demand regions, particularly near educational and infrastructural hubs, to achieve higher returns
on investment.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

In conclusion, this study bridges the gap between data science and the Indian real estate market by
developing a robust predictive model that aligns with the local context. The integration of
machine learning techniques with region-specific data ensures accurate price predictions, enabling
stakeholders to navigate the complex real estate landscape effectively. Future research can
enhance this model by incorporating additional dynamic factors, such as economic conditions,
interest rates, and supply-demand trends, to further improve prediction accuracy and market
insights.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Chapter-1:
Introduction

The Indian real estate market plays a pivotal role in the economy and societal development. With
urbanization and infrastructure growth, the need for accurate real estate valuation has never been
more critical. Predicting house prices with high precision enables stakeholders, including buyers,
sellers, developers, and policymakers, to make well-informed decisions. This study focuses on
predicting house prices using Decision Tree models, emphasizing regional differences and the
influence of proximity to premier institutions like IITs and NITs. Key factors such as property
size, age, and location are meticulously analyzed to decode their impact on pricing trends.

This project explores the application of decision tree models for predicting house prices in Indian
cities. With a dataset of 10,659 property records containing detailed attributes like price, size, and
property characteristics, the goal is to understand and analyze the relationship between these
variables and the resulting property prices. Decision tree models were employed due to their
interpretability and ability to handle non-linear relationships effectively.

1.1 Problem Definition

The primary objective of this project is to build a predictive model to estimate house prices based
on critical property attributes, such as:

 Manage Information: Centralized management of property details, including size, age,


and location.
 Data Reports: Generation of property value reports for analysis and decision-making.
 Insights: Analyzing the relationship between key property features and house prices.

The decision tree model provides an accurate and interpretable prediction tool, enabling buyers,
sellers, and developers to make informed decisions. This project addresses the limitations of
manual and simplistic approaches currently used in real estate analysis.
Furthermore, this project assists various stakeholders, such as:

 Buyers: To identify affordable and suitable properties based on their preferences.


 Sellers: To determine competitive pricing strategies.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

 Developers: To analyze market trends and plan housing projects effectively.

1.2 Study of Existing System


1.2.1 Presently Available System

In the existing systems for real estate pricing and analysis, the following shortcomings are
evident:

 Lack of Automation: Many real estate reports are generated manually or using
spreadsheets.
 Limited Features: Existing methods do not incorporate multiple variables such as lot
size, property age, and location simultaneously.
 Inconsistent Accuracy: Simple linear models fail to capture non-linear relationships
between variables and house prices.

Real estate professionals currently rely on manual data entry and interpretation, making the
process time-consuming and prone to human errors.

1.2.2 Need for the New System

The need for a predictive system arises from the limitations of existing methods, which include:

 Data Management Challenges: Current systems use spreadsheets that lack centralized
storage and robust data management capabilities.
 Manual Intervention: Data manipulation in spreadsheets can lead to errors and
inconsistencies.
 No Cross-Verification: Reports generated manually cannot be validated easily, leading to
unreliable insights.

To overcome these challenges, a predictive system based on decision tree models is proposed.
The new system offers the following advantages:

 Centralized Database: Ensures data integrity and avoids duplication or manipulation.


 Accurate Predictions: Captures complex relationships between variables.
 User-Friendly Insights: Generates interpretable and actionable reports for stakeholders.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

 Real-Time Analysis: Provides dynamic and real-time predictions for improved decision-
making.

This project integrates a robust and scalable solution to address the needs of India’s growing real
estate market.

1.3 Software Engineering Process Modeling

In software development, selecting an appropriate process model is crucial for project success.
For this project, the Spiral Model has been chosen due to its iterative nature and flexibility.

1.3.1 Spiral Model

The spiral model is an evolutionary software process model that combines the iterative approach
of prototyping with the systematic methodology of the sequential linear model. It enables
incremental releases of the software, ensuring continuous improvement and evaluation.

Reasons for Choosing the Spiral Model:

 Incremental Development: Software is developed in phases, ensuring regular feedback


and validation.
 Risk Analysis: Identifies and mitigates risks at each phase of the project.
 Flexibility: Accommodates changes in requirements and improvements during
development.

The Spiral Model consists of six key phases:

1. Customer Communication: Establish effective communication to understand the


requirements of stakeholders.
2. Planning: Define project scope, timelines, and resource allocation.
3. Risk Analysis: Identify technical and management risks to minimize project failure.
4. Engineering: Develop the predictive model using decision tree techniques.
5. Construction and Release: Build, test, and release the predictive system.
6. Customer Evaluation: Obtain feedback to refine the model and address user needs.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

The iterative nature of the spiral model ensures that the predictive model is developed, tested, and
validated at each step, leading to improved performance and accuracy.

Figure 1.1: Spiral Model [1]

The Spiral Model’s structured yet flexible approach makes it ideal for developing the predictive
system for house prices, ensuring robustness, accuracy, and user satisfaction.

Chapter-2:Project Management

Efficient project management was crucial in ensuring the success of this research. The study
involved systematic data collection and analysis. Key steps included:

1. “Identifying Reliable Data Sources”: Data was sourced from Indian real estate
platforms and government housing records to ensure reliability.
2. “Data Cleaning and Preprocessing”: Missing and redundant information was handled
carefully to maintain data integrity.
3. “Algorithm Implementation”: The Decision Tree algorithm was chosen for its
interpretability and robustness in capturing non-linear relationships.
4. “Model Optimization”: The model parameters were fine-tuned using grid search to
achieve balanced performance.

Chapter-2: Project Management

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Effective project management is crucial to ensure that the development and implementation of the
software achieve its objectives within the given constraints of time, cost, and resources. This
chapter presents the project process, resource planning, and the approach adopted to manage the
project efficiently. Project management primarily focuses on the three P's—People, Problem,
and Process—and also incorporates cost estimation for the overall development.

2.1 People

The success of any project depends on the people involved. Proper management of human
resources ensures task delegation, motivation, and collaboration among team members to achieve
project goals. In this project, a team of three developers collaborated to develop the software with
well-defined roles:

 Md Hifzullah: Conducted the analysis phase and handled the coding aspects of the project.
 Kundan Kumar: Took charge of the design phase, focusing on the architecture and interface, and
also contributed to coding.
 Sandhya Kumari & Manisha Kumari: Managed the testing phase to ensure the software was bug-
free and functionally sound.

Additionally, the people factor extends to other key stakeholders, such as Users (Employees and
Admins) who interact with the system to perform day-to-day tasks, including uploading files,
setting reminders, and sending data requests.

Teamwork, effective communication, and task division enabled smooth progress, ensuring all
project deliverables were met on schedule.

2.2 Problem

The primary focus of this project is to address inefficiencies in the existing manual insurance
management systems and develop a web-based solution. The major objectives and scope of the
project include:

 Objective: To automate client information management, generate reminders for birthdays and
anniversaries, manage reports, and reduce the manual workload of agents and admins.
 Scope: The software aims to simplify insurance data storage, retrieval, and reporting processes. It
also improves organizational efficiency by centralizing all operations into a single, user-friendly
platform.

Major Problems Addressed:

1. Confidential Data Handling: The system is designed for the exclusive use of the organization,
ensuring data privacy and security.
2. User Adaptability: While initial user training may be required, the system's intuitive design
ensures ease of use over time.
3. Report Management: The software generates accurate and timely reports needed for decision-
making and analysis.

The technical foundation of the system uses Java for application development and Oracle 10g for
the database, enabling robust, secure, and efficient handling of data.

2.3 Process

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

A well-defined process framework ensures that software development progresses methodically,


with measurable outcomes at each stage. For this project, we adopted the Spiral Model due to its
iterative nature and emphasis on risk management.

Spiral Model Framework:

1. Customer Communication: Frequent interactions with stakeholders to gather, review, and


validate requirements.
2. Planning: Creation of detailed plans, including timelines, milestones, resource allocation, and
project deliverables.
3. Risk Analysis: Identification and mitigation of technical, operational, and resource-related risks.
4. Engineering: Development of software modules based on the approved design using Java and
Oracle.
5. Construction and Release: System integration, functional testing, and deployment of the web-
based application.
6. Customer Evaluation: Feedback from stakeholders ensures iterative improvements, making the
system more aligned with user expectations.

The Spiral Model was particularly useful for this project due to its flexibility in accommodating
changing requirements and its emphasis on regular evaluation and feedback. The iterative nature
of the model allowed incremental development, ensuring the project remained on track.

2.4 Cost Estimation

Cost estimation is an integral part of project management as it helps allocate resources efficiently
and plan project expenditures. For this project, the COCOMO (COnstructive COst MOdel) was
used to calculate the development cost.

COCOMO Calculation:

1. Effort Applied (E): 2.4 * (1.36)¹⁰⁵ = 3.33 man-months


2. Development Time (D): 2.5 * (3.33)°⁻³⁸ = 3.95 months
3. People Required (P): 3.33 / 3.95 = 0.844 (approx. 1 person per month)

Cost Breakdown:

 Salary of Developer: Rs. 10,000 per month


 Documentation Charges: Rs. 1,000
 Total Salary for Development Time: (10,000 * 0.844) * 3.95 = Rs. 33,338

Final Cost:

 Total Cost = (Effort * Total Salary) + Documentation Charges


 Total Cost = (3.33 * 33,338) + 1,000
 Total Cost = Rs. 1,12,296.03

Thus, the total estimated cost for developing this web-based application is Rs. 1,12,396.03.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Conclusion

The project management process ensures effective resource allocation, problem identification,
and systematic development of the software. By adopting the Spiral Model and estimating costs
using the COCOMO approach, this project successfully addresses organizational needs and
provides a scalable solution for managing insurance operations efficiently.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Chapter-3:
Project Planning

The project followed a structured timeline to ensure a seamless execution:

 “Week 1-2”: Data sourcing and preparation.


 “Week 3-4”: Exploratory data analysis and feature engineering to uncover
patterns.
 “Week 5-6”: Model development and evaluation using training and testing
datasets.
 “Week 7”: Insights generation, report preparation, and visualization of
results.

3.1 SOFTWARE SCOPE

The scope of the House Price Prediction System revolves around streamlining the process of
property price estimation by leveraging a machine learning model. The software provides a
dynamic platform for analyzing housing attributes and predicting prices based on significant
influencing factors.

Key functional areas of the project include:

 Housing Attributes Database: Collects and stores details such as home size, lot size, property
age, number of bedrooms, bathrooms, and amenities.
 Regional Analysis: City-wise analysis of property prices with emphasis on regions near IITs and
NITs.
 Interactive Reports: Provides consolidated price prediction reports for sellers, buyers, and
investors.
 Visualization Tools: Generates graphs and charts for easier understanding of market trends and
property evaluations.
 User-Friendly Interface: Displays location-based property summaries with details such as average
price, size, and features.

The system ensures accurate price predictions, reduces manual estimation errors, and allows
stakeholders to make well-informed decisions based on concrete data-driven insights.

3.1.1 FEASIBILITY ANALYSIS

Before initiating the development process, a feasibility analysis was performed to evaluate the
project’s practicality. This analysis determines if the proposed system can meet the objectives
within defined constraints. The feasibility study includes:

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

1. Operational Feasibility: The system eliminates the manual effort of real estate analysis,
providing fast and precise house price predictions. Data collection, preprocessing, and
prediction tasks are automated, enhancing accuracy and efficiency.
2. Technical Feasibility: The system uses the proven combination of Python (for model
development), libraries such as Scikit-learn, and tools like Oracle for database storage.
The Decision Tree model's iterative nature ensures robust analysis with excellent
prediction accuracy.
3. Economic Feasibility: The system reduces reliance on expensive manual appraisal
processes. Automated price predictions save time, reduce human error, and improve the
quality of decision-making, resulting in a cost-effective solution for stakeholders.
4. Financial Feasibility: Development costs include personnel, software tools, and
infrastructure. The benefit-to-cost ratio demonstrates that the investment in the project will
yield significant returns by improving decision-making efficiency.
5. Resource Feasibility: Existing computational tools and hardware resources are sufficient
to implement the model. No additional resources are required for the system’s
deployment.

3.2 RISK ANALYSIS AND MANAGEMENT

Risk analysis is crucial for anticipating challenges and formulating mitigation strategies.
Identified risks include:

1. Data Overload: Larger datasets may increase prediction processing times. Optimized algorithms
and hardware upgrades mitigate this issue.
2. Inconsistent Data Quality: Missing or incorrect data can impact model accuracy. Data cleaning
and preprocessing techniques ensure high-quality inputs.
3. Model Overfitting: Decision Tree models risk overfitting. Cross-validation and hyperparameter
tuning reduce this risk.
4. Technological Challenges: Compatibility issues between tools are managed by using widely
compatible platforms like Python and Oracle.

3.3 PROJECT SCHEDULING

Project scheduling involves breaking down tasks into manageable components and assigning
timelines. The project follows a macroscopic and detailed schedule that includes key phases:

1. Phase 1: Data Collection and Cleaning (2 weeks).


2. Phase 2: Exploratory Data Analysis and Feature Engineering (2 weeks).
3. Phase 3: Model Development and Training (3 weeks).
4. Phase 4: Model Testing and Evaluation (2 weeks).
5. Phase 5: Report Compilation and Visualization (2 weeks).
6. Phase 6: Final Deployment and User Feedback (1 week).

3.4 PROJECT QUALITY ASSURANCE

Quality assurance ensures the accuracy and reliability of the developed system:

 Validation: Regular testing of the Decision Tree model using training and testing datasets.
 Verification: Code reviews and performance validation to ensure model accuracy.
 User Satisfaction: The system is user-friendly, accurate, and efficient in predicting house prices.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Chapter-4:Requirement Analysis Specification

Dataset Overview

- “Number of Records”: 10,659


- “Key Features”:
- Sale Price (Rs.)
- Number of Bedrooms and Bathrooms
- Home Square Footage
- Lot Size (Sqft)
- Age of Property
- Proximity to IITs and NITs

Data Preprocessing

1. “Normalization”: Numerical variables were normalized using Z-score


normalization to ensure uniform scaling.
2. “Feature Engineering”: Categorical variables such as property type and location
were transformed into binary dummy variables.
3. “Removal of Redundancy”: Non-essential columns like "sale year" were
replaced with computed metrics like "property age."

Chapter-4: Requirement Analysis Specification

Requirement Analysis Specification involves understanding and defining the project's scope,
functionality, and requirements to meet its objectives effectively. A comprehensive analysis
ensures that all critical components of the House Price Prediction System are identified,
evaluated, and modeled systematically.

4.1 REQUIREMENT ANALYSIS

Requirement Analysis bridges the gap between real estate requirements and technical
implementation. It enables the system engineer to specify the software’s functionality,
performance, and constraints to meet project goals.

Key operational principles include:

 Understanding the problem domain (Indian real estate market).

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

 Defining core functionalities (house price predictions, user interactions, and reporting).
 Representing system behavior under different scenarios.
 Developing models that progressively uncover system details.
 Moving from high-level specifications to implementation-ready details.

The House Price Prediction System must include:

 Accurate price estimations.


 Data analysis reports based on housing attributes.
 User dashboards for stakeholders like buyers, sellers, and developers.

4.1.1 Problem Recognition

The primary challenge lies in the lack of automated and accurate house price prediction tools
tailored to the Indian market. Existing manual processes are time-consuming and error-prone.
Problems include:

 Inefficient analysis of property attributes.


 Inaccurate evaluations caused by subjective assessments.
 Limited visibility of price trends based on location.

By implementing this predictive system, stakeholders will gain access to precise pricing data and
trend reports, eliminating manual inefficiencies and guesswork.

4.1.2 Evaluation and Synthesis

The evaluation process involves analyzing data flow, identifying functional components, and
synthesizing the system's solution. Key tasks include:

 Identifying housing attributes (e.g., square footage, lot size, and age).
 Defining system behavior under changing market conditions.
 Establishing user-friendly interfaces for stakeholders.
 Creating reports for trend analysis, predictions, and insights.

These efforts collectively ensure a clear understanding of the problem and an effective solution
design.

4.2 REQUIREMENT MODELING

During requirement modeling, system functionalities and behaviors are represented using models
to understand data flow, functional processes, and operational behavior.

4.2.1 Functional Models

The system processes housing data to generate predictions. Core functions include:

 Input: Users input housing attributes (e.g., square footage, location, and amenities).
 Processing: The Decision Tree model processes the data and analyzes relevant patterns.
 Output: The system outputs price predictions, trend analyses, and graphical reports.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Over iterations, detailed models for each function will refine the system's functionality.

4.2.2 Behavioral Models

Behavioral models define the system’s responses to external triggers. In this project:

 Input events (e.g., data upload) trigger analysis processes.


 State changes occur when predictions are generated.
 Users interact with the results via reports and dashboards.

4.3 REQUIREMENT SPECIFICATION

The Requirement Specification outlines the system's hardware, software, and functional needs:

4.3.1 Hardware Requirement

 Processor: Intel Core i3 or higher


 RAM: Minimum 4GB
 Hard Disk: 10GB minimum storage space

4.3.2 Software Requirement

 Operating System: Windows 10 or Linux


 Frontend: Python, Dash/Flask (for interface development)
 Backend: Oracle/MySQL (for database management)
 Libraries: Scikit-learn, Pandas, NumPy, Matplotlib

4.3.3 Functional Needs

 House price trend analysis.


 Location-specific predictive models.
 User dashboards for buyers, sellers, and investors.

4.3.4 Advantages

 Automated, data-driven analysis.


 User-friendly interface.
 Quick and accurate price predictions.
 Interactive visualizations and reports.

4.3.5 Disadvantages

 Higher initial training time for non-technical users.


 Processing delays with extremely large datasets.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Chapter-5:Design

Exploratory Data Analysis

The dataset revealed interesting trends in the Indian housing market:

- Most properties were priced between Rs. 50 lakh and Rs. 2 crore, with higher
prices observed near IIT Kanpur and IIT Bombay.
- Square footage emerged as a critical determinant, with larger homes commanding
significantly higher prices.
- Proximity to renowned educational institutions like IITs and NITs influenced
property prices, reflecting their demand-driven markets.

Model Development
The Decision Tree model was developed and refined through iterative testing:

- “Initial Model Performance”:


- Training R²: 99% (indicating overfitting)
- Testing R²: 53% (poor generalization)
- “Optimized Model Performance”:
- Training R²: 78%
- Testing R²: 70% (indicating better generalization)

Feature Importance
The following features emerged as the most critical predictors:
1. “Home Square Footage”: Larger homes were strongly associated with higher
prices.
2. “Proximity to IIT/NIT Campuses”: Locations near prestigious institutions
consistently fetched premium prices.
3. “Lot Size”: Properties with larger outdoor spaces attracted higher valuations.
4. “Age of Property”: While older properties had mixed effects, their condition and
location often compensated for their age.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

5.1 DATA DESIGN

Data design is the foundational activity in software engineering that focuses on defining the
structure of data used in the system. In the House Price Prediction System, proper organization
of housing-related data is crucial for ensuring high-quality predictions and system performance.
The data design emphasizes efficient data storage, retrieval, and integration with the Decision
Tree model.

5.1.1 Table Description

 Property Details Table: Stores information such as property ID, square footage, lot size,
number of bedrooms, bathrooms, property age, and location.

Figure 5.1: Table Design for Property Details

 Regional Data Table: Contains details of cities, proximity to IITs/NITs, average


infrastructure scores, and related real estate demand metrics.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.2: Table Design for Regional Data

 User Table: Maintains user credentials, including user ID, login credentials, and roles
(buyer, seller, investor).

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.3: Table Design for User Information

 Prediction Results Table: Stores historical and real-time house price predictions
generated by the Decision Tree model.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.4: Table Design for Prediction Results

5.1.2 Class Diagram

The Class Diagram represents the logical structure of the system by detailing its classes,
attributes, and methods. Key classes include:

 Property Class: Handles property attributes such as square footage, lot size, and age.
 User Class: Manages user interactions with the system.
 Prediction Class: Implements the Decision Tree model and returns price predictions.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.5: Class Diagram of House Price Prediction

5.2 ARCHITECTURAL DESIGN

The Architectural Design defines the system’s overall modular structure, representing control
flow and data interactions.

5.2.1 Entity Relationship Diagram (ERD)

The ERD showcases relationships among tables in the database, ensuring effective data
management.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.6: ER Diagram of House Price Prediction

5.2.2 Data Flow Diagram (DFD)

The Data Flow Diagram represents the flow of information within the system.

 DFD Level 0: Depicts the entire system as a single process that takes user input and
produces price predictions.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.7: DFD Level 0

 DFD Level 1: Breaks down processes into sub-processes, such as data input, prediction
analysis, and report generation.

Figu
re 5.8: DFD Level 1

5.3 INTERFACE DESIGN

The Interface Design ensures seamless interaction between the system and its users. Key
interfaces include:

 User Dashboard: Displays property search, prediction results, and reports.


 Admin Interface: Provides controls for database updates and model retraining.

Figure 5.9: User Dashboard for House Price Prediction

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

5.4 COMPONENT LEVEL DESIGN

Component-level design focuses on translating system architecture into implementable software


components.

5.4.1 UML Diagrams

 Use Case Diagram: Illustrates user interactions with the system (e.g., input property
details, view predictions).

Figure 5.10: Use Case Diagram

 Sequence Diagram: Demonstrates the sequence of operations, such as data input,


processing, and report generation.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.11: Sequence Diagram

 Component Diagram: Defines the relationships among major software components.

Figure 5.12: Component Diagram

 State Diagram: Represents system behavior based on events such as data input,
prediction processing, and result display.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Figure 5.13: State Diagram

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

Chapter-6:Conclusion and Future Work

This research highlights the factors influencing house prices in the Indian real
estate market. Proximity to educational hubs like IITs and NITs, combined with
property size and amenities, significantly affects pricing dynamics. The Decision
Tree model proved effective in identifying these relationships, with its insights
applicable for real-world decision-making.

Recommendations and Future Directions:

1. “Expand the Dataset”: Incorporate data from more cities and rural areas to cover
diverse market segments.
2. “Dynamic Market Factors”: Account for variables like demand-supply
dynamics, interest rates, and regional economic policies.
3. “Enhanced Algorithms”: Experiment with ensemble models like Random Forest
and Gradient Boosting for improved prediction accuracy.
4. “Integrate Real-Time Data”: Leverage APIs for real-time housing market trends
to make the model adaptable to market fluctuations.

The findings of this study provide a robust foundation for stakeholders to navigate
the Indian real estate sector strategically.

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

References

- https://www.kaggle.com/code/subhradeep88/house-price-predict-decision-
tree-random-forest

- https://github.com/srimallipudi/House-Price-Prediction-Using-Decision-
Tree-ML-Algorithm

- https://ieeexplore.ieee.org/document/9731083

- https://www.researchgate.net/publication/
238398459_Determinants_of_House_Price_A_Decision_Tree_Approach

- https://www.researchgate.net/publication/
350430324_House_Price_Prediction_Using_Machine_Learning_Algorithm

- https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3565512

- https://chatgpt.com/share/67611640-0528-800f-867e-6db0d1c3262b

- https://nevonprojects.com/predicting-house-price-using-decision-tree/

- https://chatgpt.com/share/67619c73-9180-800a-b492-e78b701ec15c

- https://emhaihsan.medium.com/house-price-prediction-with-decision-tree-
regressor-9728064de7da

- Dr. M. Thamarai, “House Price Prediction Modeling Using Machine


Learning”, I.J. Information Engineering and Electronic Business, 2020, 2,
15-20.

- https://www.kaggle.com/datasets/amitabhajoy/bengaluru-house-price-data.

- Naalla Vineeth, Maturi Ayyappa, and B. Bharathi, “House Price Prediction


Using Machine Learning Algorithms”, Springer Nature Singapore Pte Ltd.
2018 I. Zelinka et al. (Eds.): ICSCS 2018, CCIS 837, pp. 425–433, 2018

School Of Computer Science And Engineering, Sijoul, Madhubani


Predicting House Prices Using Decision Tree

School Of Computer Science And Engineering, Sijoul, Madhubani

You might also like