Predicting True Value of Cars Using Ml-1
Predicting True Value of Cars Using Ml-1
On
Bachelor of Technology
In
i
CERTIFICATE
This is to certify that the project entitled “Predicting True Value of Used Car Using Machine
Learning Technique” being submitted by K Naga Sandhya bearing ID Number O180807and
P.A.D.Prasanna bearing ID Number O180792 and P Sai Nageswari bearing ID Number O180812
and C.V.V.Vineeth bearing ID Number O180817 in partial fulfillment of the requirements for the
award of the degree of the Bachelor of Technology in Computer Science and Engineering in Dr. APJ
Abdul Kalam, RGUKT-AP, IIIT Ongole is a record of bonafide work carried out by them under my
guidance and supervision from February 2023 to June 2023.
The results presented in this project have been verified and found to be satisfactory.
The results embodied in this project report have not been submitted to any other University for the
award of any other degree or diploma.
RGUKT,Ongole. RGUKT,Ongole
ii
APPROVAL SHEET
This report entitled “Predicting True Value of Used Car Using Machine Learning Technique”
being submitted by K Naga Sandhya bearing ID Number O180807 and P.A.D.Prasanna bearing ID
Number O180792 and P Sai Nageswari bearing ID Number O180812 and C.V.V.Vineeth bearing ID
Number O180817 guided by Mr.Krishna M approved for the degree of Bachelor of Technology in
Computer Science and Engineering.
Examiners
Supervisors(s)
Date:
Place:
iii
ACKNOWLEDGEMENT
It is our privilege to express a profound sense of respect, gratitude and indebtedness to our guide
Mr.Krishna M, Assistant Professor, Dept. of Computer Science and Engineering Dr APJ Abdul
kalam RGUKT-AP, IIIT Ongole, for her indefatigable inspiration, guidance, cogent, discussion,
constructive criticisms and encouragement throughout the dissertation work.
We express our sincere gratitude to B. Sampath Babu, Asst. Professor & Head. Department of
Computer Science and Engineering, Dr APJ Abdul kalam, RGUKT-AP, IIIT Ongole for his suggestion
motivations and co-operation for the successful completion of the work.
We extend our sincere thanks to Dr Rupas Kumar, Dean Research and development, Dr APJ Abdul
kalam, RGUKT-AP, IIIT Ongole, for his encouragement and constant help.
We extend our sincere thanks Dr B Jaya Rami Reddy, Director, Dr APJ Abdul kalam RGUKT AP,
IIIT Ongole for his encouragement.
P.A.D.Prasanna - o180792
C.V.V.Vineeth – o180817
iv
DECLARATION
We hereby declare that the project work entitles “ Predicting True Value of Used Car Using
Machine Learning Technique ” submitted to the Dr APJ Abdul kalam, RGUKT-AP, IIIT Ongole
in partial fulfilment of the requirement for the award of the degree of Bachelor of Technology (B
Tech) in Computer Science and Engineering is a record of an original work done by us under the
guidance of Mr.Krishna M, Assistant Professor and this project work have not been submitted to any
university for the award of any other degree or diploma.
P.A.D.Prasanna - o180792
C.V.V.Vineeth – o180817
Date:
v
ABSTRACT
A car price prediction has been a high interest, as this kind of system becomes handy for many people.
So, to build a model for predicting the price of used cars we applied some machine learning techniques
as it requires attributes that are examined for the reliable and accurate prediction. So, this will provide
you the approximate selling price for your car based on the car company,model of the car,fuel type,
years of service, kilometres driven etc.
Used car price prediction can be used by giving the dataset to the model so that it can predict the actual
price prediction. This model can be built by using machine learning algorithms.
Here we have choosen the Linear regression model and trained our machine.In case of manual system,
they need lot of time to analyse the complete data set. Here almost all work is computerized, so the
accuracy is also maintained.
vi
CONTENTS
S No PAGE NO
1.Introduction 1
1.1 Motivation 1
1.2 Problem Definition 2
1.3 Objective of the project 2
2.Literature Survey 3
3.Analysis 4
3.1 Existing System 4
3.2 Proposed System 4
3.3 Software requirement specification 5
3.3.1 Functional requirements
3.3.2 Non-Functional requirements
4.Diagrams 6
4.1 UML Diagrams 6
5. Implementation 12
5.1 Software Environment 12
5.2 Module Description 13
5.3 DataSet 14
5.4 Sample code 15
6.Test cases 23
7.Screenshots 25
8.Conclusion 27
9. Future Enhancement 28
10.Bibiliography 29
vii
1.INTRODUCTION
The increased prices of new cars and the financial incapability of the customers to buy them,used Car
sales are on a global increase. Therefore, there is an urgent need for a Predicting True Value of Used
Cars Using Machine Learning Technique which effectively determines the worthiness of the car using
a variety of features. Determining whether the listed price of a used car is a challenging task, due to the
many factors that drive a used vehicle’s price on the market. The focus of this project is developing
model that can accurately predict the price of a used car based on its features.When compared with
Artificial Intelligence the goal of Machine learning is to allowmachines to learn from data so that they
can give accurate output. But in Artificial Intelligence, we make intelligent systems to perform any task
like a human. Therefore, we use Machine Learning, so that we teach machines with data to perform a
particular task and give an accurate result. In this project we implement and evaluate machine learning
methods on a dataset consisting of the car prices of different makes and models. We used python as the
base language to implement the model. Here, we also used some Python libraries which provides base-
level items because Python code is concise and readable even to new developers, which is beneficial to
machine. As Machine learning requires continuous data processing, and Python libraries allow you to
access, process, and transform your data. These are some of the libraries we used in this project:
Scikit-learn: for handling basic ML algorithms like clustering, linear and logistic
regressions,regression, classification, and others.
Pandas: for high-level data structures and analysis. It allows merging and filtering of data.
Matplotlib: for creating 2D plots, histograms, charts, and other forms of visualization.
We will compute the performance of machine learning algorithm using Linear Regression and predict
the best out of it. Depending on various parameters we will determine the price of the car. Regression
Algorithm are used because they provide us with continuous value as an output and not a categorized
value because of which it will be possible to predict the actual price a car rather than the price range of
a car. As a result, we offer a Machine Learning- based methodology for predicting the prices of second-
hand cars based on their characteristics.
1.1 MOTIVATION:
Almost everyone wants their own car these days, but because of factors like affordability or economic
conditions, many prefer to opt for pre-owned cars. Accurately predicting used car prices requires expert
knowledge due to the nature of their dependence on a variety of factors and features. Used car prices
are not constant in the market, both buyers and sellers need an intelligent system that will allow them to
1
predict the correct price efficiently. In this intelligent system, the most difficult problem is the
collection of the dataset which contains all important elements like the manufacturing year of the car,
its gas type, its condition, miles driven, horsepower, doors, number of times a car has been painted,
customer reviews, the weight of the car, etc. The price of the product is affected by many factors, but
unfortunately, information about these features is not always readily available. Since this project
primarily focuses on the specific dataset, the benchmark dataset containing all key features is scraped.
It is necessary to pre-process, and transform collected data in the proper format prior to feeding it
directly to the machine learning model. As a first step, the dataset was statistically analysed and plotted.
Missing, duplicated, and null values were identified and dealt with. Features were chosen and extracted
using 2 correlation matrices. To build an efficient model, the most correlated features were retained,
and others were discarded. This prediction problem can be considered a regression problem since it
belongs to the supervised learning domain. Here we used the Linear regression algorithm for the
prediction
The purpose of this study is to understand and evaluate used car prices, and to develop a model or a
strategy that utilizes Machine learning techniques to predict used car prices.
Deciding whether a used car is worth the posted price when you see listings online can be difficult.
Several factors, including mileage, make, model, year, etc. can influence the actual worth of a car.
From the perspective of a seller, it is also a dilemma to price a used car appropriately. Based on
existing data, the aim is to use machine learning algorithms to develop models for predicting used car
prices.
2
2.LITERATURE REVIEW
We have revised several papers and articles based our project named “Predicting True Value of Used
Cars Using Machine Learning Technique”.
The first paper is “Predicting the price of Used Car Using Machine Learning Techniques”.In this
paper, they investigate the application of supervised machine learning techniques to predict the price of
used cars in Mauritius. The predictions are based on historical data collected from daily newspapers.
Different techniques like multiple linear regression analysis, k-nearest neighbours, naïve bayes and
decision trees have been used to make the predictions..
The Second paper is “Car Price Prediction Using Machine Learning Techniques”. Considerable
number of distinct attributes are examined for the reliable and accurate prediction. To build a model for
predicting the price of used cars in Bosnia and Herzegovina, they have applied three machine learning
techniques (Artificial Neural Network, Support Vector Machine and Random Forest).
The Third paper is “Price Evaluation model in second hand car system”. In this paper, the price
evaluation model based on big data analysis is proposed, which takes advantage of widely circulated
vehicle data and a large number of vehicle transaction data to analyze the price data for each type of
vehicles by using the optimized BP neural network algorithm. It aims to establish a second-hand car
price evaluation model to get the price that best matches the car.
3
3.REVIEW
In the existing system, to predict the price of four-wheeler, a lot of Machine Learning algorithm were
widely used. The major drawback of this existing system is they need more attributes in order to predict
the vehicle price. It is highly complicated to get sufficient data sets that were spread widely all over the
world. The datasets can be collected only through online. But not on the offline mode. The data sets
will not have about the vehicles which were not used for long time and also the traditional model
vehicles may or may not be included in the data sets. The major drawbacks of existing system are the
system is very slow due to most of the works about the keyword query just analyse individual points,
and they are inappropriate to many applications that call for analysis of groups of different car points.
Based on the varying features and factors, and with the help of expert’s knowledge the
vehicle price prediction has been done accurately. The most necessity ingredient for
prediction is brand and model, period usage of vehicle, mileage of vehicle. The fuel type used in the
vehicle as well as fuel consumption per mile highly affect price of a vehicle due to a frequent change in
the price of a fuel. Different features like exterior color, door number, type of transmission,
dimensions, safety, air condition, interior, whether it has navigation or not will also influence the
vehicle price. In this project, we applied different methods and techniques in order to achieve higher
precision of the used vehicle price prediction.
Advantages
The system is more effective since it measures the vehicle combinations by their prices.
The system is predicted accurately due to Linear regression
Hardware requirements:
Software requirements:
5
4.DIAGRAMS
6
CLASS DIAGRAM
Class diagrams are one of the most widely used diagrams. It is the backbone of all the object-
oriented software systems. It depicts the static structure of the system. It displays the system's class,
attributes, and methods. It is helpful in recognizing the relation between different objects as well as
classes.
7
SEQUENCE DIAGRAM
The sequence diagram represents the flow of messages in the system and is also termed as an event
diagram. It helps in envisioning several dynamic scenarios. It portrays the communication between any
two lifelines as a time-ordered sequence of events, such that these lifelines took part at the run time. In
UML, the lifeline is represented by a vertical bar, whereas the message flow is represented by a vertical
dotted line that extends across the bottom of the page. It incorporates the iterations as well as
branching.
8
DEPLOYMENT DIAGRAM
It presents the system's software and its hardware by telling what the existing physical
components are and what software components are running on them. It produces information about
system software. It is incorporated whenever software is used, distributed, or deployed across multiple
machines with dissimilar configurations.
9
STATE MACHINE DIAGRAM
The state machine diagram is also called the State chart or State Transition diagram, which
shows the order of states underwent by an object within the system. It captures the software system's
behavior. It models the behavior of a class, a subsystem, a package, and a complete system. It tends out
to be an efficient way of modeling the interactions and collaborations in the external entities and the
system. It models event-based systems to handle the state of an object. It also defines several distinct
states of a component within the system. Each object/component has a specific state.
10
USECASE DIAGRAM
A Use Case Diagram is a visual representation in UML (Unified Modeling Language) that depicts the
interactions between actors (users or external systems) and a system to showcase the system's
functionality from a user's perspective. It provides a high-level view of the system's behavior, focusing
on what the system does rather than how it is implemented.
11
5.IMPLEMENTATION
A software development environment (SDE) is the collection of hardware and software tools a
system developer uses to build software systems. When you are developing software, you probably
don't want your users to see every messy part of your application creation process.The software
technology used in this project is python. Python is the fastest growing programming language.It
supports multiple programming paradigms,including structured,object-oriented and functional
programming.And it is dynamically-typed and garbage collected. It consistently ranks as one of the
most popular programming languages.It can be also used on a server to create web applications. It has a
huge number of libraries and frameworks. Python frameworks are no different they are a collection of
modules and packages. These frameworks automate common processes and implementation. For
instance, developers can focus on application logic rather than dealing with routinary processes.
The python libraries used are:
numpy
pandas
matplotlib
sklearn
seaborn
Numpy-
The name “Numpy” stands for “Numerical Python”. It is the commonly used library. It is a popular
machine learning library that supports large matrices and multi-dimensional data. It consists of in-built
mathematical functions for easy computations. Even libraries like TensorFlow use Numpy internally to
perform several operations on tensors. Array Interface is one of the key features of this library.
Pandas-
Pandas is a software library written for the Python programming language for data manipulation and
analysis. When we have to work on Tabular data, we prefer the pandas module. The powerful tools of
pandas are Data frame and Series. Pandas has a better performance when a number of rows is 500K or
more.
12
Matplotlib-
matplotlib() is a library function that is responsible for plotting numerical data. And that’s why it is
used in data analysis. It is also an open-source library and plots high-defined figures like pie charts,
histograms, scatterplots, graphs, etc.
Seaborn-
Seaborn is a library that uses Matplotlib underneath to plot graphs. It will be used to visualize random
distributions. It is used for data visualization and exploratory data analysis. Seaborn works easily with
data frames and the Pandas library. The graphs created can also be customized easily.
Python: Python is a widely-used programming language in the field of machine learning. It offers
various libraries and frameworks, such as scikit-learn, TensorFlow, and Keras, which provide tools for
data preprocessing, model development, and evaluation.
Jupyter Notebook: Jupyter Notebook is an open-source web application that allows you to create and
share documents containing code, visualizations, and explanatory text. It's commonly used for data
exploration, model development, and collaboration in machine learning projects.
Pandas: Pandas is a powerful data manipulation library in Python. It provides data structures and
functions for efficiently handling and analyzing structured data, such as CSV files or databases, which
are commonly used in used car price prediction projects.
Scikit-learn: Scikit-learn is a popular machine learning library in Python. It provides a wide range of
algorithms and tools for regression, classification, and other machine learning tasks. You can use it to
implement regression models for predicting used car prices.
XGBoost/LightGBM: XGBoost and LightGBM are gradient boosting frameworks that are highly
effective for regression problems. They provide optimized implementations of gradient boosting
algorithms, which can be used to build accurate used car price prediction models.
13
5.3 DATASET:
https://www.kaggle.com/datasets/sidharth178/car-prices-dataset
14
5.4 SAMPLE CODE
15
16
Code for website:
@app.route('/',methods=['GET','POST'])
def index():
companies=sorted(car['company'].unique())
car_models=sorted(car['name'].unique())
year=sorted(car['year'].unique(),reverse=True)
fuel_type=car['fuel_type'].unique()
companies.insert(0,'Select Company')
return render_template('index.html',companies=companies, car_models=car_models,
years=year,fuel_types=fuel_type)
@app.route('/predict',methods=['POST'])
@cross_origin()
def predict():
company=request.form.get('company')
car_model=request.form.get('car_models')
year=request.form.get('year')
fuel_type=request.form.get('fuel_type')
km_driven=request.form.get('kilo_driven')
17
dat=np.array([car_model,company,year,km_driven,fuel_type])
pred=model.predict(pd.DataFrame(columns=['name','company','year','kms_driven','fuel_type'],data=
dat.reshape(1,5)))
return str(np.round(pred,0)[0])
if _name=='main_':
app.run(debug=True)
<!DOCTYPE html>
<html lang="en">
<head xmlns="http://www.w3.org/1999/xhtml">
<meta charset="UTF-8">
<title>Car Price Predictor</title>
<link rel="stylesheet" href="static/css/style.css">
<link rel="stylesheet" type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.11.2/css/all.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/umd/popper.min.js"
integrity="sha384-
Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo"
crossorigin="anonymous"></script>
</head>
<body class="bg-dark">
<div class="container">
<div class="row">
18
<div class="card mt-50" style="width: 100%; height: 100%">
<div class="card-header" style="text-align: center">
<h1>Welcome to Car Price Predictor</h1>
</div>
<div class="card-body">
<div class="col-12" style="text-align: center">
<h5>This app predicts the price of a car you want to sell. Try filling the details below:
</h5>
</div>
<br>
<form method="post" accept-charset="utf-8" name="Modelform">
<div class="col-md-10 form-group" style="text-align: center">
<label><b>Select the company:</b> </label><br>
<select class="selectpicker form-control" id="company" name="company" required="1"
onchange="load_car_models(this.id,'car_models')">
{% for company in companies %}
<option value="{{ company }}">{{ company }}</option>
{% endfor %}
</select>
</div>
<div class="col-md-10 form-group" style="text-align: center">
<label><b>Select the model:</b> </label><br>
<select class="selectpicker form-control" id="car_models" name="car_models"
required="1">
</select>
</div>
<div class="col-md-10 form-group" style="text-align: center">
<label><b>Select Year of Purchase:</b> </label><br>
<select class="selectpicker form-control" id="year" name="year" required="1">
{% for year in years %}
<option value="{{ year }}">{{ year }}</option>
{% endfor %}
</select>
</div>
<div class="col-md-10 form-group" style="text-align: center">
19
<label><b>Select the Fuel Type:</b> </label><br>
<select class="selectpicker form-control" id="fuel_type" name="fuel_type"
required="1">
{% for fuel in fuel_types %}
<option value="{{ fuel }}">{{ fuel }}</option>
{% endfor %}
</select>
</div>
<div class="col-md-10 form-group" style="text-align: center">
<label><b>Enter the Number of Kilometres that the car has travelled:</b> </label><br>
<input type="text" class="form-control" id="kilo_driven" name="kilo_driven"
placeholder="Enter the kilometres driven ">
</div>
<div class="col-md-10 form-group" style="text-align: center">
<button class="btn btn-primary form-control" onclick="send_data()">Predict
Price</button>
</div>
</form>
<br>
<div class="row">
<div class="col-12" style="text-align: center">
<h4><span id="prediction"></span></h4>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
function load_car_models(company_id,car_model_id)
{
var company=document.getElementById(company_id);
var car_model= document.getElementById(car_model_id);
20
console.log(company.value);
car_model.value="";
car_model.innerHTML="";
{% for company in companies %}
if( company.value == "{{ company }}")
{
{% for model in car_models %}
{% if company in model %}
function form_handler(event) {
event.preventDefault(); // Don't submit the form normally
}
function send_data()
{
document.querySelector('form').addEventListener("submit",form_handler);
xhr.open('POST','/predict',true);
document.getElementById('prediction').innerHTML="Wait! Predicting Price.....";
xhr.onreadystatechange = function(){
if(xhr.readyState == XMLHttpRequest.DONE){
document.getElementById('prediction').innerHTML="Prediction: ₹"+xhr.responseText;"+xhr.responseText;
21
}
};
xhr.onload= function(){};
xhr.send(fd);
}
</script>
22
6.TEST CASES
Model Performance:
Test case: Provide a set of test samples with known prices and compare the predicted prices from your
model against the actual prices. Calculate and report the accuracy of the model predictions.
23
Model Persistence and Loading:
Test case: Save the trained model to disk and verify that it can be successfully loaded and used for
making predictions.
Real-Time Predictions:
Test case: Implement a mechanism to accept real-time input data (e.g., car features) and ensure that the
model can make accurate predictions in real-time scenarios.While selecting the schedule the user
doesn’t give any notes to the pickup person then it is asking that do you want to give any notes to the
pickup person or not.
24
7.SCREENSHOTS
25
26
8.CONCLUSION
In conclusion, the used car price prediction project involves building a machine learning model to
estimate the prices of used cars. By leveraging various software technologies and techniques, such as
Python, Jupyter Notebook, Pandas, scikit-learn, XGBoost/LightGBM, and Flask/Django, you can
develop a robust and accurate solution.
Throughout the project, it is essential to validate and preprocess the input data, split it into training and
testing sets, and train the machine learning model. Evaluating the model's performance using
appropriate metrics and test cases will help assess its accuracy and effectiveness.
Additionally, feature selection and handling outliers/anomalies are crucial steps to improve the model's
predictive power. It's important to consider the practicality and usefulness of the selected features and
ensure the model can handle unexpected data points.
Once the model is trained and evaluated, it can be persisted and loaded for real-time predictions.
Implementing a user-friendly interface, such as a web application using Flask or Django, allows users
to interact with the model and obtain price estimates based on input features.
Overall, the project aims to provide a valuable tool for estimating used car prices, facilitating decision-
making for buyers, sellers, and car enthusiasts. Continuous improvement and refinement of the model,
along with user feedback, can enhance its performance and make it more reliable over time.
27
9.FUTURE SCOPE
The future scope of a used car price prediction project can involve several potential areas of
improvement and expansion. Here are some possibilities to consider:
Advanced Machine Learning Techniques: Experiment with advanced machine learning algorithms and
techniques like ensemble methods, deep learning, or time-series analysis to potentially improve the
model's performance and ability to capture complex patterns in the data.
Expand to Other Vehicle Types: Consider extending the project to include price prediction for other
types of vehicles, such as motorcycles, trucks, or recreational vehicles. This expansion can broaden the
scope of the application and cater to a wider range of users.
28
10. BIBILOGRAPHY
PAPERS REFFERED:
"Predicting Used Car Prices with Machine Learning Techniques" by A. Sharma
"Car Price Prediction Based on Machine Learning Techniques" by R. Wang et al.
WEBSITES:
1.https://www.ijcaonline.org/archives/volume167/number9/noor-2017-ijca-914373.pdf
2. http://cs229.stanford.edu/proj2019aut/data/assignment_308832_raw/26612934.pdf
3. https://en.wikipedia.org/wiki/Linear_regression
4.https://www.ibm.com/in-en/topics/linear-regression#:~:text=Resources-,What%20is%20linear
%20regression%3F,is%20called%20the%20independent%20variable.
29