0% found this document useful (0 votes)
34 views26 pages

Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)

Data Collection and Preprocessing: We collected a dataset containing information about car features such as make, model, year, mileage, fuel type, engine size, transmission type, and number of doors. We preprocessed the data by handling missing values, outliers, and encoding categorical variables. Model Selection and Training: We selected the Linear Regression algorithm for its simplicity and interpretability. After splitting the data into training and testing sets, we trained the model using th

Uploaded by

YOGESHA KN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views26 pages

Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)

Data Collection and Preprocessing: We collected a dataset containing information about car features such as make, model, year, mileage, fuel type, engine size, transmission type, and number of doors. We preprocessed the data by handling missing values, outliers, and encoding categorical variables. Model Selection and Training: We selected the Linear Regression algorithm for its simplicity and interpretability. After splitting the data into training and testing sets, we trained the model using th

Uploaded by

YOGESHA KN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Used Car Price Prediction Using Machine Learning

A mini project report submitted by

VELURU RANJITH (URK18CS020)

in partial fulfillment for the award of the degree


of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
under the supervision of

Dr. Narmadha, Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


KARUNYA INSTITUTE OF TECHNOLOGY AND SCIENCES
(Declared as Deemed-to-be-under Sec-3 of the UGC Act, 1956)
Karunya Nagar, Coimbatore - 641 114. INDIA

March 2021

Page 1 of 26
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that the project report entitled, “Used Car Price Prediction Using Machine Learning”
is a bonafide record of Mini Project work done during the even semester of the academic year 2020-2021 by

VELURU RANJITH (Reg. No: URK18CS020)

in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology in Computer
Science and Engineering of Karunya Institute of Technology and Sciences.

Submitted for the Viva Voce held on _________________

Project Coordinator Signature of the Guide

Page 2 of 26
ACKNOWLEDGEMENT

First and foremost, I praise and thank ALMIGHTY GOD whose blessings have bestowed in me the

willpower and confidence to carry out my project.

I am grateful to our beloved founders Late. Dr. D.G.S. Dhinakaran, C.A.I.I.B, Ph.D. and Dr. Paul

Dhinakaran, M.B.A, Ph.D., for their love and always remembering us in their prayers.

I extend my thanks to our Vice-Chancellor Dr.P. Mannar Jawahar, Ph.D.and our Registrar Dr. Elijah

Blessing, M.E., Ph. D, for giving me this opportunity to do the project.

I would like to thank Dr. Prince Arulraj, M.E., Ph.D., Dean, School of Engineering and Technology

for his direction and invaluable support to complete the same.

I would like to place my heartfelt thanks and gratitude to Dr. J. Immanuel John Raja, M.E., Ph.D.,

Head of the Department, Computer Science and Engineering for his encouragement and guidance.

I feel it is a pleasure to be indebted to Mr. J. Andrew, M.E, (Ph.D.), assistant professor, Department of

Computer Science and Engineering, and Dr. Narmadha for their invaluable support, advice, and

encouragement.

I also thank all the staff members of the Department for extending their helping hands to make this

project a successful one.

I would also like to thank all my friends and my parents who have prayed and helped me during the

project work.

Page 3 of 26
ABSTRACT

Due to the on-going pandemic, most people are not ready to travel long distances mainly on
public transport. They are preferring to travel in their vehicles and the people who can not afford the new cars
planning to buy the used cars. And the other section people are ready to sell their cars and planning to buy a
new car. Therefore, the rise of used car sales is increasing exponentially. Most car sellers are taking advantage
of this situation by unrealistically pricing their vehicles based on the current demand.

So there arises a need for a model that can quote a price for a vehicle by considering its
features, to make informed purchases. The prediction of the price of used cars requires a lot of expertise in that
particular field. There are different attributes to be measured that help in considering the prediction price more
accurately and reliably. In this project, I used a supervised learning method namely Random Forest for the
prediction of the used cars. The main reason to choose this model is that it is showing a good accuracy rate
when compared to other regression models.

A Random Forest with more than 200 Decision trees was created for training the data. The
results gave me a training accuracy rate of nearly 95%, and the testing accuracy was found to be 84%. So, this
model can predict the price of cars accurately with the help of correlated factors in an accurate way.

Page 4 of 26
CONTENTS

Acknowledgment………………………………………………………………………………3
Abstract…………………………………………………………………………………….…..4

I. Introduction………………………………………………………………………………...8

1. Problem statement…………………………………………………………………..8
1.1. Motivation…………………………………………………………………...9
2. Project overview…………………………………………………………………….9
3. Objectives……………………………………………………………………………9
4. Scope of the work………………………………………………………………….10
4.1. The current situation………………………………………………………..10

II. Background Research……………………………………………………………………..11

5. Proposed Method……….……………………………………………………………12

III. Design And Analysis……………………………………………………………………...13

6. Functional Requirements…………………………………………………………….13
7. Data Requirements………….………………………………………………………..13
8.Tools used…………………………………………………………………………….14
9. Usability Requirements………………………………………………………………15
10. Use-Case Diagrams………………………………………………………………..…15

IV. Implementation…………………………………………………………………………….17
11. Dataset and Preprocessing……………………………………………………...……..17
12. Exploratory Data Analysis…………………………………………………….….…..17
13. Removing Outliers……………………………………………………………………20
14. Finding best Regression Model…………………………………………….…..……..21
15. User Interface…………………………………………………………………..……..22
V. Test Results/Verification……………………………………………………………………23
VI. Conclusions and Further Scope……………………………………………………...……25

Page 5 of 26
LIST OF TABLES
Table no. Page no.
Table.1. objectives………………………………………………………………… 9
Table 2. Preprocessing steps……………………………………………………… 17

Page 6 of 26
LIST OF FIGURES

Figure number Page. No.


Fig.1. Format of Data present in the dataset cars.csv………………….….. 14
Fig.2(a). User use case diagram…………………………………………… 16
Fig.2(b). Web API use case diagram……………………………………... 16
Fig.3. Comparison of manual & automatic cars…………………………… 18
Fig.4. Manual Type Car…………………………………………..……….. 18
Fig.5. Automatic Type car………………………………………………… 18
Fig.6. Owner Details….……..……………………………………………. 19
Fig.7. Year and Kilometer driven ………………………………………… 19
Fig.8. Year and Selling Price……………………………………………… 19
Fig.9. Grouped the data…..………………………………………………… 20
Fig.10. Selling Price ………………………………………………………. 20
Fig.11. 3D Image .…………………………………………………………. 21
Fig.12. Model prediction .…………………………………………………. 21
Fig.13. User Interface………………………………………………………. 22
Fig.14. Testing process….,…………………………………………………. 23
Fig.15. Result…. .…………………………………………………………. . 24

Page 7 of 26
I. INTRODUCTION

Predicting the price of a vehicle is a critical and important task as it is not coming from the factory directly in
this case. There is a rapid increase of 8% every year in the usage of used cars from 2013 as a study says. The
most important feature for a vehicle is the type of vehicle whether it is a manual or an automatic geared vehicle.
And the other feature is fuel type whether it is a petrol-based, Diesel based or a CNG vehicle.

Even some customers buy used cars just to get exemption from the taxes they have to pay if
they purchase a brand new car. So, as the price is increasing in the case of new cars and some customers can
not afford those vehicles due to lack of money, used car sales are on a global increase. Mainly, with the present
situation due to coronavirus pandemic not many people are showing interest in traveling public transport
services. So, the used car market is at an all-time high. Therefore, rises a necessity for a prediction system that
estimates the used car price efficiently.

In developed countries, already a lease system exists where a buyer buys a vehicle on lease
for some years and then he gives it back to the seller and he resells it again after the completion of their
agreement. So it has become an essential part of today’s world. It is not an easy task to predict the price with
minimal data. There are a variety of features of the vehicle that we have to check like the age of the car, car
model, braking system, number of kilometers that it has been covered, and many more.

So, in this project, I proposed a methodology by using the machine learning model namely Random Forest for
predicting the prices of a car with the features given in a dataset. The price will be estimated based on the
features present in the taken dataset. The total implementation of this project can be studied in the V module.

1. PROBLEM STATEMENT
Most people waste their time inquiring about the expected car price in and around their friend’s circle and their
associates manually. Even some websites can predict the price but it is not very accurate due to the
unavailability of feature data and Specifying them as the NULL value in the dataset or just dropping the feature
column. With this project, in no time people can access the website and insert their requirements, and can get a
predicted price of the car. Indeed, people who have very little technical knowledge can be able to reach the site.

Page 8 of 26
1.1. MOTIVATION

While in these difficult times most people showing interest to purchase a used car makes the market going well.
Sometimes, when we see listings online on some websites like cardekho.com we can’t say whether the price is
justifiable or not. Several factors including transmission type, kilometers covered, mileage, age of the vehicle,
year of the model, etc. can influence the price. Even for the seller, it seems difficult to quote a price
appropriately. Based on the given data, the main aim is to develop the models for predicting used cars.

2. PROJECT OVERVIEW

I am going to use the Jupyter IDE to code the machine learning method and when it comes to the web
application I am going to use the flask which is a web framework because it will be easy to go when building a
reliable, scalable, and maintainable web application and Json has been used for the server side. For the front of
the web application, I used HTML for the structure and CSS for the styling of the web page.

3. OBJECTIVES
The main objective of this project is to predict the used car price with regression techniques present in the
machine learning such as lasso, linear, multi-linear, decision tree and random forest regressors.

Table 1: Objectives
S.NO. Objectives
1 Justifying the Data and cleaning the Data
2 Performing the Exploratory Data Analysis
3 Performing the Machine learning models
4 Importing the prepared Model into the flask using pikle library
5 Reporting the solution using Web API

Page 9 of 26
4. SCOPE OF THE WORK

The predicted price will be helpful to the used cars market, which can be profitable for both the
buyer and seller. The Used car dealers are the most beneficial group through this project.
If they can understand the importance of features then they will consider this knowledge.
There are online pricing websites that offer an estimated amount of a car, even they may expect a better model
than their existing model or they even consider it as the second model which they prefer to use. Individual
people want to carry out their research through the web.
In the upcoming days, the market will grow larger than we expected. As this project can
be very useful and in no time it will get integrated into large businesses.

4.1. The current situation

Currently, many automated processes are available to predict the price of used cars. But the
efficiency and accuracy of their model might not be the greatest. Still, in some areas, people are heavily
dependent on used car dealers to find the best car in their budget and have to pay them. So, this model helps
them to cross-check the price that the dealer quotes. With that, the individual will be having a good idea about
the best price he can spend on the car.
Key factors that are driving this used car market growth include high disposable income,
rising demand for luxury cars, the short period of usage of cars by the previous owners, and increasing
preference of two-wheeler owners to get an upgrade to a compact car.

Page 10 of 26
II. BACKGROUND RESEARCH

Different approaches have been done by researchers more often predict prices of products using
some previous data and so did Pudaruth [1] who predicted prices of cars in Mauritius and these cars were not
new rather second-hand. He used multiple linear regression, k-nearest neighbors, naïve Bayes, and decision
trees algorithms to predict the prices. The comparison of prediction results from these techniques showed that
the prices from these methods are closely comparable. However, it was found that the decision tree algorithm
and naïve Bayes method were unable to classify and predict numeric values. Pudaruth’s research also
concluded that a limited number of instances in the data set does not offer high prediction accuracies [1].
Another similar research by Listiani [2] uses SVM (support vector machine) to predict the prices of
leased cars. It shows that SVM is more accurate than the multiple linear regression models which are used in
their model. But it is basically when a large database is available. It also handles the big data better than the
multiple linear regression. However, the technique does not show in terms of variance and mean standard
deviation why SVM is better than simple multiple regression.
Zhang et al. [3] used the kaggle data-set to perform price prediction of a used car. He
evaluated the performance of several classification methods to assess the performance Among all of the RFC
(random forest classifier) showed the best accuracy and proves to predict the price better than other methods.
As one of the researchers quoted in his paper that [4] they have used a neuro-fuzzy knowledge-
based system to demonstrate vehicle price prediction. By considering the following attributes such as brand,
year of production, and type of engine they predicted a model which has similar results as the simple regression
model. Moreover, they made an expert system named ODAV (Optimal Distribution of Auction Vehicles) as
there is a high demand for selling the vehicles at the end of the leasing year by vehicle dealers. This system
gives the best insights into the best prices for vehicles, as well as the location where the best price can be
gained. To predict the price of vehicles, the K – nearest neighbor machine learning algorithm has been used
which is based on regression models. More number of vehicles has been exchanged through this system so this
particular system is more successfully managed.
The report by Awad et al. [4] is more of an educational paper than a research paper. The author
reviews six most popular classification methods (Bayesian classification, ANNs, SVMs, k-NN, Rough sets, and
Artificial immune system) to perform a spam email classification task. The reason for choosing this paper is to
understand these popular classification models in detail, and its applicability to the spam email classification
problem since this paper gives much insight into each method. The main difference, however, between
classifying price range and spam mail, is that spam email classification task is a binary one, whereas our motive
is mainly one-vs-the-rest. The author uses Naive Bayes for classification which does not give accurate results
due to its major concern of feature dependency as pointed out by the author. Due to this reason, we also did not
Page 11 of 26
try to evaluate the performance of our data-set using Naive Bayes model since our dataset has heavily feature
dependency. To predict results with good accuracy, the author suggests a hybrid system which applies to our
work by using Random Forest. A manipulation of various decorrelated decision trees, the Random Forest gives
pretty good accuracy in comparison to prior work.
Work by Durgesh et al. [5] gives a good introductory paper on Support Vector Machine. The
authors assess the performance of several classification techniques (K-NN, RuleBased Classifiers, etc.) by
performing the comparative assessment of SVM with others. This comparative study is done using several data-
sets taken from the UCI Machine Learning Repository. This assessment yields that SVM gives much better
classification accuracy in comparison to others. This gives us a baseline for prediction of tasks by using a
simple linear model which gives good accuracy to let us use complex systems - random forest - which
ultimately provides pretty good results for prediction of the used-cars price.

5. THE PROPOSED WORK

The first step is to collect the data from the kaggle as they have a large section of datasets based
on used car models. There are features named fuel, transmission, selling_price_inr are some of the categorical
data present in the dataset. While there are some other features such as year are normal features.

The categorical features will be converted into one hot encoding with the help of get_dummies.
And by dropping some features such as model as there are a lot of models and classification gets difficult and
there was no proper usage of the car model in this project. By using Pearson correlation, we find how one
feature is related to the other. Then by using the train_test_split, the data will be formed. With the help of
GridSearchCV I found out the best model for the car price prediction and that is random forest

The benefit of Random Forest is that we do not have to scale this as in the decision tree it is
already scaled. By including the parameters there won’t be any overfitting (or) underfitting will takes place.
Based on the results, RandomForestRegressor gives us the best result hence we will use that model with some
features present in it. And by creating a pikle file we can able to deploy our project. So, with the help of flask
library I deployed the model.

In the next chapter, we will be seeing the analysis and design section of this project which will
help make the project a successful one.

Page 12 of 26
III. DESIGN AND ANALYSIS

The analysis of the requirements is a very critical process that enables the success of any project.
Certain requirements need to be fulfilled for this project such as functional requirements, data requirements,
usability requirements, and the look and feel requirements. When all the requirements are met the end-users
will feel more comfortable while accessing the web API. The design phase is an early stage in any project
phase. The project’s key features, structure, criteria for success, and major deliverables are all planned out. The
aim is to develop one or more designs that can be used to achieve the desireed project goals.

6. FUNCTIONAL REQUIREMENTS

The web API always provides the possibility for the user to enter the data in the data fields such as the price of
the car in lakhs, can be able to select the transmission type and fuel type and the year of the manufacturing of
the car and the other details. The response should be quick for the user. The web API always takes all the
information required from the user end.

The web API allows the user to use the prediciton analysis as many times as they wants
without any interference. For instance, a user will be trying to check the price with different values one after the
other. So, the web application will predict the price without any difficulty. The user has to fill all the fields in
order to get the price. I used the required field for all the text boxes in the HTML file, so the user has to fill all
the fields. Even if the user misses one field it gives the alert saying it is a required field. After giving all the
required input by pressing the button the user will be able to get the predicted price at the bottom of the web
page.

7. DATA REQUIREMENTS

The data is collected from the kaggle data repository, where we can find thousands of data sets that will
be helpful in our projects. So, the dataset that I used is the cars.csv file. The dataset consists of different
features of the car and nearly it has 280 rows which means 280 car details. Below we can see a few
details of how the data is presented in the dataset that I used in this project.

Page 13 of 26
Fig.1. Format of Data present in the dataset car-dataset.csv

This is the initial data present in the data set and I improvised the dataset
based on the project requirement at different stages. There are 8 columns with different features of car
present in each column. The total project is based on this dataset and its features. name, year,
selling_price, kms_driven, fuel, seller, transmission, and owner are the features of the used car.

8. TOOLS USED

Anaconda is the main software that I used to do this project. It is a distribution of the python and R
programming languages for scientific computing. It aims to simplify package management and deployment.
Spyder IDE used for the creation of backend(flask) of the web application. It will come along with
the Anaconda. NetBeans used to build the structure and css used for the look of the web API. Along with that.
JSON file also been used for storing and transporting the data from the server to a web page and viceversa as
well.
Project Jupyter is another IDE that I used for coding in python. It is mainly useful in developing
open-source software and services for interactive computing across dozens of programming languages.

Anaconda prompt is just like the command prompt, but it makes sure that you are able to use
Anaconda and conda commands from the prompt, without having to change directories or your path.

Page 14 of 26
9. USABILITY REQUIREMENTS

The web application will start up with a default front screen, that displays “predictive analysis”.
The information that is displayed will be helpful for the user to fill in the text boxes with appropriate details.
The user shall – with no training – be able to navigate into the web page and can
fill in the data and get the predictive price.The user should not feel that the webpage is for only experts. The
information present in the web application will be easily understandable by all users.
The web application should aim to meet the website requirements and it should be easily
accessible to all users.

10. USE CASE DIAGRAM

The use case diagrams are used to identify the functiionality of a system during the analysis phase of the
project.

User use case:


The user enters all the data and waits for the response from the web API.

Fig. 2(a). use-case diagram

Page 15 of 26
Web API Use case Diagram:
Th web API functions this way to give the prediction of the car cost to the user.

Fig. 2(b). use-case diagram for web api

In the next chapter we will look at the implementation phase of the project.

Page 16 of 26
IV. IMPLEMENTATION

11. Dataset and Preprocessing


To accurately predict the prices of used cars, I used an open dataset to train the model. I
used the ‘cars-dataset.csv’ from kaggle. The information about the dataset has already been given in the [8]
chapter. Additionally, I performed the following preprocessing steps on the dataset helped me to narrow down
the features.

Table2: Preprocessing Steps


# Steps performed to preprocess the data
1 Dropped the row name and seller columns which has little impact on the analysis
2 Rearranged the indexes of feature columns
3 Checked for null values
4 Checked for data consistency
5 Adding a new feature called seeling_price_inr which gives the price
6 Dropped the column ‘selling price’
7 Performed EDA

After the pre-processing, the final dataset contains 6 features for used cars -
Selling_Price_inr, Kms_Driven, Owner, year, Fuel, Transmission.

12. EXPLORATORY DATA ANALYSIS

After preprocessing is done, it is analyzed through visual exploration using the seaborn, understand the
diversity in the data and range in every field. When compared both manual and automatic cars with pair plot I
observed that automatic cars have higher price range than manual type cars though the distribution of automatic
cars were skewed to the right. We can also see the increase of automatic cars between 2015 and 2020. In this
instance, we can clearly see some outliers in selling price and kilometers driven. Since most of the cars are
based on pertrol or diesel I labelled other fuel types as others.

Page 17 of 26
Fig. 3. compariosn of manual and automatic cars

Mainly in the manual cars I observed that the diesel cars are sold at higher price than the petrol based cars and
the kilometers are covered highly by petrol cars. While in the automatic type cars, the scenario is quite
differnet. We can see that the selling price of diesel type cars in both manual and automatic were more spread
than petrol and other fuel hence getting higher average and range of selling price.

Fig. 4. Manual type car Fig. 5. Automatic type cars


Page 18 of 26
The cars sold by first owner have higher prices than the rest. Test drive cars tends to have a higher
price as well though there is a distinction between test drive cars in both manual and automatic.

Fig. 6. Owner Details


We can see that average kilometers driven rises from 1995 until 2005 and linearly goes down until 2020.
We can also see some outliers present in the distribution plot.

Fig. 7. Year and kilometer driven

We can see that selling price of manual cars grows lienarly each year whereas automatic cars have wavy
averages in each year but we can clearly see that selling prices linearly as well

Page 19 of 26
Fig. 8. year and selling price

13. REMOVING OUTLIERS

Fig. 9. Grouped the data


Clearly we have some outliers present in kilometers driven and selling price. We need to
remove these outliers by using IQR method. Removing outliers in selling price would be separated by
transmission type and will be done in each year. we can also safely remove the datapoints before 2005 since it
produces inconsistency and the data points below 2000 have low value count

.
Fig. 10. Selling prices
Now we will be removing some suspicious data points. Since the average selling price of cars
increases each year whereas average kilometer driven by the car should be atleast lower than the average
kilometer driven last year, we will be removing manual and automatic cars which kilometer driven is greater
than the average kilometer driven last year but having lower selling price compared to the average price last
year.
Now we might also consider removing some inconsistencies in selling price based on number of
previous owner. If the second owner offers lower selling price than the average selling price of third owner, we
will remove this data points. We will also do this with first and second owners based on year.

Page 20 of 26
We will also remove data points with low kilometers driven but also having low selling
price. We will be removing data points with kilometers driven one standard deviation below the mean and at
the same time the selling price below one standard deviation below the mean price. We will be removing data
points high kilometers driven and high selling price as well. Now we successfully removed all the outliers in
the dataset. Let’s see the data in 3D.

Fig. 11. 3D image

14. FINDING BEST REGRESSION MODEL

I used K fold cross validation to measure accuracy of our Linear regression model. I was getting low
score from linear regression. Let’s see if we can get better results from different regression models. Using
GridSearchCV I got the best accuracy rate for RandomForest Regressor.

Fig. 12. Model Prediction

Based on the results, RandomForestRegressor gives us the best results hence we will use that
model. Let's test our model and give it some features. Now we will be exporting our model into a pikle file,then
we will deploy using the flask web framework that can be written in python. Using JSON the data from server
will be accesable for the Web API.

Page 21 of 26
15. USER INTERFACE DESIGN

Fig. 13. User Interface

User interface (UI) design is the process designers use to build interfaces in software or
computerized devices, focusing on looks or style. Designers aim to create interfaces which users find easy to
use and pleasurable. Fig.12 shows the user interface for this project as the user can be easily select the options
from the drop down menus. I used html, css, and javascript to design the user interface.

Page 22 of 26
VI. TEST RESULTS/VERIFICATION

The Testing Phase plays an important role before the deployment of the project. During testing phase,
the components of the code will be tested individually. API testing is critical for automating testing because
APIs now serve as the primary interface to application logic and because GUI tests are difficult to maintain
with the short release cycles and frequent changes commonly used with Agile software development and
DevOps
Web API testing flow is quite simple with three main steps:
The best way to test the web based API is to send the request with necessary input data.
Get the response having the output data.

Fig.14. Testing process

Verify that the response returned as expected in the requirement.


API testing is a type of software testing that involves testing application programming interfaces (APIs)
directly and as part of integration testing to determine if they meet expectations for functionality, reliability,
performance, and security. Since APIs lack a GUI, API testing is performed at the message layer. So, I tested
with all the parameters given in the input data. With the rise in cloud applications and interconnect platforms,
API testing is a necessity. Many of the services that we use every day rely on hundreds of different
interconnected APIs, if any one of them fails then the service will not work.

Page 23 of 26
Fig.15. Result

The results are pretty accurate when compared to the pre-exisiting models as this project has the large
dataset and the features present in it makes the mode to predict more accurately. In, the machine learnng
supervised models, random forest has performed well when we compare with other models such as lasso ,linear
and decision tree. The parameters used in random forest are ‘criterion’, ‘mse’, ‘n_estimators’:21 and the score
obtained is 0.861908 and the second nearest model is decision tree had 0.837173. But when the size of dataset
increases the scores of the models will be increased as well.

Page 24 of 26
VI. CONCLUSIONS & FURTHER SCOPE

This project evaluates used car price prediction using kaggle dataset which gives an accuracy of 86% for
Random forest regression. The features which I used in this project are kms_driven, fuel_type, transmission,
price, year, dealer_type. This is not a simple task as it has a lot of features to take into considerations for
accurate prediction. The major step in this project is to collect and preprocess the data. The prediction can
always be increased by taking the larger dataset and also by using data cleaning processes. The proposed
system evaluated variables and selected the most relevant variables out of this dataset and reduced the
complexity of this model.
Keeping this model as a base, I intend to use some more advanced techniques like fuzzy logic and
genetic algorithms to predict car prices as our future work. I intend to develop a fully responsive web
application that even stores the data given by the user as well.
By creating a repository of used-cars with their prices,it will be helpful for customers to know the
price of a similar car using a recommendation engine, which I would like to work in the future.

Page 25 of 26
REFERENCES

1. Pudaruth, S. 2014.“Predicting the Price of Used Cars Using Machine Learning Techniques”, International
Journal of Information & Computation Technology,4(7), p.753-764

2. Listiani M. 2009. Support Vector Regression Analysis for Price Prediction in a Car Leasing Application.
Master Thesis. Hamburg University of Technology

3. Xinyuan Zhang , Zhiye Zhang and Changtong Qiu, “Model of Predicting the Price Range of Used Car”, 2017

4.Wu, et al, (2009). An expert system of price forecasting for used vehicles using adaptive neuro-fuzzy
inference.

5. Du et al, (2009). Practice Prize Paper—PIN Optimal Distribution of Auction Vehicles System: Applying
Price Forecasting, Elasticity Estimation, and Genetic Algorithms to Used-Vehicle Distribution

6. W.A. Awad and S.M. ELseuofi, “Machine Learning Method for SpamEmail Classification”, 2011

7. Durgesh K. Srivastava, Lekha Bhambhu, “Data Classification Method


using Support Vector Machine”, 2009

Page 26 of 26

You might also like