Used Car Price Prediction Using Different Machine Learning Algorithms
Used Car Price Prediction Using Different Machine Learning Algorithms
https://doi.org/10.22214/ijraset.2022.41300
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
Abstract: A car price prediction has been a high-interest research area, as it needed recognizable effort and knowledge of the
field expert. This paper mainly focuses on working of three different kind regression algorithms which are used to predict price
of a used car. In this project, We have Considered number of distinct attributes which are examined for the reliable and accurate
prediction. To build a model for predicting the price of used cars we have used three different kinds of machine learning
techniques which comes under supervised machine learning type of algorithm which are linear regression, lasso regression and
ridge regression respectively. we have used Python libraries to design GUI for our project and some other machine learning
related libraries like Numpy, Pandas, Sklearn etc. we have calculated and compared the accuracies of three machine learning
algorithms. The accuracies for linear regression, lasso and Ridge regression were 83.65%, 87.09% and 84.00% respectively. The
final main price is predicted according to lasso regression as it gives highest accuracy amongst three different algorithms.
Keywords: car price prediction, machine learning, Regression techniques, linear regression, lasso regression, ridge regression
I. INTRODUCTION
Car price prediction is anyhow interesting and popular problem. Accurate car price prediction involves expert knowledge, because
price usually depends on many unique features and factors. Generally, most important ones are brand name and model, years, KMs
driven and mileage. The fuel type used in the car as well as fuel consumption per mile highly affected price of a car due to often
changes in the price of a fuel. Distinct features like exterior color, door number, type of transmission, dimensions, safety, air
condition, interior, whether it has navigation or not will also results in the car price. In this paper, we applied dissimilar and unique
methodologies in order to achieve higher precision of car price like Car Model Name, Year, Selling, Price, Present Price, KMs
Driven, Fuel Type, Seller Type, Transmission Type, Owner Type. In this paper, we applied different methodologies and ML
techniques and models in order to accomplish higher precision of the used car price prediction.
This paper is organized in the following manner: Section II having related work in the field of price prediction of used cars. In
section III, the Existing system model is explained. Section IV explains the requirements and model of proposed system along with
various ML algorithms to verify their respective performances to predict the price of the used cars. Finally, in section V, a
conclusion of our work is given.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 773
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
Sameerchand Pudaruth[3] The Author has researched the application of supervised machine learning techniques to predict the price
of used cars in Mauritius. The predictions are based on historical data collected from daily newspapers. Different techniques like
multiple linear regression analysis, k-nearest neighbours, naïve bayes and decision trees are accustomed to make the predictions.
The predictions are then evaluated and compared so as tob search out those which offer the most effective performances. A
seemingly easy problem clothed to be indeed very difficult to resolve with high accuracy. All the four methods which are used
provided comparable performance.
K.Samruddhi, Dr. R.Ashok Kumar[4] During ths paper Authors proposed a supervised machine learning model using KNN (K
Nearest Neighbor) regression algorithm to research the price value of used cars. Authors have trained the model with data of used
cars which is collected from the Kaggle website. Through this experiment, the information was examined with different trained and
test ratios. As a result, the accuracy of the proposed model is around 85% and is fitted because the optimized model.
Nabarun Pal, Dhanasekar Sundararaman, Priya Arora, Puneet Kohli, Sai Sumanth Palakurthy [5] During this paper, Authors have
used supervised learning method namely Random Forest to predict the costs of used cars. The model has been chosen after careful
exploratory data analysis to work out the impact of every feature on price. A Random Forest with 500 Decision Trees were created
to train the data. From experimental results, the training accuracy was discovered to be 95.82%, and therefore the testing accuracy
was 83.63%. The model can predict the price value of cars accurately by choosing the fore most correlated features.
The Requirement analysis and model implementation of proposed system is briefly explained below.
A. Requirement Analysis
The Data required is collected from a kaggle.com. The following attributes were captured for each car: Car Name, Year of car
purchase, Present Price, Kilometers Drive, Fuel Type (0:Petrol, 1:Diesel, 2:CNG), Seller Type(0:Dealer, 1:Individual),Transmission
Type(0:Manual, 1:Automatic), Owner previous cars (yes/no) etc.
We have used python Tkinter Library for developing GUI (Graphical user Interface) for our system and libraries like pandas,
numpy, sklearn for data extraction and machine learning purposes.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 774
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
Since manual data collection is time consuming task, especially when there are numerous records to process, we have taken the car
dataset from kaggle.com. Data is Manually collected from kaggle.com.
In the training phase 75% of the total data is used and for testing purpose of machine learning model 25% of the total data is used.
The collected raw data set contains 302 samples. Since data is collected using kaggle.com, there are many samples that have only
sufficient number of attributes. The data is present in the CSV file. The Pandas is the python machine learning library which is used
to use the data in csv file to fit it in the machine learning model for training and testing purpose.
The Total Eight attributes related to used car were used for predicting selling price of the car in this project. The attributes like
Car_name, Year of car Purchase, Present_price (Present price of car), Kms_Driven (Total distance driven by the car in kilometers).
Fuel Type(Fuel Type of car), Seller type (whether the car is sell by dealer or by the individual owner),Transmission Type (whether
the car is Manual or automatic), Owner previous number of cars etc. were used fitting them into the three machine learning models
for training and testing purpose of the car. Then selling price of the car was predicted using the algorithm which achieves higher
accuracy out of three algorithms used (i.e Linear regression, Lasso regression, Ridge regression) based on the data which is given as
input by the user .
.
B. Model Implementation and Evaluation
Approach for car price prediction proposed in this paper is composed of several steps, shown in Fig. 3
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 775
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
The Above Diagram Shows Flowchart or processing steps in our project. So at very first step data is collected for project. For our
project we have taken the data related to car attributes in the csv form from kaggle.com which a data science learning website. We
have considered 302 samples of car data in our research. Then in next step data processing is done. We have python pandas module
to extract the data in csv from stored in an Excel sheet and used it further in our project for building the required machine learning
models.
Then by using the data in excel sheet training and testing of models is done. We Have used regression machine learning algorithms
for prediction purpose in our project. Three algorithms used in our project are Linear regression, lasso regression and ridge
regression respectively. Using SVM (Support Vector Machine) classifier the data is divided into two parts i.e. 75% of data is used
for training purpose and then 25% of data is used for testing purpose.
Then when user enters the car details on interface and inputs it to the system. Further system process on that input using the machine
learning models and in the final step output is produced and displayed based on the algorithm which achieves higher accuracy.
In our project, lasso regression algorithm achieved the highest out of three algorithms. Hence we have displayed the final output
based on lasso regression algorithm.
We have used three algorithms in our project which are Linear Regression, Lasso Regression, Ridge Regression respectively. The
detail explanation of Algorithms which are used in this project is given below:-
1) Linear Regression: Linear regression is one in all simplest and preferred Machine Learning algorithms. It’s a statistical method
that’s used for predictive analysis. Simple Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (x) variables, therefore called as linear regression. Since linear regression shows the linear
relationship, which suggests it finds how the value of the dependent variable is changing with respect to the value of the
independent variable.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 776
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
2) Lasso Regression: Lasso regression is another regularization technique to minimize the complexity of the model. It stands for
Least Absolute and Selection Operator. It’s almost like the Ridge Regression except that the penalty term contains only the
absolute weights rather than square of weights. Since it takes absolute values, hence, it can shrink the slope to 0. whereas Ridge
Regression can only shrink it just about to 0.It is called as L1 Regularization.
The equation for the cost function of Lasso regression will be:
Where,
Xij = Features of Y or Independent Variable
Yi = Dependent Variable
βi = Weights or Magnitude shows importance of a feature
λ = minimize the cross-validation prediction error rate.
The weight or Magnitude of features is selected based on importance of feature. selection of more features creates complexity and
reduces prediction accuracy. Hence some of the features in this technique are completely neglected for model evaluation.
In our project present_price(present of the car) and Kms_Driven (Distance driven in kilometers) are two most important features
hence there magnitude or importance factor is more than other attributes. Lasso regression achieves highest prediction accuracy
except other regression in our project
Hence, the Lasso regression can help us to reduce the overfitting in the model as well as the feature selection.
3) Ridge Regression: Ridge regression is one of the types of linear regression in which a small amount of bias is introduced in
order to get better long-term predictions. Ridge regression is a regularization technique, which is used to reduce the complexity
of the model. It is also called as L2 regularization. In this technique, the cost function is altered by adding the penalty term to it.
The amount of bias added to the model is called Ridge Regression penalty. We can calculate it by multiplying with the lambda
to the squared weight of each individual label.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 777
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
The equation for the cost function in ridge regression will be:
V. CONCLUSION
Car price prediction will be a challenging task because of the high number of attributes that should be considered for the accurate
prediction. The most important step within the prediction process is collection and preprocessing of the information. During the
research, Car data collected from kaggle.com is converted into CSV form and used for building the machine learning algorithms.
Three algorithms which are Linear, Lasso and Ridge Regression were utilized in this project. The data was divided into two parts
training and testing purpose by SVM classifier (Support Vector Machine).i.e. 75% of data was used for training purpose and 25% of
data was used for testing purpose of the machine learning. The accuracies of the three machine learning models were checked and
compared with one another. The Final result was predicted consistent with the algorithm which achieves higher accuracy. The main
drawback of this project was less number of records that have been utilized. As future work, we expect to collect more information
and to utilize further advanced developed methods like Random Forest, ANN (Artificial Neural Network), CNN (Convolutional
Neural Network) with a better user computer user interface experience.
REFERENCES
[1] Enis Gegic, Becir Isakovic, Dino Keco, Zerina Masetic, Jasmin Kevric. “Car Price Prediction using Machine Learning Techniques” TEM Journal Volume 8,
Issue 1, Pages 113-118, ISSN 2217-8309, DOI: 10.18421/TEM81-16, February 2019
[2] Pattabiraman Venkatasubbu, Mukkesh Ganesh “Used Cars Price Prediction using Supervised Learning Techniques” IJEAT ISSN: 2249 – 8958, Volume-9
Issue-1S3, December 2019
[3] Sameerchand Pudaruth “Predicting the Price of Used Cars using Machine Learning Techniques” January 2014
[4] K.Samruddhi, Dr. R.Ashok Kumar “Used Car Price Prediction using K-Nearest Neighbor Based Model” IJIRASE Volume 4, Issue 3, DOI:
10.29027/IJIRASE.v4.i3.2020.686-689, September 2020 September 2020
[5] Nabarun Pal, Dhanasekar Sundararaman, Priya Arora, Puneet Kohli, Sai Sumanth Palakurthy “How much is my car worth?” A methodology for predicting used
cars prices using Random Forest” FICC 2018
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 778