Car Price Prediction Using Machine Learning Algorithms
Car Price Prediction Using Machine Learning Algorithms
https://doi.org/10.22214/ijraset.2022.46354
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com
Abstract: Machine learning(ML) is an area of a AI that has been a key component of digitization solutions that have attracted
much recognition in the digital arena. ML is used everywhere from automating and do heavy tasks to offering intelligent insights
in every industry to benefit from it. The current world already using the devices that are suitable these problems. For example, a
wearable fitness tracker like Smart Band or a smart home assistant like Alexa, Google Home. However, there are many more
examples of machine learning in use.
In this project the task is to find out price of a used car. The cars dataset taken from Kaggle, where dataset contains used car
details (variables), Our task is to finds out which variables are significant in predicting the price of a used car and how well these
variables are important in predicting the price of a car. For this task we were using machine learning algorithms are linear
regression, ridge regression, lasso regression, K-Nearest Neighbors (KNN) regressor, random forest regressor, bagging
regressor, Adaboost regressor, and XGBoost.
The goal of this project is to build models on above mentioned machine learning algorithms on car dataset. We implement from
basic linear regression algorithm to some very good algorithms like Random Forest Regressor and XGBoost Regressor. This
project intends to point out the Random Forest and XGBoost Regressor models perform very well in regression problems.
Keywords: XGBoost Regression, Random Forest Regression (RFR), Linear Regression (LR).
I. INTRODUCTION
ML is part of AI that involves data and algorithms to design models, analyze and take decisions by themselves without the need for
human activity. It tells how computers work on their own with the help of previous experiences.
The main dissimilarity between regular system software and ML is that a human designer doesn’t give codes that instruct the
computer how to act in situations, instead, it has to train by a huge amount of data.
ML approaches are divided into Reinforcement Learning, Unsupervised Learning, Supervised Learning and depending on the
problem nature. Supervised Learning, there are two types. They are Regression and Classification.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1093
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com
A. Architecture
Comparison of ML
Algorithms Scores ML Regression
Algorithms
B. Sample Dataset
The dataset is taken from Kaggle. Take look into sample dataset below.
Sample dataset have variables like id, name, year, model, condition, cylinders, fuel type, Odometer, seats, car type, colour, selling
price.
IV. IMPLEMENTATION
A. Linear Regression
LR is used to predict the value of a variable based on the value of another feature. The feature you want to predict is called the
dependent variable. The label that is used to predict the value of another feature is called the independent variable. The LR equation
is of the form A = m + nB, where B is the independent variable, A is the dependent variable, a is the intercept y, and n is the slope of
the line.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1094
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com
B. Ridge Regression
The Ridge regression model is used for analyse any data which faces the multicollinearity problem. This model performs a
regularization, particularly L2 regularization. When the problem of multicollinearity comes, the least squares are unbiased and the
variances are large, resulting in the predicted outcomes being different from the original outcomes. The cost function for ridge
regression:
Min (||Y – X(theta)||^2 + λ||theta||^2)
Here the penalty term is Lambda. Lambda is denoted by the symbol λ. So, we control the penalty by changing the alpha values. The
more the alpha values, the greater the error and thus the magnitude of the coefficients decreases. It reduces the parameters.
Therefore, ridge regression is used to stop multicollinearity and decreases the model complexity by reducing the coefficients.
Results of Ridge Regression are:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1095
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com
C. Lasso Regression
"LASSO" means least absolute shrinkage and selection, operator. It is a type of linear regression that uses decline. The decline
means the data values will decrease towards a central point, such as the mean. It supports simple, sparse models. This kind of
regression is useful for algorithms with a more degree of multicollinearity.
Results of Lasso Regression are:
D. KNN Regressor
KNN regression is a nonparametric technique that approximates the relationship between an independent variable and a continuous
outcome by averaging observations in the same neighbourhood. The size of k should be specified by the analyst. Alternatively, we
can choose by cross-validation to choose the size that will minimize the mean squared error.
Results of KNN Regressor are:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1096
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com
F. Bagging Regressor
Bagging is short for Bootstrap Aggregating. It uses bootstrap resampling to train multiple models on random variations of the
training set. At prediction time, each item's predictions are aggregated to give the final predictions. Bagged decision trees are
efficient because each decision tree is suitable for a slightly different training data set, this allows each tree to
have subtle differences and make slightly different skill predictions.
Results of Bagging Regressor are:
G. Adaboost Regressor
AdaBoost model is a very short single-level decision tree. At first, it models a weak learner and gradually adds it to an ensemble.
Each next model will try to modify the predictions or outcomes of the previous models in a series. It is achieved by weighting the
train data to focus more on train examples in which old models made prediction errors.
Results of Adaboost regressor are:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1097
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com
H. XGBoost
XGBoost is a short name for Extreme Gradient Boosting, designed by researchers at Washington University. This library was
written in C++. It optimizes gradient boosting training. XGBoost is a family of gradient-boosted decision trees. With this
model, the decision tree is built in sequential form. In XGBoost weights play a crucial role. Weights are allocated to all independent
variables after that they are fed into a decision tree that predicts outcomes. The weight of the variables that the model predicted
wrongly is increased and these variables are then input to the second decision tree model. These individual predictors/trees are then
grouped to provide a stronger and more accurate model.
Results of XGBoost Regressor are:
V. RESULTS
By the results above, here XGBoost Regressor gives more accuracy and next is Random Forest Regressor. Therefore, here I can
conclude that for regression problems XGBoost and Random Forest Algorithms give more accurate results when compared to
Linear, KNN and other Regressions.
Name MSLE R2 Score
Linear Regression 0.00241616 0.625564
Ridge Regression 0.00241616 0.625565
Lasso Regression 0.00241611 0.625575
KNN Regressor 0.00122466 0.817600
Random Forest Regressor 0.00061121 0.911812
Bagging Regressor 0.00117784 0.826754
Adaboost Regressor 0.00063733 0.906683
XGBoost Regressor 0.00051308 0.925595
Table: Results
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1098
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com
VI. CONCLUSION
In this project XGBoost Regressor produces more accuracy and next is Random Forest Regressor. This is because XGBoost take
advantage of week learners and gradually learn but Random Forest build different trees without communicating with other learners.
REFERENCES
[1] Advances and Applications in Mathematical Sciences Volume 20, Issue 3, January 2021, Pages 367-375 © 2021 Mili Publications
[2] International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8, Issue-5S, January 2020
[3] International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-9 Issue-1S3, December 2019
[4] Journal of Multidisciplinary Developments. 6(1), 29-43, 2021 e-ISSN: 2564-6095 Predicting Used Car Prices with Heuristic Algorithms and Creating a New
Dataset – Bilen
[5] International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 7 (2014), pp. 753-764 © International Research
Publications House http://www. irphouse.com
[6] Hands-On Machine Learning with Scikit-Learn, Keras, TensorFlow, 2nd Edition, by Aurelien Geron (O’Reilly).
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1099