0% found this document useful (0 votes)
16 views

Car Price Prediction Using Machine Learning Algorithms

This document discusses predicting used car prices using machine learning algorithms. It describes using linear regression, ridge regression, and other algorithms like random forest and XGBoost regressors on a used car dataset. The goal is to build accurate models to predict car prices based on features and determine which algorithms perform best.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Car Price Prediction Using Machine Learning Algorithms

This document discusses predicting used car prices using machine learning algorithms. It describes using linear regression, ridge regression, and other algorithms like random forest and XGBoost regressors on a used car dataset. The goal is to build accurate models to predict car prices based on features and determine which algorithms perform best.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

10 VIII August 2022

https://doi.org/10.22214/ijraset.2022.46354
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com

Car Price Prediction Using Machine Learning


Algorithms
B V Raghurami Reddy1, Dr. K. Shanthi Sree2
1
Data Science, School of Information Technology, JNTUH

Abstract: Machine learning(ML) is an area of a AI that has been a key component of digitization solutions that have attracted
much recognition in the digital arena. ML is used everywhere from automating and do heavy tasks to offering intelligent insights
in every industry to benefit from it. The current world already using the devices that are suitable these problems. For example, a
wearable fitness tracker like Smart Band or a smart home assistant like Alexa, Google Home. However, there are many more
examples of machine learning in use.
In this project the task is to find out price of a used car. The cars dataset taken from Kaggle, where dataset contains used car
details (variables), Our task is to finds out which variables are significant in predicting the price of a used car and how well these
variables are important in predicting the price of a car. For this task we were using machine learning algorithms are linear
regression, ridge regression, lasso regression, K-Nearest Neighbors (KNN) regressor, random forest regressor, bagging
regressor, Adaboost regressor, and XGBoost.
The goal of this project is to build models on above mentioned machine learning algorithms on car dataset. We implement from
basic linear regression algorithm to some very good algorithms like Random Forest Regressor and XGBoost Regressor. This
project intends to point out the Random Forest and XGBoost Regressor models perform very well in regression problems.
Keywords: XGBoost Regression, Random Forest Regression (RFR), Linear Regression (LR).

I. INTRODUCTION
ML is part of AI that involves data and algorithms to design models, analyze and take decisions by themselves without the need for
human activity. It tells how computers work on their own with the help of previous experiences.
The main dissimilarity between regular system software and ML is that a human designer doesn’t give codes that instruct the
computer how to act in situations, instead, it has to train by a huge amount of data.
ML approaches are divided into Reinforcement Learning, Unsupervised Learning, Supervised Learning and depending on the
problem nature. Supervised Learning, there are two types. They are Regression and Classification.

II. PROBLEM STATEMENT


The used car market is a huge and important market for car manufacturers. The second-hand car market is also very likely linked to
new car sales. Selling used cars at new car retail and handling lease returns and fleet returns from car rental companies require car
manufacturers to be involved in the used car market.
Automakers face several problems in the used market. The deep mess in the world, the general problem of more people, increased
competition from other manufacturers and the trend toward electronic cars are just some of the factors that make it difficult to sell
used vehicles on the used car market, reducing sales margins. Automakers, therefore, require good decision support systems to
maintain the profit of the used car business. A core component of such a system is a predictive model that estimates the selling
price based on vehicle attributes and other factors. Although previous studies have explored statistical modelling of resale costs, few
studies have attempted to predict resale costs with maximum accuracy to support decision making. As a result, the answers to the
following questions are unclear: i) how predictable are resale prices, ii) the relative accuracy of various forecasting methods, and
whether some methods are particularly effective. iii) Given those market research agencies specialize in estimating residual values,
does it makes sense for automakers to invest in their resale cost prediction models? The purpose of this work is to provide more
accurate answers to those questions. The present project comes under the Regression category. This project is all about predicting
the used car's prices. In our day to life, everyone wants a car, but budget is the problem, so in this project build a model that will
take certain parameters as arguments and result or predict the price of the car based on given parameters. This project's goals are to
build a machine learning model which takes car features as input and predicts the cost of the reused car. Compare the most used
machine learning regression models which give less error and predict the more accurate value of the price of the car.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1093
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com

III. PROPOSED WORK METHODOLOGY


There are two phases in the build a model:
Training: The model is trained by using the data in the dataset and fits a model based on the model algorithm chosen accordingly.
Testing: The model is provided with the inputs and is tested for its accuracy. Afterwards, the data that is used to train the model or
test it, has to be appropriate. The model is built to detect and predict the cost of a used car and good models must be selected.

A. Architecture

Acquire Dataset Pre-processing & Data


Cleaning

Deploy Best Model


Explorative Data
Analysis

Comparison of ML
Algorithms Scores ML Regression
Algorithms

Fig: 1 Architecture of the Proposed System

B. Sample Dataset
The dataset is taken from Kaggle. Take look into sample dataset below.

Fig: Sample dataset

Sample dataset have variables like id, name, year, model, condition, cylinders, fuel type, Odometer, seats, car type, colour, selling
price.
IV. IMPLEMENTATION
A. Linear Regression
LR is used to predict the value of a variable based on the value of another feature. The feature you want to predict is called the
dependent variable. The label that is used to predict the value of another feature is called the independent variable. The LR equation
is of the form A = m + nB, where B is the independent variable, A is the dependent variable, a is the intercept y, and n is the slope of
the line.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1094
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com

Linear regression's important features are:

Graph: 1 Linear regression important features


Results of liner regression are:

B. Ridge Regression
The Ridge regression model is used for analyse any data which faces the multicollinearity problem. This model performs a
regularization, particularly L2 regularization. When the problem of multicollinearity comes, the least squares are unbiased and the
variances are large, resulting in the predicted outcomes being different from the original outcomes. The cost function for ridge
regression:
Min (||Y – X(theta)||^2 + λ||theta||^2)
Here the penalty term is Lambda. Lambda is denoted by the symbol λ. So, we control the penalty by changing the alpha values. The
more the alpha values, the greater the error and thus the magnitude of the coefficients decreases. It reduces the parameters.
Therefore, ridge regression is used to stop multicollinearity and decreases the model complexity by reducing the coefficients.
Results of Ridge Regression are:

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1095
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com

C. Lasso Regression
"LASSO" means least absolute shrinkage and selection, operator. It is a type of linear regression that uses decline. The decline
means the data values will decrease towards a central point, such as the mean. It supports simple, sparse models. This kind of
regression is useful for algorithms with a more degree of multicollinearity.
Results of Lasso Regression are:

D. KNN Regressor
KNN regression is a nonparametric technique that approximates the relationship between an independent variable and a continuous
outcome by averaging observations in the same neighbourhood. The size of k should be specified by the analyst. Alternatively, we
can choose by cross-validation to choose the size that will minimize the mean squared error.
Results of KNN Regressor are:

E. Random Forest Regressor


Random Forest Regression comes under a Supervised Machine Learning algorithm which uses an ensemble model for regression.
An ensemble learning is a technique that clubs the outcomes of different machine learning models for creating more
good predictions than one model.
Results of Random Forest Regressor are:

Random forest important variables are:

Graph: 2 Random forest important variables

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1096
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com

F. Bagging Regressor
Bagging is short for Bootstrap Aggregating. It uses bootstrap resampling to train multiple models on random variations of the
training set. At prediction time, each item's predictions are aggregated to give the final predictions. Bagged decision trees are
efficient because each decision tree is suitable for a slightly different training data set, this allows each tree to
have subtle differences and make slightly different skill predictions.
Results of Bagging Regressor are:

G. Adaboost Regressor
AdaBoost model is a very short single-level decision tree. At first, it models a weak learner and gradually adds it to an ensemble.
Each next model will try to modify the predictions or outcomes of the previous models in a series. It is achieved by weighting the
train data to focus more on train examples in which old models made prediction errors.
Results of Adaboost regressor are:

Adaboost important features are:

Graph: 3 Adaboost important features

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1097
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com

H. XGBoost
XGBoost is a short name for Extreme Gradient Boosting, designed by researchers at Washington University. This library was
written in C++. It optimizes gradient boosting training. XGBoost is a family of gradient-boosted decision trees. With this
model, the decision tree is built in sequential form. In XGBoost weights play a crucial role. Weights are allocated to all independent
variables after that they are fed into a decision tree that predicts outcomes. The weight of the variables that the model predicted
wrongly is increased and these variables are then input to the second decision tree model. These individual predictors/trees are then
grouped to provide a stronger and more accurate model.
Results of XGBoost Regressor are:

XGBoost important features are:

Graph: 4 XGBoost important features

V. RESULTS
By the results above, here XGBoost Regressor gives more accuracy and next is Random Forest Regressor. Therefore, here I can
conclude that for regression problems XGBoost and Random Forest Algorithms give more accurate results when compared to
Linear, KNN and other Regressions.
Name MSLE R2 Score
Linear Regression 0.00241616 0.625564
Ridge Regression 0.00241616 0.625565
Lasso Regression 0.00241611 0.625575
KNN Regressor 0.00122466 0.817600
Random Forest Regressor 0.00061121 0.911812
Bagging Regressor 0.00117784 0.826754
Adaboost Regressor 0.00063733 0.906683
XGBoost Regressor 0.00051308 0.925595
Table: Results

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1098
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII Aug 2022- Available at www.ijraset.com

Graph: 5 Comparison of results

VI. CONCLUSION
In this project XGBoost Regressor produces more accuracy and next is Random Forest Regressor. This is because XGBoost take
advantage of week learners and gradually learn but Random Forest build different trees without communicating with other learners.

REFERENCES
[1] Advances and Applications in Mathematical Sciences Volume 20, Issue 3, January 2021, Pages 367-375 © 2021 Mili Publications
[2] International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8, Issue-5S, January 2020
[3] International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-9 Issue-1S3, December 2019
[4] Journal of Multidisciplinary Developments. 6(1), 29-43, 2021 e-ISSN: 2564-6095 Predicting Used Car Prices with Heuristic Algorithms and Creating a New
Dataset – Bilen
[5] International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 7 (2014), pp. 753-764 © International Research
Publications House http://www. irphouse.com
[6] Hands-On Machine Learning with Scikit-Learn, Keras, TensorFlow, 2nd Edition, by Aurelien Geron (O’Reilly).

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1099

You might also like