0% found this document useful (0 votes)
57 views8 pages

33 Submission

This study used machine learning techniques to predict used car prices more accurately. The researchers collected a dataset from Kaggle containing factors about used cars. They performed data preparation including feature engineering, label encoding, and dealing with non-numeric values. Exploratory data analysis helped understand variable distributions and relationships. Regression models like random forest, linear regression, and bagging regression were trained and tested on preprocessed data to identify the most predictive features and achieve the best price estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views8 pages

33 Submission

This study used machine learning techniques to predict used car prices more accurately. The researchers collected a dataset from Kaggle containing factors about used cars. They performed data preparation including feature engineering, label encoding, and dealing with non-numeric values. Exploratory data analysis helped understand variable distributions and relationships. Regression models like random forest, linear regression, and bagging regression were trained and tested on preprocessed data to identify the most predictive features and achieve the best price estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Car Price Prediction using Regression Model

It is difficult to forecast used car prices with any degree of accuracy because there are so many
variables that affect a car's cost, including features, mileage, and condition. Regression models
and machine learning techniques are used in this study to forecast used car prices, addressing
this challenge. With a focus on openness and well-informed decision-making, the study
includes dataset selection, model comparison, data pre-processing, model development,
training, testing, and outcome analysis. The main goal is to create a used car valuation model
that is beneficial to both buyers and sellers and is dependable and efficient. This study uses
cutting edge techniques to improve the accuracy of price predictions and expand knowledge
about the used car market.

Because they have a greater residual


In the car business, price forecasting is a value, used automobiles have become
complicated and hotly contested topic. more and more popular and are a good
Accurately estimating the cost of a used choice for individuals looking for their first
automobile is difficult due to several or personal vehicle. After leaving the
factors, such as features, mileage, and showroom, new automobiles lose around
general condition. Interestingly, new data 10% of their original value due to rapid
from Mark Lines Automotive shows that depreciation. But it might be difficult to find
sales of passenger and commercial a trustworthy second-hand car at a fair
vehicles in India have significantly price. Accurate used automobile pricing is
increased year over year, suggesting that frequently difficult for both buyers and
people are becoming more and more sellers to find. False mileage estimates are
accustomed to owning personal vehicles. one kind of deceptive behaviour that
contributes to the uncertainty surrounding regressor) to a benchmark dataset to
second-hand car pricing. estimate second-hand car prices. The
Furthermore, pricing differences for the outcomes demonstrated that the Random
same model are caused by differing tastes Forest Regressor, Bagging Regression,
and assessments of particular automobile and Linear Regression outperformed
qualities. Because of this, estimating the others in this context.
cost of a secondhand automobile is still a
difficult task. Ashutosh Datt Sharma and Vibhor
Sharma conducted a study in 2020,
investigating the relationship between
Several recent research have focused on selling price and various factors such as
predicting used car prices using machine current pricing, the number of drivers, age,
learning techniques and different and prior ownership. They found positive
regression models. This section provides a correlation with current pricing but negative
summary of the main research findings in correlation with the number of drivers and
this subject, arranged according to the year age. Additionally, the study highlighted that
of publication.
car dealers tend to sell vehicles at higher
The authors of the study from 2022 prices compared to individuals.
created models for used car pricing using
SVM, random forest, and linear regression. In 2019, the authors used
Their data was gathered from the "Njukalo" regression trees and K-nearest neighbors
online store and subjected to statistical (KNN) to predict used automobile prices.
methods, such as Matlab Regression Their dataset included variables such as
Learner and SAS software, for analysis. miles, vehicle financing, year, and fuel type.
The study emphasised how crucial it is to The findings included mean square errors
take into account a variety of factors and for the regression tree and KNN models,
data characteristics when assessing the providing insights into their performance.
outcomes of regression analysis.
Noor and Jan (Bukvi'c, Paagi'c
In 2021, Abdulla AlShared krinjar, Fratrovi'c, & Abramovi'c, 2014)
compared three regressors (random forest employed multiple regression models to
regressor, linear regressor, and bagging model car pricing in their 2017 study. Their
dataset encompassed attributes like ➢
dimensions, volume, exterior color, release We started our investigation by using the
date, ad display, engine power, mileage, "Used Cars Price Prediction" dataset from
gearbox type, engine type, region, Kaggle[12], which has a thorough collection
registration, layout, version, make, and of factors related to used cars. To
model year. They achieved a remarkable guarantee data integrity and
98% estimation accuracy rate. appropriateness for analysis, however,
several data preparation procedures were
A 2014 research used a variety of carried out before we could construct our
machine learning techniques, such as, prediction models.
Naive Bayes , K-nearest neighbours, ➢
various regression models and decision Using the pandas package, which offers a
trees to forecast automobile prices in handy platform for data manipulation and
Mauritius. For their examination, the analysis, we first loaded the dataset.
researchers collected data from ➢ :
newspapers. Feature engineering was a crucial stage in
This earlier research, which looked at the data preparation process. We included
different regression models, machine a new element, 'CarAge,' which indicates
learning strategies, and other approaches, the age of the automobile, to improve our
have made a substantial contribution to our prediction algorithms. There is a distinction
understanding of used vehicle price between the year of manufacture and the
prediction. The objective of this research is present automobile. We also eliminated the
to expand on the current understanding by "Year" column from the dataset at the same
identifying suitable models, analysing time as it was no longer required for our
pertinent factors, and improving forecast research.
precision. ➢ :
The LabelEncoder from scikit-learn was
used to encode categorical variables like
This section gives a summary of the "Owner_Type" , "Transmission" ,
approaches we utilised in our study to "Fuel_Type," and "Location" into numerical
anticipate used car prices using machine representations. This step guaranteed that
learning techniques. our regression models would use the
categorical data effectively.
➢ selected some models known for their
regression capabilities.
The presence of non-numeric characters in To facilitate model comparison and
numerical columns such as "Mileage," evaluation, we used the scikit-learn
"Engine," and "Power" required additional package for model training and
preprocessing. Regular expressions were assessment. First, we divided the dataset
used to extract numerical values from these into training and testing sets (70% and
columns, and the data type was 30%, respectively) to enable accurate
subsequently changed to float. model evaluation. Additionally, to ensure
➢ that numerical features were on a
comparable scale and no single feature
It is essential to comprehend the dominated the learning process, we applied
distribution of the variables and their MinMaxScaler separately to both the
relationships. To do this, we performed training and testing sets.
exploratory data analysis. For each model, we performed training,
- To visualize the distribution of the target prediction, and evaluation using metrics
variable 'Price,' we created a histogram including the R-squared (R²) score and
plot. Root Mean Squared Error (RMSE).
- To see how the variables related to one ➢
another, a correlation matrix was displayed. In our research, we loaded, preprocessed,
- The link between numerical features and visualised, and modelled our data using a
the goal variable, "Price," was visualised number of libraries, including seaborn ,
using scatter plots. numpy, matplotlib, pandas, and scikit-learn.
➢ ➢
Recursive Feature Elimination was utilised The performance of each model was
to identify the most pertinent features that assessed using the following metrics once
have a substantial impact on car price the regression models had been trained
forecasts. using our training data and had made
➢ predictions using our testing data.
Heart of our research involved training and
evaluating various regression models for The
used automobile price prediction. We average difference between expected and
actual car costs is measured by RMSE.
Improved model accuracy is indicated by
lower RMSE values.
This
quantifies the percentage of the variance of
the target variable that can be anticipated
by the independent variables. A better
model fit is indicated by a higher R2 score.
To evaluate each model's prediction
performance, we computed and contrasted
the RMSE and R2 score.
The results and insights gained from our
regression model evaluation are discussed
in the subsequent sections of this research
paper.

Here, we provide the results of our


investigation and examine how effectively
the different regression models predict
used automobile prices. We investigated
four different regression models: XGBoost,
Random Forest, SVR, and Linear
Regression. The performance of the
regression models was assessed using two
evaluation metrics: the root mean squared
error (RMSE) and the Rsquared (R2) score.
RMSE assesses the average difference
between the projected and actual prices,
whilst R2 score indicates the amount of
variance in the target variable that is
explained by the model.
Model RMSE Rsquare greater share of the variation in used car

score pricing.

Linear 7.1294 0.70


Regression
In this study, we employed machine
Random Forest 4.4058 0.89
learning regression models to forecast
Support 12.4275 0.90
used car prices. We performed a thorough
Vector
analysis on a dataset that included
Regression
information about the age, mileage, and
XBoost 3.6752 0.92 features of pre-owned vehicles. We used
the regression modelling techniques of svr,

R2 score and RMSE , the Random Forest xgboost, random forest, and linear

and XGBoost models scored better than regression.

the competition. These models showed


improved precision and a greater capacity Our findings show that the XGBoost model

to represent the intricate connections delivered the best performance, as

between the input data and the target measured by its low RMSE and high R2

variable. The RMSE is 7.1294 and the R2 score. It performed better than the other

value is 0.70. the linear regression model models at forecasting used car prices. The

performed admirably. Random Forest model likewise performed

With a comparatively high R2 value of 0.90, admirably, with a high R2 score and a

the SVR model was able to account for reasonably low RMSE. Despite having a

90% of the variance in the target variable. higher RMSE than the other models, the

However, compared to the other models, it SVR model earned a strong R2 score.

had a higher RMSE of 12.4275, indicating Although it performed reasonably, the

a greater discrepancy between the linear regression model had trouble

anticipated and actual prices. capturing complicated interactions.


The study's conclusions have effects on

With the lowest RMSE of 3.6752 and the both consumers and dealers of used cars.

greatest R2 score of 0.92, the XGBoost Buyers may negotiate fair pricing and make

model performed the best of all the models. informed judgements with the aid of

This demonstrates its capacity to make accurate price forecast. Greater accuracy

more precise predictions and account for a in market value estimation might help
sellers develop more effective pricing The dataset's quantity and diversity can be
strategies. augmented through data augmentation
techniques. This augmentation holds the
potential to strengthen model
Future research may explore various generalization and robustness. It may
avenues to further enhance the accuracy encompass methods for increasing dataset
and effectiveness of used car price size, including synthetic data generation, to
forecasting. These avenues include: ensure models can effectively capture the
complexities of the used car market.

There is an opportunity to delve deeper into


feature engineering techniques, thereby
refining the models. This exploration may The creation of models with real-time
involve incorporating domain-specific adaptability is a promising path in a market
knowledge and considering additional that is constantly changing. These models
factors, such as brand reputation, vehicle have the ability to continuously assimilate
condition, and specific engine new data, which allows them to quickly
combinations. The goal is to capture more adapt to changing market conditions and
nuanced aspects influencing used car fluctuations in the value of used cars. Price
prices. forecasts must be flexible enough to
change in real time while still being
accurate.
The quest for precision in price forecasting
may lead to the adoption of advanced
regression methods. Models such as Developing user-friendly mobile
ensemble methods and deep learning applications or interfaces is necessary to
approaches could offer a significant increase the practical impact of research
improvement in prediction accuracy. Their findings. These interfaces can give sellers
capability to unveil intricate relationships and buyers easily accessible tools for
and detect irregular data patterns makes quickly and easily estimating the price of
them promising candidates for future used cars. These kinds of programmes
research. encourage open markets and provide
consumers the ability to make wise 5) Pudaruth S., 2014. Predicting the Price of
choices. Used Cars using Machine Learning
Future research can help create an Techniques. ISSN 0974-2239 Volume 4,
ecosystem for used car pricing that is more Number 7 (2014), pp. 753-764.
precise and responsive by tackling these 6) Noor, K., Jan, S. (2017). Vehicle Price
research directions. Incorporating Prediction System using Machine Learning
sophisticated techniques, feature Techniques. International Journal of
engineering, data augmentation, and Computer Applications, 167(9), 27-31.
intuitive user interfaces can greatly assist 7) Tiwari S., Chandak A., Ganorkar P.,
dealers and customers in formulating their Sharma S., Bagmar A., (2019). Car Price
pricing plans and making decisions. Prediction Using Machine Learning.
Research paper Vol.-7, Issue5, May 2019
E-ISSN: 2347-2693.
1) Listiani, M. (2009). Support Vector 8) Sharma A., Sharma V., (2020). Used Car
Regression Analysis for Price Prediction in Price Prediction using Linear Regression
a Car Leasing Application. Thesis (MSc). Model. IRJMETS Vol. 02, Issue 11,
Hamburg University of Technology. November 2020.
2) Richardson, M. (2009). Determinants of 9) Alshared A., (2021). Theses. Used Cars
Used Car Resale Value. Thesis (BSc). The Price Prediction and Valuation using Data
Colorado College. Mining Techniques.
3) Wu, J.D., Hsu, C.C., Chen, H.C. (2009). An 10) Mammadov H., (2021). Car price
expert system of price forecasting for used prediction in the USA by using Linear
cars using adaptive neuro-fuzzy inference. Regression. International journal of
Expert Systems with Applications, 36(4), economic behavior Vol- 11, 2021.
7809-817. 11) Https://www.kaggle.com/datasets/avika sl
4) Gongqi, S., Yansong, W., Qiang, Z. (2011). iwal/us ed-cars-price-prediction
New Model for Residual Value Prediction of 12) Bukvi´c, L.; Pašagi´c Škrinjar, J.; Fratrovi´c,
the Used Car Based on BP Neural Network T.; Abramovi´c, B. Price Prediction and
and Nonlinear Curve Fit. In Measuring Classification of UsedVehicles Using
Technology and Mechatronics Automation Supervised Machine Learning.
(ICMTMA), 2011 Third International Sustainability 2022, 14, 17034.
Conference, Vol. 2, pp. 682- 685, IEEE. https://doi.org/ 10.3390/su142417034

You might also like