CarPricePrediction Python FR
CarPricePrediction Python FR
TECHNIQUES
ABSTRACT
This project proposes a forecasting approach that is solely based on the data retrieved
from sales and allows for a straightforward human interpretation.
Therefore, it proposes two generalized models for predicting future sales. In
an extensive evaluation, data sets are taken which consists of car price data.
The main motivation of doing this project is to present a sales prediction model for the
prediction of car price. Further, this research work is aimed towards identifying the best
classification algorithm for sales analysis.
In this work, data mining classification algorithm called Naïve Bayes is addressed and
used to develop a prediction system in order to analyze and predict the sales volume. In addition,
various grouping and chart preparation is also made in proposed system for better classification
results. The project is designed using Python 3.7.
EXISTING SYSTEM
In existing system, car dataset which contains attributes (engine type, cylinder number,
price, etc) are taken and two algorithms are carried out for classification/prediction purpose. The
algorithm called Naïve Bayes is used. The training data is taken 75% from the whole data set and
model is predicted. Then the remaining 25% of the data is taken as test data and checked against
the predicted model.
DRAWBACKS
The Naïve bayes classification yields conditional probability values only for existing
given dataset. New test data is added for classification.
Naïve bayes classification could not be preferred when the outlier data is more.
Chart preparation is not carried out.
PROPOSED SYSTEM
All the existing system approaches are carried out in proposed system. In addition, along
with Naïve Bayes based classification, various grouping operation is used to predict the model as
it helps better in various ways. It is found to be suitable especially if the data set is having more
number of records is contains outlier data. A wide variety of sales records can be taken for all
engine type and cylinder count classification purpose and predicting a new model at the same
time increasing the efficiency. SVM and KNN classification is also carried out.
ADVANTAGES
Chart preparation is carried out.
Grouping of records for various columns are prepared and displayed.
Engine type wise sales are found out.
Cylinder count wise sales are found out.
SYSTEM SPECIFICATION
HARDWARE REQUIREMENTS
This section gives the details and specification of the hardware on which the system is
expected to work.
Processor : Dual Core 2.1 GHz
RAM : 8 GB SD RAM
Monitor : 17” Color
Hard disk : 1 TB
Keyboard : Standard102 keys
Mouse : Optical mouse
SOFTWARE REQUIREMENTS
This section gives the details of the software that are used for the development.
Operating System : Windows 10 Pro
Environment : Python IDLE
Language : Python 3.7
MODULE DESCRIPTION
The project contains following modules.
1. DATASET COLLECTION
2. NAÏVE BAYES CLASSIFICATION
3. SVM CLASSIFICATION
4. KNN CLASSIFICATION
5. CHART PREPARATION
1. DATASET COLLECTION
In this module, the sales dataset from kaggle which contains attributes (engine type,
cylinder number, price, etc) are taken. Null value records are eliminated during preprocessing
work.
In this module, Branch wise, gender wise rating similarity (conditional probability) is
found out both for below and above 5.0 rating. Moreover, Outlet_Size wise items sold is found
out. Item wise comparison chart is also prepared for sold quantity.
3. SVM CLASSIFICATION
SVM stands for Support Vector Machine. It is a machine learning approach used for
classification and regression analysis. It depends on supervised learning models and trained by
learning algorithms. They analyze the large amount of data to identify patterns from them. An
SVM generates parallel partitions by generating two parallel lines. For each category of data in a
high-dimensional space and uses almost all attributes. It separates the space in a single pass to
generate flat and linear partitions. Divide the 2 categories by a clear gap that should be as wide
as possible. Do this partitioning by a plane called hyperplane. An SVM creates hyperplanes that
have the largest margin in a high-dimensional space to separate given data into classes. The
margin between the 2 classes represents the longest distance between closest data points of those
classes. The larger the margin, the lower is the generalization error of the classifier. The records
are classified into disease 1 or 2 using SVM classification in this module.
4. KNN CLASSIFICATION
In this module, KNN classification is being done with K value given as 6 and type
column (generated based on rating value with below and above 5.0 values) column as binary
classification factor. 75% of the data is given as training data and 25% as testing data. The
testing data’s record number and the record type is found out and displayed as result.
5. CHART PREPARATION
Using barplot the car price records values are group with their count values and plotted.
scatter.smooth(), the data sets’ column values are plotted with ‘range’ as X and ‘count’ as Y
column.
SYSTEM FLOW DIAGRAM
Classification
4. Chart Preparation
2. Accuracy Prediction