Performance Evaluation of Different Supervised Learning Algorithms For Mobile Price Classification
Performance Evaluation of Different Supervised Learning Algorithms For Mobile Price Classification
http://doi.org/10.22214/ijraset.2020.6302
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 8 Issue VI June 2020- Available at www.ijraset.com
Abstract: This research-paper aims at comparing the accuracy of different classification algorithms used in supervised machine
learning. Classification Problem is about to find out in which class each example is related within a given dataset. It is used to
classify the data instances into different groups according to some characteristics. We used several famous supervised learning
algorithms - Logistic Regression, K-Nearest Neighbours (KNN), Decision Tree, Support Vector Machine (SVM), and Gradient
Boosting to classify the range of Mobile price. We have created multiple classifiers for Mobile Price classification and compared
their accuracy on the data taken from kaggle. Results are compared in terms of outcome accuracy score achieved from the
research experiment. Conclusion is made for the best classifier for the mobile price classification problem.
Keywords: Machine Learning, Supervised Learning, Classification, Logistic Regression, Decision Tree, KNN, SVM, Gradient
Boost Algorithm.
I. INTRODUCTION
Scientific Computing anticipates on executing computer algorithms. Given a particular available environment and hardware,
algorithm’s accuracy is a deciding factor. There are many supervised learning algorithms available for classification problem and
many languages available to execute it like Python, R, MATLAB etc. But Python particularly has the best libraries and tools used in
Machine Learning. Python provides scikit-learn library which contains simple and efficient tools for executing the supervised
learning algorithms.
Millions of mobile are sold and purchased globally. So here the kaggle- mobile price classification is an example dataset for the
given type of problem i.e. finding optimal class. The same work can be done to classify real price-range of all products like cars,
bikes, Electronic items, medicine, Housing-price etc.
Here, result is taken as a single opinion score ‘accuracy’ – best used in comparison of algorithms. Price-class is calculated and
decides whether the mobile is very economical, economical, and expensive or very expensive. In this paper, we compare the popular
supervised learning- approaches (Classification, Logistic Regression, Decision Tree, K-Nearest Neighbour (KNN), Support Vector
Machine (SVM), and Gradient Boost Algorithm) in the context of classification. It gives the opinion on the mobile price depending
on the features used.
III. METHODOLOGY
B. Data Analysis
Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and
recap, and evaluate data. [3]
Python’s Scikit-Learn Machine Learning Toolbox has been used for the Exploratory Data Analysis, Data Processing and Model
Development. Python’s Plotting Libraries like Matplotlib and Seaborn have been used for the data Visualizations.
Dataset as 21 features and 2000 entries. The meanings of the features are given below.
b) K – Nearest Neighbour (KNN): KNN is a classification algorithm as given in [11] where objects are classified by voting several
labelled training examples with their smallest distance from each object. This method performs well even in handling the
classification tasks with multi-categorized classification. Its disadvantage is that KNN requires more time for classifying objects
when a large number of training examples are given. KNN should select some of them by computing the distance of each test
objects with all of the training examples. KNN is a modest algorithm that stores all accessible suitcases and classifies new
suitcases based on a similarity measure. KNN has symmetrical names (a) Memory-Based reasoning [12] (b) Example-Based
Reasoning (c) Instance-Based Learning (d) Case-Based Reasoning and (e) Lazy Learning. KNN utilized for relapse and
grouping for prescient issues [13]. Be that as it may, it is broadly utilized as a part of grouping troubles in the business.
In contrast to other supervised learning algorithms, a decision tree algorithm can be utilized for taking care of regression and
classification issues as well. The general thought process of utilizing Decision Tree is to make a training model that can use to
predict class or estimation of objective factors by taking in choice standards derived from earlier data (training data). In Fig. 4 we
have shown a sample picture of decision trees. [15]
d) Support Vector Machine(SVM): In ML, SVM are supervised learning models associated with learning algorithms that inspect
data used for classification and regression analysis [16]. Determined a settled of activity cases, each show as going to one or the
new of two gatherings, in SVM preparing calculation develops a model that apportions new cases to one gathering or the other,
making it a non-probabilistic parallel direct classifier [17]. When facts are not categorized, supervised learning is not possible,
and an unsupervised learning approach is compulsory [18], which efforts to invention normal clustering of the data to groups,
and then map new data to these formed groups. The clustering algorithm which delivers an enhancement to the SVM is called
support vector clustering (SVC) and is often used in trade applications either when facts are not categorized or when only some
facts are categorized as a pre-processing for a classification pass [19]. The mechanism of classifying the data into different
classes by definition a line which splits the training files into classes. There are a few straight hyperplanes, SVM calculation
tries to augment the separation in the focal of the few classes that are mind boggling and this is said as edge augmentation [20].
If the line makes the most of the space among the classes is recognized, the probability to simplify well to unobserved data is
increased.
Fig. 8: Confusion Matrix and Classification Report of Support Vector Machine Algorithm.
e) Gradient Boosting
Gradient Boosting = Gradient Descent + Boosting
Fit an additive model (ensemble) Σ ρ ℎ (x) in a forward stage-wise manner.
In each stage, introduce a weak learner to compensate the shortcomings of existing weak learners.
In Gradient Boosting, “shortcomings” are identified by gradients.
Recall that, in Adaboost, “shortcomings” are identified by high-weight data points.
Both high-weight data points and gradients tell us how to improve our model. [21]
E. Model Evaluation
A binary classification problem has only two classes to classify, preferably a positive and a negative class. Now let’s look at the
metrics of the Confusion Matrix.
1) True Positive (TP): It refers to the number of predictions where the classifier correctly predicts the positive class as positive.
2) True Negative (TN): It refers to the number of predictions where the classifier correctly predicts the negative class as negative.
3) False Positive (FP): It refers to the number of predictions where the classifier incorrectly predicts the negative class as positive.
4) False Negative (FN): It refers to the number of predictions where the classifier incorrectly predicts the positive class as
negative.
a) Accuracy: It gives you the overall accuracy of the model, meaning the fraction of the total samples that were correctly classified
by the classifier. To calculate accuracy, use the following formula:
+
( + + + )
b) Misclassification Rate: It tells you what fraction of predictions were incorrect. It is also known as Classification Error. You can
calculate it using
+
( + + + )
c) Precision: It tells you what fraction of predictions as a positive class were actually positive. To calculate precision, use the
following formula:
d) Recall: It tells you what fraction of all positive samples were correctly predicted as positive by the classifier. It is also known as
True Positive Rate (TPR), Sensitivity, and Probability of Detection. To calculate Recall, use the following formula:
e) Specificity: It tells you what fraction of all negative samples are correctly predicted as negative by the classifier. It is also
known as True Negative Rate (TNR). To calculate specificity, use the following formula:
( + )
f) F1-score: It combines precision and recall into a single measure. Mathematically it’s the harmonic mean of precision and recall.
It can be calculated as follows: [22]
F. Classification Report
A Classification report is used to measure the quality of predictions from a classification algorithm. How many predictions are True
and how many are False. More specifically, True Positives, False Positives, True negatives and False Negatives are used to predict
the metrics of a classification report. [23]
IV. CONCLUSIONS
The principal part of this work is to compare five distinctive supervised machine learning classifiers and find the best accurate
algorithm. We researched all classifiers execution on Mobile Price classification data and the Gradient Boost classifier gives the
most elevated order exactness 90% dependent on F1 score and K-Nearest Neighbours (KNN) gives the least precision 55%. So we
can conclude that even on the less training data, Gradient Boosting and SVM algorithms classifies very well and the accuracy can be
increased by using big datasets. The main reason of low accuracy rate for some algorithms is low number of instances in the data set.
In our study, there are a few bearings for future work in this field. We just explored some popular supervised machine learning
algorithms, more algorithms can be picked to assemble an increasingly precise model.
V. ACKNOWLEDGMENT
The authors are grateful and pleased to all the researchers in this research study.
REFERENCES
[1] Data is available on Kaggle uploaded by Abhishek Sharma : https://www.kaggle.com/iabhishekofficial/mobile-price-classification
[2] https://en.wikipedia.org/wiki/Data_collection
[3] https://www.techopedia.com/definition/14650/data-preprocessing
[4] https://ori.hhs.gov/education/products/n_illinois_u/datamanagement/datopic.html#:~:text=Data%20Analysis%20is%20the%20process,and%20recap%2C%20
and%20evaluate%20data.&text=An%20essential%20component%20of%20ensuring,appropriate%20analysis%20of%20research%20findings.
[5] https://www.displayr.com/what-is-a-correlation-matrix/#:~:text=A%20correlation%20matrix%20is%20a,Create%20your%20own%20correlation%20matrix
[6] Sharma, L., Gupta, G. and Jaiswal, V., 2016, December. Classification and development of tool for heart diseases (MRI images) using machine learning. In
Parallel, Distributed and Grid Computing (PDGC), 2016 Fourth International Conference on (pp. 219-224). IEEE.
[7] Chauhan, D. and Jaiswal, V., 2016, October. An efficient data mining classification approach for detecting lung cancer disease. In Communication and
Electronics Systems (ICCES), International Conference on (pp. 1-8). IEEE.
[8] Negi, A. and Jaiswal, V., 2016, December. A first attempt to develop a diabetes prediction method based on different global datasets. In Parallel, Distributed
and Grid Computing (PDGC), 2016 Fourth International Conference on (pp. 237-241). IEEE.
[9] https://hal.inria.fr/hal-00860051/document
[10] Aaron Defazio, Francis Bach, Simon Lacoste-Julien. SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite
Objectives. Advances In Neural Information Processing Systems, Nov 2014, Montreal, Canada
[11] http://scikit-learn.org/stable/documentation.html
[12] Tam, Santoso A and Setiono R., “A comparative study of centroid-based, neighborhood-based and statistical approaches for effective document
categorization”, ICPR '02 Proceedings of the 16th International Conference on Pattern Recognition (ICPR'02) ,vol.4 , no. 4 , 2002, pp.235–238.
[13] Domingos, P., 2012. A few useful things to know about machine learning. Communications of the ACM, 55(10), pp.78-87.
[14] Mitchell, T.M., 2006. The discipline of machine learning (Vol. 3). Carnegie Mellon University, School of Computer Science, Machine Learning Department.
[15] Russell Greiner and Jonathan Schaffer, “Exploratorium – Decision Trees”, Canada. 2001. URL: http://www.cs.ualberta.ca/~aixplore/learning/ Decision Trees
[16] Decision Trees, Retrieve from: https://dataaspirant.com/2017/01/30/how-decision-treealgorithm-works/, Last Accessed: 5 Octobor,2019
[17] Wagstaff, K., 2012. Machine learning that matters. arXiv preprint arXiv:1206.4656.
[18] Bennett, K.P. and Parrado-Hernández, E., 2006. The interplay of optimization and machine learning research. Journal of Machine Learning Research, 7(Jul),
pp.1265-1281.
[19] Caruana, R. and Niculescu-Mizil, A., 2006, June. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international
conference on Machine learning (pp. 161-168). ACM.
[20] Javidi, B., 2002. Image recognition and classification: algorithms, systems, and applications. CRC Press.
[21] College of Computer and Information Science Northeastern University[ A Gentle Introduction to Gradient Boosting by Cheng Li]
[22] https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826
[23] https://muthu.co/understanding-the-classification-report-in-
sklearn/#:~:text=A%20Classification%20report%20is%20used,classification%20report%20as%20shown%20below