Report of Comparing 5 Classification Algorithms of Machine Learning PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Report of comparing 5 classification

algorithms of Machine Learning:


Author: Anjali Patel

AITS Machine Learning Engineer intern https://ai-techsystems.com/

Prayagraj,UttarPradesh,India . 9161082490 [email protected]

Abstract-This paper is about the comparison of


the five most popular classification algorithms
of the supervised machine learning(ML). The
algorithms are as :
(II) Classification Supervised Learning:
*Decision Trees The Supervised machine learningalgorithmsearches
for the patterns within the value labels assigneddata
*Boosted Trees points. Classification is applied when the output
variable is a category.such as ‘High’ or ‘Low’,
*Random Forest ‘Normal’ or ‘Abnormal’ , ‘Red’ or ‘Black’.
Classification is the process of dividing the datasets
*Support Machine Learning
into different categories or groups by adding labels.
*Neural Networks
(A). Decision Trees.
After the theoretical comparison, I have
The Decision tree algorithm is used to solve the
implemented each of the algorithm on a dataset
classification problem as well as the regression
of Drug_dt based on the features of sex, blood –
problem, so known as the CART(Classification
pressure , cholesterol , Na to K parameters and
And regression Tree).
the target of Drug to be given.
The decision tree uses the tree representation to
solve the problems. Using it we can represent any
Boolean function on the discrete attributes.
Keywords- machine_learning_algorithms, Some assumption that are made while using the decision
comparison of five popular ml algorithms, Decision- tree:
trees,Boosted-trees,Random Forest, Support-Vector-
 At the beginning , we consider the whole training
Machine, Neural-Networks.
set as the root.
 Feature values are preferred to be categorical.
(I) Introduction:
 On the basis of attribute values records are
Machine Learning is subtopic of the Artificial Language, distributed recursively.
the Machine Learning is an idea of making a machine  We use statical methods for ordering as root or
learnt with the examples and implement it further. The the internal node.
machine learning can be categorised into three parts:-

1.Supervised Learning. 2.Unsupervised Learning. 3.


Reinforcement. The supervised learning can further
classified into two parts: 1.Classification. 2.Regression.
The Unsupervised learning can also classified into two
parts: 1.Association. 2.Clustering.Here we would discuss
about the algorithms of the Classification, which is a type
of the supervised learning.
Root node
(C). Random Forest:

The Random forest algorithm is the parallel bagged


Daughter Daughter ensemble of the number trees.This model have strong
node node
predective power but lower interpredibility as comparison
to the decision tree.

Leaf Leaf Leaf More hyperparameters than the decision trees that control
model growth are:

Fig 1. Structure of a decision tree.  Number of trees


 Sampling Rate
( B). Boosted Trees:
 Number of the variables to try.
The Boosted trees are the sequential Ensemble of the tree
models. The Boosting tree is the model of machine learning
algorithms to combine the weak learner to form the strong
learner to improve the accuracy.

tree3
tree2
tree1
Fig 3. Random forest

The concept of Random Forest can be


simplified as:
Fig 2. Concept of the Boosted Trees The Random forest is the combination of the
How does the boosting work:
Decision trees. Let there are nimber of the
decision trees used to resolve a dataset and
The basic principle behind the working of the boosting there are different outcomes .The outcome
algorithm is to generate multiple weak learner and combine which has been repeated most, means
their prediction to form one strong rule. Many iteration are having the highest voting is the final
used to create the Decision stumps and combine several consequence of the Random Forest
weak learners to form a strong learner.
algorithm results.
STUMPS: These are the trees having the single node and
Two leaves. (D). Support Vector Machine(SVM):
A support vector machine (SVM) is a
ADABOOST: The adaboost is used to make the discriminative classifier formally defined by
collection of the trees that is the Boosted tree. It combines a separating hyperplane . In other word,
the stumps to form the boosted tree.
given labelled training data , the algorithm
The boosted trees have: optimals an optimal hyperplane which
categorises new examples. In two
1 Strong predective power , but even less interpredibility dimentional space this hyperplane is a line
than forest. In it each successive tree uses the residue of the dividing a plane in two parts where in each
previous tree. class lay in either side.
2 Even it have more hyperparameters to control model
Building.
Fig 4. Concept of SVM Fig 6. Neural Network
KERNELS: If we have such a data that no
any line can separate it into two classes in (III) Experiment Exploration:
the X-Y plane .Now, we have to apply the
transformation and add one more dimention In the colab coding section I have implemented the
the Z-axis.Now a line can be drawn which above described algorithms on a dataset Drug_dt.
can separate the data into two classes. When
Imported the dataset using the pandas.then
we return to the original plane , it maps a
preprocessed the data and then segregate it as
circular boundry called kernel.
follows:

Fig 5. Kernel

(E). Neural Networks.


A neural network is a massively parallel
distributed Processor that has a natural
propensity for storing the experimental Fig 7. Graph count vs sex.
knowledge and making it available for.
The dataset having the features of sex , BP,
A neural network , is a collection of layers Cholestrol, Na-to –K concentration and the target of
that transform the input in some way to produce an the Drugs which are the categorical data.
output.
Split the data into train test format of 0.30 to train
The perceptron is the basic unit of the neural the various models and then test to it.
network. The perceptron is consist of two types of
nodes : Input nodes and output nodes, each input  From sklearn.tree import
node is connected via weighted link to the output DecisionTreeClassifier.
node.  From sklearn.ensemble import AdaBoost
Classifier.
∆W=ᾐ.d.x
 From sklearn.ensemble import
d=predicted output-desired output RandomForestClassifier.
 From sklearn.svm import Linear SVC.
x= Input data
 From sklearn.neural_network import
ᾐ=Learning Rate MLP Classifier.
(IV) Results:

The results of the different algorithms models are


different in the term of the accuracy:

Serial Algorithms Accuracy


no. (%)
1 Decision Trees 98
2 Boosted Trees 73
3 Random Forest 95
4 SVM 61
5 Neural networks 48
Table 1.

120
Colum 100
n1 80
Colum 60
n2
accura 40
cy 20
0

Fig 8. Graph Representation of results.

(V) Conclusion:

The conclusion of the above whole


discussion and the exploration is that the
five compared algorithms models are used
for the classification problems and some are
used for the regression problems. For the
dataset taken by me the Decision Trees
algorithm model is best and then is the
RandomForest model.

(VI). References:

 https://www.analyticsvidhya.com
 https://www.edureka.co
 www.analyticsindiamag.com
 Data Analytics and Machine Learning :-By
Chandan Verma(author).

You might also like