A Literature Review On Supervised Machine LearningAlgorithms and Boosting Process

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

International Journal of Computer Applications (0975 – 8887)

Volume 169 – No.8, July 2017

A Literature Review on Supervised Machine Learning


Algorithms and Boosting Process

M. Praveena V. Jaiganesh, PhD


MCA, Mphil Professor, Department of PG & Research
Assistant Professor, Department of Computer Dr. NGP Arts and Science College
Science, Dr. SNS Rajalakshmi College of Arts and Coimbatore
Science, Coimbatore

ABSTRACT Supervised machine learning is the mission of conceive a


Data mining is one amid the core research areas in the field of meaning from labelled training data which has a set of
computer science. Yet there is a knowledge data detection training examples. As far as supervised learning is concerned,
process helps the data mining to extract hidden information every example is a mainstay containing an input object (which
from the dataset there is a big scope of machine learning is usually a vector quantity) and a enforced output value (may
algorithms. Especially supervised machine learning also be referred as supervisory signal).
algorithms gain extensive importance in data mining research. A supervised learning algorithm at first performs the analysis
Boosting action is regularly helps the supervised machine task from the practice data and constructs a contingent
learning algorithms for rising the predictive / classification function, in order to map new examples. A maximum setting
veracity. This survey research article prefer two famous probably facilitates the algorithm to exactly courage the class
supervised machine learning algorithms that is decision trees labels for covered instances and the same needs the
and support vector machine and presented the recent research supervised learning algorithm to reduce from the training data
works carried out. Also recent improvement on Adaboost to covered situations in a "rational" manner. The supervised
algorithms (boosting process) is also granted. From this methods are possibly used in various application areas that
survey research it is learnt that connecting supervised machine include marketing, finance, manufacturing, testing, stock
learning algorithm with boosting process increased prediction market prediction, and so on.
efficiency and there is a wide scope in this research element.
1.1 Steps performed in the Supervised
Keywords
Data mining, machine learning, research, adaboost, support
Machine Learning Algorithms
vector machine, decision trees. Step – 1: Establish the type of training examples. The user
needs to courage the type(s) of data that will be used as a
1. INTRODUCTION training set.
Machine learning shortly describe as ML is a kind of artificial Step – 2: Converge a training set. The training set ambition to
intelligence (AI) which compose available computers with the be delegate of the real-world use of the function. As a effort, a
efficiency to be trained without being veraciously set of input objects is collected that remains and analogous
programmed. ML learning interest on the extensions of outputs are also collected.
computer programs which is capable enough to modify when
unprotected to new-fangled data. ML algorithms are broadly Step – 3: Resolve the input feature illustration of the learned
classified into three divisions namely supervised learning, function / learned attribute. The accurateness of the learned
unsupervised learning and reinforcement learning and is function is securely based on the input object is
shown in Fig.1. The evolution of machine learning is representation.
comparable to that of data mining. Both data mining and Step – 4: Resolve the formation of the learned function and
machine learning consider or explore from end to end data to comparable machine learning algorithm.
assume for patterns. On the other hand, in choice to extracting
data for human knowledge as is the case in data mining Step – 5: Assimilate the design and execute the learning
applications; machine learning generate use of the data to algorithm on the collected training set.
identify patterns in data and fine-tune program actions
therefore. Step – 6: Evaluate the accurateness / correctness of the
learned function. Then, parameter adapt and learning may be
performed on the resulting function and needs to be measured
on a test data set that is break up from the training set.

1.2 Factors to be considered


1.2.1 Data Heterogeneity:
When the countenance vectors contains countenance of
several kinds which includes discrete, discrete ordered,
counts, continuous values, certain algorithms are simpler to
implement than rest of the algorithms. Many such algorithms
namely - Support Vector Machines, linear regression, logistic
regression, neural networks, and nearest neighbour methods,
desire that the input countenance be numerical and scaled to
Fig.1. Machine Learning and its Types similar ranges.

32
International Journal of Computer Applications (0975 – 8887)
Volume 169 – No.8, July 2017

1.2.2 Data Redundancy: maintaining the processing time bounded, irrespective of the
When the input features has unwanted information, a few number of instances processed.
learning algorithms probably may execute defectively due to Predicting learning styles in conversational intelligent tutoring
numerical irresolution. Such researches issues may be solved systems using fuzzy decision trees has been proposed by
by consolidate some pre-processing techniques. Crokett et al., 2017 [4]. Prediction of learning style is carried
1.2.3 Presence of interactions and non-linearity’s: out by imprison independent behaviour variables during the
tutoring observation with the highest value variable. A
When the countenance makes an autonomous role to the
weakness of their approach is that it does not take into
output, then algorithms that are based on linear functions and
consideration the interactions between behaviour variables
distance functions usually perform fit. On the other hand,
and, due to the uncertainty inherently present in modelling
when there are multifaceted interactions amongst
learning styles, small differences in behaviour can lead to
countenance, then certain algorithms perform much better, as
incorrect predictions. Subsequently, the learner is presented
they are distinctively designed to determine these interactions.
with guidance material not suited to their learning style.
2. RELATED WORKS Because of the above mentioned challenges a new method
that uses fuzzy decision trees to build a series of fuzzy
2.1 Recent Works on Decision Trees predictive models connecting these variables for all
Lertworaprachaya et al., 2014 [1] proposed a new model for dimensions of the Felder Silverman Learning Styles model.
compose decision trees using interval-valued fuzzy Results using live data by the authors showed that the fuzzy
membership values. Most existing fuzzy decision trees do not models have increased the anticipate accuracy across four
consider the concerned associated with their membership learning style dimensions and facilitated the discovery of
values; however, precise values of fuzzy membership values some interesting relationships amongst behaviour variables.
are not always possible. Because of that, the authors
represented fuzzy membership values as distance to model 2.2 Recent Works on Support Vector
concerned and employ the look-ahead based fuzzy decision
tree induction method to construct decision trees. The authors
Machine (SVM)
Motivated by the KNN trick conferred in the weighted twin
also measured the significance of different neighbourhood
support vector machines with local information (WLTSVM),
values and define a new parameter unkind to specific data sets
Pan et al., 2015 [5] proposed a novel K-nearest neighbour
using fuzzy sets. Some examples are provided to establish the
establish structural twin support vector machine (KNN-
effectiveness of their approach.
STSVM). By applying the intra-class KNN method, different
Bahnsen et al. 2015 [2] proposed an example-reliant cost- weights are given to the samples in one class to enhance the
sensitive decision tree algorithm, by incorporating the structural information. For the other class, the expendable
different example-reliant costs into a new cost-based impurity constraints are deleted by the inter-class KNN method to
measure and new cost-based pruning criteria. Subsequently, speed up the coaching process. For large scale problems, a
using three different databases, from three real-world fast clip algorithm is further introduced for increase of rate.
applications namely credit card fraud detection, credit scoring Comprehensive experimental results on twenty-two datasets
and direct marketing, the authors evaluated their proposed demonstrate the efficiency of their proposed KNN-STSVM.
method. Their results showed that their proposed algorithm is
It is noteworthy that existing structural classifiers do not
the best performing design for all databases. Additionally,
balance structural information’s relationships both intra-class
when compared across a standard decision tree, their design
and inter-class. Connecting the structural information with
builds significantly smaller trees in only a fifth of the time,
nonparallel support vector machine (NPSVM), D. Chen et al.
while having a superior performance measured by cost
2016 [6], designed a new structural nonparallel support vector
savings, leading to a design that not only has more business-
machine (called SNPSVM). Each model of SNPSVM
oriented results, but also a design that creates simpler models
examine not only the concentration in both classes by the
that are easier to analyze.
structural information but also the reparability between
Online decision trees from data current are usually unable to classes, thus it can fully adventure prior knowledge to directly
handle concept drift. Blanco et al., 2016 [3] proposed the recover the algorithms generalization capacity. Moreover, the
Incremental Algorithm Driven by Error Margins (IADEM-3) authors applied the improved alternating direction designed of
that mainly carry out two actions in response to a approach multipliers (ADMM) to SNPSVM. Both their model itself and
drift. At first, IADEM-3 resets the variables affected by the the solving algorithm can guarantee that it possibly would
change and maintains unbroken the structure of the tree, deal with large-scale classification problems with a huge
which allows for changes in which ensuing target functions number of occurrence as well as features. Experimental
are very similar. After that, IADEM-3 creates alternative results show that SNPSVM is superior to the other current
models that replace parts of the main tree when they algorithms based on structural information of data in both
significantly improve the accuracy of the model, thereby estimation time and classification accuracy.
rebuilding the main tree if needed. An online change detector
Peng et al., 2016 [7] formulated a linear kernel support vector
and a non-parametric statistical test based on Hoeffding’s
machine (SVM) as a consistent least-squares (RLS) problem.
bounds are used to guarantee that significance. A new pruning
By defining a set of indicator variables of the errors, the
method is also incorporated in IADEM-3, making sure that all
solution to the RLS problem is represented as an equation that
split tests previously installed in decision nodes are useful.
describe the error vector to the indicator variables. Through
Their learning model is also viewed as an ensemble of
partitioning the training set, the SVM weights and tendency
classifiers, and predictions of the main and alternative models
are expressed analytically using the support vectors. The
are joined to classify unlabeled examples. IADEM-3 is
authors also determine how their approach naturally extends
empirically related with various well-known decision tree
to sums with nonlinear kernels whilst deflect the need to make
induction algorithms for concept drift detection. The authors
use of Lagrange multipliers and duality theory. A fast constant
portrayed that their new algorithm generally reaches higher
solution algorithm based on Cholesky decomposition with
levels of accuracy with smaller decision tree models,

33
International Journal of Computer Applications (0975 – 8887)
Volume 169 – No.8, July 2017

modification of the support vectors is recommended as a a single-layer perception using AdaBoost and decision
solution method. The properties of their SVM formulation stumps. It is then extended to learn weights of a neural
were analyzed and correlated with standard SVMs using a network with a single hidden layer of linear neurons. At last, a
simple example that can be decorated graphically. The novel method is introduced by the authors to incorporate non-
correctness and behaviour of their proposed work has been linear activation functions in artificial neural network
demonstrated using a set of public benchmarking problems for learning. Their proposed method uses series representation to
both linear and nonlinear SVMs. approximate non-linearity of activation functions, learns the
coefficients of nonlinear terms by AdaBoost which adapts the
Utkin and Zhuk, 2017 [8] proposed a well-known one-class network parameters by a layer-wise iterative traversal of
classification support vector machine (OCC SVM) dealing neurons and an appropriate reduction of the problem.
with interval-valued or set-valued training data. Their key Comparison of various neural network models learned the
idea is to represent every distance of training data by a finite proposed methods and those learned using the least mean
set of explicit data with imprecise weights. Their squared learning (LMS) and the resilient back-propagation
representation is based on replacement of the interval-valued (RPROP) is provided by the authors.
familiar risk produced by interval-valued data with the
interval-valued expected risk produced by uncertain weights Miller and Soh 2015 [12] proposed a novel cluster-based
or sets of weights. It can also be mentioned that, the interval boosting (CBB) approach to address limitations in boosting on
concern is replaced with the uncertain weight or probabilistic supervised learning (SL) algorithms. Their CBB approach
uncertainty. The authors showed how constraints for the partitions the training data into clusters containing highly
imprecise weights are incorporated into dual quadratic similar member data and integrates these clusters directly into
programming problems which can be viewed as extensions of the boosting process. Their CBB approach attempts to address
the well-known OCC SVM models. With the help of two specific limitations for current boosting both resulting
numerical examples with synthetic and real interval-valued from boosting focusing on incorrect training data. The first
training data the authors decorate their proposed approach and one is filtering for subsequent functions when the training
investigate its properties. data contains troublesome areas and/or label noise; and the
second one is over fitting in subsequent functions that are
2.3 Recent Works on Adaboost forced to learn on all the incorrect instances. The authors
Universum data usually does not belong to any class of the demonstrated the effectiveness of CBB through extensive
training data, has been applied for training better classifiers. empirical results on 20 UCI benchmark datasets and
Xu et al., 2014 [9] addressed a novel boosting algorithm proclaimed that CBB achieves superior predictive accuracy
called UAdaBoost which possibly would better the that use selective boosting without clusters.
classification performance of AdaBoost with Universum data.
UAdaBoost determine a function by minimizing the loss for 3. FINDINGS AND CONCLUSIONS
labelled data and Universum data. The cost function is Every learning algorithm will tend to suit some problem types
discount by a greedy, stage wise, functional gradient better than others, and will typically have many different
procedure. Each training stage of UdaBoost is fast and parameters and configurations to be adjusted before achieving
efficient. The standard AdaBoost weights labelled samples optimal performance on a dataset, AdaBoost (with decision
over training iterations while UAdaBoost gives an explicit trees as the weak learners) is often referred to as the best out-
weighting program for Universum samples as well. Also the of-the-box classifier. When used with decision tree learning,
authors described the practical conditions for the effectiveness information gathered at each stage of the AdaBoost algorithm
of Universum learning. These conditions are based on the about the relative 'hardness' of each training sample is fed into
analysis of the distribution of ensemble forecasting over the tree growing algorithm such that later trees tend to focus
training samples. By their experimental results the authors on harder-to-classify examples.
declare that their method can obtain superior performances
over the standard AdaBoost by selecting proper Universum The supervised machine learning algorithms such as decision
data. trees and support vector machine are capable enough to deal
with big data mining tasks. Even though the algorithms
Sun et al., 2016 [10] quoted a representative approach named efficiency considerably improving there is a need for adaptive
noise-detection based AdaBoost (ND_AdaBoost) in order to boosting process required in order to increase the predictive
improve the robustness of AdaBoost in the two-class accuracy much more. The following are the findings from this
classification scenario. In order to resolve the dilemma a survey research manuscript.
robust multi-class AdaBoost algorithm (Rob_MulAda) is
proposed by the authors whose key ingredients consist in a (i) Fuzzy logic which is a soft computing technique is
noise-detection based multi-class loss function and a new incorporated with the decision tree machine learning
weight updating scheme. The authors claims that their algorithm in order to rule out the ambiguity in the
experimental study indicates that their newly-proposed weight datasets.
updating scheme is indeed more robust to mislabelled noises (ii) Example – dependent along with cost sensitive
than that of ND_AdaBoost in both two-class and multi-class factors helps the decision trees to proclaim more
scenarios. As well, through the comparison experiments, the independency in machine learning process.
authors also verified the effectiveness of Rob_MulAda and (iii) Error margins based methods reduce the false
provide a suggestion in choosing the most appropriate noise- negative values while making use of decision trees.
alleviating approach according to the concrete noise level in (iv) Interactions between behaviour variables tend to
practical applications. improve the performance of the decision trees.
(v) Weight based structural information helps the
Baig et al., 2017 [11] presented a boosting-based method of support vector machine to quickly train the machine
learning a feed-forward artificial neural network (ANN) with learning algorithm.
a single layer of hidden neurons and a single output neuron.
At first, an algorithm called Boost on is depicted which learns

34
International Journal of Computer Applications (0975 – 8887)
Volume 169 – No.8, July 2017

(vi) Relationships between inter-class and intra-class [3] F. Blanco, J. C. Ávila, G. R. Jiménez, A. Carvalho, A. O.
surely will increase the effectiveness of the support Díaz, R. M. Bueno, “Online adaptive decision trees
vector machine. based on concentration inequalities,” Knowledge-Based
(vii) Decomposition of the attributes also significantly Systems, vol. 104, pp. 179-194, 2016.
improves the effectiveness of the classifier.
(viii)Noise detection process will be helpful to increase [4] Crockett, A. Latham, N. Whitton, “On predicting
the accuracy of the machine learning algorithm. learning styles in conversational intelligent tutoring
(ix) Cluster based boosting still has further scope of systems using fuzzy decision trees,” International Journal
research by making use of optimization techniques. of Human-Computer Studies, vol. 97, pp. 98-115, 2017.

From the above findings it is interesting to note that the [5] X. Pan, Y. Luo, Y. Xu, “K-nearest neighbour based
clustering or classification accuracy directly depends on the structural twin support vector machine,” Knowledge-
employment of boosting process. Not only that the overall Based Systems, vol. 88, pp. 34-44, 2015.
computational complexity would be reduced then. This survey [6] D. Chen, Y. Tian, X. Liu, “Structural nonparallel support
research article chooses two machines learning algorithm and vector machine for pattern recognition,” Pattern
one boosting technique and portrayed on the recent research Recognition, vol. 60, pp. 296-305, 2016.
works carried out during 2014 to 2017.
[7] X. Peng, K. Rafferty, S. Ferguson, “Building support
4. FUTURE SCOPE OF RESEARCH vector machines in the context of regularized least
Dealing with several datasets and performing data mining is a squares,” Neurocomputing, volume. 211, pp. 129-142,
tedious task. The following are the future directions for 2016.
further research work. [8] V. Utkin, Y. A. Zhuk, “An one-class classification
 Optimization techniques like genetic algorithm, particle support vector machine model by interval-valued training
swarm optimization, ant colony optimization, artificial data,” Knowledge-Based Systems, vol. 120, pp. 43-56,
bee colony algorithms can be used for improving the 2017.
performance of adaboost algorithm. [9] J. Xu, Q. Wu, J. Zhang, Z. Tang, “Exploiting Universum
 Other machine learning algorithms such as relevance data in AdaBoost using gradient descent,” Image and
vector machine, extreme learning machine, neural Vision Computing, vol. 32, pp. 550-557, 2014.
networks can be used for classifying / clustering the [10] B. Sun, S. Chen, J. Wang, H. Chen, “A robust multi-class
data. AdaBoost algorithm for mislabelled noisy data,”
Knowledge-Based Systems, vol. 102, pp. 87-102, 2016.
5. REFERENCES
[1] Y. Lertworaprachaya, Y. Yang, R. John, “Interval-valued [11] M. Baig, M .M. Awais, E. M. El-Alfy, “AdaBoost-based
fuzzy decision trees with optimal neighbourhood artificial neural network learning,” Neurocomputing, vol.
perimeter,” Applied Soft Computing, vol. 24, pp. 851- 16, pp. 22 – 41, 2017.
866, 2014.
[12] L. D. Miller and L. K. Soh, "Cluster-Based Boosting,"
[2] A. C. Bahnsen, D. Aouada, B. Ottersten, “Example- IEEE Transactions on Knowledge and Data Engineering,
dependent cost-sensitive decision trees,” Expert Systems vol. 27, pp. 1491-1504, 2015.
with Applications, vol. 42, pp. 6609-6619, 2015.

35
IJCATM : www.ijcaonline.org

You might also like