An Overview of The Supervised Machine Learning Methods: December 2017
An Overview of The Supervised Machine Learning Methods: December 2017
net/publication/328146111
CITATIONS READS
309 77,964
1 author:
Vladimir Nasteski
University "St. Kliment Ohridski" - Bitola
3 PUBLICATIONS 308 CITATIONS
SEE PROFILE
All content following this page was uploaded by Vladimir Nasteski on 11 December 2018.
Vladimir Nasteski
Faculty of Information and Communication Technologies, Partizanska bb,
7000 Bitola, Macedonia
[email protected]
Abstract
In the last decade a large number of supervised learning methods have been
introduced in the field of the machine learning. Supervised learning became an
area for a lot of research activity in machine learning. Many of the supervised
learning techniques have found application in their processing and analyzing
variety of data. One of the main characteristics is that the supervised learning
has the ability of annotated training data. The so called labels are class labels in
the classification process. There is a variety of algorithms that are used in the
supervised learning methods. This paper summarizes the fundamental aspects
of couple of supervised methods. The main goal and contribution of this review
paper is to present the overview of machine learning and provide machine
learning techniques.
Introduction
machine learning algorithms [1] are created to be able to represent the human
approach of learning some task. These algorithms can also represent an insight into
relative difficulty of learning in different environments.
These days, the development of new computing technologies in the area of Big
Data, machine learning is not like machine learning was in the past. Today, many of
the machine learning algorithms have been developed [2], updated and improved and
the recent development in machine learning becomes the ability to automatically
apply a variety of complex mathematical calculation to a big data, which calculates
the results much faster.
The adaptive programming is very popular. It is used in machine learning where
the applications are capable to recognize patterns, learning from experience, abstract
new information from data or optimize the accuracy and efficiency of its processing
and output. Also, the machine learning techniques [7] are used to work with
multidimensional data which are present in diverse amount of application areas.
So, based on the desired outcome of the algorithm, the machine learning algorithms
are organized in the following groups:
Supervised learning - the various algorithms generate a function that maps
inputs to desired outputs. One standard formulation of the supervised learning
task is the classification problem: the learner is required to learn (to
approximate the behavior of) a function which maps a vector into one of
several classes by looking at several input-output examples of the function.
Unsupervised learning - models a set of inputs: labeled examples are not
available.
Semi-supervised learning - combines both labeled and unlabeled examples to
generate an appropriate function or classifier.
Reinforcement learning - the algorithm learns a policy of how to act given an
observation of the world. Every action has some impact in the environment,
and the environment provides feedback that guides the learning algorithm.
Transduction - similar to supervised learning, but does not explicitly construct
a function: instead, tries to predict new outputs based on training inputs,
training outputs, and new inputs.
Learning to learn - where the algorithm learns its own inductive bias based
on previous experience.
Besides these groups of machine learning algorithms, they are basically divided
into two general groups, supervised and unsupervised learning.
In supervised algorithms, the classes are predetermined. These classes are created
in a manner of finite set, defined by the human, which in practice means that a certain
segment of data will be labeled with these classifications. The task of the machine
learning algorithm is to find patterns and construct mathematical models. These
models are then evaluated based on the predictive capacity in relation to measures of
variance in the data itself.
It is also useful to make difference between two main supervised models:
classification models (classifiers) and regression models. Regression models map the
An overview of the supervised machine learning methods 3
input space into a real-value domain. The classifiers map the input space into pre-
defined classes. There are many alternatives for representing classifiers, for instance,
support vector machines, decision trees, probabilistic summaries, algebraic function,
etc. Along with regression and probability estimation, classification is one of the most
studied models, possibly one with the greatest practical relevance. The potential
benefits of progress in classification are immense since the technique has great impact
on other areas, both within Data Mining and in its applications.
On the other hand, the unsupervised learning algorithms are not provided with
classifications. The main task of unsupervised learning is to automatically develop
classifications labels. These algorithms are searching the similarity between pieces of
data in order to determinate if they can be categorized and create a group. These
groups are so called clusters, and they represent whole family of clustering machine
learning techniques. In this unsupervised classification (cluster analysis) the machine
doesn’t know how the clusters are grouped. Using the cluster analysis, there is a bigger
potential for surprising ourselves. Thus, cluster analysis is a very promising tool for
the exploration of relationships between many papers.
This paper is a representation of different types of supervised machine learning
algorithms and their most efficient use to make decisions more efficient and to
complete the task in more optimized form. In this paper, how different algorithms give
the machine different learning experience and are adopting other things from the
environment will be shown, and after which the machine makes a decision and
performs specialized tasks.
The paper is organized as follows: Section II paper takes us into consideration the
main related work that are used for completing this paper. Section III provides the
overview of the supervised machine learning process. Section IV discusses the various
learning algorithms used to perform learning process.
Related work
There are many research papers and articles that give us a great overview of some
of the methods and algorithms that are used in the area of machine learning.
Rich Caruana, Alexandru Niculescu-Mizil [2] present a large-scale empirical
comparison between ten supervised learning methods: SVMs, neural nets, logistic
regression, naive Bayes, memory-based learning, random forests, decision trees,
bagged trees, boosted trees, and boosted stumps.
Leonidas Akritidis and Panayiotis Bozanis [5] attempt to address interesting
problem where documents remain unclassified, by introducing a machine learning
algorithm which combines several parameters and meta-data of a research article.
Aurangzeb Khan et al. [6]had highlighted the important techniques and
methodologies that are employed in text documents classification. The paper provides
a review of the theory and methods of document classification and text mining.
4
Pradraig Cunningham, Matthieu Cord, and Sarah Jane Delany in their chapter
“Supervised learning” provide an overview of support vector machines and nearest
neighbour classifiers –probably the two most popular supervised learning techniques
employed in multimedia research.
S. B. Kotsiantis [16] describes various supervised machine learning classification
techniques. He also points the goal of supervised learning which is to build a concise
model of the distribution of class labels in terms of predictor features.
Amanpreet Singh et al [17] are discussing about the efficacy of supervised machine
learning algorithms in terms of the accuracy, speed of learning, complexity and risk
of over fitting measures. The main objective of their paper is to provide a general
comparison with state of art machine learning algorithms.
The learning process in a simple machine learning model is divided into two steps:
training and testing. In training process, samples in training data are taken as input in
which features are learned by learning algorithm or learner and build the learning
model [4]. In the testing process, learning model uses the execution engine to make
the prediction for the test or production data. Tagged data is the output of learning
model which gives the final prediction or classified data.
available, but if some of the input values are missing, it is not possible to infer
anything about the outputs.
Supervised learning is the most common technique for training for neutral
networks and decision trees. Both of these are depended on the information given by
the pre-determinate classification.
Also, this learning is used in applications where historical data predicts likely
feature events. There are many practical examples of this learning, for instance an
application that predicts the species of iris given a set of measurements of its flower.
As previously mentioned, the supervised learning tasks are divided into two
categories: classification and regression. In classification, the label is discrete, while
in regression, the label is continuous.
As shown on Figure 2, the algorithm makes the distinction between the observed
data 𝑋 that is the training data, in most cases structured data given to the model during
the training process. In this process, the supervised learning algorithm builds the
predictive model. After its training, the fitted model would try to predict the most
likely labels for a new set of samples 𝑋 in the testing set. Depending on the nature of
the target y, supervised learning can be classified:
If 𝑦 has values in a fixed set of categorical outcomes (integers), the task to
predict y is called classification
If 𝑦 has floating point values, the task to predict 𝑦 is called regression
Decision trees
All of the other nodes have exactly one incoming edge. The node that has outgoing
edges is called internal node or a test node. The rest of the nodes are called leaves. In
a decision tree, each test node splits the instance space into two or more sub-spaces
according to a certain discrete function of the input values. In the simplest case, each
test considers a single attribute, such that the instance space is portioned according to
the attribute’s value. In case of numeric attributes, the condition refers to a range.
Each leaf is assigned to one class that represents the most appropriate target value.
The leaf may hold a probability vector that indicates the probability of the target
attribute having a certain value. The instances are classified by navigating them from
the root of the tree down the leaf, according to the outcome of the tests along the path.
On Figure 3 describes a simple use of the decision tree. Each node is labeled with the
attribute it tests, and its branches are labeled with its corresponding values.
Given this classifier, the analyst can predict the response of some potential
customer and understanding the behavioral characteristics of the entire potential
customers’ population [9].
Linear regression
The goal of the linear regression1, as a part of the family of regression algorithms,
is to find relationships and dependencies between variables. It represents a modeling
relationship between a continuous scalar dependent variable y (also label or target in
1http://www.ess.uci.edu/~yu/class/ess210b/lecture.3.regression.all.pdf
An overview of the supervised machine learning methods 7
As shown on Figure 4, the model (red line) is calculated using training data (blue
points) where each point has a known label (𝑦 axis) to fit the points as accurately as
possible by minimizing the value of a chosen loss function. We can then use the model
to predict unknown labels (we only know 𝑥 value and want to predict 𝑦 value).
Naive Bayes
Logistic Regression
Like the naive Bayes, logistic regression [13] works by extracting some set of
weighted features from the input, taking logs and combining them linearly, which
means that each feature is multiplied by a weight and then added up.
The most important difference between naive Bayes and logistic regression is that
the logistic regression is a discriminative classifier while the naive Bayes is a
generative classifier.
Logistic regression [14] is a type of regression that predicts the probability of
occurrence of an event by fitting data to a logistic function. Just as many form of
regression analysis, logistic regression makes use of several predictor variables that
may be numerical or categorical.
The logistic regression hypothesis is defined as:
ℎ𝜃 (𝑥) = 𝑔(𝜃 𝑇 𝑥)
Where the function 𝑔 is sigmoid function defined as:
1
𝑔(𝑧) =
1 + 𝑒 −𝑧
The sigmoid function has special properties that result the values in range [0,1], as
visualized on Figure 5.
𝑚
1
𝐽(𝜃) = ∑[−𝑦 (𝑖) log (ℎ𝜃 (𝑥 (𝑖) )) − (1 − 𝑦 (𝑖) ) log(1 − ℎ𝜃 (𝑥 (𝑖) ))]
𝑚
𝑖=1
To find the minimum of this cost function, in machine learning we will use a built-
in function called fmin_bfgs2, which finds the best parameters 𝜃 for the logistic
regression cost function given a fixed dataset (of 𝑥 and 𝑦 values). The parametars are
the initial values of the parameters that need to be optimized and a function that when
given the training set and a particular𝜃, computes the logistic regression cost and
gradient with respect to 𝜃 for the dataset with 𝑥 and 𝑦 values. The final 𝜃 value will
be used to plot the decision boundary of the training data.
Conclusion
As discussed in the paper, for the supervised learning it may be concluded that is
one of the dominant methodology in machine learning. The techniques that are used
are even more successful than the unsupervised techniques because the ability of
labelled training data provide us clearer criteria for model optimization. The
supervised learning methods contain a large set of algorithms which are improving all
the time by the data scientists.
This paper provides an overview of couple of supervised learning algorithms.
There is a brief explanation of the machine learning process. This paper also describes
the basic structure of some various machine learning algorithms and their basic
structure.
This area has the attention from many developers and has gained substantial
progress in the last decade. The learning methods achieved excellent performance that
would have been difficult to obtain in the previous decades. Because of the rapid
progression, there is plenty of space for the developers to work or to improve the
supervised learning methods and their algorithms.
References
2https://gist.github.com/dormantroot/4223554
10
[22] http://gerardnico.com/wiki/data_mining/simple_regression
[23] http://aimotion.blogspot.mk/2011/11/machine-learning-with-python-
logistic.html