Machine Learning
Machine Learning
The machine learning field, which can be briefly defined as enabling computers
make successful predictions using past experiences, has exhibited an impressive development
recently with the help of the rapid increase in the storage capacity and processing power of
computers. Together with many other disciplines, machine learning methods have been widely
employed in bioinformatics. The difficulties and cost of biological analyses have led to the
development of sophisticated machine learning approaches for this application area. In this
chapter, we first review the fundamental concepts of machine learning such as feature
assessment, unsupervised versus supervised learning and types of classification. Then, we point
out the main issues of designing machine learning experiments and their performance evaluation.
Finally, we introduce some supervised learning methods.
The area of machine learning deals with the design of programs that can learn rules from data,
adapt to changes, and improve performance with experience. In addition to being one of the
initial dreams of computer science, Machine Learning has become crucial as computers are
expected to solve increasingly complex problems and become integrated into our daily lives.
This is a hard problem, since making a machine learn from its computational tasks requires work
at several levels, and complexities and ambiguities arise at each of those levels.
Index
AKNOWLEGMENT
CERTIFICARTE
ABSTRACT
Chapter 6 conclusion 23
References 24m
CHAPTER 1
INTRODUCTION TO MACHINE LEARNING
1.1 Background
1.1.1 The Goals of Machine Learning.
The goal of ML, in simples words, is to understand the nature of (human and other forms
of) learning, and to build learning capability in computers. To be more specific, there are
three aspects of the goals of ML.
(1) To make the computers smarter, more intelligent. The more direct objective in this
aspect is to develop systems (programs) for specific practical learning tasks in
application domains.
(2) To develop computational models of human learning process and perform
computer simulations. The study in this aspect is also called cognitive modeling.
(3) To explore new learning methods and develop general learning algorithms
independent of applications.
transfer, and others which shows that machine learning is both an opportunity and challenge,
in the sense that we can hope to discover ways for machine to learn which are better than
ways human learn (the opportunity), and that there are amply amount of difficulties to be
overcome in order to make machines learn.
Over the years, research in machine learning has been pursued with varying degrees
of intensity, using different approaches and placing emphasis on different, aspects and goals.
Within the relatively short history of this discipline, one may distinguish three major periods,
each centered on a different concept:
1 neural modelling and decision-theoretic technique
2 symbolic concept-oriented learning
3 knowledge-intensive approaches combining various learning strategies
a signal. Due to the early computer technology, most of the research under this neural
network model was either theoretical or involved the construction of special purpose
experimental hardware systems. Related research involved the simulation of evolutionary
process that through random mutation and “natural” selection might create a system capable
of some intelligent, behaviour, experience in the above areas spawned the new discipline of
patterns recognition and led to the development of a decision theoretic approach to machine
learning. In this approach, learning is equated with the acquisition of linear, polynomial, or
related discriminated functions from a given set of training examples. One of the best known
successful learning systems utilizing such techniques as well as some original new ideas
involving non linear transformations was Samuel’s checkers program. Through repeated
training, this program acquired master level performance somewhat; difference, but closely
related, techniques utilized methods of statistical a decision theory for learning pattern
recognition rules.
acquire new knowledge a system must already possess a great deal of initial knowledge
CHAPTER 2
OVERVIEW OF MACHINE LEARNING
Machine Learning can still be defined as learning the theory automatically from the
data, through a process of inference, model fitting, or learning from examples:
1 Automated extraction of useful information from a body of data by building good
probabilistic models.
2 Ideally suited for areas with lots of data in the absence of a general theory.
2.1 The Aim of Machine Learning
The field of machine learning can be organized around three primary research Areas:
1 Task-Oriented Studies: The development and analysis of learning systems oriented
toward solving a predetermined set, of tasks (also known as the “engineering
approach”).
2 Cognitive Simulation: The investigation and computer simulation of human learning
processes (also known as the “cognitive modelling approach”)
3 Theoretical Analysis: The theoretical exploration of the space of possible learning
methods and algorithms independent application domain.
Although many research efforts strive primarily towards one of these objectives,
progress in on objective often lends to progress in another. For example, in order to
investigate the space of possible learning methods, a reasonable starting point may be to
consider the only known example of robust learning behaviour, namely humans (and perhaps
other biological systems) Similarly, psychological investigations of human learning may held
by theoretical analysis that may suggest various possible learning models. The need to
acquire a particular form of knowledge in stone task-oriented study may itself spawn new
theoretical analysis or pose the question: “how do humans acquire this specific skill (or
knowledge)?” The existence of these mutually supportive objectives reflects the entire field
of artificial intelligence where expert system research, cognitive simulation, and theoretical
studies provide some cross-fertilization of problems and ideas.
2.2 Machine Learning as a Science
The clear contender for a cognitive invariant in human is the learning mechanism
which is the ability facts, skills and more abstractive concepts. Therefore understanding
human learning well enough to reproduce aspect of that learning behaviour in a computer
system is, in itself, a worthy scientific goal. Moreover, the computer can render substantial
assistance to cognitive psychology, in that it may be used to test the consistency and
completeness of learning theories and enforce a commitment to the fine-structure process
level detail that precludes meaningless tautological or untestable theories (Bishop, 2006).
The study of human learning processes is also of considerable practical significance. Gaining
insights into the principles underlying human learning abilities is likely to lead to more
effective educational techniques. Machine learning research is all about developing
intelligent computer assistant or a computer tutoring systems and many of these goals are
shared within the machine learning fields. According to Jaime et al who stated computer
tutoring are starting to incorporate abilities to infer models of student competence from
observed performance. Inferring the scope of a student’s knowledge and skills in a particular
area allows much more effective and individualized tutoring of the student.
CHAPTER 3
CLASSIFICATION OF MACHINE LEARNING
There are several areas of machine learning that could be exploited to solve the
problems of email management and our approach implemented unsupervised machine
learning method.
In the unsupervised learning problem, we observe only the features and have no
measurements of the outcome. Our task is rather to describe how the data are organized or
clustered”. Trevor Hastie explained that "In unsupervised learning or clustering there is no
explicit teacher, and the system forms clusters or ‘natural groupings’ of the input patterns.
“Natural” is always defined explicitly or implicitly in the clustering system itself; and given a
particular set of patterns or cost function; different clustering algorithms lead to different
clusters. Often the user will set the hypothesized number of different clusters ahead of time,
but how should this be done? According to Richard O. Duda, “How do we avoid
inappropriate representations?"
There are various categories in the field of artificial intelligence. The classifications
of machine learning systems are:
The task of the supervised learner is to predict the value of the function for any valid
input object after having seen a number of training examples (i.e. pairs of input and
targetoutput). To achieve this, the learner has to generalize from the presented data to
unseensituations in a "reasonable" way. Supervised learning is a machine learning technique
whereby the algorithm is first presented with training data which consists of examples which
include both the inputs and the desired outputs; thus enabling it to learn a function. The
learner should then be able to generalize from the presented data to unseen examples." by
Mitchell. Supervised learning also implies we are given a training set of (X, Y) pairs by a
“ t e a c h e r ” . We know (sometimes only approximately) the values of for the m samples in
the training set, ≡ we assume that if we can find a hypothesis, h, that closely agrees with f
for the members of ≡ then this hypothesis will be a good guess for f especially if ≡ is large.
Curve fitting is a simple example of supervised learning of a function.
2 Unsupervised Machine Learning: Unsupervised learning is a type of machine learning
where manual labels of inputs are not used. It is distinguished from supervised learning
approaches which learn how to perform a task, such as classification or regression, using a
set of human prepared examples. Unsupervised learning means we are only given the Xs and
some (ultimate) feedback function on our performance. We simply have a training set of
vectors without function values of them. The problem in this case, typically, is to partition
the training set into subsets, ≡1 ……≡ R , in some appropriate way.
CHAPTER 4
TYPES OF MACHINE LEARNING
Machine learning algorithms are organized into taxonomy, based on the desired
outcome of the algorithm. Common algorithm types include:
1. Supervised learning - where the algorithm generates a function that maps inputs to
desired outputs. One standard formulation of the supervised learning task is the
classification problem: the learner is required to learn (to approximate the behaviour
of a function ) which maps a vector into one of several classes by looking at several
input-output examples of the function.
2. Unsupervised learning - which models a set of inputs, labelled examples are not
available.
3. Semi-supervised learning - which combines both labelled and unlabelled examples
to generate an appropriate function or classifier.
4. Reinforcement learning - where the algorithm learns a policy of how to act given an
observation of the world. Every action has some impact in the environment, and the
environment provides feedback that guides the learning algorithm.
5. Transduction - similar to supervised learning, but does not explicitly construct a
function: instead, tries to predict new outputs based on training inputs, training
outputs, and new inputs.
6. Learning to learn - where the algorithm learns its own inductive bias based on
previous experience.
In the area of supervised learning which deals much with classification. These are the
algorithms types:
a. Linear Classifiers
1. Fisher’s linear discriminant
2. Naïve Bayes Classifier
3 Perceptron
4. Support Vector Machine
b. Quadratic Classifiers
c. Boosting
d. Decision Tree
e. Neural networks
f. Bayesian Networks
4.1. 1 Linear Classifiers:
In machine learning, the goal of classification is to group items that have similar
feature values, into groups. Timothy et al (Timothy Jason Shepard, 1998) stated that a linear
classifier achieves this by making a classification decision based on the value of the linear
combination of the features. If the input feature vector to the classifier is a real vector x, then
the output score is where is a real vector of weights and f is a function that converts the dot
product of the two vectors into the desired output.
4.1.1.1 Fisher’s linear discriminant
Linear discriminant analysis (LDA) and the related Fisher's linear discriminant are
methods used in machine learning to find a linear combination of features which
characterizes or separates two or more classes of objects or events. The resulting combination
may be used as a linear classifier or, more commonly, for dimensionality reduction before
later classification.
4.1.1.2 Naïve Bayes Classifier
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes'
theorem with strong (naive) independence assumptions. A more descriptive term for the
underlying probability model would be "independent feature model".
In simple terms, a naive Bayes classifier assumes that the presence or absence of a
particular feature is unrelated to the presence or absence of any other feature, given the class
variable. For example, a fruit may be considered to be an apple if it is red, round, and about
Department of Electronics Page 14
MACHINE LEARNING
3" in diameter. A naive Bayes classifier considers each of these features to contribute
independently to the probability that this fruit is an apple, regardless of the presence or
absence of the other features.
4.1.1.3 Perceptron
The perceptron is an algorithm for supervised classification of an input into one of
several possible non-binary outputs. The learning algorithm for perceptron is an online
algorithm, in that it processes elements in the training set one at a time.
4.1.1.4 Support vector machines
In machine learning, support vector machines (SVMs) are supervised learning
models with associated learning algorithms that analyse data and recognize patterns, used for
classification and regression analysis. The basic SVM takes a set of input data and predicts,
for each given input, which of two possible classes forms the output, making it a non-
probabilistic binary linear classifier. Given a set of training examples, each marked as
belonging to one of two categories, an SVM training algorithm builds a model that assigns
new examples into one category or the other. An SVM model is a representation of the
examples as points in space, mapped so that the examples of the separate categories are
divided by a clear gap that is as wide as possible. New examples are then mapped into that
same space and predicted to belong to a category based on which side of the gap they fall on.
4.1.2 Quadratic classifier
A quadratic classifier is used in machine learning and statistical classification to
separate measurements of two or more classes of objects or events by a quadric surface. It is
a more general version of the linear classifier
4.1.3Boosting
Boosting is a machine learning meta-algorithm for reducing bias in supervised
learning. Boosting is based on the question posed as “Can a set of weak learners create a
single strong learner?” A weak learner is defined to be a classifier which is only slightly
correlated with the true classification. In contrast, a strong learner is a classifier that is
arbitrarily well-correlated with the true classification
They are usually presented as systems of interconnected "neurons" that can compute
values from inputs by feeding information through the network. Neural networking is the
science of creating computational solutions modeled after the brain. Like the human brain,
neural networks are trainable-once they are taught to solve one complex problem, they can
apply their skills to a new set of problems without having to start the learning process from
scratch. They are capable of machine learning and pattern recognition.
4.1.5 Bayesian network
A Bayesian network, Bayes network, belief network, Bayes(ian) model or
probabilistic directed acyclic graphical model is a probabilistic graphical model (a type
ofstatistical model) that represents a set of random variables and their conditional
dependencies via a directed acyclic graph (DAG). For example, suppose that there are two
events which could cause grass to be wet: either the sprinkler is on or it's raining. Also,
suppose that the rain has a direct effect on the use of the sprinkler (namely that when it rains,
the sprinkler is usually not turned on). Then the situation can be modeled with a Bayesian
network (shown). All three variables have two possible values, T (for true) and F (for false).
4.1.6 Decision Trees
A decision tree is a hierarchical data structure implementing the divide-and-conquer
strategy. It is an efficient nonparametric method, which can be used for both classification
and regression. A decision tree is a hierarchical model for supervised learning whereby the
local region is identified in a sequence of recursive splits in a smaller number of steps. A
decision tree is com posed of internal decision nodes and terminal leaves (see figure). Each
decision node m implements a test function fm(x) with discrete outcomes labeling the
branches. Given an input, at each node, a test is applied and one of the branches is taken
depending on the outcome. This process starts at the root and is repeated recursively until a
leaf node is hit, at which point the value written in the leaf constitutes the output.
CHAPTER 5
WELLSPRINGS OF MACHINE LEARNING
Work in machine learning is now converging from several sources. These different
traditions each bring different methods and different vocabulary which are now being
assimilated into a more unified discipline. Here is a brief listing of some of the separate
disciplines that have contributed to machine learning;
5.1 Statistics
A long-standing problem in statistics is how best to use samples drawn from unknown
probability distributions to help decide from which distribution some new sample is drawn. A
related problem is how to estimate the value of an unknown function at a new point given the
values of this function at a set of sample points. Statistical methods for dealing with these
problems can be considered instances of machine learning because the decision and
estimation rules depend on a corpus of samples drawn from the problem environment.
5.2 Brain Models
Non-linear elements with weighted inputs have been suggested as simple models of
biological neurons. Brain modelers are interested in how closely these networks approximate
the learning phenomena of living brains. Several important machine learning techniques are
based on networks of nonlinear elements often called neural networks. Work inspired by this
school is sometimes called connectionism, brain-style computation, or sub-symbolic
processing.
network methods. More recent work of this sort has been influenced by activities in artificial.
CHAPTER 6
CONCLUSION
Better understanding how auxiliary information, such as unlabeled data, hints from a
user, or previously-learned tasks, can best be used by a machine learning algorithm to
improve its ability to learn new things. Traditionally, Machine Learning Theory has focused
on problems of learning a task (say, identifying spam) from labeled examples (email labeled
as spam or not). However, often there is additional information available. One might have
access to large quantities of unlabeled data (email messages not labeled by their type, or
discussion-group transcripts on the web) that could potentially provide useful information.
One might have other hints from the user besides just labels, e.g. highlighting relevant
portions of the email message. Or, one might have previously learned similar tasks and want
to transfer some of that experience to the job at hand. These are all issues for which a solid
theory is only beginning to be developed.
Further developing connections to economic theory. As software
agents based on machine learning are used in competitive settings, “strategic” issues become
increasingly important. Most algorithms and models to date have focused on the case of a
single learning algorithm operating in an environment that, while it may be changing, does
not have its own motivations and strategies. However, if learning algorithms are to operate in
settings dominated by other adaptive algorithms acting in their own users’ interests, such as
bidding on items or performing various kinds of negotiations, then we have a true merging of
computer science and economic models. In this combination, many of the fundamental issues
are still wide open.
Development of learning algorithms with an eye towards the use of
learning as part of a larger system. Most machine learning models view learning as a
standalone process, focusing on prediction accuracy as the measure of performance.
However, when a learning algorithm is placed in a larger system, other issues may come into
play. For example, one would like algorithms that have more powerful models of their own
confidence or that can optimize multiple objectives. One would like models that capture the
process of deciding what to learn, in addition to how to learn it. There has been some
theoretical work on these issues, but there is certainly is much more to be done.
Machine Learning Theory is both a fundamental theory with many basic and
compelling foundational questions, and a topic of practical importance that helps to advance
the state of the art in software by providing mathematical frameworks for designing new
machine learning algorithms. It is an exciting time for the field, as connections to many other
areas are being discovered and explored, and as new machine learning applications bring
new questions to be modeled and studied. It is safe to say that the potential of Machine
Learning and its theory lie beyond the frontiers of our imagination.
REFERENCES