Introduction To ML,: Module-I
Introduction To ML,: Module-I
Introduction to ML,
Module 1- Outline
▪Chapter 1: Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
•Summary
Slides by :Harivinod N 4
Ex1: A checkers learning problem
Slides by:Harivinod N 5
Checkers Board
Slides by :Harivinod N 6
Ex2: A handwriting recognition
learning problem
▪Task T: recognizing and classifying handwritten
words within images
▪Performance measure P: percent of words
correctly classified
▪Training experience E:
a database of
handwritten words with
given classifications
Slides by:Harivinod N 7
Slides by :Harivinod N 8
Module 1- Outline
▪Chapter 1: Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
•Summary
Slides by :Harivinod N 10
Problem Description:
Slides by :Harivinod N 12
Designing a learning system
Slides by:Harivinod N 13
▪Remaining choices
•The exact type of knowledge to be learned
•A representation for this target knowledge
•A learning mechanism
Slides by :Harivinod N 16
Designing a learning system
Slides by:Harivinod N 17
Slides by :Harivinod N 18
Choosing the Target Function (2)
Slides by:Harivinod N 19
Slides by :Harivinod N 20
Choosing the Target Function (4)
Slides by :Harivinod N 22
Choosing the Target Function (6)
Slides by:Harivinod N 23
Slides by :Harivinod N 24
Choosing a Representation for the
Target function
▪Choice of representations involve trade offs
• Pick a very expressive representation to allow close approximation
to the ideal target function V
• More expressive, more training data required to choose among
alternative hypotheses
▪Use linear combination of the following board features:
• x1: the number of black pieces on the board
• x2: the number of red pieces on the board
• x3: the number of black kings on the board
• x4: the number of red kings on the board
• x5: the number of black pieces threatened by red (i.e. which
can be captured on red's next turn)
• x6: the number of red pieces threatened by black
ˆ
V (b) = w + w x + w x + w x + w x + w x + w x
0 1 1 2 2 3 3 4 4 5 5 66 25
Slides by:Harivinod N
Slides by :Harivinod N 26
Designing a learning system
Slides by:Harivinod N 27
Slides by :Harivinod N 28
Choosing a Function Approximation
Algorithm
▪To learn we require a set of training examples
describing the board b and the training value Vtrain(b)
▪Ordered b,Vtrain(b)
Pair
x = 3, x = 0, x = 1, x = 0, x = 0, x = 0 , +100
1 2 3 4 5 6
Slides by:Harivinod N 29
Slides by :Harivinod N 30
Rule for estimating training values:
Choosing a Function Approximation
Algorithm
2. Adjusting the weights
One common approach is to define the best hypothesis,
or set of weights, as that which minimizes the square
error E between the training values and the values
predicted by the hypothesis V.
Slides by:Harivinod N 31
Slides by :Harivinod N 32
Designing a learning system
Slides by:Harivinod N 33
Slides by :Harivinod N 36
Summary of choices
in designing the
checkers learning
program
Slides by:Harivinod N 37
Module 1- Outline
▪Chapter 1: Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
•Summary
Perspective in ML
▪One useful perspective on machine learning is that
it involves searching a very large space of
possible hypotheses to determine
one that best fits the observed data
and any prior knowledge held by the learner.
▪For example, consider the space of hypotheses that
could in principle be output by the above checkers
learner.
▪ This hypothesis space
consists of all evaluation
functions
that can be represented by
some choice of values for the weights w0 through w6.
Slides by:Harivinod N 39
Perspective in ML
▪The learner's task is thus to search through this vast
space to locate the hypothesis that is most consistent
with the available training examples.
▪The LMS algorithm for fitting weights achieves this
goal by iteratively tuning the weights, adding a
correction to each weight each time the
hypothesized evaluation function predicts a value
that differs from the training value.
▪This algorithm works well when the hypothesis
representation considered by the learner defines a
continuously parameterized space of potential
hypotheses.
Slides by :Harivinod N 40
Issues in ML
Our checkers example raises a number of generic
questions about ML. It is concerned with answering
questions like;
▪What algorithms exist for learning general target
functions from specific training examples? In what
settings will particular algorithms converge to the
desired function, given sufficient training data? Which
algorithms perform best for which types of problems
and representations?
▪How much training data is sufficient? What general
bounds can be found to relate the confidence in learned
hypotheses to the amount of training experience and
the character of theSlides
learner's hypothesis space?
by:Harivinod N 41
Issues in ML
▪When and how can prior knowledge held by the learner
guide the process of generalizing from examples?
Can prior knowledge be helpful even when it is
only approximately correct?
▪What is the best strategy for choosing a useful next
training experience, and how does the choice of this
strategy alter the complexity of the learning problem?
▪What is the best way to reduce the learning task to one
or more function approximation problems?
Put another way, what specific functions should the
system attempt to learn? Can this process itself be
automated?
▪How can the learner automatically alter its
Slides by :Harivinod N 42
representation to improve its ability to represent and
learn the target function?
Module 1- Outline
▪Chapter 1: Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
•Summary
Summary
▪ML have proven to be of great value in a variety of
applications.
1. Data mining problems containing large databases
a. To analyze outcomes of medical treatments from
patient databases
b. To learn general rules for credit worthiness from
financial databases
2. Poorly understood domains where humans might not
have the knowledge needed to develop effective
algorithms
e.g., human face recognition from images
3. Domains where the program must dynamically
adapt to changing conditions
44
a. controlling manufacturing processes
Slides by :Harivinod N under changing supply
stocks
Summary
▪Machine learning draws on ideas from a diverse set
of disciplines, including AI, probability and statistics,
computational complexity, information theory,
psychology and neurobiology, control theory, and
philosophy.
▪A well-defined learning problem requires a
well-specified task, performance metric, and source of
training experience.
▪Designing a machine learning approach involves a
number of design choices, including
•choosing the type of training experience,
•the target function to be learned,
•a representationSlides
for by:Harivinod
this target N
function, and 45
•an algorithm for learning the target function from
training examples.
Summary
▪Learning involves search:
searching through a space of possible hypotheses
to find the hypothesis that best fits the available
training examples and other prior constraints or
knowledge.
Slides by :Harivinod N 46
Module 1- Outline
▪Chapter 1: Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
•Summary
Concept Learning
▪Inducing general functions from specific training
examples is a main issue of machine learning.
▪A task of acquiring a potential hypothesis (solution) that
best fits the training examples
▪It is the process of acquiring the definition of a
general category from given sample positive and
negative training examples of the category.
▪Concept Learning can seen as a problem of searching
through a predefined space of potential hypotheses for
the hypothesis that best fits the training examples.
▪Concept learning: Inferring a boolean-valued function
from training examples of its input and output.
Slides by :Harivinod N 48
Module 1- Outline
▪Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
▪Concept Learning
•Concept learning task
•Concept learning as search
•Find-S algorithm
•Version space
•Candidate Elimination algorithm
•Inductive Bias Slides by:Harivinod N 49
•Summary
Hypothesis
50
Slides by :Harivinod N
Classification
Slides by:Harivinod N 51
Slides by :Harivinod N 52
Concept Learning Task
▪What hypothesis representation shall we provide to
the learner in this case?
▪For each attribute, the hypothesis will either
• indicate by a “ ? ” that any value is acceptable for this attribute,
• specify a single required value (e.g., Warm) for the attribute, or
• indicate by a "Φ" that no value is acceptable.
Slides by :Harivinod N 54
Inductive learning hypothesis
▪Our assumption is that the Fundamental
assumption of inductive learning.
“Best hypothesis regarding unseen instances is
the hypothesis that best fits the observed
training data.”
Module 1- Outline
▪Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
▪Concept Learning
•Concept learning task
•Concept learning as search
•Find-S algorithm
•Version space
•Candidate Elimination algorithm
•Inductive Bias
Slides by :Harivinod N 56
•Summary
Concept learning as search
▪Concept learning can be viewed as the task of searching
through a large space of hypotheses
▪The goal is to find the hypothesis that best fits training
examples.
▪General-to-Specific Ordering of Hypotheses
Harivinod N 57
Slides by :Harivinod N 58
Module 1- Outline
▪Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
▪Concept Learning
•Concept learning task
•Concept learning as search
•Find-S algorithm
•Version space
•Candidate Elimination algorithm
15CS73 •Inductive Bias
- Machine Learning Harivinod N 59
•Summary
Slides by :Harivinod N 60
Find-S Algorithm
To Find Maximally Specific Hypothesis
Illustration
Slides by :Harivinod N 62
Key Property
▪The key property of the Find-S algorithm is that for
hypothesis spaces described by conjunctions of
attribute constraints
▪Find-S is guaranteed to output the most specific
hypothesis within H that is consistent with the positive
training examples.
Find S - Drawback
Questions still left unanswered
▪Has the learner converged to the correct target concept?
▪Why prefer the most specific hypothesis?
If multiple hypotheses consistent with the training
examples, FIND-S will find the most specific. It is
unclear whether we should prefer this hypothesis
▪Are the training examples consistent?
Training examples may contain at least some errors or
noise. Such inconsistent sets of training examples can
severely mislead FIND-S, since it ignores negative
examples.
▪What if there are several maximally specific
consistent hypotheses?
There can be several maximally
Slides by :Harivinod N specific 64
▪Concept Learning
•Concept learning task
•Concept learning as search
•Find-S algorithm
•Version space
•Candidate Elimination algorithm
15CS73 •Inductive Bias
- Machine Learning Harivinod N 65
•Summary
Definition: Consistent
Slides by :Harivinod N 66
Definition: Version Space
Slides by :Harivinod N 68
Few definition
Module 1- Outline
▪Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
▪Concept Learning
•Concept learning task
•Concept learning as search
•Find-S algorithm
•Version space
•Candidate Elimination algorithm
•Inductive Bias Slides by :Harivinod N 70
•Summary
Candidate Elimination Algorithm
Slides by :Harivinod N 72
Illustration
Illustration
Slides by :Harivinod N 74
Illustration
Slides by:Harivinod N 75
Illustration
Slides by :Harivinod N 76
Illustration
Illustration
Slides by :Harivinod N 78
Illustration
Slides by:Harivinod N 79
Illustration
▪After processing these four examples, the boundary
sets S4 and G4 delimit the version space of all
hypotheses consistent with the set of incrementally
observed training examples.
▪The entire version space, including those
hypotheses bounded by S4 and G4.
▪This learned version space is independent of the
sequence in which the training examples are presented
▪As further training data is encountered, the S and G
boundaries will move monotonically closer to each
other
Slides by :Harivinod N 80
Module 1- Outline
▪Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
▪Concept Learning
•Concept learning task
•Concept learning as search
•Find-S algorithm
•Version space
•Candidate Elimination algorithm
15CS73 •Inductive
- Machine LearningBias Harivinod N 81
•Summary
Inductive Bias
▪The CEA will converge toward the true target
concept provided it is given accurate training
examples
▪The fundamental questions for inductive inference in
general.
•What if the target concept is not contained in
the hypothesis space?
•How does the size of this hypothesis space
influence the ability of the algorithm to generalize
to unobserved instances?
•How does the size of the hypothesis space
influence the number of training examples that
must be observed?
Slides by 82
▪Here we examine them in the context of the CEA.
:Harivinod
N
Inductive Bias
1. A Biased Hypothesis Space
▪Suppose we wish to assure that the hypothesis space
contains the unknown target concept.
▪The obvious solution is to enrich the hypothesis
space to include every possible hypothesis.
Inductive Bias
1. A Biased Hypothesis Space
(Continued)
Inductive Bias
2. An unbiased learner (Continued)
▪Let us reformulate the Enjoysport learning task
▪Let H’ represent every subset of instances; that is,
let H' correspond to the power set of X.
▪One way to define such an H' is to allow arbitrary
disjunctions, conjunctions, and negations of our
earlier hypotheses.
▪For instance, the target concept "Sky = Sunny or Sky =
Cloudy" could then be described as
Slides by 86
:Harivinod
N
Inductive Bias
2. An unbiased learner (Continued)
▪Now new problem: we are completely unable to
generalize beyond the observed examples!
▪To see why, suppose we present three positive
examples (xl, x2, x3) and two negative examples (x4,
x5) to the learner.
▪At this point, the S and G boundary of the version
space will be
Inductive Bias
3. The Futility of Bias-Free Learning
▪The fundamental property of inductive inference:
A learner that makes no a priori assumptions
regarding the identity of the target concept has no
rational basis for classifying any unseen instances.
▪CEA generalizes observed training examples because
it was biased by the implicit assumption that the
target concept could be represented by a conjunction
of attribute values.
• If this assumption is correct (and the training examples
are error-free), its classification of new sample will also
be correct.
• If this assumption is incorrect,
Slides however,
by it is certain that the88
CEA will mis-classify at least:Harivinod
some instances from X.
N
Inductive bias
▪Consider a concept learning algorithm L for the
set of instances X.
▪Let c be an arbitrary concept defined over X, and let Dc =
{ x, c(x) } be an arbitrary set of training examples of c.
▪Let L(xi, Dc) denote the classification assigned to the
instance xi by L after training on the data Dc.
▪The inductive bias of L is any minimal set of assertions
B such that for any target concept c and corresponding
training examples Dc
Slides by 90
:Harivinod
N
Algorithms listed from weakest to
strongest bias
▪Rote-Learner: Learning corresponds simply to storing
each observed training example in memory.
▪CEA: New instances are classified only in the case
where all members of the current version space
agree on the classification. Otherwise, the system
refuses to classify the new instance.
▪FIND-S: This algorithm, described earlier, finds the most
specific hypothesis consistent with the training
examples. It then uses this hypothesis to classify all
subsequent instances.
Module 1- Outline
▪Introduction
•Well posed learning problems
•Designing a Learning system
•Perspective and Issues in Machine
Learning
•Summary
▪Concept Learning
•Concept learning task
•Concept learning as search
•Find-S algorithm
•Version space
•Candidate Elimination algorithm
•Inductive Bias Slides by 92
:Harivinod
•Summary N
Summary …(1)
▪Concept learning can be cast as a problem of
searching through a large predefined space of
potential hypotheses.
▪The general-to-specific partial ordering of hypotheses,
provides a useful structure for organizing the search
through the hypothesis space.
▪The Find-S algorithm
utilizes general-to-specific ordering,
performing a specific-to-general
search through the hypothesis
space
along one branch of the partial
ordering, to find the most specific
15CS73hypothesis
- Machine Learning consistent with 93
the N
Harivinod
training examples.
Summary …(2)
▪ The Candidate Elimination Algorithm
utilizes this general-to-specific
ordering to compute the version
space
(the set of all hypotheses consistent with the training
data)
by incrementally computing the sets of maximally
specific (S) and maximally general (G) hypotheses.
Slides by 96
:Harivinod
N