Module 1 (3) - Pages
Module 1 (3) - Pages
•
•
•
•
•
•
•
•
What exactly should be the value of the target function V for any given board
state?
– Of course any evaluation function that assigns higher scores to better board
states will do.
When learning the target concept, the learner is presented a set of training
examples, each consisting of an instance x from X, along with its target concept
value c(x) (e.g., the training examples in Table 2.1).
Instances for which c(x) = 1 are called positive examples, or members of the target
concept.
Instances for which C(X) = 0 are called negative examples, or nonmembers of the
target
concept.
We will often write the ordered pair (x, c(x)) to describe the training example
consisting of
the instance x and its target concept value c(x).
We use the symbol D to denote the set of available training examples.
Given a set of training examples of the target concept c, the problem faced by the
learner is
to hypothesize, or estimate, c.
We use the symbol H to denote the set of all possible hypotheses that the learner
may
consider regarding the identity of the target concept.
Usually H is determined by the human designer's choice of hypothesis
representation.
In general, each hypothesis h in H represents a boolean-valued function defined
over X;
that is, h : X {O, 1).
The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x
in X.
The Inductive Learning Hypothesis
• Any hypothesis found to approximate the
target function well over a sufficiently large
set of training examples will also approximate
the target function well over other
unobserved examples.
CONCEPT LEARNING AS SEARCH
• General-to-Specific Ordering of Hypotheses
•Now consider the sets of instances that are classified positive by hl and
by h2.
•Because h2 imposes fewer constraints on the instance, it classifies more
instances as positive.
•In fact, any instance classified positive by hl will also be classified
positive by h2. Therefore, we say that h2 is more general than hl.
Instances, hypotheses, and the m o r e - g e n e r a l - t h a n relation.
FIND-S Algorithm
FIND-S
• To illustrate this algorithm, assume the learner
is given the sequence of training examples
from Table 2.1 for the EnjoySport task.
• The first step of FINDS is to initialize h to the
most specific hypothesis in H
This h is still very specific; it asserts that all instances are negative except
for
the single positive training example we have observed.
FIND-S
• Next, the second training example (also
positive in this case) forces the algorithm to
further generalize h,
• this time substituting a "?' in place of any
attribute value in h that is not satisfied by the
new example.
• The refined hypothesis in this case is
FIND-S
• Upon encountering the third training example-in
this case a negative example-the algorithm makes
no change to h.
• In fact, the FIND-S algorithm simply ignores every
negative example!
• While this may at first seem strange, notice that
in the current case our hypothesis h is already
consistent with the new negative example
• (i-e., h correctly classifies this example as
negative), and hence no revision is needed
FIND-S
• In the general case, as long as we assume that the hypothesis space
H contains a hypothesis that describes the true target concept c
and that the training data contains no errors, then the current
hypothesis h can never require a revision in response to a negative
example.
• To see why, recall that the current hypothesis h is the most specific
hypothesis in H consistent with the observed positive examples.
• Because the target concept c is also assumed to be in H and to be
consistent with the positive training examples, c must be
more.general_than-or-equal .
• But the target concept c will never cover a negative example, thus
neither will h (by the definition of more-general~han).
• Therefore, no revision to h will be required in response to any
negative example.
FIND-S
• To complete our trace of FIND-S, the fourth
(positive) example leads to a further
generalization of h
FIND-S
• The FIND-S algorithm illustrates one way in which the
more-general than partial ordering can be used to organize
the search for an acceptable hypothesis.
• The search moves from hypothesis to hypothesis, searching
from the most specific to progressively more general
hypotheses along one chain of the partial ordering.
• Figure 2.2 illustrates this search in terms of the instance
and hypothesis spaces.
• At each step, the hypothesis is generalized only as far as
necessary to cover the new positive example.
• Therefore, at each stage the hypothesis is the most specific
hypothesis consistent with the training examples observed
up to this point (hence the name FIND-S)
FIND-S
FIND-S
• The key property of the FIND-S algorithm is that
for hypothesis spaces described by conjunctions
of attribute constraints (such as H for the
EnjoySport task),
• FIND-S is guaranteed to output the most specific
hypothesis within H that is consistent with the
positive training examples.
• Its final hypothesis will also be consistent with
the negative examples provided the correct
target concept is contained in H, and provided
the training examples are correct.
Problems in FIND-S
• However, there are several questions still left unanswered
by this learning algorithm, such as:
• Has the learner converged to the correct target concept?
• Although FIND-S will find a hypothesis consistent with the
training data, it has no way to determine whether it has
found the only hypothesis in H consistent with the data
(i.e., the correct target concept), or
• whether there are many other consistent hypotheses as
well.
• We would prefer a learning algorithm that could determine
whether it had converged and, if not, at least characterize
its uncertainty regarding the true identity of the target
concept.
Problems in FIND-S
• Why prefer the most specific hypothesis?
• In case there are multiple hypotheses
consistent with the training examples, FIND-S
will find the most specific.
• It is unclear whether we should prefer this
hypothesis over, say, the most general, or
some other hypothesis of intermediate
generality.
Problems in FIND-S
• Are the training examples consistent?
• In most practical learning problems there is some
chance that the training examples will contain at
least some errors or noise.
• Such inconsistent sets of training examples can
severely mislead FIND-S, given the fact that it
ignores negative examples.
• We would prefer an algorithm that could at least
detect when the training data is inconsistent and,
preferably, accommodate such errors.
Problems in FIND-S
• What if there are several maximally specific consistent hypotheses?
• In the hypothesis language H for the EnjoySport task, there is
always a unique, most specific hypothesis consistent with any set
of positive examples.
• However, for other hypothesis spaces (discussed later) there can be
several maximally specific hypotheses consistent with the data.
• In this case, FIND-S must be extended to allow it to backtrack on its
choices of how to generalize the hypothesis,
– to accommodate the possibility that the target concept lies along a
different branch of the partial ordering than the branch it has
selected.
• Furthermore,
• we can define hypothesis spaces for which there is no maximally
specific consistent hypothesis, although this is more of a theoretical
issue than a practical one (see Exercise 2.7).
VERSION SPACES AND THE CANDIDATE-ELIMINATION
ALGORITHM
•
•
•
•
•
•
•
As long as the sets G and S are well defined (see Exercise 2.7), they
completely specify the version space.
In particular, we can show that the version space is precisely the set of
hypotheses contained in G, plus those contained in S, plus
those that lie between G and S in the partially ordered hypothesis space
CANDIDATE-ELIMINATION Algorithm
CANDIDATE-ELIMINATION Algorithm
CANDIDATE-ELIMINATION Algorithm An Illustrative Example
•CANDIDATE-ELIMINATTraIcOe N1. S0
and G0 are the initial boundary sets
corresponding to the most
specific and most general hypotheses.
•Training examples 1 and 2 force the S
boundary to become
more general, as in the FIND-S
algorithm. They have no effect on the
G boundary.
CANDIDATE-ELIMINATION Algorithm An Illustrative Example
• As illustrated by these first two steps, positive training examples
may force the S boundary of the version space to become
increasingly general.
• Negative training examples play the complimentary role of forcing
the G boundary to become increasingly specific.
• Consider the third training example, shown in Figure 2.5.
• This negative example reveals that the G boundary of the version
space is overly general; that is, the hypothesis in G incorrectly
predicts that this new example is a positive example.
• The hypothesis in the G boundary must therefore be specialized
until it correctly classifies this new negative example.
• As shown in Figure 2.5, there are several alternative minimally more
specific hypotheses.
• All of these become members of the new G3 boundary set.
CANDIDATE-ELIMINATION Algorithm An Illustrative Example
CANDIDATE-ELIMINATION Algorithm An Illustrative Example
• Given that there are six attributes that could be
specified to specialize G2,
• why are there only three new hypotheses in G3?
• For example, the hypothesis
• h = (?, ?, Normal, ?, ?, ?) is a minimal
specialization of G2 that correctly labels the new
example as a negative example, but it is not
included in G3.
• The reason this hypothesis is excluded is that it is
inconsistent with the previously encountered
positive examples.
CANDIDATE-ELIMINATION Algorithm An Illustrative Example
• The fourth training example, as shown in
Figure 2.6, further generalizes the
• S boundary of the version space.
• It also results in removing one member of the
G boundary, because this member fails to
cover the new positive example.
CANDIDATE-ELIMINATION Algorithm An Illustrative Example
• After processing these four examples, the
boundary sets S4 and G4 delimit the version
space of all hypotheses consistent with the set
of incrementally observed training examples.
The entire version space, including those
hypotheses bounded by S4 and G4, is shown
in Figure 2.7.
CANDIDATE-ELIMINATION Algorithm An Illustrative Example
A Biased Hypothesis Space
Permutation Pblm – Syntactically
Distinct Hypothesis
The Futility of Bias-Free Learning