Candidate - Elimination Algorihm
Candidate - Elimination Algorihm
Lecture Outline:
• Version Spaces
• Inductive Bias
Reading:
Chapter 2 of Mitchell
• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.
To overcome this, introduce notion of version space and algorithms to compute it.
• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.
To overcome this, introduce notion of version space and algorithms to compute it.
• A hypothesis h is consistent with a set of training examples D of target concept c if and only if
h(x) = c(x) for each training example hx, c(x)i in D.
• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.
To overcome this, introduce notion of version space and algorithms to compute it.
• A hypothesis h is consistent with a set of training examples D of target concept c if and only if
h(x) = c(x) for each training example hx, c(x)i in D.
• The version space, V SH,D , with respect to hypothesis space H and training examples D, is the
subset of hypotheses from H consistent with all training examples in D.
• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.
To overcome this, introduce notion of version space and algorithms to compute it.
• A hypothesis h is consistent with a set of training examples D of target concept c if and only if
h(x) = c(x) for each training example hx, c(x)i in D.
• The version space, V SH,D , with respect to hypothesis space H and training examples D, is the
subset of hypotheses from H consistent with all training examples in D.
<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>
<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>
• The Candidate-Elimination algorithm represents the version space by recording only the
most general members (G) and its most specific members (S)
– other intermediate members in general-to-specific ordering can be generated as needed
COM3250 / 6170 4-b 2010-2011
The C ANDIDATE -E LIMINATION Algorithm (cont)
• The General boundary, G, of version space V SH,D is the set of its maximally general
members
• The Specific boundary, S, of version space V SH,D is the set of its maximally specific
members
• The General boundary, G, of version space V SH,D is the set of its maximally general
members
• The Specific boundary, S, of version space V SH,D is the set of its maximally specific
members
• The General boundary, G, of version space V SH,D is the set of its maximally general
members
• The Specific boundary, S, of version space V SH,D is the set of its maximally specific
members
S0 {<φ,φ,φ,φ,φ,φ>}
T1
S1 {<Sunny,Warm,Normal,Strong,Warm,Same>}
Training Examples:
T2
T1: hSunny,Warm, Normal, Strong,Warm, Samei,Yes S2 , S 3 {<Sunny,Warm,?,Strong,Warm.Same>}
T2: hSunny,Warm, High, Strong,Warm, Samei,Yes
T4
T3: hRainy,Cold, High, Strong,Warm,Changei, No S4 {<Sunny,Warm,?,Strong,?,?>}
T4: hSunny,Warm, High, Strong,Cool,Changei,Yes
G4 {<Sunny,?,?,?,?,?> <?,Warm,?,?,?,?>}
T4
T3
G0 ,G 1 ,G 2 {<?,?,?,?,?,?>}
• If algorithm can request next training example (e.g. from teacher) can increase speed of
convergence by requesting examples that split the version space
– E.g. T5: hSunny,Warm, Normal, Light,Warm, Samei satisfies 3 hypotheses in previous
example
∗ If T5 positive, S generalised, 3 hypotheses eliminated
∗ If T5 negative, G specialised, 3 hypotheses eliminated
– Optimal query strategy is to request examples that exactly split version space – converge in
⌈log2 |V S|⌉ steps. However, this is not always possible.
• When using (i.e not training) a classifier that has not completely converged, new examples
may be
1. classed as positive by all h ∈ V S
2. classed as negative by all h ∈ V S
3. classed as positive by some, and negative by other, h ∈ V S
Cases 1 and 2 are unproblematic. In case 3. may want to consider proportion of positive vs.
negative classifications (but then a priori probabilities of hypotheses are relevant)
• Can do this by allowing hypotheses that are arbitrary conjunctions, disjunctions and negations
of our earlier hypotheses
– New problem: concept learning algorithm cannot generalise beyond observed examples!
∗ S boundary = disjunction of positive examples – exactly covers observed positive
examples
∗ G boundary = negation of disjunction of negative examples – exactly rules out observed
negative examples
• Since all inductive learning involves bias, useful to characterise learning approaches by the
type of bias they employ
• Consider
– concept learning algorithm L
– instances X, target concept c
– training examples Dc = {hx, c(x)i}
– let L(xi , Dc ) denote the classification, positive or negative, assigned to the instance xi by L
after training on data Dc .
Definition:
The inductive bias of L is any minimal set of assertions B such that for any target
concept c and corresponding training examples Dc
Inductive system
Classification of
Training examples Candidate new instance, or
Elimination "don’t know"
Algorithm
New instance Using Hypothesis
Space H
Inductive bias
made explicit
• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.
• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.
• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.
• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.
• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.
• The Candidate-Elimination algorithm exploits this theorem by searching for H for the
version space by using the examples in training data D to progressively generalise the specific
booundary and specialise the general boundary.
• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.
• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.
• The Candidate-Elimination algorithm exploits this theorem by searching for H for the
version space by using the examples in training data D to progressively generalise the specific
booundary and specialise the general boundary.
• There are certain concepts the Candidate-Elimination algorithm cannot learn because of the
bias of the hypothesis space – every concept must be representable as a conjunction of
attribute values.
• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.
• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.
• The Candidate-Elimination algorithm exploits this theorem by searching for H for the
version space by using the examples in training data D to progressively generalise the specific
booundary and specialise the general boundary.
• There are certain concepts the Candidate-Elimination algorithm cannot learn because of the
bias of the hypothesis space – every concept must be representable as a conjunction of
attribute values.
• In fact, all inductive learning supposes some a priori assumptions about the nature of the target
concept, or else there is no basis for generalisation beyond observed examples: bias-free
learning is futile.