0% found this document useful (0 votes)
81 views39 pages

Candidate - Elimination Algorihm

The document describes the candidate elimination algorithm for concept learning. It begins with a review of version spaces and the list-then-eliminate algorithm. The candidate elimination algorithm uses a more compact representation of the version space by tracking just the most general and most specific boundaries. It processes training examples by using positive examples to make the specific boundary more specific, and negative examples to make the general boundary more general.

Uploaded by

Dr B Anjanadevi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views39 pages

Candidate - Elimination Algorihm

The document describes the candidate elimination algorithm for concept learning. It begins with a review of version spaces and the list-then-eliminate algorithm. The candidate elimination algorithm uses a more compact representation of the version space by tracking just the most general and most specific boundaries. It processes training examples by using positive examples to make the specific boundary more specific, and negative examples to make the general boundary more general.

Uploaded by

Dr B Anjanadevi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Version Spaces + Candidate Elimination

Lecture Outline:

• Quick Review of Concept Learning and General-to-Specific Ordering

• Version Spaces

• The Candidate Elimination Algorithm

• Inductive Bias

Reading:
Chapter 2 of Mitchell

COM3250 / 6170 1 2010-2011


Version Spaces

• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.

To overcome this, introduce notion of version space and algorithms to compute it.

COM3250 / 6170 2 2010-2011


Version Spaces

• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.

To overcome this, introduce notion of version space and algorithms to compute it.

• A hypothesis h is consistent with a set of training examples D of target concept c if and only if
h(x) = c(x) for each training example hx, c(x)i in D.

Consistent(h, D) ≡ (∀hx, c(x)i ∈ D) h(x) = c(x)

COM3250 / 6170 2-a 2010-2011


Version Spaces

• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.

To overcome this, introduce notion of version space and algorithms to compute it.

• A hypothesis h is consistent with a set of training examples D of target concept c if and only if
h(x) = c(x) for each training example hx, c(x)i in D.

Consistent(h, D) ≡ (∀hx, c(x)i ∈ D) h(x) = c(x)

• The version space, V SH,D , with respect to hypothesis space H and training examples D, is the
subset of hypotheses from H consistent with all training examples in D.

V SH,D ≡ {h ∈ H|Consistent(h, D)}

COM3250 / 6170 2-b 2010-2011


Version Spaces

• One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with
the training data – there might be many.

To overcome this, introduce notion of version space and algorithms to compute it.

• A hypothesis h is consistent with a set of training examples D of target concept c if and only if
h(x) = c(x) for each training example hx, c(x)i in D.

Consistent(h, D) ≡ (∀hx, c(x)i ∈ D) h(x) = c(x)

• The version space, V SH,D , with respect to hypothesis space H and training examples D, is the
subset of hypotheses from H consistent with all training examples in D.

V SH,D ≡ {h ∈ H|Consistent(h, D)}

• Note difference between definitions of consistent and satisfies:


– an example x satisfies hypothesis h when h(x) = 1, regardless of whether x is +ve or −ve
example of target concept
– an example x is consistent with hypothesis h iff h(x) = c(x)

COM3250 / 6170 2-c 2010-2011


The L IST-T HEN -E LIMINATE Algorithm

• Can represent version space by listing all members.

• Leads to List-Then-Eliminate concept learning algorithm:

1. VersionSpace ← a list containing every hypothesis in H


2. For each training example, hx, c(x)i
remove from VersionSpace any hypothesis h for which h(x) 6= c(x)
3. Output the list of hypotheses in VersionSpace

• List-Then-Eliminate works in principle, so long as version space is finite.

• However, since it requires exhaustive enumeration of all hypotheses in practice it is not


feasible.

• Is there a more compact way to represent version spaces?

COM3250 / 6170 3 2010-2011


The C ANDIDATE -E LIMINATION Algorithm

• The Candidate-Elimination algorithm is similar to List-Then-Eliminate algorithm but uses


a more compact representation of version space.
– represents version space by its most general and most specific members

COM3250 / 6170 4 2010-2011


The C ANDIDATE -E LIMINATION Algorithm

• The Candidate-Elimination algorithm is similar to List-Then-Eliminate algorithm but uses


a more compact representation of version space.
– represents version space by its most general and most specific members
• For EnjoySport example Find-S outputs the hypothesis: h = hSunny,Warm, ?, Strong, ?, ?i
which was one of 6 hypotheses consistent with the data.

S: { <Sunny, Warm, ?, Strong, ?, ?> }

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G: { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }

COM3250 / 6170 4-a 2010-2011


The C ANDIDATE -E LIMINATION Algorithm

• The Candidate-Elimination algorithm is similar to List-Then-Eliminate algorithm but uses


a more compact representation of version space.
– represents version space by its most general and most specific members
• For EnjoySport example Find-S outputs the hypothesis: h = hSunny,Warm, ?, Strong, ?, ?i
which was one of 6 hypotheses consistent with the data.

S: { <Sunny, Warm, ?, Strong, ?, ?> }

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G: { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }

• The Candidate-Elimination algorithm represents the version space by recording only the
most general members (G) and its most specific members (S)
– other intermediate members in general-to-specific ordering can be generated as needed
COM3250 / 6170 4-b 2010-2011
The C ANDIDATE -E LIMINATION Algorithm (cont)

• The General boundary, G, of version space V SH,D is the set of its maximally general
members

• The Specific boundary, S, of version space V SH,D is the set of its maximally specific
members

COM3250 / 6170 5 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

• The General boundary, G, of version space V SH,D is the set of its maximally general
members

• The Specific boundary, S, of version space V SH,D is the set of its maximally specific
members

• Version Space Representation Theorem


Every member of the version space lies between these boundaries

V SH,D = {h ∈ H|(∃s ∈ S)(∃g ∈ G)(g ≥g h ≥g s)}

where x ≥g y means x is more general or equal to y


(see Mitchell, p. 32, for proof)

COM3250 / 6170 5-a 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

• The General boundary, G, of version space V SH,D is the set of its maximally general
members

• The Specific boundary, S, of version space V SH,D is the set of its maximally specific
members

• Version Space Representation Theorem


Every member of the version space lies between these boundaries

V SH,D = {h ∈ H|(∃s ∈ S)(∃g ∈ G)(g ≥g h ≥g s)}

where x ≥g y means x is more general or equal to y


(see Mitchell, p. 32, for proof)

• Intuitively, Candidate-Elimination algorithm proceeds by


– initialising G and S to the maximally general and maximally specific hypotheses in H
– considering each training example in turn and
∗ using positive examples to drive the maximally specific boundary up
∗ using negative examples to drive the maximally general boundary down

COM3250 / 6170 5-b 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H

COM3250 / 6170 6 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do

COM3250 / 6170 6-a 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example

COM3250 / 6170 6-b 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
– Remove from G any hypothesis inconsistent with d

COM3250 / 6170 6-c 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
– Remove from G any hypothesis inconsistent with d
– For each hypothesis s in S that is not consistent with d

COM3250 / 6170 6-d 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
– Remove from G any hypothesis inconsistent with d
– For each hypothesis s in S that is not consistent with d
∗ Remove s from S

COM3250 / 6170 6-e 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
– Remove from G any hypothesis inconsistent with d
– For each hypothesis s in S that is not consistent with d
∗ Remove s from S
∗ Add to S all minimal generalizations h of s such that

COM3250 / 6170 6-f 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
– Remove from G any hypothesis inconsistent with d
– For each hypothesis s in S that is not consistent with d
∗ Remove s from S
∗ Add to S all minimal generalizations h of s such that
1. h is consistent with d, and
2. some member of G is more general than h

COM3250 / 6170 6-g 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
– Remove from G any hypothesis inconsistent with d
– For each hypothesis s in S that is not consistent with d
∗ Remove s from S
∗ Add to S all minimal generalizations h of s such that
1. h is consistent with d, and
2. some member of G is more general than h
∗ Remove from S any hypothesis that is more general than another hypothesis in S

COM3250 / 6170 6-h 2010-2011


The C ANDIDATE -E LIMINATION Algorithm (cont)

G ← maximally general hypotheses in H


S ← maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
– Remove from G any hypothesis inconsistent with d
– For each hypothesis s in S that is not consistent with d
∗ Remove s from S
∗ Add to S all minimal generalizations h of s such that
1. h is consistent with d, and
2. some member of G is more general than h
∗ Remove from S any hypothesis that is more general than another hypothesis in S
• If d is a negative example
– Remove from S any hypothesis inconsistent with d
– For each hypothesis g in G that is not consistent with d
∗ Remove g from G
∗ Add to G all minimal specializations h of g such that
1. h is consistent with d, and
2. some member of S is more specific than h
∗ Remove from G any hypothesis that is less general than another hypothesis in G

COM3250 / 6170 6-i 2010-2011


The C ANDIDATE -E LIMINATION Algorithm: Example

S0 {<φ,φ,φ,φ,φ,φ>}

T1

S1 {<Sunny,Warm,Normal,Strong,Warm,Same>}

Training Examples:
T2
T1: hSunny,Warm, Normal, Strong,Warm, Samei,Yes S2 , S 3 {<Sunny,Warm,?,Strong,Warm.Same>}
T2: hSunny,Warm, High, Strong,Warm, Samei,Yes
T4
T3: hRainy,Cold, High, Strong,Warm,Changei, No S4 {<Sunny,Warm,?,Strong,?,?>}
T4: hSunny,Warm, High, Strong,Cool,Changei,Yes

<Sunny,?,?,Strong,?,?> <Sunny,Warm,?,?,?,?> <?,Warm,?,Strong,?,?>

G4 {<Sunny,?,?,?,?,?> <?,Warm,?,?,?,?>}

T4

G3 {<Sunny,?,?,?,?,?> <?,Warm,?,?,?,?> <?,?,?,?,?,Same>}

T3

G0 ,G 1 ,G 2 {<?,?,?,?,?,?>}

COM3250 / 6170 7 2010-2011


The C ANDIDATE -E LIMINATION Algorithm: Remarks

• Version space learned by Candidate-Elimination algorithm will converge towards correct


hypothesis provided:
– no errors in training examples
– there is a hypothesis in H that describes target concept
In such cases algorithm may converge to empty version space

COM3250 / 6170 8 2010-2011


The C ANDIDATE -E LIMINATION Algorithm: Remarks

• Version space learned by Candidate-Elimination algorithm will converge towards correct


hypothesis provided:
– no errors in training examples
– there is a hypothesis in H that describes target concept
In such cases algorithm may converge to empty version space

• If algorithm can request next training example (e.g. from teacher) can increase speed of
convergence by requesting examples that split the version space
– E.g. T5: hSunny,Warm, Normal, Light,Warm, Samei satisfies 3 hypotheses in previous
example
∗ If T5 positive, S generalised, 3 hypotheses eliminated
∗ If T5 negative, G specialised, 3 hypotheses eliminated
– Optimal query strategy is to request examples that exactly split version space – converge in
⌈log2 |V S|⌉ steps. However, this is not always possible.

COM3250 / 6170 8-a 2010-2011


The C ANDIDATE -E LIMINATION Algorithm: Remarks (cont)

• When using (i.e not training) a classifier that has not completely converged, new examples
may be
1. classed as positive by all h ∈ V S
2. classed as negative by all h ∈ V S
3. classed as positive by some, and negative by other, h ∈ V S

Cases 1 and 2 are unproblematic. In case 3. may want to consider proportion of positive vs.
negative classifications (but then a priori probabilities of hypotheses are relevant)

COM3250 / 6170 9 2010-2011


Inductive Bias

• As noted, version space learned by Candidate-Elimination algorithm will converge towards


correct hypothesis provided:
– no errors in training examples
– there is a hypothesis in H that describes target concept
What if no concept in H that describes the target concept?

COM3250 / 6170 10 2010-2011


Inductive Bias

• As noted, version space learned by Candidate-Elimination algorithm will converge towards


correct hypothesis provided:
– no errors in training examples
– there is a hypothesis in H that describes target concept
What if no concept in H that describes the target concept?

• Consider the training data


Example Sky Temp Humid Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Cloudy Warm Normal Strong Warm Same Yes
3 Rainy Warm Normal Strong Warm Same No

COM3250 / 6170 10-a 2010-2011


Inductive Bias

• As noted, version space learned by Candidate-Elimination algorithm will converge towards


correct hypothesis provided:
– no errors in training examples
– there is a hypothesis in H that describes target concept
What if no concept in H that describes the target concept?

• Consider the training data


Example Sky Temp Humid Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Cloudy Warm Normal Strong Warm Same Yes
3 Rainy Warm Normal Strong Warm Same No

• No hypotheses consistent with 3 examples.


Most specific hypothesis consistent with Ex 1 and 2 and representable in H:

h?, Warm, Normal, Strong, Warm, Samei

But this is inconsistent with Ex 3.

COM3250 / 6170 10-b 2010-2011


Inductive Bias (cont)

• Need more expressive hypothesis representation language.


E.g. allow disjunctive or negative attribute values:

Sky = Sunny ∨Cloudy


Sky 6= Rainy

COM3250 / 6170 11 2010-2011


An Unbiased Learner

• What about ensuring every concept can be represented in H?


– Since concepts are subsets of instance space X, want H to be able to represent any set in
power set of X
∗ for EnjoySport there were 96 possible instances
so, power set contains 296 ≈ 1028 possible target concepts
∗ recall biased conjunctive hypothesis space can represent only 973 of these

• Can do this by allowing hypotheses that are arbitrary conjunctions, disjunctions and negations
of our earlier hypotheses
– New problem: concept learning algorithm cannot generalise beyond observed examples!
∗ S boundary = disjunction of positive examples – exactly covers observed positive
examples
∗ G boundary = negation of disjunction of negative examples – exactly rules out observed
negative examples

COM3250 / 6170 12 2010-2011


An Unbiased Learner

• Capacity of Candidate-Elimination to generalise lies in its implicit assumption of bias – that


target concept can be represented as a conjunction of attribute values

• Fundamental property of inductive inference:


a learner that makes no a priori assumptions regarding the identity of the target concept has
no rational basis for classifying any unseen instances
I.e. bias-free learning is futile

COM3250 / 6170 13 2010-2011


Inductive Bias, More Formally

• Since all inductive learning involves bias, useful to characterise learning approaches by the
type of bias they employ

• Consider
– concept learning algorithm L
– instances X, target concept c
– training examples Dc = {hx, c(x)i}
– let L(xi , Dc ) denote the classification, positive or negative, assigned to the instance xi by L
after training on data Dc .

Definition:
The inductive bias of L is any minimal set of assertions B such that for any target
concept c and corresponding training examples Dc

(∀xi ∈ X)[(B ∧ Dc ∧ xi ) ⊢ L(xi , Dc )]

where A ⊢ B means A logically entails B

COM3250 / 6170 14 2010-2011


Modelling Inductive Systems by Deductive Systems

Inductive system
Classification of
Training examples Candidate new instance, or
Elimination "don’t know"
Algorithm
New instance Using Hypothesis
Space H

Equivalent deductive system


Classification of
Training examples new instance, or
"don’t know"
Theorem Prover
New instance

Assertion " H contains


the target concept"

Inductive bias
made explicit

COM3250 / 6170 15 2010-2011


Summary

• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.

COM3250 / 6170 16 2010-2011


Summary

• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.

• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.

COM3250 / 6170 16-a 2010-2011


Summary

• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.

• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.

• The Candidate-Elimination algorithm exploits this theorem by searching for H for the
version space by using the examples in training data D to progressively generalise the specific
booundary and specialise the general boundary.

COM3250 / 6170 16-b 2010-2011


Summary

• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.

• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.

• The Candidate-Elimination algorithm exploits this theorem by searching for H for the
version space by using the examples in training data D to progressively generalise the specific
booundary and specialise the general boundary.

• There are certain concepts the Candidate-Elimination algorithm cannot learn because of the
bias of the hypothesis space – every concept must be representable as a conjunction of
attribute values.

COM3250 / 6170 16-c 2010-2011


Summary

• The version space with respect to a hypothesis space H and a set of training examples D is the
subset of all hypotheses in H consistent with all the examples in D.

• The version space may be compactly represented by recording its general boundary G and
specific boundary S.
Every hypothesis in the version space is guaranteed to lie between G and S by the version
space representation theorem.

• The Candidate-Elimination algorithm exploits this theorem by searching for H for the
version space by using the examples in training data D to progressively generalise the specific
booundary and specialise the general boundary.

• There are certain concepts the Candidate-Elimination algorithm cannot learn because of the
bias of the hypothesis space – every concept must be representable as a conjunction of
attribute values.

• In fact, all inductive learning supposes some a priori assumptions about the nature of the target
concept, or else there is no basis for generalisation beyond observed examples: bias-free
learning is futile.

COM3250 / 6170 16-d 2010-2011

You might also like