2 Concept-Learning
2 Concept-Learning
Learning
Waiting outside the house to get an autograph.
2
Which days does he come out to enjoy sports?
• Sky condition
• Humidity
• Temperature
• Wind
• Water
• Forecast
3
Learning Task
• We want to make a hypothesis about the day on which
SRK comes out..
– in the form of a boolean function on the attributes of the
day.
4
Training Examples for EnjoySport
c( )=1
c( )=1
c( )=0
c( )=1
8
Approaches to learning algorithms
The choice of the
• Brute force search hypothesis space reduces
– Enumerate all possible hypotheses and evaluate the number of
– Highly inefficient even for small EnjoySport examplehypotheses.
|X| = 3.2.2.2.2= 96 distinct instances
Large number of syntactically distinct hypotheses (0’s, ?’s)
– EnjoySport: |H| = 5.4.4.4.4.4=5120
– Fewer when consider h’s with 0’s
Every h with a 0 is empty set of instances (classifies instance as neg)
Hence # semantically distinct h’s is:
1+ (4.3.3.3.3.3) = 973
EnjoySport is VERY small problem compared to many
• Hence use other search procedures.
– Approach 1: Search based on ordering of hypotheses
– Approach 2: Search based on finding all possible hypotheses using
a good representation of hypothesis space
All hypotheses that fit data
9
Ordering on Hypotheses
Instances X
Hypotheses H specific
general
general
h
xSunny Warm Normal Strong Warm Same hSunny Warm Normal Strong Warm
xSunny Warm High Strong Warm Same Same
xRainy Cold High Strong Warm Change hSunny Warm ? Strong Warm
xSunny Warm High Strong Cool Change Same hSunny Warm ? Strong
Warm Same hSunny Warm ?
Strong ? ?
Problems with Find-S
• Problems:
– Throws away information!
Negative examples
– Can’t tell whether it has learned the concept
Depending on H, there might be several h’s that fit TEs!
Picks a maximally specific h (why?)
– Can’t tell when training data is inconsistent
Since ignores negative TEs
• But
– It is simple
– Outcome is independent of order of examples
Why?
• What alternative overcomes these problems?
– Keep all consistent hypotheses!
Candidate elimination algorithm
13
Consistent Hypotheses and Version
Space
• A hypothesis h is consistent with a set of training examples
D of target concept c
if hx cx for each training example x cx in D
– Note that consistency is with respect to specific D.
• Notation:
Consistent h, D x cxD :: hx cx
• The version space, VSH,D , with respect to hypothesis space
H and training examples D, is the subset of hypotheses
from H consistent with D
• Notation:
VSH,D = h | h H Consistent h, D
14
List-Then-Eliminate Algorithm
1. VersionSpace list of all hypotheses in H
2. For each training example x cx
remove from VersionSpace any hypothesis h for which
hx cx
3. Output the list of hypotheses in VersionSpace
4. This is essentially a brute force procedure
15
Example of Find-S,
Revisited
Instances Hypotheses H
X
specific
general
h
xSunny Warm Normal Strong Warm Same
hSunny Warm Normal Strong Warm
Same
Sunny Cold
xRainy WarmHigh
HighStrong
StrongWarm Same
WarmChange
hSunny Warm ? Strong Warm
xSunny Warm High Strong Cool Change h Sunny
Same Warm ?Warm
hSunny Strong ? ?
? Strong
Warm Same
Version Space for this Example
S
Sunny Warm ? Strong ? ?
17
Representing Version Spaces
• Want more compact representation of VS
– Store most/least general boundaries of space
– Generate all intermediate h’s in VS
– Idea that any h in VS must be consistent with all TE’s
Generalize from most specific boundaries
Specialize from most general boundaries
Recall : If d is positive
Remove from G every hypothesis inconsistent
S0 with d
For each hypothesis s in S that is inconsistent
with d
• Remove s from S
G0 ? ? ? ? ? ? • Add to S all minimal generalizations h of s
that
are specializations of a hypothesis in G
• Remove from S every hypothesis
that
Sunny Warm Normal Strong Warm is more
Same general than another
hypothesis in S
G1 ? ? ? ? ? ?
Example (contd)
G1 ? ? ? ? ? ?
G2 ? ? ? ? ? ?
23
Example (contd)
G3
Sunny ? ? ? ? ? ? Warm ? ? ? ? ? ? ? ? ? Same
24
Example (contd)
25
Example
(contd)
S3 Sunny Warm ? Strong Warm Same
26
Example
(contd)Sunny Warm High Strong Cool Change
Why does this example remove a hypothesis from G?:
– ? ? ? ? ? Same
This hypothesis
– Cannot be specialized, since would not cover new TE
– Cannot be generalized, because more general would cover
negative TE.
– Hence must drop hypothesis.
27
Version Space of the Example
S Sunny Warm ? Strong ? ?
S
G Sunny ? ? ? ? ? ? Warm ? ? ? ?
versio
n
space
G
28
Convergence of algorithm
• Convergence guaranteed if:
– no errors
– there is h in H describing c.
• Ambiguity removed from VS when S = G
– Containing single h
– When have seen enough TEs
• If have false negative TE, algorithm will remove every h
consistent with TE, and hence will remove correct target concept
from VS
– If observe enough TEs will find that S, G boundaries converge to empty VS
29
Let us try this
30
And this
31
Which Next Training Example?
S Sunny Warm ? Strong ? ?
36
Unbiased Learners and Inductive
Bias
• Approach:
– Place constraints on representation of
hypotheses
Example of limiting connectives to conjunctions
Allows learning of generalized hypotheses
Introduces bias that depends on hypothesis representation
• Need formal definition of inductive bias of learning
algorithm
37
Inductive Syst and Equiv Deductive
Syst
• Inductive bias made explicit in equivalent deductive
system
– Logically represented system that produces same outputs
(classification) from inputs (TEs, instance x, bias B) as
CE procedure
• Inductive bias (IB) of learning algorithm L is any
minimal set of assertions B such that for any target
concept c and training examples D, we can logically infer
value c(x) of any instance x from B, D, and x
– E.g., for rote learner, B = {}, and there is no IB
• Difficult to apply in many cases, but a useful guide
38
Inductive Bias and specific learning
algs
• Rote learners:
no IB
• Version space candidate elimination algorithm:
c can be represented in H
• Find-S: c can be represented in H;
all instances that are not positive are
negative
39
Computational Complexity of VS
40
Exponential size of G
• n Boolean attributes
• 1 positive example: (T, T, .., T)
• n/2 negative examples:
– (F,F,T,..T)
– (T,T,F,F,T..T)
– (T,T,T,T,F,F,T..T)
– ..
– (T,..T,F,F)
• Every hypothesis in G needs to choose from n/2 2-element
sets.
– Number of hypotheses = 2n/2
41
Summary
• Concept learning as search through H
• General-to-specific ordering over H
• Version space candidate elimination algorithm
• S and G boundaries characterize learner’s uncertainty
• Learner can generate useful queries
• Inductive leaps possible only if learner is biased!
• Inductive learners can be modeled as equiv deductive
systems
• Biggest problem is inability to handle data with errors
– Overcome with procedures for learning decision trees
42