Concept Learning
Concept Learning
1
INTRODUCTION
• Inducing general functions from specific training examples is a main issue
of machine learning.
• Concept learning - a learning task in which a human or machine learner is
trained to classify objects by being shown a set of example objects along
with their class labels. The learner simplifies what has been observed by
condensing it in the form of an example.
• Concept learning - also known as category learning, concept attainment,
and concept formation.
2
INTRODUCTION ….
• Concept Learning: Acquiring the definition of a general category
from given sample of positive and negative training examples of
the category.
3
What is a Concept? - Examples
■ An example for concept-learning is the learning of bird-concept
from the given examples of birds (positive examples) and non-
birds (negative examples).
■ The concept of a bird is the subset of all objects (i.e., the set of all
things or all animals) that belong to the category of bird.
■ Each concept is a boolean-valued function defined over this
larger set. [Example: a function defined over all animals whose
value is true for birds and false for every other animal]
4
5
A Concept Learning Task –
EnjoySport Training Examples
7
Hypothesis Representation…..
■ Each attribute will be:
■ ?- indicating any value is acceptable for the attribute (don’t care)
■ single value – specifying a single required value, Ex. Warm (specific)
■ 0 -indicating no value is acceptable for the attribute (no value)
■ A hypothesis:
■ Sky AirTemp Humidity Wind Water Forecast
■ < Sunny, ?, ?, Strong , ?, Same >
8
Hypothesis Representation…..
■ Most General Hypothesis: Everyday is a good day for water sports
<?, ?, ?, ?, ?, ?> (Positive example)
■ Most Specific Hypothesis: No day is a good day for water sports
<0, 0, 0, 0, 0, 0> (No day is Positive example)
■ EnjoySport concept learning task requires learning the sets of
days for which EnjoySport = yes, describing this set by a
conjunction of constraints over the instance attributes.
■ Eg:” Aldo enjoys his sport on cold days with high humidity is
represented by (?, cold, high, ?, ?, ?)
9
EnjoySport Concept Learning Task
■ Instances X: Set of all Possible days, each described by the attributes
• Sky (Sunny, Cloudy, and Rainy)
• Temp (Warm and Cold)
• Humidity (Normal and High)
• Wind (Strong and Weak)
• Water (Warm and Cool)
• Forecast (Same and Change)
■ Target Concept (Function) c: EnjoySport : X →{0,1}
■ Hypotheses H : Each hypothesis is described by a conjunction of
constraints on the attributes.
■ Training Examples D : positive and negative examples of the target
function {x1,c(x1)}, {x2, c(x2)}, …….{xn, c(xn)}
10
EnjoySport Concept Learning Task….
■ Determine :A hypothesis h in H such that h(x) = c(x) for all x in D.
■ Members of the concept (instances for which c(x)=1) are called
positive examples.
■ Nonmembers of the concept (instances for which c(x)=0) are called
negative examples.
■ H represents the set of all possible hypotheses. H is determined
by the human designer’s choice of a hypothesis representation.
■ The goal of concept-learning is to find a hypothesis
h: X → {0, 1} such that h(x)=c(x) for all x in D.
11
Inductive Learning Hypothesis
12
Concept Learning As Search
■ Concept Learning can be viewed as the task of searching through
a large space of hypotheses implicitly defined by the hypothesis
representation.
■ The goal of this search is to find the hypothesis that best fits the
training examples.
■ The hypothesis space has a general-to-specific ordering of
hypotheses.
■ The learning algorithm has to implicitly define the space of all
hypotheses that the program can ever represent
13
Enjoy Sport - Hypothesis Space
■ Sky has 3 possible values, and other 5 attributes have 2 possible
values.
■ There are 96 (= 3.2.2.2.2.2) distinct instances in X.
■ There are 5120 (=5.4.4.4.4.4) syntactically distinct hypotheses in
H.
■ – Two more values for attributes:? and 0
■ Hypothesis containing one or more 0 symbols represents the
empty set of instances, that is, it classifies every instance as
negative.
■ Hence, there are 973 (= 1 + 4.3.3.3.3.3) semantically distinct
hypotheses in H. considering only ? and one hypothesis
representing empty set of instances.
14
Concept Learning As Search: General-
to- Specific Ordering of Hypotheses
■ The hypothesis space has a general-to-specific ordering of
hypotheses, and the search can be efficiently organized.
15
Concept Learning As Search: General-
to- Specific Ordering of Hypotheses
■ Definition: Let hj and hk be boolean valued functions defined over X.
Then hj is more-general-than-or-equal-to hk if and only if
for all x in X, [(hk(x) = 1) (hj (x)=1)]
Example:h1 = <Sunny, ?, ?, Strong, ?, ?>
h2 = <Sunny, ?, ?, ?, ?, ?>
Every instance that are classified as positive by h1 will also be
classified as positive by h2 in our example data set. Therefore h2 is
more general than h1.
16
More General than Relation
17
Find –S Algorithm :
Finding a Maximally Specific Hypothesis Learning Algorithm
18
Hypothesis search using Find- S
19
Find S algorithm example
20
21
22
Drawbacks of Find S algorithm
■ Find-S finds a hypothesis consistent with the training data, it does
not indicate whether that is the only one consistent hypothesis
available.
■ Is it a good strategy to prefer the most specific hypothesis?
■ If training examples contains noise or errors, its misleads the
algorithm.
■ Find-S cannot backtrack, if there are several maximally specific
consistent hypothesis.
23
Version Spaces
■ Definition: A hypothesis h is consistent with a set of training
examples D iff h(x) = c(x) for each example {x, c(x) } in D.
■ The Candidate-Elimination algorithm represents the set of all
hypotheses consistent with the observed training examples.
■ This subset of all hypotheses is called the version space with
respect to the hypothesis space H and the training examples D,
because it contains all plausible versions of the target concept.
■ Definition: The version space, denoted VSH,D with respect to
hypothesis space H and training examples D, is the subset of
hypotheses from H consistent with the training examples in D.
VSH,D {h H | Consistent(h, D)}
24
List-Then-Eliminate algorithm
Version space as list of hypotheses
1. VersionSpace a list containing every hypothesis in H
2. For each training example, {x, c(x)}
Remove from Version Space any hypothesis h for which h(x) c(x)
3. Output the list of hypotheses in VersionSpace
25
A Compact Representation for Version Space
■ Version space can be represented by its most specific and most
general boundaries.
■ Definition: The general boundary G, with respect to hypothesis
space H and training data D, is the set of maximally general
members of H consistent with D.
■ Definition: The specific boundary S, with respect to hypothesis
space H and training data D, is the set of minimally general (i.e.,
maximally specific) members of H consistent with D.
26
27
Candidate Elimination Algorithm
For each training example d, do
If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
– Remove s from S
– Add to S all minimal generalizations h of s such that
o h is consistent with d, and some member of G is
more general than h
– Remove from S any hypothesis that is more general than
another hypothesis in S
28
If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
– Remove g from G
– Add to G all minimal specializations h of g such that
o h is consistent with d, and some member of S is more
specific than h
– Remove from G any hypothesis that is less general than another
hypothesis in G
29
■ S minimally general hypotheses in H, G maximally
general hypotheses in H
■ S0 = , , , , ,
■ G0 = ?, ?, ?, ?, ?, ?
30
Initial Values
■ S0 : , , , , .
■ G0 : ?, ?, ?, ?, ?, ?
31
■
32
Example:
after seeing Sunny, Warm, Normal, Strong, Warm, Same +
■ S0: , , , , .
■ G0, G1 : ?, ?, ?, ?, ?, ?
33
Example:
after seeing Sunny, Warm, High, Strong, Warm, Same +
■ G0, G1 G2: ?, ?, ?, ?, ?, ?
34
Example:
after seeing Rainy, Cold, High, Strong, Warm, Change
■ G2: ?, ?, ?, ?, ?, ?
35
Example:
after seeing Sunny, Warm, High, Strong, Cool Change +
■ S3 : Sunny, Warm, ?, Strong, Warm, Same
36
The S boundary of the version space forms a summary of the
previously encountered positive examples that can be used to
determine whether any given hypothesis is consistent with these
examples.
37
Learned Version Space
38
Remarks on C-E algorithm
■ The learned Version Space correctly describes the target
concept, provided:
– There are no errors in the training examples
39
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
40
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G0 = <?,?,?,?,?>
■ S0 = <, , , , >
■ 1. Positive Example
■ <Japan, Honda Blue, 1980, Economy>
■ G1 = <?,?,?,?,?>
■ S1 = <Japan, Honda Blue, 1980, Economy>
41
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G1 = <?,?,?,?,?>
■ S1 = <Japan, Honda Blue, 1980, Economy>
■ 2. Negative Example
■ <Japan, Toyota, Green, 1970, Sports>
42
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G2 = {<?,Honda,?,?,?> , <?,?,Blue,?,?>, <?,?,?, 1980,?>,
<?,?,?,?,Economy>}
■ S2 = <Japan, Honda Blue, 1980, Economy>
■ 3. Positve Example
■ <Japan, Toyota, Blue, 1990, Economy>
■ G3 = {<?,?,Blue,?,?>, <?,?,?,?,Economy>}
■ S3= <Japan, ?, Blue, ? , Economy>
43
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G3 = {<?,?,Blue,?,?>, <?,?,?,?,Economy>}
■ S3= <Japan, ?, Blue, ? , Economy>
■ 4. Negative Example
■ <USA, Crysler, Red, 1980, Economy>
■ G4 = {<?,?,Blue,?,?>, <Japan,?,?,?,Economy>}
■ S4= <Japan, ?, Blue, ? , Economy>
44
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G4 = {<?,?,Blue,?,?>, <Japan,?,?,?,Economy>}
■ S4= <Japan, ?, Blue, ? , Economy>
■ 5. Positive Example
■ <Japan, Honda, White, 1980, Economy>
■ G5 = <Japan,?,?,?,Economy>
■ S5 = <Japan,?, ?, ?,Economy>
45
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ 6. Positive Example
■ <Japan, Toyota, Green, 1980, Economy>
■ G6 = <Japan,?,?,?,Economy>
■ S6 = <Japan,?, ?, ?,Economy>
■ 7. Negative Example
■ < Japan, Honda, Red, 1990, Economy>
■ Example is inconsistent with VS. VS collapses. No conjunctive
hypothesis is consistent.
46
Learning the Concept – Candidate
Elimination – Example 2
47
Learning the Concept– Candidate
Elimination
■ G0 = <?,?,?>
■ S0 = <, , >
■ 1. Negative Example
■ <Big, Red, Circle>
■ G1 = {<Small, ?, ?> , <?, Blue , ?> , <?, ?, Triangle>}
■ S1 = <, , >
48
Learning the Concept– Candidate
Elimination
■ G1 = {<Small, ?, ?> , <?, Blue , ?> , <?, ?, Triangle>}
■ S1 = <, , >
■ 2. Negative Example
■ <Small, Red, Triangle>
■ G2 = {<Small , Blue, ?> , <Small, ?, Circle> , <?, Blue, ?>,
<Big, ?, Triangle> , <? , Blue, Triangle> }
■ S2 = <, , >
49
Learning the Concept– Candidate
Elimination
■ G2 = {<Small , Blue, ?> , <Small, ?, Circle> , <?, Blue, ?>,
<Big, ?, Triangle> , <? , Blue, Triangle> }
■ S2 = <, , >
■ 3. Positive Example
■ <Small, Red, Circle>
■ G3 = <Small, ?, Circle>
■ S3 = <Small, Red, Circle>
50
Learning the Concept– Candidate
Elimination
■ G3 = <Small, ?, Circle>
■ S3 = <Small, Red, Circle>
■ 4. Negative Example
■ <Big, Blue, Circle>
■ G4 = <Small, ?, Circle>
■ S4 = <Small, Red, Circle>
51
Learning the Concept– Candidate
Elimination
■ G4 = <Small, ?, Circle>
■ S4 = <Small, Red, Circle>
■ 4. Positive Example
■ <Small, Blue, Circle>
■ G5 = <Small, ?, Circle>
■ S5 = <Small, ?, Circle>
52
Learning the Concept– Find-S
53
Learning the Concept– Find-S
■ h0 = <, , , >
■ 1. Positive Example
■ <Sam’s, Breakfast, Friday, Cheap>
■ h1 = <Sam’s, Breakfast, Friday, Cheap>
54
Learning the Concept– Find-S
55
Learning the Concept– Find-S
■ h2 = <Sam’s, ? , ? , Cheap>
56
Ordering on training examples
The learned version space does not change with different orderings of
training examples
Efficiency does
Optimal strategy (if you are allowed to choose)
Generate instances that satisfy half the hypotheses in the current version space.
For example:
Sunny, Warm, Normal, Light, Warm, Same satisfies 3/6 hyp.
Ideally the VS can be reduced by half at each experiment
Correct target found in log2|VS| experiments
Use of partially learned concepts
No hypothesis consistent with the three examples with the assumption that the target is
a conjunction of constraints
?, Warm, Normal, Strong, Cool, Change is too general
Target concept exists in a different space H', including disjunction and in particular
the hypothesis
Sky=Sunny or Sky=Cloudy
An unbiased learner
Every possible subset of X is a possible target
|H'| = 2|X|, or 296 (vs |H| = 973, a strong bias)
This amounts to allowing conjunction, disjunction and negation
Sunny, ?, ?, ?, ?, ? V <Cloudy, ?, ?, ?, ?, ?
Sunny(Sky) V Cloudy(Sky)
We are guaranteed that the target concept exists
No generalization is however possible!!!
Let's see why …
No generalization without bias!
VS after presenting three positive instances x1, x2, x3, and two negative
instances x4, x5
S = {(x1 v x2 v x3)}
G = {¬(x4 v x5)}
… all subsets including x1 x2 x3 and not including x4 x5
We can only classify precisely examples already seen!
Take a majority vote?
Unseen instances, e.g. x, are classified positive (and negative) by half of the
hypothesis
For any hypothesis h that classifies x as positive, there is a complementary
hypothesis ¬h that classifies x as negative
No inductive inference without a bias
A learner that makes no a priori assumptions regarding the identity of the
target concept, has no rational basis for classifying unseen instances
The inductive bias of a learner are the assumptions that justify its inductive
conclusions or the policy adopted for generalization
Different learners can be characterized by their bias
See next for a more formal definition of inductive bias …
Inductive bias: definition
Given:
a concept learning algorithm L for a set of instances X
a concept c defined over X
a set of training examples for c: Dc = {x, c(x)}
L(xi, Dc) outcome of classification of xi after learning
Inductive inference ( ≻ ):
Dc xi ≻ L(xi, Dc)
The inductive bias is defined as a minimal set of assumptions B, such that
(|− for deduction)
(xi X) [ (B Dc xi) |− L(xi, Dc) ]
Inductive bias of Candidate-Elimination