0% found this document useful (0 votes)
121 views71 pages

Concept Learning

1. Concept learning involves inducing general functions from specific training examples to classify objects into categories. It involves condensing observations into examples to simplify what has been observed. 2. The goal of concept learning is to infer a boolean-valued function from training examples that specifies whether an input belongs to a category or not. This involves learning a concept from positive and negative examples. 3. Concept learning can be viewed as searching a hypothesis space to find the hypothesis that best fits the training examples according to a general-to-specific ordering of hypotheses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views71 pages

Concept Learning

1. Concept learning involves inducing general functions from specific training examples to classify objects into categories. It involves condensing observations into examples to simplify what has been observed. 2. The goal of concept learning is to infer a boolean-valued function from training examples that specifies whether an input belongs to a category or not. This involves learning a concept from positive and negative examples. 3. Concept learning can be viewed as searching a hypothesis space to find the hypothesis that best fits the training examples according to a general-to-specific ordering of hypotheses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

CONCEPT LEARNING

1
INTRODUCTION
• Inducing general functions from specific training examples is a main issue
of machine learning.
• Concept learning - a learning task in which a human or machine learner is
trained to classify objects by being shown a set of example objects along
with their class labels. The learner simplifies what has been observed by
condensing it in the form of an example.
• Concept learning - also known as category learning, concept attainment,
and concept formation.

2
INTRODUCTION ….
• Concept Learning: Acquiring the definition of a general category
from given sample of positive and negative training examples of
the category.

■ A Formal Definition for Concept Learning:


• Inferring a boolean-valued function from training examples of its
input and output.
• Let us try to learn the definition of a concept from
examples.

3
What is a Concept? - Examples
■ An example for concept-learning is the learning of bird-concept
from the given examples of birds (positive examples) and non-
birds (negative examples).

■ The concept of a bird is the subset of all objects (i.e., the set of all
things or all animals) that belong to the category of bird.
■ Each concept is a boolean-valued function defined over this
larger set. [Example: a function defined over all animals whose
value is true for birds and false for every other animal]

4
5
A Concept Learning Task –
EnjoySport Training Examples

■ A set of example days, and each is described by six attributes.


■ The task is to learn to predict the value of EnjoySport for arbitrary day, based on
the values of its attribute values . This is Target concept
6
Hypothesis Representation
■ Goal: To infer the “best” concept-description from the set of all
possible hypotheses.
■ Each hypothesis consists of a conjunction of constraints on the
instance attributes.
■ Each hypothesis will be a vector of six constraints, specifying the
values of the six attributes
(Sky, AirTemp, Humidity, Wind, Water, and Forecast)

7
Hypothesis Representation…..
■ Each attribute will be:
■ ?- indicating any value is acceptable for the attribute (don’t care)
■ single value – specifying a single required value, Ex. Warm (specific)
■ 0 -indicating no value is acceptable for the attribute (no value)
■ A hypothesis:
■ Sky AirTemp Humidity Wind Water Forecast
■ < Sunny, ?, ?, Strong , ?, Same >

8
Hypothesis Representation…..
■ Most General Hypothesis: Everyday is a good day for water sports
<?, ?, ?, ?, ?, ?> (Positive example)
■ Most Specific Hypothesis: No day is a good day for water sports
<0, 0, 0, 0, 0, 0> (No day is Positive example)
■ EnjoySport concept learning task requires learning the sets of
days for which EnjoySport = yes, describing this set by a
conjunction of constraints over the instance attributes.
■ Eg:” Aldo enjoys his sport on cold days with high humidity is
represented by (?, cold, high, ?, ?, ?)

9
EnjoySport Concept Learning Task
■ Instances X: Set of all Possible days, each described by the attributes
• Sky (Sunny, Cloudy, and Rainy)
• Temp (Warm and Cold)
• Humidity (Normal and High)
• Wind (Strong and Weak)
• Water (Warm and Cool)
• Forecast (Same and Change)
■ Target Concept (Function) c: EnjoySport : X →{0,1}
■ Hypotheses H : Each hypothesis is described by a conjunction of
constraints on the attributes.
■ Training Examples D : positive and negative examples of the target
function {x1,c(x1)}, {x2, c(x2)}, …….{xn, c(xn)}

10
EnjoySport Concept Learning Task….
■ Determine :A hypothesis h in H such that h(x) = c(x) for all x in D.
■ Members of the concept (instances for which c(x)=1) are called
positive examples.
■ Nonmembers of the concept (instances for which c(x)=0) are called
negative examples.
■ H represents the set of all possible hypotheses. H is determined
by the human designer’s choice of a hypothesis representation.
■ The goal of concept-learning is to find a hypothesis
h: X → {0, 1} such that h(x)=c(x) for all x in D.

11
Inductive Learning Hypothesis

■ Any hypothesis found to approximate the target function well over


a sufficiently large set of training examples will also approximate
the target function well over other unobserved examples

12
Concept Learning As Search
■ Concept Learning can be viewed as the task of searching through
a large space of hypotheses implicitly defined by the hypothesis
representation.
■ The goal of this search is to find the hypothesis that best fits the
training examples.
■ The hypothesis space has a general-to-specific ordering of
hypotheses.
■ The learning algorithm has to implicitly define the space of all
hypotheses that the program can ever represent

13
Enjoy Sport - Hypothesis Space
■ Sky has 3 possible values, and other 5 attributes have 2 possible
values.
■ There are 96 (= 3.2.2.2.2.2) distinct instances in X.
■ There are 5120 (=5.4.4.4.4.4) syntactically distinct hypotheses in
H.
■ – Two more values for attributes:? and 0
■ Hypothesis containing one or more 0 symbols represents the
empty set of instances, that is, it classifies every instance as
negative.
■ Hence, there are 973 (= 1 + 4.3.3.3.3.3) semantically distinct
hypotheses in H. considering only ? and one hypothesis
representing empty set of instances.

14
Concept Learning As Search: General-
to- Specific Ordering of Hypotheses
■ The hypothesis space has a general-to-specific ordering of
hypotheses, and the search can be efficiently organized.

15
Concept Learning As Search: General-
to- Specific Ordering of Hypotheses
■ Definition: Let hj and hk be boolean valued functions defined over X.
Then hj is more-general-than-or-equal-to hk if and only if
for all x in X, [(hk(x) = 1)  (hj (x)=1)]
Example:h1 = <Sunny, ?, ?, Strong, ?, ?>
h2 = <Sunny, ?, ?, ?, ?, ?>
Every instance that are classified as positive by h1 will also be
classified as positive by h2 in our example data set. Therefore h2 is
more general than h1.

16
More General than Relation

17
Find –S Algorithm :
Finding a Maximally Specific Hypothesis Learning Algorithm

18
Hypothesis search using Find- S

19
Find S algorithm example

20
21
22
Drawbacks of Find S algorithm
■ Find-S finds a hypothesis consistent with the training data, it does
not indicate whether that is the only one consistent hypothesis
available.
■ Is it a good strategy to prefer the most specific hypothesis?
■ If training examples contains noise or errors, its misleads the
algorithm.
■ Find-S cannot backtrack, if there are several maximally specific
consistent hypothesis.

23
Version Spaces
■ Definition: A hypothesis h is consistent with a set of training
examples D iff h(x) = c(x) for each example {x, c(x) } in D.
■ The Candidate-Elimination algorithm represents the set of all
hypotheses consistent with the observed training examples.
■ This subset of all hypotheses is called the version space with
respect to the hypothesis space H and the training examples D,
because it contains all plausible versions of the target concept.
■ Definition: The version space, denoted VSH,D with respect to
hypothesis space H and training examples D, is the subset of
hypotheses from H consistent with the training examples in D.
VSH,D {h  H | Consistent(h, D)}

24
List-Then-Eliminate algorithm
Version space as list of hypotheses
1. VersionSpace  a list containing every hypothesis in H
2. For each training example, {x, c(x)}
Remove from Version Space any hypothesis h for which h(x)  c(x)
3. Output the list of hypotheses in VersionSpace

25
A Compact Representation for Version Space
■ Version space can be represented by its most specific and most
general boundaries.
■ Definition: The general boundary G, with respect to hypothesis
space H and training data D, is the set of maximally general
members of H consistent with D.
■ Definition: The specific boundary S, with respect to hypothesis
space H and training data D, is the set of minimally general (i.e.,
maximally specific) members of H consistent with D.

26
27
Candidate Elimination Algorithm
 For each training example d, do
 If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
– Remove s from S
– Add to S all minimal generalizations h of s such that
o h is consistent with d, and some member of G is
more general than h
– Remove from S any hypothesis that is more general than
another hypothesis in S

28
 If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
– Remove g from G
– Add to G all minimal specializations h of g such that
o h is consistent with d, and some member of S is more
specific than h
– Remove from G any hypothesis that is less general than another
hypothesis in G

29
■ S  minimally general hypotheses in H, G  maximally
general hypotheses in H
■ S0 = , , , , , 
■ G0 = ?, ?, ?, ?, ?, ?

 Initialize G to the set of maximally general hypotheses in H


 Initialize S to the set of maximally specific hypotheses in H

30
Initial Values

■ S0 : , , , , . 

■ G0 : ?, ?, ?, ?, ?, ? 

31

32
Example:
after seeing Sunny, Warm, Normal, Strong, Warm, Same  +

■ S0: , , , , . 

■ S1: Sunny, Warm, Normal, Strong, Warm, Same

■ G0, G1 : ?, ?, ?, ?, ?, ?

33
Example:
after seeing Sunny, Warm, High, Strong, Warm, Same  +

■ S2: Sunny, Warm, ?, Strong, Warm, Same

■ G0, G1 G2:  ?, ?, ?, ?, ?, ?

34
Example:
after seeing Rainy, Cold, High, Strong, Warm, Change  

■ S2, S3: Sunny, Warm, ?, Strong, Warm, Same

■ G3: Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?, ?, ?, ?, ?, Same

■ G2:  ?, ?, ?, ?, ?, ?

35
Example:
after seeing Sunny, Warm, High, Strong, Cool Change  +
■ S3 : Sunny, Warm, ?, Strong, Warm, Same

■ S4 : Sunny, Warm, ?, Strong, ?, ?

■ G4: Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ?

■ G3: Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?, ?, ?, ?, ?, Same

36
 The S boundary of the version space forms a summary of the
previously encountered positive examples that can be used to
determine whether any given hypothesis is consistent with these
examples.

 The G boundary summarizes the information from previously


encountered negative examples. Any hypothesis more specific
than G is assured to be consistent with past negative examples.

37
Learned Version Space

38
Remarks on C-E algorithm
■ The learned Version Space correctly describes the target
concept, provided:
– There are no errors in the training examples

– There is some hypothesis that correctly describes the target


concept
■ If S and G converge to a single hypothesis then concept is
exactly learned
■ In case of errors in the training, useful hypothesis are discarded,
no recovery possible
■ An empty version space means no hypothesis in H is consistent
with training examples

39
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination

40
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G0 = <?,?,?,?,?>
■ S0 = <, , , , >

■ 1. Positive Example
■ <Japan, Honda Blue, 1980, Economy>
■ G1 = <?,?,?,?,?>
■ S1 = <Japan, Honda Blue, 1980, Economy>

41
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G1 = <?,?,?,?,?>
■ S1 = <Japan, Honda Blue, 1980, Economy>

■ 2. Negative Example
■ <Japan, Toyota, Green, 1970, Sports>

■ G2 = {<?,Honda,?,?,?> , <?,?,Blue,?,?>, <?,?,?, 1980,?>,


<?,?,?,?,Economy>}
■ S2 = <Japan, Honda Blue, 1980, Economy>

42
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G2 = {<?,Honda,?,?,?> , <?,?,Blue,?,?>, <?,?,?, 1980,?>,
<?,?,?,?,Economy>}
■ S2 = <Japan, Honda Blue, 1980, Economy>

■ 3. Positve Example
■ <Japan, Toyota, Blue, 1990, Economy>

■ G3 = {<?,?,Blue,?,?>, <?,?,?,?,Economy>}
■ S3= <Japan, ?, Blue, ? , Economy>

43
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G3 = {<?,?,Blue,?,?>, <?,?,?,?,Economy>}
■ S3= <Japan, ?, Blue, ? , Economy>

■ 4. Negative Example
■ <USA, Crysler, Red, 1980, Economy>

■ G4 = {<?,?,Blue,?,?>, <Japan,?,?,?,Economy>}
■ S4= <Japan, ?, Blue, ? , Economy>

44
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ G4 = {<?,?,Blue,?,?>, <Japan,?,?,?,Economy>}
■ S4= <Japan, ?, Blue, ? , Economy>

■ 5. Positive Example
■ <Japan, Honda, White, 1980, Economy>

■ G5 = <Japan,?,?,?,Economy>
■ S5 = <Japan,?, ?, ?,Economy>

45
Learning the Concept of “Japanese
Economy Car” – Candidate Elimination
■ 6. Positive Example
■ <Japan, Toyota, Green, 1980, Economy>
■ G6 = <Japan,?,?,?,Economy>
■ S6 = <Japan,?, ?, ?,Economy>

■ 7. Negative Example
■ < Japan, Honda, Red, 1990, Economy>
■ Example is inconsistent with VS. VS collapses. No conjunctive
hypothesis is consistent.

46
Learning the Concept – Candidate
Elimination – Example 2

SIZE COLOR SHAPE CLASS


Big Red Circle No
Small Red Triangle No
Small Red Circle Yes
Big Blue Circle No
Small Blue Circle Yes

47
Learning the Concept– Candidate
Elimination
■ G0 = <?,?,?>
■ S0 = <, , >

■ 1. Negative Example
■ <Big, Red, Circle>
■ G1 = {<Small, ?, ?> , <?, Blue , ?> , <?, ?, Triangle>}
■ S1 = <, , >

48
Learning the Concept– Candidate
Elimination
■ G1 = {<Small, ?, ?> , <?, Blue , ?> , <?, ?, Triangle>}
■ S1 = <, , >

■ 2. Negative Example
■ <Small, Red, Triangle>
■ G2 = {<Small , Blue, ?> , <Small, ?, Circle> , <?, Blue, ?>,
<Big, ?, Triangle> , <? , Blue, Triangle> }
■ S2 = <, , >

49
Learning the Concept– Candidate
Elimination
■ G2 = {<Small , Blue, ?> , <Small, ?, Circle> , <?, Blue, ?>,
<Big, ?, Triangle> , <? , Blue, Triangle> }
■ S2 = <, , >

■ 3. Positive Example
■ <Small, Red, Circle>
■ G3 = <Small, ?, Circle>
■ S3 = <Small, Red, Circle>

50
Learning the Concept– Candidate
Elimination
■ G3 = <Small, ?, Circle>
■ S3 = <Small, Red, Circle>

■ 4. Negative Example
■ <Big, Blue, Circle>
■ G4 = <Small, ?, Circle>
■ S4 = <Small, Red, Circle>

51
Learning the Concept– Candidate
Elimination
■ G4 = <Small, ?, Circle>
■ S4 = <Small, Red, Circle>

■ 4. Positive Example
■ <Small, Blue, Circle>
■ G5 = <Small, ?, Circle>
■ S5 = <Small, ?, Circle>

■ Test = <Big, Red, Circle> = ?

52
Learning the Concept– Find-S

Restaurant Meal Day Cost Target


Function
Sam’s Breakfast Friday Cheap Yes
Hilton Lunch Friday Expensive No
Sam’s Lunch Saturday Cheap Yes
Dennis Breakfast Sunday Cheap No
Sam’s Breakfast Sunday Expensive No

53
Learning the Concept– Find-S

■ h0 = <, , , >

■ 1. Positive Example
■ <Sam’s, Breakfast, Friday, Cheap>
■ h1 = <Sam’s, Breakfast, Friday, Cheap>

54
Learning the Concept– Find-S

■ h1 = <Sam’s, Breakfast, Friday, Cheap>

■ 2. Negative Example – Ignore


■ 3. Positive Example
■ <Sam’s, Lunch, Saturday, Cheap>
■ h2 = <Sam’s, ? , ? , Cheap>

55
Learning the Concept– Find-S

■ h2 = <Sam’s, ? , ? , Cheap>

■ 4. Negative Example – Ignore


■ 5. Negative Example – Ignore

■ Hence, Maximally Specific Hypothesis:


■ h2 = <Sam’s, ? , ? , Cheap>

■ Test = <Sam’s, Lunch, Thursday, Cheap> = ?

56
Ordering on training examples
 The learned version space does not change with different orderings of
training examples
 Efficiency does
 Optimal strategy (if you are allowed to choose)
 Generate instances that satisfy half the hypotheses in the current version space.
For example:
Sunny, Warm, Normal, Light, Warm, Same satisfies 3/6 hyp.
 Ideally the VS can be reduced by half at each experiment
 Correct target found in log2|VS| experiments
Use of partially learned concepts

Classified as positive by all hypothesis, since satisfies any


hypothesis in S
Classifying new examples

Classified as negative by all hypothesis, since does not satisfy any


hypothesis in G
Classifying new examples

Uncertain classification: half hypothesis are consistent, half are not


consistent
Classifying new examples

Sunny, Cold, Normal, Strong, Warm, Same


4 hypothesis not satisfied; 2 satisfied
Probably a negative instance. Majority vote?
Hypothesis space and bias
 What if H does not contain the target concept?
 Can we improve the situation by extending the hypothesis space?
 Will this influence the ability to generalize?
 These are general questions for inductive inference, addressed in the
context of Candidate-Elimination
 Suppose we include in H every possible hypothesis … including the
ability to represent disjunctive concepts
Extending the hypothesis space
Sky AirTemp Humidity Wind Water Forecast EnjoyS
1 Sunny Warm Normal Strong Cool Change YES
2 Cloudy Warm Normal Strong Cool Change YES
3 Rainy Warm Normal Strong Cool Change NO

 No hypothesis consistent with the three examples with the assumption that the target is
a conjunction of constraints
?, Warm, Normal, Strong, Cool, Change is too general
 Target concept exists in a different space H', including disjunction and in particular
the hypothesis
Sky=Sunny or Sky=Cloudy
An unbiased learner
 Every possible subset of X is a possible target
|H'| = 2|X|, or 296 (vs |H| = 973, a strong bias)
 This amounts to allowing conjunction, disjunction and negation
Sunny, ?, ?, ?, ?, ? V <Cloudy, ?, ?, ?, ?, ?
Sunny(Sky) V Cloudy(Sky)
 We are guaranteed that the target concept exists
 No generalization is however possible!!!
Let's see why …
No generalization without bias!
 VS after presenting three positive instances x1, x2, x3, and two negative
instances x4, x5
S = {(x1 v x2 v x3)}
G = {¬(x4 v x5)}
… all subsets including x1 x2 x3 and not including x4 x5
 We can only classify precisely examples already seen!
 Take a majority vote?
 Unseen instances, e.g. x, are classified positive (and negative) by half of the
hypothesis
 For any hypothesis h that classifies x as positive, there is a complementary
hypothesis ¬h that classifies x as negative
No inductive inference without a bias
 A learner that makes no a priori assumptions regarding the identity of the
target concept, has no rational basis for classifying unseen instances
 The inductive bias of a learner are the assumptions that justify its inductive
conclusions or the policy adopted for generalization
 Different learners can be characterized by their bias
 See next for a more formal definition of inductive bias …
Inductive bias: definition
 Given:
 a concept learning algorithm L for a set of instances X
 a concept c defined over X
 a set of training examples for c: Dc = {x, c(x)}
 L(xi, Dc) outcome of classification of xi after learning
 Inductive inference ( ≻ ):
Dc  xi ≻ L(xi, Dc)
 The inductive bias is defined as a minimal set of assumptions B, such that
(|− for deduction)
 (xi  X) [ (B  Dc  xi) |− L(xi, Dc) ]
Inductive bias of Candidate-Elimination

 Assume L is defined as follows:


 compute VSH,D
 classify new instance by complete agreement of all the hypotheses in VSH,D
 Then the inductive bias of Candidate-Elimination is simply
B  (c  H)
 In fact by assuming c  H:
1. c  VSH,D , in fact VSH,D includes all hypotheses in H consistent with D
2. L(xi, Dc) outputs a classification "by complete agreement", hence any hypothesis,
including c, outputs L(xi, Dc)
Inductive system
Equivalent deductive system
Each learner has an inductive bias
 Three learner with three different inductive bias:
1. Rote learner: no inductive bias, just stores examples and is able to
classify only previously observed examples
2. Candidate Elimination: the concept is a conjunction of constraints.
3. Find-S: the concept is in H (a conjunction of constraints) plus "all
instances are negative unless seen as positive examples” (stronger
bias)
 The stronger the bias, greater the ability to generalize and classify
new instances (greater inductive leaps).

You might also like