0% found this document useful (0 votes)
26 views43 pages

Concept

Uploaded by

radhikasn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views43 pages

Concept

Uploaded by

radhikasn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

CONCEPT LEARNING

It is a process of abstraction and generalization


from the data.
Concept learning requires three things:

1. Input - Training dataset which is a .set of training instances

2. Output - Target concept or Target function f It is a mapping


function f(x) from input x to output y.

3. Test - New instances to test the learned model.

Concept learning is defined as-"Given a set of hypotheses, the learner


searches through the hypothesis space to identify the best hypothesis
that matches the target concept".
Here, in this set of training instances,
The independent attributes considered are ‘Horns', 'Tail', 'Tusks', 'Paws', 'Fur', 'Color',
'Hooves' and 'Size’.
The dependent attribute is 'Elephant'. The target concept is to identify the animal to be
an Elephant.
Representation of a Hypothesis

• A hypothesis 'h' approximates a target function 'f' to represent the


relationship between the independent attributes and the dependent
attribute of the training instances.

• Each hypothesis is represented as a conjunction of attribute


conditions in the antecedent part.

• For example, (Tail= Short)/\ (Color= Black)....


• The set of hypothesis in the search space is called as hypotheses. Hypotheses are the
plural form of hypothesis.

• 'H' is used to represent the hypotheses.

• 'h' is used to represent a candidate hypothesis.

• each attribute can take value as either '?’ or ‘ᶲ’ or can hold a single
value.

• "?" denotes that the attribute can take any value [e.g., Color= ?]
• ‘ᶲ’ denotes that the attribute cannot take any value, i.e., it represents a null value [e.g., Horns= ‘ᶲ’
• Single value denotes a specific single value from acceptable values of the attribute, i.e., the attribute
'Tail' can take a value as 'short' [e.g., Tail= Short]
• The different hypotheses that can be predicted for the target concept are
The most general hypothesis can allow any value for each of the attribute.
It is represented as : <?, ?, ?, ?, ?, ?, ?, ?>.
This hypothesis indicates that any animal can be an elephant.

The most specific hypothesis will not allow any value for each of the
attribute < ᶲ ,ᶲ ,ᶲ ,ᶲ ,ᶲ ,ᶲ ,ᶲ ,ᶲ >
hypothesis indicates that no animal can be an elephant.
Hypothesis space
• Hypothesis space is the set of all possible hypotheses that
approximates the target function f.

• The subset of hypothesis space that is consistent with all-observed


training instances is called as Version Space.

• Version space represents the only hypotheses that are used for the
classification.
• For example, each of the attribute given in the Table 3.1 has the following
possible set of values.
• Considering these values for each of the attribute, there are (2 x 2 x 2 x 2
x 2 x 3 x 2 x 2) =384 distinct instances covering all the 5 instances in the
training dataset.

So, we can generate (4 x 4 x 4 x 4 x 4 x 5 x 4 x 4) =81,920 distinct hypotheses when including two more
values[?, ᶲ] for each of the attribute
Heuristic search space
• Heuristic search is a search strategy that finds an optimized
hypothesis/solution to a problem

• It by iteratively improving the hypothesis/solution based on a given


heuristic function or a cost measure.

• Several commonly used heuristic search methods are hill climbing


methods, constraint satisfaction problems, best-first search,
simulated-annealing, A* algorithm, and genetic algorithms
Generalization and Specialization

• By generalization of the most specific hypothesis and by specialization of the


most general hypothesis
• The hypothesis space can be searched or an approximate hypothesis that
matches all positive instances but does not mat any negative instance.
• Searching the Hypothesis Space
• There are two ways of learning the hypothesis, consistent with all training instances
from the large hypothesis space.

1. Specialization- General to Specific learning


2. Generalization - Specific to General learning
• Generalization- Specific to General Learning :
This learning methodology will search through the hypothesis space for an
approximate hypothesis by generalizing the most specific hypothesis.

Example : Consider the training instances shown in Table 3.1 and illustrate
Specific to General Learning.

Solution: We will start from all false or the most specific hypothesis to
determine the most restrictive specialization. Consider only the positive
instances and generalize the most specific hypothesis. Ignore the negative
instances.
• The most specific hypothesis is taken now, which will not classify any instance
to true.
•h=<ᶲ ᶲ ᶲ ᶲ ᶲ ᶲ ᶲ ᶲ>
• Read the first instance I1, to generalize the hypothesis h so that this positive
instance can be classified by the hypothesis hl.
• I1: No Short Yes No No Black No Big Yes (Positive instance)

h1= <No Short Yes No No Black No Big>


• When reading the second instance I2, it is a negative instance, so ignore
it. h1=h2

I2: Yes Short No No No Brown Yes Medium No(Negative instance)

h2 == <No Short Yes No No Black No Big>


• when reading the third instance I3, it is a positive instance so generalize h2
to h3 to accommodate it. The resulting h3 is generalized.
h2 == <No Short Yes No No Black No Big>

I3: No Short Yes No No Black No Medium Yes (Positive instance)

h3 =< No Short Yes No No Black No ?>


• Ignore I4 since it is a negative instance so,h4=h3
h3 =< No Short Yes No No Black No ?>

I4: No Long No Yes Yes White No Medium No (Negative instance)

h4 == <No Short Yes No No Black No ?>


• When reading the fifth instance I5, h4 is further generalized to h5.

h4 == <No Short Yes No No Black No ?>

I5: No Short Yes Yes Yes Black No Big Yes (Positive instance)

h5 == <No Short Yes ? ? Black No ?>

• After observing all the


positive instances, an
approximate hypothesis
h5 is generated which
can now classify any
subsequent positive
instance to true.
Example 2:
Consider sample training instances shown in Table 1, which describes the symptoms
of the persons and their Covid-19 test result. Apply specific to general learning to
search for an approximate hypothesis in the hypothesis space.
• Example: illustrate learning by Specialization - General to Specific Leaming
for the data instances shown in Table 3.1.
Specialization - General to Specific Learning
• The hypothesis space for an approximate hypothesis by specializing
the most general hypothesis.
• illustrate learning by Specialization - General to Specific Leaming for
the data instances shown in Table 3.1.
• Start from the most general hypothesis which will make true all
positive and negative instances.

• h=<? ? ? ? ? ? ? ?>

Yes (Positive instance)


I1: No Short Yes No No Black No Big

hl =<? ? ? ? ? ? ? ?>
I2: Yes Short No
No No No Brown Yes Medium

hl =<? ? ? ? ? ? ? ?>

h2=<No ? ? ? ? ? ?
?>

<? long Yes ? ? ? ? ?>

<? ? ? yes yes Black ? ?>

<? ? ? ? ? ? No ?>

<? ? ? Big>
? ?
? ?

h2 imposes constraints so that it will not classify a negative instance to true.


Yes (Positive instance)
I3: No Short Yes No No Black No Medium

• h3=h2
h2=<No ? ? ? ? ? ?
?>

<? long Yes ? ? ? ? ?>

<? ? ? yes yes Black ? ?>

<? ? ? ? ? ? No ?>

<? ? ? Big>
? ?
? ?
• I4
No(Negative instance)
I4: No Long No Yes Yes White No Medium

h4=<? ? Yes ? ? ? ?
?>

<? ? ? ? ? Black ? ?>

<? ? ? Big>
? ?
? ?

Remove any hypothesis inconsistent with this negative instance.


I5: No Short Yes Yes Yes Black No Big Yes (Positive instance)

h5=h4
h5=<? ? Yes ? ? ? ?
?>

<? ? ? ? ? Black ? ?>

<? ? ? Big>
? ?
? ?

Thus, h5 is the hypothesis space generated which will classify the positive instances
to true and negative instances to false.
Example 2:
Consider sample training instances shown in Table 1, which describes the symptoms
of the persons and their Covid-19 test result. Apply general to specific learning to
search for an approximate hypothesis in the hypothesis space.
Hypothesis Space Search by Find-S Algorithm

• Find-S algorithm is guaranteed to converge to the most specific


hypothesis in H that is consistent with the positive instances in the
training dataset.

• This algorithm considers only the positive instances and eliminates


negative instances while generating the hypothesis.
• Consider the training dataset of 4 instances shown in Table 3.2. It
contains the details of the performance of students and their
likelihood of getting a job offer or not in their final semester. Apply
the Find-S algorithm.
• Step 1: Initialize 'h' to the most specific hypothesis. There are 6 attributes, so
for each attribute, we initially fill ‘ᶲ ’ in the initial hypothesis 'h'.
•h=<ᶲ ᶲ ᶲ ᶲ ᶲ ᶲ >
• Step 2: Generalize the initial hypothesis for the first positive instance. I1 is a
positive instance, so generalize the most specific hypothesis 'h’ to include
this positive instance. Hence
I1: >=9 YES Excellent Good Fast Yes Positive
instance

h1: >=9 YES Excellent Good Fast Yes


• Step 3: Scan the next instance I2, since I2 is a positive instance. Generalize
'h' to include positive instance I2. For each of the non-matching attribute
value in 'h' put a '?' to include this positive instance. The third attribute value
is mismatching in 'h' with I2, so put a ‘?’.

h1: >=9 YES Excellent Good Fast Yes

I2: >=9 Yes Good Good Fast


Yes Positive instance

h2 | >=9 Yes ? Good Fast Yes>


• Now,scan I3. Since it is a negative instance, ignore it. Hence, the
hypothesis remains the same without any change after scanning I3.
• h3=h2

h2 | >=9 Yes ? Good Fast Yes>

I3: >=8 No Good Good Fast


No Negative instance

• Ignore It
h3 | >=9 Yes ? Good Fast Yes>
• Now scan I4. Since it is a positive instance, check for mismatch in the
hypothesis 'h' with I4.
• The 5th and 6th attribute value are mismatching, so add'?’ to those
attributes in 'h'.
Yes >
h3 | >=9 Yes ? Good Fast

I4: >=9 Yes Good Good Slow No Positive instance

h4: >=9 Yes ? Good ? ? >


• Now, the final hypothesis generated with Find-S algorithm is:

• It includes all positive instances and obviously ignores any negative


instance

h: < >=9 Yes ? Good ? ? >


Limitations of Find-S Algorithm
• Find-S algorithm tries to find a hypothesis that is consistent with
positive instances, ignoring all negative instances.
• As long as the training dataset is consistent, the hypothesis found by
this algorithm may be consistent.
• The algorithm finds only one unique hypothesis, where in there may be
many other hypotheses that are consistent with the training dataset.
• Many times, training dataset may contain some errors; hence such in
consistent data instances
• Can mislead this algorithm in determining the consistent hypothesis
since it ignores negative instances.
• step!1:
• I1 = <Sunny, Warm, Normal, Strong, Warm, Same> Yes(+ve)
• h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

• Step2:
• h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
• I2 = <Sunny, Warm, High, Strong, Warm, Same> Yes(+ve)

• h2 = <Sunny, Warm, ?, Strong, Warm, Same>


Step3:
• h2 = <Sunny, Warm, ?, Strong, Warm, Same>
• I3 = <Rainy, Cold, High, Strong, Warm, Change> No(-Ve)
• I3 is Negative example Hence ignored

• h3=h2

• h3 = <Sunny, Warm, ?, Strong, Warm, Same>


Step 4:

• h3 = <Sunny, Warm, ?, Strong, Warm, Same>


• I4 = <Sunny, Warm, High, Strong, Cool, Change> Yes(+Ve)

• h4 = <Sunny, Warm, ?, Strong, ?, ?>

• The final maximally specific hypothesis is


• <Sunny, Warm, ?, Strong, ?, ?>

You might also like