S Algorithm
S Algorithm
► The find-S algorithm is a basic concept learning algorithm in machine learning. The
find-S technique identifies the hypothesis that best matches all of the positive cases.
The find-S algorithm considers only positive cases.
► When the find-S method fails to categorize observed positive training data, it starts with
the most particular hypothesis and generalizes it.
► The find-S algorithm finds the most specific hypothesis that fits all the positive examples.
We have to note here that the algorithm considers only those positive training example.
HOW DOES IT WORK?
Consider the following data set having the data about which particular seeds are poisonous.
► First, we consider the hypothesis to be a more specific hypothesis. Hence, our hypothesis would
be :
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
► Consider example 1 :
► The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that our initial
hypothesis is more specific and we have to generalize it for this example. Hence, the hypothesis
becomes :
h = { GREEN, HARD, NO, WRINKLED }
► Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this example and our
hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
► Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis remains the
same.
►
h = { GREEN, HARD, NO, WRINKLED }
► Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare every single attribute with the
initial data and if any mismatch is found we replace that particular attribute with a general case ( ” ? ” ). After doing
the process the hypothesis becomes :
h = { ?, HARD, NO, WRINKLED }
► Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We compare every single attribute with the
initial data and if any mismatch is found we replace that particular attribute with a general case ( ” ? ” ). After doing
the process the hypothesis becomes :
h = { ?, ?, ?, ? }
► Since we have reached a point where all the attributes in our hypothesis have the general
condition, example 6 and example 7 would result in the same hypothesizes with all
general attributes.
h = { ?, ?, ?, ? }
► Hence, for the given data the final hypothesis would be :
Final Hyposthesis: h = { ?, ?, ?, ? }
► In order to understand Find-S algorithm, you need to have a basic idea of the
following concepts as well:
1. Concept Learning
2. General Hypothesis
3. Specific Hypothesis
► The Find-S algorithm follows the steps written below:
1. Initialize ‘h’ to the most specific hypothesis.
2. The Find-S algorithm only considers the positive examples and eliminates
negative examples. For each positive example, the algorithm checks for each
attribute in the example. If the attribute value is the same as the hypothesis
value, the algorithm moves on without any changes. But if the attribute value
is different than the hypothesis value, the algorithm changes it to ‘?’.
Concept Learning
► The specific hypothesis fills in all the important details about the variables
given in the general hypothesis. The more specific details into the example
given above would be I want a cheeseburger with a chicken pepperoni filling with
a lot of lettuce.
► S = {‘Φ’,’Φ’,’Φ’, ……,’Φ’}
Flow Chart
1. The process starts with initializing ‘h’ with the most specific hypothesis,
generally, it is the first positive example in the data set.
2. We check for each positive example. If the example is negative, we will move
on to the next example but if it is a positive example we will consider it for the
next step.
3. We will check if each attribute in the example is equal to the hypothesis value.
4. If the value matches, then no changes are made.
5. If the value does not match, the value is changed to ‘?’.
6. We do this until we reach the last positive example in the data set.
► Looking at the data set, we have six attributes and a final attribute that
defines the positive or negative example. In this case, yes is a positive
example, which means the person will go for a walk.
► So now, the general hypothesis is:
► h0 = {‘Morning’, ‘Sunny’, ‘Warm’, ‘Yes’, ‘Mild’, ‘Strong’}
► This is our general hypothesis, and now we will consider each example one by
one, but only the positive examples.
► h1= {‘Morning’, ‘Sunny’, ‘?’, ‘Yes’, ‘?’, ‘?’}
► h2 = {‘?’, ‘Sunny’, ‘?’, ‘Yes’, ‘?’, ‘?’}
Limitations of Find-S Algorithm
► There are a few limitations of the Find-S algorithm listed down below:
1. There is no way to determine if the hypothesis is consistent throughout the
data.
2. Inconsistent training sets can actually mislead the Find-S algorithm, since it
ignores the negative examples.
3. Find-S algorithm does not provide a backtracking technique to determine the
best possible changes that could be done to improve the resulting
hypothesis.