Ex - No.2 - Find S Algorithm
Ex - No.2 - Find S Algorithm
Objective:
The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. The algorithm
considers only those positive training examples. The find-S algorithm starts with the most
specific hypothesis and generalizes this hypothesis each time it fails to classify an observed
positive training data. Hence, the Find-S algorithm moves from the most specific hypothesis
to the most general hypothesis.
Algorithm:
1. Initialization:
o Set the initial hypothesis h to the most specific one, where all attributes have
specific values (e.g., h = {Sunny, Warm, Strong, Yes}).
2. Iterate through positive examples:
o For each positive example:
If the example is already covered by h, move on to the next example.
If the example is not covered by h:
For each attribute in the example:
If the attribute value in the example differs from the
corresponding value in h, replace the value in h with a
wildcard ? to generalize the hypothesis.
3. Final Hypothesis:
o The final hypothesis h represents the most specific generalization that covers
all positive examples in the training set.
Symbols used:
∅ (Empty Set) − This symbol represents the absence of any specific value or attribute. It is
often used to initialize the hypothesis as the most specific concept.
? (Don't Care) − The question mark symbol represents a "don't care" or "unknown" value for
an attribute. It is used when the hypothesis needs to generalize over different attribute values
that are present in positive examples.
Positive Examples (+) − The plus symbol represents positive examples, which are instances
labeled as the target class or concept being learned.
Negative Examples (-) − The minus symbol represents negative examples, which are
instances labeled as non-target classes or concepts that should not be covered by the
hypothesis.
Hypothesis (h) − The variable h represents the hypothesis, which is the learned concept or
generalization based on the training data. It is refined iteratively throughout the algorithm.
Example:
For Training instance No:0 the hypothesis is ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm',
'Same']
For Training instance No:1 the hypothesis is ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training instance No:2 the hypothesis is ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training instance No:3 the hypothesis is ['Sunny', 'Warm', '?', 'Strong', '?', '?']
Problem Statement
Imagine a dataset containing patient attributes and their diagnosed diseases. The Find-S
algorithm could be used to learn symptom-based rules for identifying specific diseases. For
example, the algorithm might discover the rule: "If a patient has fever, cough, and difficulty
breathing, then they are likely to have pneumonia. Apply the Find S-algorithm on the medical
diagnosis data and check how it is working for the above rule using Python code.
Medical Diagnosis Dataset
Feve Difficulty
Samples r Cough Breathing Diagnosed Disease
1 Yes Yes No Pneumonia
2 No Yes No Common Cold
3 Yes Yes Yes Pneumonia Label
4 No No No Healthy
5 Yes Yes Yes Pneumonia