2-Candidate Elimination Algorithm
2-Candidate Elimination Algorithm
Initialization:
The algorithm starts with an initial hypothesis space, which typically includes
the most general hypothesis (represented by '?' for each attribute) and the
most specific hypothesis (where each attribute is exactly defined).Initially, the
Specific boundary (S) contains the most specific hypothesis and the General
boundary (G) contains the most general hypothesis.
Iterative Refinement:
Remove from S any hypothesis that is more general than another hypothesis
in S. This ensures that the hypothesis space does not contain redundant
hypotheses.
If the example is classified as negative (does not belong to the target class):
Remove from G any hypothesis inconsistent with the example. This means
any hypothesis in G that matches the example's attributes is discarded.
Specialize the remaining hypotheses in G to accommodate the negative
example. This involves making the hypotheses more specific by replacing
unknown ('?') attributes with other possible values that do not match the
attribute values from the example.
Remove from G any hypothesis that is more specific than another hypothesis
in G. This ensures that the hypothesis space does not contain redundant
hypotheses.
Output:
The final hypothesis space consists of all the consistent hypotheses in S and
G after processing all training examples.
Usage:
The final hypothesis space can be used for classification of new instances. If
there are multiple consistent hypotheses, typically the most specific one is
chosen.
Consider the third training example. This negative example reveals that the G
boundary of the version space is overly general, that is, the hypothesis in G
incorrectly predicts that this new example is a positive example.
The hypothesis in the G boundary must therefore be specialized until it
correctly classifies this new negative example
Given that there are six attributes that could be specified to specialize G2,
why are there only three new hypotheses in G3?
This positive example further generalizes the S boundary of the version space.
It also results in removing one member of the G boundary, because this
member fails to cover the new positive example
After processing these four examples, the boundary sets S4 and G4 delimit
the version space of all hypotheses consistent with the set of incrementally
observed training examples.
Dataset (csv file):
Program:
Output: