0% found this document useful (0 votes)
119 views6 pages

2-Candidate Elimination Algorithm

ML Lab

Uploaded by

gladykiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views6 pages

2-Candidate Elimination Algorithm

ML Lab

Uploaded by

gladykiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Problem 2: Candidate-Elimination Learning Algorithm

Aim: Demonstrate the working model and principle of candidate elimination


algorithm
Problem: For a given set of training data examples stored in a csv file
implement and demonstrate the Candidate-Elimination algorithm to output a
description of the set of all hypothesis consistent with the training examples.
Candidate-Elimination algorithm- explanation:
The Candidate-Elimination algorithm is a machine learning algorithm used
for concept learning in the context of supervised learning. It operates on a set
of instances, each described by a set of attributes and associated with a target
value or class label.

Here's a step-by-step explanation of the Candidate-Elimination algorithm:

Initialization:

The algorithm starts with an initial hypothesis space, which typically includes
the most general hypothesis (represented by '?' for each attribute) and the
most specific hypothesis (where each attribute is exactly defined).Initially, the
Specific boundary (S) contains the most specific hypothesis and the General
boundary (G) contains the most general hypothesis.

Iterative Refinement:

For each training example:


If the example is classified as positive (belongs to the target class):
Remove from S any hypothesis inconsistent with the example. This means
any hypothesis in S that does not match the example's attributes is discarded.
Generalize the remaining hypotheses in S to accommodate the positive
example. This involves making the hypotheses less specific by replacing
unknown ('?') attributes with the attribute values from the example.

Remove from S any hypothesis that is more general than another hypothesis
in S. This ensures that the hypothesis space does not contain redundant
hypotheses.

If the example is classified as negative (does not belong to the target class):
Remove from G any hypothesis inconsistent with the example. This means
any hypothesis in G that matches the example's attributes is discarded.
Specialize the remaining hypotheses in G to accommodate the negative
example. This involves making the hypotheses more specific by replacing
unknown ('?') attributes with other possible values that do not match the
attribute values from the example.

Remove from G any hypothesis that is more specific than another hypothesis
in G. This ensures that the hypothesis space does not contain redundant
hypotheses.
Output:

The final hypothesis space consists of all the consistent hypotheses in S and
G after processing all training examples.
Usage:

The final hypothesis space can be used for classification of new instances. If
there are multiple consistent hypotheses, typically the most specific one is
chosen.

The Candidate-Elimination algorithm is particularly useful for learning in the


presence of noisy data or incomplete information, as it maintains a set of
possible hypotheses instead of committing to a single hypothesis prematurely.
It gradually refines the hypothesis space based on the observed training
examples until it converges to a set of consistent and maximally specific
hypotheses.

CANDIDATE-ELIMINTION algorithm begins by initializing the version space


to the set of all hypotheses in H;

When the first training example is presented, the CANDIDATE-ELIMINTION


algorithm checks the S boundary and finds that it is overly specific and it fails
to cover the positive example.
The boundary is therefore revised by moving it to the least more general
hypothesis that covers this new example
No update of the G boundary is needed in response to this training example
because Go correctly covers this example

When the second training example is observed, it has a similar effect of


generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 = G0

Consider the third training example. This negative example reveals that the G
boundary of the version space is overly general, that is, the hypothesis in G
incorrectly predicts that this new example is a positive example.
The hypothesis in the G boundary must therefore be specialized until it
correctly classifies this new negative example
Given that there are six attributes that could be specified to specialize G2,
why are there only three new hypotheses in G3?

For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal


specialization of G2 that correctly labels the new example as a negative
example, but it is not included in G3. The reason this hypothesis is excluded
is that it is inconsistent with the previously encountered positive examples
Consider the fourth training example.

This positive example further generalizes the S boundary of the version space.
It also results in removing one member of the G boundary, because this
member fails to cover the new positive example
After processing these four examples, the boundary sets S4 and G4 delimit
the version space of all hypotheses consistent with the set of incrementally
observed training examples.
Dataset (csv file):

Program:
Output:

You might also like