0% found this document useful (0 votes)
21 views33 pages

DI-ML-concept learning-CEA

The Candidate Elimination Algorithm (CEA) incrementally builds a version space based on a hypothesis space and examples, refining hypotheses by removing those inconsistent with the examples. It aims to find all consistent hypotheses while managing general and specific boundaries, making it more accurate and flexible than the Find-S algorithm. However, CEA is more complex, requires more memory, and may be slower with large datasets, potentially leading to overfitting.

Uploaded by

yeshw537
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views33 pages

DI-ML-concept learning-CEA

The Candidate Elimination Algorithm (CEA) incrementally builds a version space based on a hypothesis space and examples, refining hypotheses by removing those inconsistent with the examples. It aims to find all consistent hypotheses while managing general and specific boundaries, making it more accurate and flexible than the Find-S algorithm. However, CEA is more complex, requires more memory, and may be slower with large datasets, potentially leading to overfitting.

Uploaded by

yeshw537
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Concept Learning

Candidate Elimination Algorithm


Candidate elimination algorithm
• The candidate elimination algorithm
incrementally builds the version space given a
hypothesis space H and a set E of examples.
• The examples are added one by one; each
example possibly shrinks the version space by
removing the hypotheses that are inconsistent
with the example. T
• The candidate elimination algorithm does this by
updating the general and specific boundary for
each new example.
Candidate Elimination Algorithm (CEA)
Concept

• The concept to learn is when does a person Aldo enjoy


sports.
• The answer is Boolean :
• Yes – enjoys sport or
• No – doesn’t enjoy sport.
Hypothesis Space
• The actual space in which we search is huge. So
restrict to a hypothesis representation of the
search space with only the attributes in the
training dataset.
• The easiest way to represent is by taking the
conjunction of all the attributes.
• <sunny, warm, normal, strong, warm, same, yes>
is a hypothesis represented by <x,c(x)>. Here c(x)
is ‘yes’.
• We say that if it is,
• Sunny and warm and normal and strong and
warm and same, then Aldo enjoys sport
Consistent
• A hypothesis h is said to be consistent on x if
h(x) = c(x) for the training set x
• A hypothesis h is consistent on a dataset D, if
h(x) = c(x) for all x in D
Version Space and minimal
generalization
• The goal is to find all the consistent hypotheses
on D. That is, find all hypothesis in hypothesis
space H that are consistent with dataset D.
• This set is termed the Version Space.
Minimal Generalization:
• 0 is replaced with specific attribute value
or
• specific attribute value is replaced with ?
Algorithm Overview
• Goal
Create two sets G and S.
G = Set of all general hypotheses consistent with D
S = Set of all specific hypotheses consistent with D
• Step 1 : Initialise
G_0 = Most General Hypotheses = <?, ?, ?, ?, ?, ?>
S_0 = Most Specific Hypotheses = <0, 0, 0, 0, 0, 0>
G_0 and S_0 – the 0 mentions the number of
instances already looked in the dataset.
Algorithm Overview(Contd..)
• Step 2: Perform Step 3 for all the instances in
the training dataset
• Step 3: Check if it is a positive label or
negative label
– That is, EnjoySport = Yes is positive
– Perform Step 3.1., if the example is positive and
Step 3.2 for negative examples
Step 3.1. The instance(x) is positive.
Step 3.1.1. Check G
Take the hypothesis(g) in G one by one and if it is
inconsistent with x, remove the g from G
Step 3.1.2. Check S
Take the hypothesis(s) in S one by one and check with x.
If s is inconsistent with x,
- Remove s from S
- Find all the minimal generalizations of s such that:
– They are consistent with x. Note that the
generalization must be minimum.
– They(s) are less general than some hypothesis in G.
– Insert them in S
- Check the hypothesis in S. If any hypothesis is more
general than another hypothesis remove the hypothesis.
Step 3.2. The instance(x) is negative.
(Note: We swap G and S w.r.t step 3.1)
Step 3.2.1. Check S
Take the hypothesis(s) in S one by one and if it is inconsistent
with x, remove the s from S
Step 3.2.2. Check G
Take the hypothesis(g) in G one by one and check with x.
If g is inconsistent with x,
- Remove g from G
- Find all the minimal specializations of g such that:
• They(g) are consistent with x. Note that the
generalization must be minimum.
• They are more general than some hypothesis in S.
• Insert them in G
- Check the hypothesis in G. If any hypothesis is less general
than another hypothesis remove the hypothesis.
The Candidate Elimination Algorithm (CEA) is an
improvement over the Find-S algorithm for classification
tasks.
Will the CEA converge to correct
hypothesis?
Partially learned concept-example
How can partially learned concepts be
used?
• Partially learned concepts can be used to
evaluate examples and update the hypothesis
space accordingly.
• When presented with examples, compare the
values to determine whether they support or
contradict the current hypotheses.
• This helps in eliminating hypotheses that are
inconsistent with the examples and narrowing
down the hypothesis space.
Advantages of CEA over Find-S
• Improved accuracy: CEA considers both positive and negative
examples to generate the hypothesis, which can result in
higher accuracy when dealing with noisy or incomplete data.
• Flexibility: CEA can handle more complex classification tasks,
such as those with multiple classes or non-linear decision
boundaries.
• More efficient: CEA reduces the number of hypotheses by
generating a set of general hypotheses and then eliminating
them one by one. This can result in faster processing and
improved efficiency.
• Better handling of continuous attributes: CEA can handle
continuous attributes by creating boundaries for each
attribute, which makes it more suitable for a wider range of
datasets.
Disadvantages of CEA in comparison
with Find-S
• More complex: CEA is a more complex algorithm than Find-S,
which may make it more difficult for beginners or those
without a strong background in machine learning to use and
understand.
• Higher memory requirements: CEA requires more memory to
store the set of hypotheses and boundaries, which may make
it less suitable for memory-constrained environments.
• Slower processing for large datasets: CEA may become slower
for larger datasets due to the i nc re a s e d numbe r o f
hypotheses generated.
• Higher potential for overfitting: The increased complexity of
CEA may make it more prone to overfitting on the training
data, especially if the dataset is small or has a high degree of
noise.
Exercise-CEA
Exercise-2
• For the dataset given below, find the specific
and generic boundary using Candidate
elimination algorithm
Inductive Bias
Inductive Bias
Inductive Bias(Contd..)
Inductive Bias
• Inductive bias can be defined as the set of assumptions
or biases that a learning algorithm employs to make
predictions on unseen data based on its training data.
• These assumptions are inherent in the algorithm's
design and serve as a foundation for learning and
generalization.
• The inductive bias of an algorithm influences how it
selects a hypothesis (a possible explanation or model)
from the hypothesis space (the set of all possible
hypotheses) that best fits the training data.
• It helps the algorithm navigate the trade-off between
fitting the training data perfectly (overfitting) and
generalizing well to unseen data (underfitting).
An Unbiased Learner
An Unbiased Learner(Contd..)

It is impossible to achieve learning without any biases. It suggests


that all learning is inherently influenced by some form of bias
Inductive Bias-a formal representation

A |= B "for every evaluation: B evaluates to


true if only all elements of A evaluate to true"
Modelling Inductive Systems

You might also like