The Candidate Elimination Algorithm (CEA) incrementally builds a version space based on a hypothesis space and examples, refining hypotheses by removing those inconsistent with the examples. It aims to find all consistent hypotheses while managing general and specific boundaries, making it more accurate and flexible than the Find-S algorithm. However, CEA is more complex, requires more memory, and may be slower with large datasets, potentially leading to overfitting.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
21 views33 pages
DI-ML-concept learning-CEA
The Candidate Elimination Algorithm (CEA) incrementally builds a version space based on a hypothesis space and examples, refining hypotheses by removing those inconsistent with the examples. It aims to find all consistent hypotheses while managing general and specific boundaries, making it more accurate and flexible than the Find-S algorithm. However, CEA is more complex, requires more memory, and may be slower with large datasets, potentially leading to overfitting.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33
Concept Learning
Candidate Elimination Algorithm
Candidate elimination algorithm • The candidate elimination algorithm incrementally builds the version space given a hypothesis space H and a set E of examples. • The examples are added one by one; each example possibly shrinks the version space by removing the hypotheses that are inconsistent with the example. T • The candidate elimination algorithm does this by updating the general and specific boundary for each new example. Candidate Elimination Algorithm (CEA) Concept
• The concept to learn is when does a person Aldo enjoy
sports. • The answer is Boolean : • Yes – enjoys sport or • No – doesn’t enjoy sport. Hypothesis Space • The actual space in which we search is huge. So restrict to a hypothesis representation of the search space with only the attributes in the training dataset. • The easiest way to represent is by taking the conjunction of all the attributes. • <sunny, warm, normal, strong, warm, same, yes> is a hypothesis represented by <x,c(x)>. Here c(x) is ‘yes’. • We say that if it is, • Sunny and warm and normal and strong and warm and same, then Aldo enjoys sport Consistent • A hypothesis h is said to be consistent on x if h(x) = c(x) for the training set x • A hypothesis h is consistent on a dataset D, if h(x) = c(x) for all x in D Version Space and minimal generalization • The goal is to find all the consistent hypotheses on D. That is, find all hypothesis in hypothesis space H that are consistent with dataset D. • This set is termed the Version Space. Minimal Generalization: • 0 is replaced with specific attribute value or • specific attribute value is replaced with ? Algorithm Overview • Goal Create two sets G and S. G = Set of all general hypotheses consistent with D S = Set of all specific hypotheses consistent with D • Step 1 : Initialise G_0 = Most General Hypotheses = <?, ?, ?, ?, ?, ?> S_0 = Most Specific Hypotheses = <0, 0, 0, 0, 0, 0> G_0 and S_0 – the 0 mentions the number of instances already looked in the dataset. Algorithm Overview(Contd..) • Step 2: Perform Step 3 for all the instances in the training dataset • Step 3: Check if it is a positive label or negative label – That is, EnjoySport = Yes is positive – Perform Step 3.1., if the example is positive and Step 3.2 for negative examples Step 3.1. The instance(x) is positive. Step 3.1.1. Check G Take the hypothesis(g) in G one by one and if it is inconsistent with x, remove the g from G Step 3.1.2. Check S Take the hypothesis(s) in S one by one and check with x. If s is inconsistent with x, - Remove s from S - Find all the minimal generalizations of s such that: – They are consistent with x. Note that the generalization must be minimum. – They(s) are less general than some hypothesis in G. – Insert them in S - Check the hypothesis in S. If any hypothesis is more general than another hypothesis remove the hypothesis. Step 3.2. The instance(x) is negative. (Note: We swap G and S w.r.t step 3.1) Step 3.2.1. Check S Take the hypothesis(s) in S one by one and if it is inconsistent with x, remove the s from S Step 3.2.2. Check G Take the hypothesis(g) in G one by one and check with x. If g is inconsistent with x, - Remove g from G - Find all the minimal specializations of g such that: • They(g) are consistent with x. Note that the generalization must be minimum. • They are more general than some hypothesis in S. • Insert them in G - Check the hypothesis in G. If any hypothesis is less general than another hypothesis remove the hypothesis. The Candidate Elimination Algorithm (CEA) is an improvement over the Find-S algorithm for classification tasks. Will the CEA converge to correct hypothesis? Partially learned concept-example How can partially learned concepts be used? • Partially learned concepts can be used to evaluate examples and update the hypothesis space accordingly. • When presented with examples, compare the values to determine whether they support or contradict the current hypotheses. • This helps in eliminating hypotheses that are inconsistent with the examples and narrowing down the hypothesis space. Advantages of CEA over Find-S • Improved accuracy: CEA considers both positive and negative examples to generate the hypothesis, which can result in higher accuracy when dealing with noisy or incomplete data. • Flexibility: CEA can handle more complex classification tasks, such as those with multiple classes or non-linear decision boundaries. • More efficient: CEA reduces the number of hypotheses by generating a set of general hypotheses and then eliminating them one by one. This can result in faster processing and improved efficiency. • Better handling of continuous attributes: CEA can handle continuous attributes by creating boundaries for each attribute, which makes it more suitable for a wider range of datasets. Disadvantages of CEA in comparison with Find-S • More complex: CEA is a more complex algorithm than Find-S, which may make it more difficult for beginners or those without a strong background in machine learning to use and understand. • Higher memory requirements: CEA requires more memory to store the set of hypotheses and boundaries, which may make it less suitable for memory-constrained environments. • Slower processing for large datasets: CEA may become slower for larger datasets due to the i nc re a s e d numbe r o f hypotheses generated. • Higher potential for overfitting: The increased complexity of CEA may make it more prone to overfitting on the training data, especially if the dataset is small or has a high degree of noise. Exercise-CEA Exercise-2 • For the dataset given below, find the specific and generic boundary using Candidate elimination algorithm Inductive Bias Inductive Bias Inductive Bias(Contd..) Inductive Bias • Inductive bias can be defined as the set of assumptions or biases that a learning algorithm employs to make predictions on unseen data based on its training data. • These assumptions are inherent in the algorithm's design and serve as a foundation for learning and generalization. • The inductive bias of an algorithm influences how it selects a hypothesis (a possible explanation or model) from the hypothesis space (the set of all possible hypotheses) that best fits the training data. • It helps the algorithm navigate the trade-off between fitting the training data perfectly (overfitting) and generalizing well to unseen data (underfitting). An Unbiased Learner An Unbiased Learner(Contd..)
It is impossible to achieve learning without any biases. It suggests
that all learning is inherently influenced by some form of bias Inductive Bias-a formal representation
A |= B "for every evaluation: B evaluates to
true if only all elements of A evaluate to true" Modelling Inductive Systems