ML Notes
ML Notes
Example : In Driverless Car, the training data is fed to Algorithm like how
to Drive Car in Highway, Busy and Narrow Street with factors like speed limit,
parking, stop at signal etc. After that, a Logical and Mathematical model is created
on the basis of that and after that, the car will work according to the logical model.
Also, the more data the data is fed the more efficient output is produced.
Designing a Learning System in Machine Learning :
According to Tom Mitchell, “A computer program is said to be learning from
experience (E), with respect to some task (T). Thus, the performance measure (P) is
the performance at task T, which is measured by P, and it improves with experience
E.”
Example: In Spam E-Mail detection,
Task, T: To classify mails into Spam or Not Spam.
Performance measure, P: Total percent of mails being correctly classified
as being “Spam” or “Not Spam”.
Experience, E: Set of Mails with label “Spam”
Steps for Designing Learning System are:
These examples are determined according to the preferences and needs of the
bank. The bank may have determined that the best customers have a combination of
certain features.
The positive examples are the examples of applicants that the bank has
deemed acceptable for awarding a loan. These applicants have a combination of
features that have been determined as desirable to the bank in terms of the applicant
being able to repay the loan without much trouble.
The negative examples are the examples of applicants that the bank deems
unacceptable for awarding a loan. These applicants have a combination of features
that the bank sees as undesirable. These are applicants that the bank deems will have
difficulty repaying the loan.
∅ (Empty Set) − This symbol represents the absence of any specific value or
attribute. It is often used to initialize the hypothesis as the most specific concept.
? (Don't Care) − The question mark symbol represents a "don't care" or "unknown"
value for an attribute. It is used when the hypothesis needs to generalize over
different attribute values that are present in positive examples.
Positive Examples (+) − The plus symbol represents positive examples, which are
instances labeled as the target class or concept being learned.
Negative Examples (-) − The minus symbol represents negative examples, which
are instances labeled as non-target classes or concepts that should not be covered by
the hypothesis.
Hypothesis (h) − The variable h represents the hypothesis, which is the learned concept or
generalization based on the training data. It is refined iteratively throughout the algorithm.
Introduction :
The find-S algorithm is a basic concept learning algorithm in machine learning. The
find-S algorithm finds the most specific hypothesis that fits all the positive examples.
We have to note here that the algorithm considers only those positive training
example. The find-S algorithm starts with the most specific hypothesis and generalizes
this hypothesis each time it fails to classify an observed positive training data. Hence,
the Find-S algorithm moves from the most specific hypothesis to the most general
hypothesis.
Important Representation :
Example :
Consider the following data set having the data about which particular seeds are
poisonous.
First, we consider the hypothesis to be a more specific hypothesis. Hence, our
hypothesis would be :
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
Consider example 1 :
The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that our initial
hypothesis is more specific and we have to generalize it for this example. Hence, the
hypothesis becomes :
h = { GREEN, HARD, NO, WRINKLED }
Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare
every single attribute with the initial data and if any mismatch is found we replace that
particular attribute with a general case ( ” ? ” ). After doing the process the hypothesis
becomes :
h = { ?, HARD, NO, WRINKLED }
Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We compare
every single attribute with the initial data and if any mismatch is found we replace that
particular attribute with a general case ( ” ? ” ). After doing the process the hypothesis
becomes :
h = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have the
general condition, example 6 and example 7 would result in the same hypothesizes
with all general attributes.
h = { ?, ?, ?, ? }
Algorithm :
The candidate elimination algorithm incrementally builds the version space given a
hypothesis space H and a set E of examples. The examples are added one by one;
each example possibly shrinks the version space by removing the hypotheses that
are inconsistent with the example. The candidate elimination algorithm does this by
updating the general and specific boundary for each new example.
You can consider this as an extended form of the Find-S algorithm.
Consider both positive and negative examples.
Actually, positive examples are used here as the Find-S algorithm (Basically they
are generalizing from the specification).
While the negative example is specified in the generalizing form.
Terms Used:
Concept learning: Concept learning is basically the learning task of the
machine (Learn by Train data)
General Hypothesis: Not Specifying features to learn the machine.
G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
Specific Hypothesis: Specifying features to learn machine (Specific feature)
S= {‘pi’,’pi’,’pi’…}: The number of pi depends on a number of attributes.
Version Space: It is an intermediate of general hypothesis and Specific
hypothesis. It not only just writes one hypothesis but a set of all possible
hypotheses based on training data-set.
Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.
Example:
Consider the dataset given below:
Algorithmic steps:
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
S = [Null, Null, Null, Null, Null, Null]