0% found this document useful (0 votes)
2 views

ml_lecture

including different topics in machine learning.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ml_lecture

including different topics in machine learning.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Heuristic Space Search

➢ Heuristic search is a search strategy that finds an optimized


hypothesis/solution to a problem by iteratively improving the
hypothesis/solution based on a given heuristic function or a cost
measure.
➢ Heuristic search methods will generate a possible hypothesis that
can be a solution in the hypothesis space or a path from the
initial state.
➢ This hypothesis will be tested with the target function or the goal
state to see if it is a real solution.
➢ If the tested hypothesis is a real solution, then it will be selected.
Heuristic Space Search

➢ This method generally increases the efficiency because it is


guaranteed to find a better hypothesis but may not be the best
hypothesis.
➢ It is useful for solving tough problems which could not solved by any
other method.
➢ The typical example problem solved by heuristic search is the
travelling salesman problem.
➢ Several commonly used heuristic search methods are hill climbing
methods, constraint satisfaction problems, best-first search,
simulated-annealing, A* algorithm, and genetic algorithms.
Generalization and Specialization

➢ In order to understand about how we construct this concept


hierarchy, let us apply this general principle of generalization
/specialization relation.
➢ By generalization of the most specific hypothesis and
➢ by specialization of the most general hypothesis, the
hypothesis space can be searched for an approximate
hypothesis that matches all positive instances but does not
match any negative instance.
Searching the Hypothesis Space

➢ There are two ways of learning the hypothesis, consistent with


all training instances from the large hypothesis space.
I. Specialization - General to Specific learning
2. Generalization - Specific to General learning
Searching the Hypothesis Space

Generalization - Specific to General Learning


This learning methodology will search through the
hypothesis space for an approximate hypothesis by
generalizing the most specific hypothesis.
Searching the Hypothesis Space

➢ Example 3.2: Consider the training instances shown in Table


3.1 and illustrate Specific to General Learning.
➢ Solution: We will start from all false or the most specific
hypothesis to determine the most restrictive specialization.
Consider only the positive instances and generalize the most
specific hypothesis. Ignore the negative instances.
➢ This learning is illustrated as follows:
The most specific hypothesis is taken now, which will not
classify any instance to true.
h = < 𝝋 𝝋 𝝋 𝝋 𝝋 𝝋 𝝋 𝝋>.
Searching the Hypothesis Space

➢ Read the first instance I1, to generalize the hypothesis h so that


this positive instance can be classified by the hypothesis h1.
Searching the Hypothesis Space

Specialization - General to Specific Learning


This learning methodology will search through
the hypothesis space for an approximate
hypothesis by specializing the most general
hypothesis.
Searching the Hypothesis Space

➢ Example 3.3: illustrate learning by Specialization - General to


Specific Leaming for the data instances shown in Table 3.1.
➢ Solution: Start from the most general hypothesis which will
make true all positive and negative instances.
Hypothesis Space Search by Find-S Algorithm

➢ Find-S algorithm is guaranteed to converge to the


most specific hypothesis in H that is consistent with
the positive instances in the training dataset.
Obviously, it will also be consistent with the negative
instances. Thus, this algorithm considers only the
positive instances and eliminates negative instances
while generating the hypothesis. It initially starts with
the most specific hypothesis.
Hypothesis Space Search by Find-S Algorithm

➢ Example 3.4:
Consider the training dataset of 4 instances shown in
Table 3.2. It contains the details of the performance of
students and their likelihood of getting a job offer or not
in their final semester. Apply the Find-S algorithm.
Model Performance

 The focus of this section is the evaluation of classifier models.


Classifiers are unstable as a small change in the input can change
the output.
 A solid framework is needed for proper evaluation. There are several
metrics that can be used to describe the quality and usefulness of a
classifier.
 One way to compute the metrics is to form a table called
contingency table.
 For example, consider a test for detecting a disease, say cancer.
Table 3.4 shows a contingency table for this scenario.
Model Performance

 In this table, True Positive (TP) = Number of cancer patients who are
classified by the test correctly, True Negative (TN) = Number of
normal patients who do not have cancer are correctly detected.
 The two errors that are involved in this process is False Positive (FP)
that is an alarm that indicates that the tests show positive when the
patient has no disease and False Negative (FN) is another error that
says a patient has cancer when tests says negative or normal.
 FP and FN are costly errors in this classification process.
Model Performance

 The metrics that can be derived from this contingency table are listed
below:
I. Sensitivity - The sensitivity of a test is the probability that it will produce a
true positive result when used on a test dataset. It is also known as true
positive rate. The sensitivity of a test can be determined by calculating:
𝑻𝑷
𝑻𝑷 + 𝑭𝑵
Model Performance

2. Specificity- The specificity of a test is the probability that a test will produce
a true negative result when used on test dataset.
𝑻𝑵
𝑻𝑵 + 𝑭𝑷
3. Positive Predictive Value - The positive predictive value of a test is the
probability that an object is classified correctly when a positive test result is
observed.
𝑻𝑷
𝑻𝑷 + 𝑭𝑷
Model Performance

 4. Negative Predictive Value - The negative predictive value of a test is the


probability that an object is not classified properly when a negative test
result is observed.
𝑻𝑵
𝑻𝑵 + 𝑭𝑵
 5. Accuracy- The accuracy of the classifier can be shown in terms of
sensitivity computed as:
𝑻𝑷 + 𝑻𝑵
𝑻𝑷 + 𝑻𝑵 + 𝑭𝑷 + 𝑭𝑵
Model Performance

 4. Negative Predictive Value - The negative predictive value of a test is the


probability that an object is not classified properly when a negative test
result is observed.
𝑻𝑵
𝑻𝑵 + 𝑭𝑵
 5. Accuracy- The accuracy of the classifier can be shown in terms of
sensitivity computed as:
𝑻𝑷 + 𝑻𝑵
𝑻𝑷 + 𝑻𝑵 + 𝑭𝑷 + 𝑭𝑵
Model Performance

6. Precision -Precision is also known as positive predictive power. It is defined


as the ratio of true positive divided by the sum of true positive and false
positive.
𝑻𝑷
𝐏𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 =
𝑻𝑷 + 𝑭𝑷
Precision indicates how good classifier is in predicting the positive classes.
7. Recall-It is same as sensitivity.
𝑻𝑷
𝐑𝐞𝐜𝐚𝐥 = 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 =
𝑻𝑷 + 𝑭𝑵
Chapter 4

Similarity-based
Learning
➢ Similarity-based Learning )‫ (التعلم القائم على التشابه‬is a supervised
learning technique that predicts the class label of a test instance
by gauging the similarity of this test instance with training
instances.
➢ Similarity-based learning refers to a family of instance-based
learning which is used to solve both classification and regression
problems.
➢ Instance-based learning (‫ )التعلم القائم على المثيل‬makes prediction by
computing distances or similarities between test instance and
specific set of training instances local to the test instance in an
incremental process.
➢ This learning methodology improves the performance of
classification since it uses only a specific set of instances as
incremental learning task.
➢ Similarity-based classification (‫ )التصنيف المبني على التشابه‬is useful in
various fields such as image processing, text classification,
pattern recognition, bio informatics, data mining, information
retrieval, natural language processing, etc.
➢ A practical application of this learning is predicting daily stock
index price changes. This chapter provides an insight of how
different similarity-based models predict the class of a new
instance.
Introduction to similarity or instance-based
learning
➢ Similarity-based classifiers use similarity measures to locate the
nearest neighbours and classify a test instance which works in contrast
with other learning mechanisms such as decision trees or neural
networks.
➢ Similarity-based learning is also called as Instance-based
learning/Just-in time learning since it does not build an abstract model
of the training instances and performs lazy learning when classifying a
new instance.
Introduction to similarity or instance-based
learning
➢ Classification of instances is done based on the measure of similarity in
the form of distance functions over data instances.
➢ Several distance metrics are used to estimate the similarity or
dissimilarity between instances required for clustering, nearest
neighbour classification, anomaly detection, and so on.
➢ Popular distance metrics used are Hamming distance, Euclidean
distance, Manhattan distance, Minkowski distance, Cosine similarity,
Mahalanobis distance, Pearson's correlation or correlation similarity,
Mean squared difference, Jaccard coefficient, Tanimoto coefficient,
etc.
Some examples of Instance-based
learning algorithms
➢ I. k-Nearest Neighbour (k-NN)

➢ 2. Variants of Nearest Neighbor learning


➢ 3. Locally Weighted Regression
➢ 4. Learning Vector Quantization (LVQ)
➢ 5. Self-Organizing Map (SOM)
➢ 6. Radial Basis Function (RBF) networks
NEAREST-NEIGHBOR LEARNING

➢ A natural approach to similarity-based classification is k-Nearest-


Neigh k-Nearest-Neighbours (k-NN), which is a non-parametric
method used for both classification and regression problems.
➢ It is a simple and powerful non-parametric algorithm that
predicts the category of the test instance according to the 'k'
training samples which are closer to the test instance and
classifies it to that category which has the largest probability.
➢ A visual representation of this learning is shown in Figure 4.1.
NEAREST-NEIGHBOR LEARNING

➢ There are two classes of objects called C1 and C2 in the given


figure. When given a test instance T, the category of this test
instance is determined by looking at the class of k = 3 nearest
neighbours.
➢ Thus, the class of this test instance Tis predicted as C2 bars (k-NN),
which is a non-parametric method used for both classification
and regression problems.
NEAREST-NEIGHBOR LEARNING

➢ It is a simple and powerful non-parametric algorithm that


predicts the category of the test instance according to the 'k'
training samples which are closer to the test instance and
classifies it to that category which has the largest probability.
➢ A visual representation of this learning is shown in Figure 4.1.
There are two classes of objects called C1 and C2 in the given
figure. When given a test instance T, the category of this test
instance is determined by looking at the class of k = 3 nearest
neighbours. Thus, the class of this test instance Tis predicted as C2
NEAREST-NEIGHBOR LEARNING

➢ The algorithm relies on the assumption that similar objects are


dose to each other in the feature space.
➢ k-NN performs instance-based learning which just stores the
training data instances and learning instances case by case.
➢ The model is also 'memory-based' as it uses training data at time
when predictions need to be made.
➢ It is a lazy (‫ )كسول‬learning algorithm since no prediction model is
built earlier with training instances and classification happens
only after getting the test instance.
NEAREST-NEIGHBOR LEARNING

➢ The algorithm classifies a new instance by determining the 'k'


most similar instances (i.e., k nearest neighbours) and
summarizing the output of those 'k' instances.
➢ H the target variable is discrete then it is a classification problem,
so it selects the most common class value among the 'k'
instances by a majority vote. However, if the target variable is
continuous then it is a regression problem, and hence the mean
output variable of the 'k' instances is the output of the test
instance.
NEAREST-NEIGHBOR LEARNING

➢ The most popular distance measure such as Euclidean distance


is used in k-NN to determine the 'k' instances which are similar to
the test instance.
➢ The value of 'k' is best determined by tuning with different 'k'
values and choosing the 'k' which classifies the test instance
more accurately.
NEAREST-NEIGHBOR LEARNING

Example 4.1: Consider the student performance training dataset of 8


data instances shown in Table 4.2 which describes the performance of
individual students in a course and their CGPA obtained in the
previous semesters. The independent attributes are CGPA,
Assessment and Project. The target variable is 'Result' which is a
discrete valued variable that takes two values 'Pass' or 'Fail'. Based on
the performance of a student, classify whether a student will pass or
fail in that course.
NEAREST-NEIGHBOR LEARNING

Solution: Given a test instance (6.1, 40, 5) and a set of categories


{Pass, Fail} also called as classes, we need to use the training set to
classify the test instance using Euclidean distance.
The task of classification is to assign a category or class to
an arbitrary instance. Assign k=3.
Step 1: Calculate the Euclidean distance between the test
instance (6.1, 40, and 5) and each of the training instances
as shown in Table 4.3.
Step 2: Sort the distances in the ascending order and select the
first 3 nearest training data instances to the test instance. The
selected nearest neighbours are shown in Table 4.4.

Here, we take the 3 nearest neighbours as instances 4, 5 and


7 with smallest distances.
Step 3: Predict the class of the test instance by majority voting.
WEIGHTED K-NEAREST-NEIGHBOR ALGORITHM

The Weighted k-NN is an extension of k-NN. It chooses the


neighbours by using the weighted distance.
The k-Nearest Neighbour algorithm has some serious limitations as
its performance is solely dependent on choosing the k nearest
neighbours, the distance metric used and the decision rule.
However, the principle idea of Weighted k-NN is that k closest
neighbours to the test instance are assigned a higher weight in the
decision as compared to neighbours that are farther away from
the test instance. The idea is that weights are inversely proportional
to distances.
WEIGHTED K-NEAREST-NEIGHBOR ALGORITHM

The selected k nearest neighbors can be assigned uniform weights,


which means all the instances in each neighbourhood are
weighted equally or weights can be assigned by the inverse of
their distance.
In the second case, closer neighbours of a query point will have a
greater influence than neighbours which are further away.
WEIGHTED K-NEAREST-NEIGHBOR ALGORITHM

Example 4.2: Consider the same training dataset given in Table 4.1.
Use Weighted k-NN and determine the class.
Solution:
Step 1: Given a test instance (7.6, 60, 8) and a set of classes {Pass,
Fail), use the training dataset to classify the test instance using
Euclidean distance and weighting function.
Assign k = 3. The distance calculation is shown in Table 4.5.
Step 2: Sort the distances in the ascending order and
select the first 3 nearest training data instances to the
test instance. The selected nearest neighbours are
shown in Table 4.6.
Step 3: Predict the class of the test instance by
weighted voting technique from the 3 selected nearest
instances.
➢ Compute the inverse of each distance of the 3
selected nearest instances as shown in Table4.7.
➢Find the sum of the inverses.
Sum= 0.06502 + 0.092370 + 0.08294 = 0.24033
➢Compute the weight by dividing each inverse
distance by the sum as shown in Table 4.8.
➢ Add the weights of the same class.
Fail= 0.270545 + 0.384347 = 0.654892
Pass= 0.345109
➢ Predict the class by choosing the class with the maximum vote.
The class is predicted as 'Fail'.

You might also like