Text Book 2 Module 4 Chapter 3-Similarity Based Learning

Aiml

Uploaded by

Dharshan M Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

59 views12 pages

Text Book 2 Module 4 Chapter 3-Similarity Based Learning

Aiml

Uploaded by

Dharshan M Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 12

fel ifr eta | Similarity-based Learning “Anyone who stops learning is old, whether at twenty or eighty.” Similarity-based Learning is a supervised learning technique that predicts the lass label ofa test instance by gauging the similarity of this test instance with training instances. Similarity-based OF similarities between test instance and specific set of training instances local to the test ipstance in an incremental process. In contrast to other learning mechanisms, it considers only the nearest instance or instances to predict the class of unseen instances. This learning methodology improves the performance of c] as incremental learning task. Similarity-based classification is useful in various fields such ec image Processing; text classification, pattern recognition, bio informatics, data mining, infor. mation retrieval natural language processing, etc. A practical application of this learning is Predicting daily stock index price changes, This chapter provides an insight of how differs similarity-based models predict the class of a new instance. * Understand the fundamentals of Instance based leaming * Know about the concepts of Nearest-Neighbor Leaming using the algorithm called k-Nearest-Neighbots (NN) * Lear about Weighted k-Nearest-Neighbor classifier that chooses the neighbors by using the weighted distance * Gain knowledge about Nearest Centroid classifier, a simple altemative to classifiers * Understand Locally Weighted Regression (LWR) that approximates the linear functions of all k neighbors to minimize the error while prediction j — Henry Ford116 + Machine Learni9g $a 4.1 INTRODUCTION TO SIMILARITY OR INSTANCE-BASED LEARNING Similarity-based classifiers use similarity measures to locate the nearest neighbors and classify a test instance which works in contrast with other learning mechanisms such as decision trees or neural networks. Similarity-based learning is also called as Instance-based learning/Just-in time learning since it does not build an abstract model of the training instances and performs lazy learning when classifying a new instance. This learning mechanism simply.stores all data and uses it only when it needs to classify an unseen instance. The advantage of using this learning is that processing occurs only when a request to classify a new instance is given. This methodology is particularly useful when the whole dataset is not available in’ the beginning but-collected in an incremental manner. The drawback of this learning is that it requires a large memory to store the data since a global abstract model is not constructed initially with the training data. Classification of instances is done based on the measure of similarity in the form of distance functions over data instances. Several distance metrics are used to estimate the similarity or dissimilarity between instances required for clustering, nearest neighbor classification, anomaly detection, and so on. Popular distance metrics used are Hamming distance, Euclidean distance, Manhattan distance, Minkowski distance, Cosine similarity, Mahalanobis distance, Pearson's correlation or correlation similarity, Mean squared difference, Jaccard coefficient, Tanimoto coefficient, etc. Generally, Similarity-based classification problems formulate the features of test instance and training instances in Euclidean space to learn the similarity or dissimilarity between instances. 4.1.1 Differences Between Instance- and Model-based Learning An instance is an entity or an example in the training dataset. It is described by a set of features or attributes. One attribute describes the class label or category of an instance. Instance-based methods leam or predict the class label of a test instance only when a new instance is given for classification and until then it delays the processing of the training dataset. Itis also referred to as lazy learning methods since it does not generalize any model from the training dataset but just keeps the training dataset as a knowledge base until a new instance is given. In contrast, model-based learning, generally referred to as eager learning, tries to generalize the training data to a model before receiving test instances. Model-based machine learning describes all assumptions about the problem domain in the form of a model, These algorithms basically learn in two phases, called training phase and testing phase. In training phase, a model is built from the training dataset and is used to classify’a test instance during the testing phase. Some examples of models constructed are decision trees, neural networks and Support Vector Machines (SVM), etc. The differences between Instance-based Learning and Model-based Learning are listed in Table 4.1. Table 4.1: Differences between instance-based Learning and Model-based Learning eee Ree en Learners 4 Enger Learners Processing of training instances is done only during | Processing of training instances is done during testing phase training phase (Continued)i 1 ( i 1 i I ] itreceives a test instance before it receives a test instance Predicts the class of the test instance directly from | Predicts the class of the test instance from the model thetraining data. built Slow in testing phase Learns by making many Jocal approximations Fast in testing phase ] Leams by creating global approximation Instance-based learning also.comes under the category of memory-based models which normally compare the given test instance with the trained instances that are stored in memory. Memory- based models classify a test instance by checking the similarity with the training instances. Some examples of Instance-based learning algorithms are: 1, KeNearest Neighbor (k-NN) 2. Variants of Nearest Neighbor learning 3. Locally Weighted Regression 4, Learning Vector Quantization (LVQ) 5. Self-Organizing Map (SOM) 6, Radial Basis Function (RBF) networks In this chapter, we will discuss about certain instance-based learning algorithms sich as k-Nearest Neighbor (k-NN), Variants of Nearest Neighbor learning, and Locally Weighted Regression learning. Self-Organizing Map (SOM) and Radial Basis Function (RBF) networks are discussed along with the concepts of artificial neural networks discussed in Chapter 10 since they could be referred only after the understanding of neural networks. These instance-based methods have serious limitations about the range of feature values taken. Moreover, they are sensitive to irrelevant and correlated features leading to misclassification of instances. 4,2 NEAREST-NEIGHBOR LEARNING ‘A natural approach to similarity-based' classification is k-Nearest-Neighbors (k-NN), which is a non-parametric method used for both classification and regression problems. It is a simple and powerful non-parametric algorithm that predicts the category of the test instance according to the ‘K’ training samples which are closer to the test instance and classifies it to that category which has the largest probability. A visual representation of this learning is shown in Figure 4.1. There are two classes of objects called C, and C, in the given figure. When given a test instance T, the category of this test instance is determined by looking at the class of k=3 Figure 4.1: Visual Representation of nearest neighbors. Thus, the class of this test instance k-Nearest Neighbor Learning T is predicted as C,.118 + Machine —— OT The algorithm relies on the assumption that similar objects are close to each other in the feay,,, Space, k-NN performs instance-based learning which just stores the training data instances ang learning instances case by case. The model is also ‘memory-based’ as it uses training data at time when predictions need to be made. It is a lazy learning algorithm since no prediction OTA is buil arlier with training instances and classification happens only after getting the test instance. The algorithm classifies a new instance by determining the ‘k’ most similar instances (ie, k nearest neighbors) and summarizing the output of those ‘K’ instances. If the target variable is discrete then it is a classification problem, so it selects the most common class value among the ‘k instances by a majority vote. However, if the target variable is continuous then it is a regression Problem, and hence the mean output variable of the ‘K’ instances is the output of the test instance. The most popular distance measure such as Euclidean distance is used in k-NN to determine the ‘k instances which are similar to the test instance. The value of ‘K’ is best determined by tuning with different ’k values and choosing the ‘k’ which classifies the test instance more accurately. Inputs: Training dataset T, distance metric d, Test instance !, the number of nearest neighbors k Output: Predicted class or category Prediction: For test instance t, 1, For each instance i:in T, compute the distance between the test instance { and every other instance i in the training dataset using a distance metric (Euclidean distance), [Continuous attributes - Euclidean distance between two points in the plane with soordinates (x, y,) and (x, y,) is given as dist ((x, y,), (xy y,))= V%—%) + -¥) D {Categorical attributes (Binary) - Hamming Distance: If the value of the two instances is same, the distance d will be equal to 0 otherwise d = 1.) - Sort the distances in an ascending order and select the first k nearest training data instances to the test instance. . Predict the class of the test instance by majority voting (if target attribute is discrete valued) or mean (if target attribute is continuous valued) of the k selected nearest instances. ea Consider the student performance training dataset of 8 data instances shown in Table 4.2 which describes the performance of individual students in a course and their CGPA obtained in the previous semesters. The independent attributes are CGPA, Assessment and Project. The target variable is ‘Result’ which is a discrete valued variable that takes two values ‘Pass’ or ‘Fail’, Based on the performance of a student, classify whether a student will pass or fail in that course, Table 4.2: Training Dataset T ‘ ee Bombe) (Continued)

AI&ML Module 3
No ratings yet
AI&ML Module 3
96 pages
MLT Unit 3 Part 2
No ratings yet
MLT Unit 3 Part 2
57 pages
Unit 3
No ratings yet
Unit 3
100 pages
Module 3-1
No ratings yet
Module 3-1
46 pages
@vtudeveloper - in ML Mod 3
No ratings yet
@vtudeveloper - in ML Mod 3
32 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
1 Unit 2 Notes
No ratings yet
1 Unit 2 Notes
31 pages
Unit 5 ML-2-70
No ratings yet
Unit 5 ML-2-70
69 pages
ML Lecture
No ratings yet
ML Lecture
73 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Difference Between Instance-And Model-Based Learning
No ratings yet
Difference Between Instance-And Model-Based Learning
35 pages
Module 5
No ratings yet
Module 5
94 pages
Lecture19 s12 KNN
No ratings yet
Lecture19 s12 KNN
16 pages
Chapter 4 Similarity Based Learning
No ratings yet
Chapter 4 Similarity Based Learning
8 pages
Module3-Similarity-based Learning-11Mar2024
No ratings yet
Module3-Similarity-based Learning-11Mar2024
34 pages
Instance Based Learning: Aiml/ Bda
No ratings yet
Instance Based Learning: Aiml/ Bda
25 pages
CH 2
No ratings yet
CH 2
30 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Ue21cs352a 20230830121009
No ratings yet
Ue21cs352a 20230830121009
42 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
CHP 4
No ratings yet
CHP 4
24 pages
ML KNN
No ratings yet
ML KNN
20 pages
Chapter 6: Classification and Prediction: Classify Predictions
No ratings yet
Chapter 6: Classification and Prediction: Classify Predictions
23 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
ML - Module 3 - Chapter 4 RNSIT
No ratings yet
ML - Module 3 - Chapter 4 RNSIT
5 pages
Lec05 InstanceBased
No ratings yet
Lec05 InstanceBased
13 pages
ML KN
No ratings yet
ML KN
12 pages
ML TRW
No ratings yet
ML TRW
5 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Aiml Module 3 Part 2
No ratings yet
Aiml Module 3 Part 2
12 pages
ML Mid2 Ans
No ratings yet
ML Mid2 Ans
24 pages
Instance Based Learning: November 2015
No ratings yet
Instance Based Learning: November 2015
11 pages
Replace All Valid Mathematical Equations With High
No ratings yet
Replace All Valid Mathematical Equations With High
6 pages
Lecture 05 - Nearest Neighbour
No ratings yet
Lecture 05 - Nearest Neighbour
17 pages
ML Lec7
No ratings yet
ML Lec7
5 pages
04 KNN M
No ratings yet
04 KNN M
26 pages
Siddu AIml
No ratings yet
Siddu AIml
8 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Introduction To Instance Base Leaning
No ratings yet
Introduction To Instance Base Leaning
8 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
No ratings yet
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
9 pages
4 3DM - Classification-Methods
No ratings yet
4 3DM - Classification-Methods
9 pages
Module IV - K NN
No ratings yet
Module IV - K NN
15 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
AIML Module 3
No ratings yet
AIML Module 3
25 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
ML Unit-4
No ratings yet
ML Unit-4
9 pages
Lazy vs. Eager Learning
No ratings yet
Lazy vs. Eager Learning
6 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Machine Learning - Lazy Learning: Hapter
No ratings yet
Machine Learning - Lazy Learning: Hapter
22 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages

Text Book 2 Module 4 Chapter 3-Similarity Based Learning

Uploaded by

Text Book 2 Module 4 Chapter 3-Similarity Based Learning

Uploaded by

You might also like