0% found this document useful (0 votes)
5 views5 pages

ML TRW

Similarity-based learning focuses on measuring similarity and dissimilarity between data points, used in tasks like recommendation systems, classification, and anomaly detection. Instance-based learners, known as lazy learners, store training data and compute predictions only when needed, while k-NN is a memory-based method that relies on stored data for predictions without generalization. The k-NN algorithm has advantages like ease of implementation and adaptability, but also limitations such as poor scalability and susceptibility to overfitting.

Uploaded by

Ronit Laha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

ML TRW

Similarity-based learning focuses on measuring similarity and dissimilarity between data points, used in tasks like recommendation systems, classification, and anomaly detection. Instance-based learners, known as lazy learners, store training data and compute predictions only when needed, while k-NN is a memory-based method that relies on stored data for predictions without generalization. The k-NN algorithm has advantages like ease of implementation and adaptability, but also limitations such as poor scalability and susceptibility to overfitting.

Uploaded by

Ronit Laha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1. What do you understand by similarity-based learning?

Similarity learning is an area of supervised machine learning. Different from other


supervised learning algorithms (which focus on predicting labels based on input
data), it focuses on recognizing and measuring similarity and dissimilarity between
data points.

To make it easier to conceptualize it, we can think about comparing two photographs
and wanting to figure out if the object in the photos is the same one. Instead of
checking every pixel, similarity learning algorithms will find key characteristics and
features (for example the shape) of the object in each photo and compare them.

Similarity learning algorithms can be used for various tasks where the main goal is to
find similarities and relationships between items:
● Recommendation system: to keep the user spending time on the social media
platform, similarity learning is used to find content similar to the ones the user
has already liked and recommend them.
● Classification: to classify an item into a given class, we want to check if the
item is similar to the items in that class.
● Face Verification: similarity learning is used also to compare facial features in
an image to a database of faces, verifying and recognizing with remarkable
accuracy.
● Anomaly detection: by defining what “normal” data looks like, any dissimilarity
and deviation data can be detected and reported to prevent possible issues.

2. Compare between instance-based learning and model based learning.


3. Why instance based learners are called as lazy learners?

Instance-based learners are called "lazy learners" because they do not perform any
significant computation or model building during the training phase; instead, they
simply store the training data and only calculate predictions when a new data point
needs to be classified, essentially "lazily" postponing the heavy lifting until prediction
time.
Key points about lazy learners:
● No upfront generalization:
Unlike other learning algorithms that build a general model during training,
instance-based learners directly compare new data points to the stored
training instances without creating a generalized representation.
● Query-based Learning:
When making a prediction, lazy learners simply look at the stored data and
make a decision based on the closest or most relevant instances. For
example, in k-nearest neighbors (k-NN), predictions are based on the k
closest data points to the query point.
● Delayed Computation:
Lazy learners postpone most of the computation until they receive a query.
The prediction process is computationally expensive because it requires
comparing the query with all stored instances, which can be inefficient for
large datasets.
● Example algorithm: K-Nearest Neighbors (KNN):
One of the most well-known lazy learners, KNN stores all the training data
and classifies a new data point based on the class labels of its closest
neighbors.

4. Why k-NN method is called as memory-based method?

The k-Nearest Neighbors (k-NN) method is called a memory-based method because


it directly relies on the entire dataset stored in memory to make predictions. Here's a
breakdown of why this is the case:

1. Storage of Training Data: k-NN stores all the training examples and does not
abstract or build a generalized model from them. Instead, the raw data itself is used
to make predictions, which means the algorithm must keep all the training data in
memory.
2. Prediction by Instance Comparison: When a new query (or test instance) comes
in, the k-NN algorithm compares it to every instance in the stored training set to find
the closest k instances. This means that the algorithm's performance depends on
how quickly it can access and compute distances between the query and all stored
instances.
3. No Generalization: Since k-NN doesn’t build a generalized model, it "memorizes"
the training data and retrieves the relevant pieces of it when needed for predictions.
This makes the prediction process computationally expensive, as it must involve all
the data stored in memory during the query phase.
Therefore, the method is termed "memory-based" because it relies heavily on the
memory to store the entire dataset and use it directly during the prediction stage,
unlike model-based learners, which abstract the data into a model that doesn’t need
to reference the training set in full.

5. What are the benefits and limitations of k-NN algorithm?

Advantages of the KNN Algorithm


● Easy to implement as the complexity of the algorithm is not that high.
● Adapts Easily – As per the working of the KNN algorithm it stores all the data
in memory storage and hence whenever a new example or data point is
added then the algorithm adjusts itself as per that new example and has its
contribution to the future predictions as well.
● Few Hyperparameters – The only parameters which are required in the
training of a KNN algorithm are the value of k and the choice of the distance
metric which we would like to choose from our evaluation metric.

Disadvantages of the KNN Algorithm


● Does not scale – As we have heard about this, the KNN algorithm is also
considered a Lazy Algorithm. The main significance of this term is that this
takes lots of computing power as well as data storage. This makes this
algorithm both time-consuming and resource exhausting.
● Curse of Dimensionality – There is a term known as the peaking
phenomenon according to this the KNN algorithm is affected by the curse of
dimensionality which implies the algorithm faces a hard time classifying the
data points properly when the dimensionality is too high.
● Prone to Overfitting – As the algorithm is affected due to the curse of
dimensionality it is prone to the problem of overfitting as well. Hence generally
feature selection as well as dimensionality reduction techniques are applied to
deal with this problem.
6. Consider the data from a questionnaire survey and objective testing with two
attributes (acid durability and strength) to classify whether a special tissue is
good or not.

Now, the factory produces a new paper tissue that has acid durability = 3 and
strength = 7.
Classify this new tissue as GOOD or BAD

Step 1
First, Number of parameters K = Number of nearest neighbors.
Therefore, from the given data K = 3

Step 2
Calculate the distance between the query instance and all the training samples.
Here query instance is (3,7) and calculates the distance by using the Euclidean
Distance formula

The below table shows the Euclidean Distance for every paper from the query
instance (3, 7):

Step 3
Sort the distance and determine the nearest neighbors based on Kth
minimum distance
The below table shows the sorted distance and according to that nearest neighbor is
decided for each paper:
Step 4
Collect the Quality of the nearest neighbors. Hence in the below, table Quality for
Paper_2 is not included because the rank of this paper item is more than 3.
The below table shows the quality of each paper based on the nearest neighbor:

Step 5
Use the simple majority of the category of nearest neighbors as the prediction value
of the query instance.
Here, it got 2 Good and 1 Bad value for the quality of nearest neighbors.

Hence, 2 Good > 1 Bad from which, the conclusion is that a new sample paper_5
that passes laboratory test with "Acid durability = 3" and "Strength = 7" is included in
Good category quality.

You might also like