0% found this document useful (0 votes)
4 views25 pages

Instance Based Learning: Aiml/ Bda

The document discusses Instance Based Learning (IBL) and the K-Nearest Neighbors (K-NN) algorithm, which classifies new instances based on their proximity to stored training instances. It highlights the advantages and disadvantages of IBL, provides examples of its application, and explains the importance of selecting the appropriate value for K. Additionally, it clarifies that KNN is a supervised, non-parametric, and lazy learning algorithm.

Uploaded by

tarun.nemaai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views25 pages

Instance Based Learning: Aiml/ Bda

The document discusses Instance Based Learning (IBL) and the K-Nearest Neighbors (K-NN) algorithm, which classifies new instances based on their proximity to stored training instances. It highlights the advantages and disadvantages of IBL, provides examples of its application, and explains the importance of selecting the appropriate value for K. Additionally, it clarifies that KNN is a supervised, non-parametric, and lazy learning algorithm.

Uploaded by

tarun.nemaai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Instance Based Learning

AIML/ BDA
Topics covered
• K-Nearest Neighbors (K-NN) concept
• Distance metrics
• K-NN for classification
Machine Learning Classification
• Training instances are stored in memory
Instances • For a test (unseen) instances
Based Learning • Compare test instances with instances seen in training and
gives result
• Also known as Memory based learning

IBL algorithms are local


rather than global
approximations.
Instances Based Learning
• IBL methods learn by storing the training data.

• When a new query instance is encountered, a set of similar


related instances is retrieved from memory and used to classify
the new query instance.

• For each distinct query, IBL construct different approximation to


the target function.
• These are local rather than global approximations.
Advantages and Disadvantages of IBL
Advantage:
• Suitable for problems with very complex target functions.

Disadvantage:
• The cost of classifying new instances - high.
• Considered all attributes of the instances – dimension increase
Comparison
Example
• A company produce tissues (used by biological labs).

• The company’s objective is to predict how well their products are


accepted by their clients.

• They conducted a survey with their clients to find the acceptance of


the product. Quality is based on acid durability and strength parameter.
Example
• The data set pertains to a company that produces tissues for use in biological
labs.
Name Acid Acid Acceptability
Durability Strength
Type-1 7 7 Low
Type-2 7 4 Low
Type-3 3 4 High
Type-4 1 4 High

Test data: Type-5 Acid Durability = 3 Strength = 7

• Built a classifier to predict a new type of tissue


• Apply the Euclidian distance measure for the data to find the distances from
the new data Type-5.

Name Acid Strength Distance NeighorR


Durability ank
Type-1 7 7 Sqrt((7-3)2+(7-7)2) =4 3
Type-2 7 4 Sqrt((7-3)2+(4-7)2)=5 4
Type-3 3 4 Sqrt((3-3)2+(4-7)2)=3 1
Type-4 1 4 Sqrt((1-3)2+(4-7)2)=3.6 2

• If k =1, ONE immediate neighbor Type 3 Good (new type is = High)


• If k =2, TWO immediate neighbor Type 3, Type 4 = High; (new type is = High)
• If k =3, THREE immediate neighbor Type 3, Type 4 = High & Type 1 = Low (but the
probability of High is high so, consider new type is classified as High)
Example 2
• Assume a Boolean target function and Distance Classification
a 2-dimensional instance space from query
instance
(shown in figure). 1.00 +
• Determine how the k-Nearest 1.35 -
Neighbour Learning algorithm would 1.40 -
1.60 -
classify the new instance xq for k = 1.90 +
1,3,5 and 7. 2.00 +
2.20 -
• The + and – signs in the instance space 2.40 +
refer to positive and negative examples 2.80 -
respectively.
Distance Classification + -
-
from query -

instance - xq +
1.00 + +
-
1.35 - +
1.40 -
1.60 -
1.90 + 1-NN +
2.00 +
2.20 - 3-NN -
2.40 +
2.80 - 5-NN -
7-NN -
Selection of K value ?
• Try many different values for K and see what works best for your
problem.
• K value should be an odd number (3, 5, 7, 9, etc.).
How does the efficiency and accuracy of k-NN search
change as k increases?
• If we have sufficiently large number of training experiences the
accuracy should increase
• The computational complexity of KNN increases with the size of the
training dataset.
• The time to calculate the prediction will also increase.
• In that sense less efficient
• KNN is a Lazy Learning algorithm – why?

• No learning of the model/ algorithm

• It “memorizes” the training dataset

• DT algorithm learns its model during training time


• KNN is a Non-Parametric algorithm – why?

• It makes no assumptions about the functional form of the


problem being solved.
• Is KNN supervised or unsupervised learning algorithm?

• KNN is a supervised learning algorithm, uses labeled data for


classification problem.

• Note: K-means is an unsupervised learning algorithm used for


clustering problem
Thank you

You might also like