0% found this document useful (0 votes)

148 views40 pages

BookSlides 5A Similarity Based Learning

The document summarizes key concepts from sections 5.1-5.3 of chapter 5 on similarity-based learning from a textbook on machine learning. It discusses the big idea of classifying unknown instances based on similarity to known examples. It then covers fundamentals of the approach, including feature spaces to represent data as points in n-dimensional space and distance metrics like Euclidean and Manhattan distance to measure similarity between points. Standard algorithms like k-nearest neighbors that use these concepts are also mentioned.

Uploaded by

Mba Nani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views40 pages

BookSlides 5A Similarity Based Learning

Uploaded by

Mba Nani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Big Idea Fundamentals Standard Approach Epilogue Summary

Fundamentals of Machine Learning for

Predictive Data Analytics
Chapter 5: Similarity-based Learning
Sections 5.1, 5.2, 5.3

John Kelleher and Brian Mac Namee and Aoife D’Arcy

[email protected] [email protected] [email protected]

Big Idea Fundamentals Standard Approach Epilogue Summary

1 Big Idea

2 Fundamentals
Feature Space
Distance Metrics

3 Standard Approach: The Nearest Neighbor Algorithm

A Worked Example

4 Epilogue

5 Summary
Big Idea Fundamentals Standard Approach Epilogue Summary

Big Idea
Big Idea Fundamentals Standard Approach Epilogue Summary

The year is 1798 and you are Lieutenant-Colonel David

Collins of HMS Calcutta who is exploring the region around
Hawkesbury River, in New South Wales.
After an expedition up the river one of the men tells you
that he saw a strange animal near the river.
You ask him to describe the animal to you and he explains
that he didn’t see it very well but that he did notice that it
had webbed feet and a duck-bill snout, and that it growled
at him.
In order to plan the expedition for the next day you decide
that you need to classify the animal so that you can figure
out whether it is dangerous to approach it or not.
Grrrh! Score

3 7 7 1
7 3 7 1
7 3 3 2
Figure: Matching animals you remember to the features of the
unknown animal described by the sailor. Note: The images used in
this figure were created by Jan Gillbank for the English for the
Australian Curriculum website (http://www.e4ac.edu.au) and
are used under the Create Commons Attribution 3.0 Unported licence
(http://creativecommons.org/licenses/by/3.0). The
images were sourced via Wikimedia Commons.
Big Idea Fundamentals Standard Approach Epilogue Summary

The process of classifying an unknown animal by matching

the features of the animal against the features of animals
you can remember neatly encapsulates the big idea
underpinning similarity-based learning:

if you are trying to classify something then you should

search your memory to find things that are similar and
label it with the same class as the most similar thing in
your memory

One of the simplest and best known machine learning

algorithms for this type of reasoning is called the nearest
neighbor algorithm.
Big Idea Fundamentals Standard Approach Epilogue Summary

Fundamentals
Big Idea Fundamentals Standard Approach Epilogue Summary

The fundamentals of similarity-based learning are:

Feature space
Similarity metrics
Big Idea Fundamentals Standard Approach Epilogue Summary

Feature Space

Table: The speed and agility ratings for 20 college athletes labelled
with the decisions for whether they were drafted or not.
ID Speed Agility Draft ID Speed Agility Draft
1 2.50 6.00 No 11 2.00 2.00 No
2 3.75 8.00 No 12 5.00 2.50 No
3 2.25 5.50 No 13 8.25 8.50 No
4 3.25 8.25 No 14 5.75 8.75 Yes
5 2.75 7.50 No 15 4.75 6.25 Yes
6 4.50 5.00 No 16 5.50 6.75 Yes
7 3.50 5.25 No 17 5.25 9.50 Yes
8 3.00 3.25 No 18 7.00 4.25 Yes
9 4.00 4.00 No 19 7.50 8.00 Yes
10 4.25 3.75 No 20 7.25 5.75 Yes
8
Agility
6
4
2

2 3 4 5 6 7 8
Speed

Figure: A feature space plot of the data in Table 2 [25] . The triangles
represent ’Non-draft’ instances and the crosses represent the ’Draft’
instances.
Big Idea Fundamentals Standard Approach Epilogue Summary

Feature Space

A feature space is an abstract n-dimensional space that is

created by taking each of the descriptive features in an
ABT to be the axes of a reference space and each
instance in the dataset is mapped to a point in the feature
space based on the values of its descriptive features.
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

A similarity metric measures the similarity between two

instances according to a feature space
Mathematically, a metric must conform to the following four
criteria:
1 Non-negativity: metric(a, b) ≥ 0
2 Identity: metric(a, b) = 0 ⇐⇒ a = b
3 Symmetry: metric(a, b) = metric(b, a)
4 Triangular Inequality:
metric(a, b) ≤ metric(a, c) + metric(b, c)
where metric(a, b) is a function that returns the distance
between two instances a and b.
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

One of the best known metrics is Euclidean distance which

computes the length of the straight line between two
points. Euclidean distance between two instances a and b
in a m-dimensional feature space is defined as:
v
u m
uX
Euclidean(a, b) = t (a[i] − b[i])2 (1)
i=1
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

Example
The Euclidean distance between instances d12 (S PEED= 5.00,
AGILITY= 2.5) and d5 (S PEED= 2.75,AGILITY= 7.5) in Table 2
[25]
is:
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

Example
The Euclidean distance between instances d12 (S PEED= 5.00,
AGILITY= 2.5) and d5 (S PEED= 2.75,AGILITY= 7.5) in Table 2
[25]
is:

q
Euclidean(h5.00, 2.50i , h2.75, 7.50i) = (5.00 − 2.75)2 + (2.50 − 7.50)2
√
= 30.0625 = 5.4829
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

Another, less well known, distance measure is the

Manhattan distance or taxi-cab distance.
The Manhattan distance between two instances a and b in
a feature space with m dimensions is:1
m
X
Manhattan(a, b) = abs(a[i] − b[i]) (2)
i=1

1
The abs() function surrounding the subtraction term indicates that we
use the absolute value, i.e. non-negative value, when we are summing the
differences; this makes sense because distances can’t be negative.
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

n
d ea
cli
Eu

●
Manhattan

Figure: The Manhattan and Euclidean distances between two points.

Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

Example
The Manhattan distance between instances d12 (S PEED= 5.00,
AGILITY= 2.5) and d5 (S PEED= 2.75,AGILITY= 7.5) in Table 2
[25]
is:
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

Example
The Manhattan distance between instances d12 (S PEED= 5.00,
AGILITY= 2.5) and d5 (S PEED= 2.75,AGILITY= 7.5) in Table 2
[25]
is:

Manhattan(h5.00, 2.50i , h2.75, 7.50i) = abs(5.00 − 2.75) + abs(2.5 − 7.5)

= 2.25 + 5 = 7.25
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

The Euclidean and Manhattan distances are special cases

of Minkowski distance
The Minkowski distance between two instances a and b in
a feature space with m descriptive features is:
m
!1
X p

Minkowski(a, b) = abs(a[i] − b[i])p (3)

i=1

where different values of the parameter p result in different

distance metrics
The Minkowski distance with p = 1 is the Manhattan
distance and with p = 2 is the Euclidean distance.
Big Idea Fundamentals Standard Approach Epilogue Summary

Distance Metrics

The larger the value of p the more emphasis is placed on

the features with large differences in values because these
differences are raised to the power of p.
Example
Manhattan Euclidean
Instance ID Instance ID (Minkowski p=1) (Minkowski p=2)
12 5 7.25 5.4829
12 17 7.25 8.25
17 Manhattan
Euclidean

8
Agility 5
6
4

12
2

2 3 4 5 6 7 8
Speed

The Manhattan and Euclidean distances between instances d12

(S PEED= 5.00, AGILITY= 2.5) and d5 (S PEED= 2.75, AGILITY= 7.5) and
between instances d12 and d17 (S PEED= 5.25, AGILITY= 9.5).
Big Idea Fundamentals Standard Approach Epilogue Summary

Standard Approach: The Nearest

Neighbor Algorithm
Big Idea Fundamentals Standard Approach Epilogue Summary

The Nearest Neighbour Algorithm

Require: set of training instances

Require: a query to be classified
1: Iterate across the instances in memory and find the
instance that is shortest distance from the query position in
the feature space.
2: Make a prediction for the query equal to the value of the
target feature of the nearest neighbor.
Big Idea Fundamentals Standard Approach Epilogue Summary

A Worked Example

Example
Should we draft an athlete with the following profile:

S PEED= 6.75, AGILITY= 3

Big Idea Fundamentals Standard Approach Epilogue Summary

A Worked Example

8
Agility
6
4

?
2

2 3 4 5 6 7 8
Speed

Figure: A feature space plot of the data in Table 2 [25] with the position
in the feature space of the query represented by the ? marker. The
triangles represent ’Non-draft’ instances and the crosses represent
the ’Draft’ instances.
Big Idea Fundamentals Standard Approach Epilogue Summary

A Worked Example

Table: The distances (Dist.) between the query instance with

S PEED = 6.75 and AGILITY = 3.00 and each instance in Table 2 [25] .

ID S PEED AGILITY D RAFT Dist. ID S PEED AGILITY D RAFT Dist.

18 7.00 4.25 yes 1.27 11 2.00 2.00 no 4.85
12 5.00 2.50 no 1.82 19 7.50 8.00 yes 5.06
10 4.25 3.75 no 2.61 3 2.25 5.50 no 5.15
20 7.25 5.75 yes 2.80 1 2.50 6.00 no 5.20
9 4.00 4.00 no 2.93 13 8.25 8.50 no 5.70
6 4.50 5.00 no 3.01 2 3.75 8.00 no 5.83
8 3.00 3.25 no 3.76 14 5.75 8.75 yes 5.84
15 4.75 6.25 yes 3.82 5 2.75 7.50 no 6.02
7 3.50 5.25 no 3.95 4 3.25 8.25 no 6.31
16 5.50 6.75 yes 3.95 17 5.25 9.50 yes 6.67
Big Idea Fundamentals Standard Approach Epilogue Summary

A Worked Example

(a) Voronoi tessellation (b) Decision boundary (k = 1)

Figure: (a) The Voronoi tessellation of the feature space for the
dataset in Table 2 [25] with the position of the query represented by the
? marker; (b) the decision boundary created by aggregating the
neighboring Voronoi regions that belong to the same target level.
Big Idea Fundamentals Standard Approach Epilogue Summary

A Worked Example

One of the great things about nearest neighbour

algorithms is that we can add in new data to update the
model very easily.
Big Idea Fundamentals Standard Approach Epilogue Summary

A Worked Example

Table: The extended version of the college athletes dataset.

ID S PEED AGILITY D RAFT ID S PEED AGILITY D RAFT

1 2.50 6.00 no 12 5.00 2.50 no
2 3.75 8.00 no 13 8.25 8.50 no
3 2.25 5.50 no 14 5.75 8.75 yes
4 3.25 8.25 no 15 4.75 6.25 yes
5 2.75 7.50 no 16 5.50 6.75 yes
6 4.50 5.00 no 17 5.25 9.50 yes
7 3.50 5.25 no 18 7.00 4.25 yes
8 3.00 3.25 no 19 7.50 8.00 yes
9 4.00 4.00 no 20 7.25 5.75 yes
10 4.25 3.75 no 21 6.75 3.00 yes
11 2.00 2.00 no
Big Idea Fundamentals Standard Approach Epilogue Summary

A Worked Example

(a) Voronoi tessellation (b) Decision boundary (k = 1)

Figure: (a) The Voronoi tessellation of the feature space when the
dataset has been updated to include the query instance; (b) the
updated decision boundary reflecting the addition of the query
instance in the training set.
Big Idea Fundamentals Standard Approach Epilogue Summary

Epilogue
Big Idea Fundamentals Standard Approach Epilogue Summary

Returning to 1798 and HMS Calcutta, the next day you

accompany your men on the expedition up the river and
you encounter the strange animal the sailor had described
to you.
This time when you see the animal yourself you realize that
it definitely isn’t a duck!
It turns out that you and your men are the first Europeans
to encounter a platypus2 .

2
The story recounted here of the discovery of the platypus is loosely
based on real events.
Big Idea Fundamentals Standard Approach Epilogue Summary

Figure: A duck-billed platypus.The platypus image used in here was

created by Jan Gillbank for the English for the Australian Curriculum
website (http://www.e4ac.edu.au) and are used under the
Create Commons Attribution 3.0 Unported licence
(http://creativecommons.org/licenses/by/3.0). The
image was sourced via Wikimedia Commons.
Big Idea Fundamentals Standard Approach Epilogue Summary

This epilogue illustrates two important, and related,

aspects of supervised machine learning:
1 supervised machine learning is based on the stationarity
assumption which states that the data doesn’t change -
remains stationary - over time.
2 in the context of classification, supervised machine learning
creates models that distinguish between the classes that
are present in the dataset they are induced from. So, if a
classification model is trained to distinguish between lions,
frogs and ducks, the model will classify a query as being
either a lion, a frog or a duck; even if the query is actually a
platypus.
Big Idea Fundamentals Standard Approach Epilogue Summary

Summary
Big Idea Fundamentals Standard Approach Epilogue Summary

Similarity-based prediction models attempt to mimic a very

human way of reasoning by basing predictions for a target
feature value on the most similar instances in
memory—this makes them easy to interpret and
understand.
This advantage should not be underestimated as being
able to understand how the model works gives people
more confidence in the model and, hence, in the insight
that it provides.
Big Idea Fundamentals Standard Approach Epilogue Summary

The inductive bias underpinning similarity-based

classification is that things that are similar (i.e., instances
that have similar descriptive features) belong to the same
class.
The nearest neighbor algorithm creates an implicit global
predictive model by aggregating local models, or
neighborhoods.
The definition of these neighborhoods is based on
proximity within the feature space to the labelled training
instances.
Queries are classified using the label of the training
instance defining the neighborhood in the feature space
that contains the query.
Big Idea Fundamentals Standard Approach Epilogue Summary

1 Big Idea

2 Fundamentals
Feature Space
Distance Metrics

3 Standard Approach: The Nearest Neighbor Algorithm

A Worked Example

4 Epilogue

5 Summary

Physics For Remedial Program Lectures Note
91% (11)
Physics For Remedial Program Lectures Note
201 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
ML Unit 2
No ratings yet
ML Unit 2
22 pages
Maneesha Panchakam
100% (6)
Maneesha Panchakam
106 pages
6.1 Circular Motion
No ratings yet
6.1 Circular Motion
50 pages
Good Hope School (F.4 - F.6) Properties of Circle: No. Diagram Given Condition Conclusion Abbreviation
100% (4)
Good Hope School (F.4 - F.6) Properties of Circle: No. Diagram Given Condition Conclusion Abbreviation
4 pages
Math 8 3RD Exam
100% (4)
Math 8 3RD Exam
4 pages
4MA1 2H Que 20200305
75% (4)
4MA1 2H Que 20200305
28 pages
Chapter 3 Slides PDF
100% (1)
Chapter 3 Slides PDF
184 pages
Demo SOFTEXPERT GRC PDF
No ratings yet
Demo SOFTEXPERT GRC PDF
59 pages
Geometry EOC Practice Test
No ratings yet
Geometry EOC Practice Test
34 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
ME 395 12 Solutions #5
100% (2)
ME 395 12 Solutions #5
5 pages
Unit 3
No ratings yet
Unit 3
100 pages
Unit - IV
No ratings yet
Unit - IV
78 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
Introduction To Classification - KNN
No ratings yet
Introduction To Classification - KNN
29 pages
DS - Module 4
No ratings yet
DS - Module 4
57 pages
BBC Documentary The Story of Maths 2
No ratings yet
BBC Documentary The Story of Maths 2
7 pages
Joao Penedones PDF
No ratings yet
Joao Penedones PDF
63 pages
K-Nearest Neighbors: Nipun Batra July 5, 2020
No ratings yet
K-Nearest Neighbors: Nipun Batra July 5, 2020
66 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
BookSlides 6B Probability-Based Learning
No ratings yet
BookSlides 6B Probability-Based Learning
74 pages
Statistical Learning
No ratings yet
Statistical Learning
92 pages
BookSlides 5B Similarity Based Learning
No ratings yet
BookSlides 5B Similarity Based Learning
69 pages
BookSlides 5B Similarity Based Learning
No ratings yet
BookSlides 5B Similarity Based Learning
69 pages
BookSlides 6A Probability-Based Learning PDF
No ratings yet
BookSlides 6A Probability-Based Learning PDF
68 pages
BookSlides 4A Information-Based Learning PDF
No ratings yet
BookSlides 4A Information-Based Learning PDF
65 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
ML Unit 2
No ratings yet
ML Unit 2
24 pages
05 KNN
No ratings yet
05 KNN
49 pages
ML Unit - 2
No ratings yet
ML Unit - 2
85 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Lecture 3
No ratings yet
Lecture 3
58 pages
Fundamentals of Machine Learning For Predictive Data Analytics
No ratings yet
Fundamentals of Machine Learning For Predictive Data Analytics
52 pages
BookSlides 7A Error-Based Learning
No ratings yet
BookSlides 7A Error-Based Learning
49 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
BookSlides 3B Data Exploration
No ratings yet
BookSlides 3B Data Exploration
60 pages
Showfile
No ratings yet
Showfile
130 pages
DS - Module 3
No ratings yet
DS - Module 3
65 pages
BookSlides 3A Data Exploration
No ratings yet
BookSlides 3A Data Exploration
43 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Machine Learning Module-03
No ratings yet
Machine Learning Module-03
24 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
Distances Similarities
No ratings yet
Distances Similarities
39 pages
Distance Based Method
No ratings yet
Distance Based Method
67 pages
Dennis Atkinson - Art, Equality and Learning - Pedagogies Against The State-Sense Publishers (2011) PDF
No ratings yet
Dennis Atkinson - Art, Equality and Learning - Pedagogies Against The State-Sense Publishers (2011) PDF
193 pages
BookSlides 5A Similarity-based-Learning
No ratings yet
BookSlides 5A Similarity-based-Learning
40 pages
K Nearest Neighbour - Algorithm
No ratings yet
K Nearest Neighbour - Algorithm
29 pages
Week03 - 1 - KNN
No ratings yet
Week03 - 1 - KNN
32 pages
2067 SLC For 2072 Kins School
No ratings yet
2067 SLC For 2072 Kins School
2 pages
Antipode - 2005 - Goonewardena - The Urban Sensorium Space Ideology and The Aestheticization of Politics
No ratings yet
Antipode - 2005 - Goonewardena - The Urban Sensorium Space Ideology and The Aestheticization of Politics
26 pages
BookSlides 11 The Art of Machine Learning For Predictive Data Analytics
No ratings yet
BookSlides 11 The Art of Machine Learning For Predictive Data Analytics
27 pages
Module II Lecture 4
No ratings yet
Module II Lecture 4
19 pages
Clustering Lecture 1: Basics: Jing Gao
No ratings yet
Clustering Lecture 1: Basics: Jing Gao
62 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Curves: Engineering Survey 2 SUG200/213
No ratings yet
Curves: Engineering Survey 2 SUG200/213
89 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
Introduction To AI and ML - UNIT 4
No ratings yet
Introduction To AI and ML - UNIT 4
29 pages
BDA
No ratings yet
BDA
31 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Adobe Scan Dec 02, 2024
No ratings yet
Adobe Scan Dec 02, 2024
8 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Camnetics, Inc.: Time in Degrees
No ratings yet
Camnetics, Inc.: Time in Degrees
28 pages
4.4-InstanceBasedLearning Part 1
No ratings yet
4.4-InstanceBasedLearning Part 1
16 pages
Predict Based Simmiliarity and Validation
No ratings yet
Predict Based Simmiliarity and Validation
19 pages
DM Lab 02
No ratings yet
DM Lab 02
12 pages
ML Unit 2
No ratings yet
ML Unit 2
11 pages
Module 3 Lab 1
No ratings yet
Module 3 Lab 1
6 pages
Ch15-RF Site Survey Fundamentals - 190615
No ratings yet
Ch15-RF Site Survey Fundamentals - 190615
12 pages
Similarity-Based Learning: Exercise Solutions: Solutionsmanual-Mit-7X9-Style 2015/4/22 21:17 Page 45 #55
No ratings yet
Similarity-Based Learning: Exercise Solutions: Solutionsmanual-Mit-7X9-Style 2015/4/22 21:17 Page 45 #55
10 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
9 Distance Measures in Data Science - Towards Data Science
No ratings yet
9 Distance Measures in Data Science - Towards Data Science
14 pages
Metric-Based Classifiers: Nuno Vasconcelos (Ken Kreutz-Delgado)
No ratings yet
Metric-Based Classifiers: Nuno Vasconcelos (Ken Kreutz-Delgado)
32 pages
01 Basics 02knn 03
No ratings yet
01 Basics 02knn 03
9 pages
1954 - Application of The Rayleigh Ritz Method To Variational Problem by Indritz
No ratings yet
1954 - Application of The Rayleigh Ritz Method To Variational Problem by Indritz
37 pages
Similarity Based Learning (Part 1)
No ratings yet
Similarity Based Learning (Part 1)
6 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Cambridge IGCSE: MATHEMATICS 0580/43
No ratings yet
Cambridge IGCSE: MATHEMATICS 0580/43
24 pages
Fundamentals of Machine Learning For Predictive Data Analytics
No ratings yet
Fundamentals of Machine Learning For Predictive Data Analytics
49 pages
Distance Functions
No ratings yet
Distance Functions
7 pages
Gorg-2018-Euler Class and Intersection T
No ratings yet
Gorg-2018-Euler Class and Intersection T
7 pages
Week 6 - K-Nearest Neighbor-Dikonversi
No ratings yet
Week 6 - K-Nearest Neighbor-Dikonversi
9 pages
Attempt This Question.: Geometrical Reasoning A
No ratings yet
Attempt This Question.: Geometrical Reasoning A
5 pages
Bahan Silabus
No ratings yet
Bahan Silabus
5 pages
Worksheet F Ch6
No ratings yet
Worksheet F Ch6
3 pages
Continuty and Differentibily
No ratings yet
Continuty and Differentibily
7 pages
Dipmaths
No ratings yet
Dipmaths
2 pages
Surface Areas and Volumes Problems
No ratings yet
Surface Areas and Volumes Problems
2 pages
Orthographic Projection
No ratings yet
Orthographic Projection
2 pages
Cut and Cover Tunnel
No ratings yet
Cut and Cover Tunnel
1 page
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
No ratings yet
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
30 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
TEST - Trigonometry - March 2012
No ratings yet
TEST - Trigonometry - March 2012
6 pages
Brillouin
No ratings yet
Brillouin
4 pages
IGNOU MCA Digital Image Processing and Computer Vision Unsolved Paper Book MCS 230
From Everand
IGNOU MCA Digital Image Processing and Computer Vision Unsolved Paper Book MCS 230
Manish Soni
No ratings yet
IGNOU MCA Discrete Mathematics Previous Years Unsolved Papers MCS 212
From Everand
IGNOU MCA Discrete Mathematics Previous Years Unsolved Papers MCS 212
Manish Soni
No ratings yet

BookSlides 5A Similarity Based Learning

Uploaded by

BookSlides 5A Similarity Based Learning

Uploaded by

Big Idea Fundamentals Standard Approach Epilogue Summary

Fundamentals of Machine Learning for

John Kelleher and Brian Mac Namee and Aoife D’Arcy

[email protected] [email protected] [email protected]

3 Standard Approach: The Nearest Neighbor Algorithm

The year is 1798 and you are Lieutenant-Colonel David

The process of classifying an unknown animal by matching

if you are trying to classify something then you should

One of the simplest and best known machine learning

The fundamentals of similarity-based learning are:

A feature space is an abstract n-dimensional space that is

A similarity metric measures the similarity between two

One of the best known metrics is Euclidean distance which

Another, less well known, distance measure is the

Figure: The Manhattan and Euclidean distances between two points.

Manhattan(h5.00, 2.50i , h2.75, 7.50i) = abs(5.00 − 2.75) + abs(2.5 − 7.5)

The Euclidean and Manhattan distances are special cases

Minkowski(a, b) = abs(a[i] − b[i])p (3)

where different values of the parameter p result in different

The larger the value of p the more emphasis is placed on

The Manhattan and Euclidean distances between instances d12

Standard Approach: The Nearest

The Nearest Neighbour Algorithm

Require: set of training instances

S PEED= 6.75, AGILITY= 3

Table: The distances (Dist.) between the query instance with

ID S PEED AGILITY D RAFT Dist. ID S PEED AGILITY D RAFT Dist.

(a) Voronoi tessellation (b) Decision boundary (k = 1)

One of the great things about nearest neighbour

Table: The extended version of the college athletes dataset.

ID S PEED AGILITY D RAFT ID S PEED AGILITY D RAFT

(a) Voronoi tessellation (b) Decision boundary (k = 1)

Returning to 1798 and HMS Calcutta, the next day you

Figure: A duck-billed platypus.The platypus image used in here was

This epilogue illustrates two important, and related,

Similarity-based prediction models attempt to mimic a very

The inductive bias underpinning similarity-based

3 Standard Approach: The Nearest Neighbor Algorithm

You might also like