0% found this document useful (0 votes)

39 views17 pages

Lecture 03 - Supervised Learning by Computing Distances - Plain

This document provides an introduction to supervised learning and discusses computing distances between vectors. It defines key concepts like feature vectors, Euclidean distance, weighted Euclidean distance, and absolute distance. It also outlines common supervised learning problems like classification, regression, and ranking that can be formulated using distances between labeled training data vectors.

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views17 pages

Lecture 03 - Supervised Learning by Computing Distances - Plain

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Getting Started with Supervised Learning,

Learning by Computing Distances (1)

CS771: Introduction to Machine Learning

Piyush Rai
2
Supervised Learning
Labeled
Training “dog”
Data
“dog” Supervised Learning
“cat” Algorithm
“cat” Cat vs Dog
“cat” Prediction model
Important: In ML (not just sup. learning
but also unsup. and RL), training and test
datasets should be “similar” (we don’t
like “out-of-syllabus” questions in Predicted Label
exams ) A test image
In the above example, it Cat vs Dog (cat/dog)
Does it mean ML is useless if
means that we can’t have test Prediction model this assumption is violated?
data with BnW images or
sketches of cats and dogs Of course not.  Many ML
techniques exist to handle Will give you Just the names for
More formally, the train and
such situations (a bit advanced now – domain adaptation,
test data distributions should
but will touch upon those covariate shift, transfer learning,
be the same
later) etc
CS771: Intro to ML
3
Some Types of Supervised Learning Problems
 Consider building an ML module for an e-mail client

 Some tasks that we may want this module to perform

 Predicting whether an email of spam or normal: Binary Classification
 Predicting which of the many folders the email should be sent to: Multi-class Classification
 Predicting all the relevant tags for an email: Tagging or Multi-label Classification
 Predicting what’s the spam-score of an email: Regression
 Predicting which email(s) should be shown at the top: Ranking
 Predicting which emails are work/study-related emails: One-class Classification

 These predictive modeling tasks can be formulated as supervised learning problems

 Today: A very simple supervised learning model for binary/multi-class classification

 This model doesn’t require any fancy maths – just computing means and distances
CS771: Intro to ML
4
Some Notation and Conventions
 In ML, inputs are usually represented by vectors 0.5 0.3 0.6 0.1 0.2 0.5 0.9 0.2 0.1 0.5

 A vector consists of an array of scalar values

 Geometrically, a vector is just a point in a vector space, e.g.,
 A length 2 vector is a point in 2-dim vector space
Likewise for higher
 A length 3 vector is a point in 3-dim vector space dimensions, even though
harder to visualize
(0.5,0.3) (0.5,0.3,0.6)
0.5 0.3 0.6
0.5 0.3

 Unless specified otherwise

 Small letters in bold font will denote vectors, e.g., , , etc.
 Small letters in normal font to denote scalars, e.g. , etc
 Capital letters in bold font will denote matrices (2-dim arrays), e.g., , etc
CS771: Intro to ML
5
Some Notation and Conventions
 A single vector will be assumed to be of the form

 Unless specified otherwise, vectors will be assumed to be column vectors

 So we will assume to be a column vector of size
 Assuming each element to be real-valued scalar, or (: space of reals)

 If is a feature vector representing, say an image, then

 denotes the dimensionality of this feature vector (number of features)
 (a scalar) denotes the value of feature in the image

 For denoting multiple vectors, we will use a subscript with each vector, e.g.,
 N images denoted by N feature vectors , or compactly as
 The vector denotes the image
 (a scalar) denotes the feature () of the image
CS771: Intro to ML
6
Some Basic Operations on Vectors
 Addition/subtraction of two vectors gives another vector of the same size

 The mean (average or centroid) of vectors

𝑁
1
𝜇= ∑ 𝐱 𝑛 (of the same size as each )
𝑁 𝑛=1
 The inner/dot product of two vectors and Assuming both and have
unit Euclidean norm

= (a real-valued number denoting how “similar” and are)

 For a vector , its Euclidean norm is defined via its inner product with itself
 Also the Euclidean distance of from origin
 Note: Euclidean norm is also called L2
norm

CS771: Intro to ML
7
Computing Distances
 Euclidean (L2 norm) distance between two vectors and

√
Sqrt of Inner product of Another expression in terms of inner
𝐷 the difference vector! products of individual vectors

𝑑 2 ( 𝒂, 𝒃 )=¿|𝒂− 𝒃|∨¿ 2= ∑ ( 𝑎𝑖 − 𝑏𝑖 ) =√ ( 𝒂− 𝒃 )
2 ⊤
( 𝒂−𝒃 )=√ 𝒂 𝒂+𝒃 𝒃 − 2 𝒂 𝒃 ¿ ⊤ ⊤ ⊤

𝑖=1
 Weighted Euclidean distance between two vectors and
Useful tip: Can achieve the effect of is a DxD diagonal matrix with weights on its
feature scaling (recall last lecture) diagonals. Weights may be known or even

√
by using weighted Euclidean learned from data (in ML problems)
distances!
𝐷
𝑑 𝑤 ( 𝒂 , 𝒃 )= ∑ 𝑤 𝑖 ( 𝑎𝑖 − 𝑏𝑖 ) =√ ( 𝒂 − 𝒃 )
2 ⊤
𝐖 ( 𝒂 − 𝒃)
Note: If is a DxD symmetric matrix
then it is called the Mahalanobis
distance (more on this later)
𝑖=1
 Absolute (L1 norm) distance between two vectors and
L1 norm distance is also known as the Apart from L2 and L1.
Manhattan distance or Taxicab norm 𝐷 there other ways of

𝑑 1 ( 𝒂 , 𝒃 )=¿|𝒂 − 𝒃|∨¿1 =∑ ¿ 𝑎 𝑖 −𝑏 𝑖∨¿ ¿ ¿

(it’s a very natural notion of distance defining distances?
between two points in some vector space)
Yes. Another, although less commonly 𝑖=1
used, distance is the L-infinity distance
(equals to max of abs-value of element-
wise difference between two vectors CS771: Intro to ML
8

Our First Supervised

Learner

CS771: Intro to ML
9
Prelude: A Very Primitive Classifier
The idea also applies to multi-
class classification: Use one
image per class, and predict label
based on the distances of the test
 Consider a binary classification problem – cat vs dog image from all such images

 Assume training data with just 2 images – one and one

 Given a new test image (cat/dog), how do we predict its label?

 A simple idea: Predict using its distance from each of the 2 training images

d( Test
image , ) < d( Test
image
, ) ? Predict cat else dogExcellent question! Glad you
Wait. Is it ML? Seems to be Some possibilities: Use a feature asked!
like just a simple “rule”. learning/selection algorithm to Even this simple model can be
Where is the “learning” part extract features, and use a learned. For example, for the
in this? Mahalanobis distance where you feature extraction/selection part
learn the W matrix (instead of using and/or for the distance computation
a predefined W), using “distance part
metric learning” techniques CS771: Intro to ML
10
Improving Our Primitive Classifier
 Just one input per class may not sufficiently capture variations in a class

 A natural improvement can be by using more inputs per class

“cat” “dog”
“cat” “dog”
“dog”
“cat”

 We will consider two approaches to do this

 Learning with Prototypes (LwP)
 Nearest Neighbors (NN – not “neural networks”, at least not for now )

 Both LwP and NN will use multiple inputs per class but in different ways

CS771: Intro to ML
11
Learning with Prototypes (LwP)
 Basic idea: Represent each class by a “prototype” vector

 Class Prototype: The “mean” or “average” of inputs from that class

Averages (prototypes) of each of the handwritten digits 1-9

 Predict label of each test input based on its distances from the class prototypes
 Predicted label will be the class that is the closest to the test input

 How we compute distances can have an effect on the accuracy of this model
(may need to try Euclidean, weight Euclidean, Mahalanobis, or something else)
Pic from: https://www.reddit.com/r/dataisbeautiful/comments/3wgbv9/average_handwritten_digit_oc/ CS771: Intro to ML
12
Learning with Prototypes (LwP): An Illustration
 Suppose the task is binary classification (two classes assumed pos and neg)

 Training data: labelled examples ,,

 Assume example from positive class, examples from negative class
 Assume green is positive and red is negative

1 𝜇
∑
1
𝜇− = 𝐱𝑛 𝜇−
+¿=
∑
¿
𝑁 − 𝑦 =−1
𝑛 𝜇+¿¿ 𝑁+¿
𝑦 𝑛=+1
𝐱𝑛¿

For LwP, the prototype

LwP straightforwardly generalizes
vectors (and here) define
to more than 2 classes as well
(multi-class classification) – K Test example Test example the “model”
prototypes for K classes
CS771: Intro to ML
13
LwP: The Prediction Rule, Mathematically
 What does the prediction rule for LwP look like mathematically?

 Assume we are using Euclidean distances here

||𝝁− − 𝐱|| =||𝝁−|| +||𝐱|| −2 ⟨ 𝝁− , 𝐱 ⟩

2 2 2
𝜇− 𝜇+¿¿
¿ ¿

Test example

Prediction Rule: Predict label as +1 if otherwise -1

CS771: Intro to ML
14
LwP: The Prediction Rule, Mathematically
 Let’s expand the prediction rule expression a bit more

 Thus LwP with Euclidean distance is equivalent to a linear model with

 Weight vector 2( Will look at linear models
 Bias term more formally and in more
detail later

 Prediction rule therefore is: Predict +1 if > 0, else predict -1

CS771: Intro to ML
15
LwP: Some Failure Cases
 Here is a case where LwP with Euclidean distance may not work well

Can use feature scaling or use

Mahalanobis distance to handle
such cases (will discuss this in
𝜇− the next lecture)
𝜇+¿¿

Test example

 In general, if classes are not equisized and spherical, LwP with Euclidean
distance will usually not work well (but improvements possible; will discuss
later)
CS771: Intro to ML
16
LwP: Some Key Aspects
 Very simple, interpretable, and lightweight model
 Just requires computing and storing the class prototype vectors

 Works with any number of classes (thus for multi-class classification as well)

 Can be generalized in various ways to improve it further, e.g.,

 Modeling each class by a probability distribution rather than just a prototype vector
 Using distances other than the standard Euclidean distance (e.g., Mahalanobis)

 With a learned distance function, can work very well even with very few
examples from each class (used in some “few-shot learning” models nowadays
– if interested, please refer to “Prototypical Networks for Few-shot Learning”)
CS771: Intro to ML
17
Next Lecture
 Fixing LwP
 Nearest Neighbors

CS771: Intro to ML

GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
CQF - ML - 2 - General Issues - Annotated
No ratings yet
CQF - ML - 2 - General Issues - Annotated
81 pages
Unit 3
No ratings yet
Unit 3
100 pages
19 Image Classification
No ratings yet
19 Image Classification
78 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
No ratings yet
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
22 pages
ShortCourse QTT Lecture1
No ratings yet
ShortCourse QTT Lecture1
40 pages
cz4041 8a SVM
No ratings yet
cz4041 8a SVM
38 pages
771 A18 Lec8
No ratings yet
771 A18 Lec8
107 pages
ML1 Skript 2023
No ratings yet
ML1 Skript 2023
97 pages
Lec 5
No ratings yet
Lec 5
22 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Lec 02
No ratings yet
Lec 02
27 pages
Lec 5
No ratings yet
Lec 5
20 pages
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
Lecture 3
No ratings yet
Lecture 3
21 pages
CS550 Lec7-ClassificationIntro
No ratings yet
CS550 Lec7-ClassificationIntro
49 pages
3 - Learning With Prototypes
No ratings yet
3 - Learning With Prototypes
17 pages
Course Logistics and Introduction: CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Course Logistics and Introduction: CS771: Introduction To Machine Learning Piyush Rai
24 pages
04-05-Supervised Learning by Computing Distances
No ratings yet
04-05-Supervised Learning by Computing Distances
16 pages
771 A18 Lec2
No ratings yet
771 A18 Lec2
119 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Lecture 15 - Generative Models For Supervised Learning - Plain
No ratings yet
Lecture 15 - Generative Models For Supervised Learning - Plain
15 pages
Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
65 pages
771 A18 Lec4
100% (1)
771 A18 Lec4
128 pages
Lec 03
No ratings yet
Lec 03
42 pages
Lecture 16 - Hyperplane Classifiers - Perceptron - Plain
No ratings yet
Lecture 16 - Hyperplane Classifiers - Perceptron - Plain
9 pages
Lecture 21 and 22
No ratings yet
Lecture 21 and 22
28 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
4 Nearest Neighbors
No ratings yet
4 Nearest Neighbors
13 pages
3 Percept Ron
No ratings yet
3 Percept Ron
34 pages
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
No ratings yet
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
29 pages
Lecture 21 and 22
No ratings yet
Lecture 21 and 22
28 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
6 - Support Vector Machines
No ratings yet
6 - Support Vector Machines
14 pages
Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth
No ratings yet
Learning With Prototypes: CS771: Introduction To Machine Learning Nisheeth
22 pages
Perceptrons and SVMS: Cs771: Introduction To Machine Learning Nisheeth
No ratings yet
Perceptrons and SVMS: Cs771: Introduction To Machine Learning Nisheeth
18 pages
Distance Based Method
No ratings yet
Distance Based Method
67 pages
Lecture 2
No ratings yet
Lecture 2
26 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
Lec 26
No ratings yet
Lec 26
16 pages
Distance Metric Learning For Large Margin Nearest Neighbor Classification
No ratings yet
Distance Metric Learning For Large Margin Nearest Neighbor Classification
8 pages
Lecture 02 - Warming-Up and Data and Features - Plain
No ratings yet
Lecture 02 - Warming-Up and Data and Features - Plain
23 pages
Showfile
No ratings yet
Showfile
130 pages
Linear Models: CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Linear Models: CS771: Introduction To Machine Learning Piyush Rai
8 pages
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
No ratings yet
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
29 pages
Bernd Klein Python Data Analysis Letter
No ratings yet
Bernd Klein Python Data Analysis Letter
514 pages
L03 Slides - Perceptron
No ratings yet
L03 Slides - Perceptron
22 pages
CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
CS771: Introduction To Machine Learning Piyush Rai
25 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
2 Getting Started
No ratings yet
2 Getting Started
20 pages
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
No ratings yet
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
15 pages
Lecture 26
No ratings yet
Lecture 26
17 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
No ratings yet
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
16 pages
Lecture 23
No ratings yet
Lecture 23
15 pages
Lect 1
No ratings yet
Lect 1
24 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
A B Testing
100% (1)
A B Testing
28 pages
Computing Vedic Planetary Positions, As Per Vedic Astronomy and Mathematics PDF
100% (1)
Computing Vedic Planetary Positions, As Per Vedic Astronomy and Mathematics PDF
7 pages
Model Training: (Anything Done While We Train The Model)
No ratings yet
Model Training: (Anything Done While We Train The Model)
194 pages
Friction - DPPs
No ratings yet
Friction - DPPs
11 pages
Mathematics Talent Reward Programme
No ratings yet
Mathematics Talent Reward Programme
3 pages
General Observation
No ratings yet
General Observation
93 pages
Bernd Klein Python and Machine Learning Letter
No ratings yet
Bernd Klein Python and Machine Learning Letter
453 pages
Cnns Convolution Neural Networks
No ratings yet
Cnns Convolution Neural Networks
50 pages
Adiabatic Reactor 2
No ratings yet
Adiabatic Reactor 2
11 pages
Fractal Time Why A Watched Kettle Never Boils Studies of Nonlinear Phenomena in Life Science Susie Vrobel Download
No ratings yet
Fractal Time Why A Watched Kettle Never Boils Studies of Nonlinear Phenomena in Life Science Susie Vrobel Download
77 pages
Morphological Analysis: Natural Language Processing (CSE 5321)
No ratings yet
Morphological Analysis: Natural Language Processing (CSE 5321)
23 pages
Tolerances and Fits: Min Max
No ratings yet
Tolerances and Fits: Min Max
24 pages
Dataset: (Most Famous)
No ratings yet
Dataset: (Most Famous)
8 pages
Unit 1 Lesson 1-5
No ratings yet
Unit 1 Lesson 1-5
24 pages
Year 2 Autumn Paper 2 Reasoning 2022
No ratings yet
Year 2 Autumn Paper 2 Reasoning 2022
12 pages
Buble Sort
No ratings yet
Buble Sort
97 pages
Optimal Design and Performance Analysis of Hydraulic Ram Pump System
No ratings yet
Optimal Design and Performance Analysis of Hydraulic Ram Pump System
16 pages
Paper A Method For Fuel Efficiency Classification of Agricultural Tractors
No ratings yet
Paper A Method For Fuel Efficiency Classification of Agricultural Tractors
11 pages
Electronic System Assistance For Grade 10 Mathematics of Rizal National Science High School
No ratings yet
Electronic System Assistance For Grade 10 Mathematics of Rizal National Science High School
15 pages
10.O'Zaro Bog Lanishlarni Statistik O'rganish
No ratings yet
10.O'Zaro Bog Lanishlarni Statistik O'rganish
81 pages
Tic Tac Toe
No ratings yet
Tic Tac Toe
34 pages
ZXF01U03
No ratings yet
ZXF01U03
4 pages
Applications of Trigonometry
No ratings yet
Applications of Trigonometry
7 pages
Working With Programs: Takeaways: Syntax
No ratings yet
Working With Programs: Takeaways: Syntax
2 pages
Command Line Python Scripting: Takeaways: Syntax
No ratings yet
Command Line Python Scripting: Takeaways: Syntax
2 pages
Trading Strategies Market Colour Ravi Kashyap 2018
No ratings yet
Trading Strategies Market Colour Ravi Kashyap 2018
26 pages
Practice For 2ND PT
No ratings yet
Practice For 2ND PT
3 pages
BER Analysis Power Point Presentation
No ratings yet
BER Analysis Power Point Presentation
39 pages
Math 1210 Project 2
No ratings yet
Math 1210 Project 2
3 pages
What Is Cluster Analysis?: Dmitriy (Dima) Gorenshteyn
No ratings yet
What Is Cluster Analysis?: Dmitriy (Dima) Gorenshteyn
54 pages
Tulane University Sea-Level Rise Study
No ratings yet
Tulane University Sea-Level Rise Study
11 pages
Oracle SQL Cheatsheet
No ratings yet
Oracle SQL Cheatsheet
2 pages
2009 Lotos Bssa
No ratings yet
2009 Lotos Bssa
21 pages
Leakage Current Mitigation in Photovoltaic String Inverter Using Predictive Control With Fixed Average Switching Frequency
No ratings yet
Leakage Current Mitigation in Photovoltaic String Inverter Using Predictive Control With Fixed Average Switching Frequency
11 pages
Case-Based MCQs Trigonometry
No ratings yet
Case-Based MCQs Trigonometry
4 pages
Nonstop Travel: Input
No ratings yet
Nonstop Travel: Input
2 pages
Java - Util.Inputmismatchexception Java - Util.Scanner Java - Util.Stack
No ratings yet
Java - Util.Inputmismatchexception Java - Util.Scanner Java - Util.Stack
3 pages
Polynomials 03
No ratings yet
Polynomials 03
1 page
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture 03 - Supervised Learning by Computing Distances - Plain

Uploaded by

Lecture 03 - Supervised Learning by Computing Distances - Plain

Uploaded by

Getting Started with Supervised Learning,

Learning by Computing Distances (1)

CS771: Introduction to Machine Learning

 Some tasks that we may want this module to perform

 These predictive modeling tasks can be formulated as supervised learning problems

 Today: A very simple supervised learning model for binary/multi-class classification

 A vector consists of an array of scalar values

 Unless specified otherwise

 Unless specified otherwise, vectors will be assumed to be column vectors

 If is a feature vector representing, say an image, then

 The mean (average or centroid) of vectors

= (a real-valued number denoting how “similar” and are)

𝑑 1 ( 𝒂 , 𝒃 )=¿|𝒂 − 𝒃|∨¿1 =∑ ¿ 𝑎 𝑖 −𝑏 𝑖∨¿ ¿ ¿

Our First Supervised

 Assume training data with just 2 images – one and one

 Given a new test image (cat/dog), how do we predict its label?

 A natural improvement can be by using more inputs per class

 We will consider two approaches to do this

 Class Prototype: The “mean” or “average” of inputs from that class

Averages (prototypes) of each of the handwritten digits 1-9

 Training data: labelled examples ,,

For LwP, the prototype

 Assume we are using Euclidean distances here

||𝝁− − 𝐱|| =||𝝁−|| +||𝐱|| −2 ⟨ 𝝁− , 𝐱 ⟩

Prediction Rule: Predict label as +1 if otherwise -1

 Thus LwP with Euclidean distance is equivalent to a linear model with

 Prediction rule therefore is: Predict +1 if > 0, else predict -1

Can use feature scaling or use

 Can be generalized in various ways to improve it further, e.g.,

You might also like