0% found this document useful (0 votes)

15 views47 pages

2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor

Uploaded by

Zakaria Mennioui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views47 pages

2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor

Uploaded by

Zakaria Mennioui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Machine Learning

2EL1730

Lecture 4
Non-parametric learning and nearest neighbor methods

Fragkiskos Malliaros and Maria Vakalopoulou

Friday, December 11, 2020

Some Updates

• The first individual assignment has been announced on edunao

– Due on December 23 at 23:00
– You will have to submit it on gradescope

2
Acknowledgements

• The lecture is partially based on material by

– Richard Zemel, Raquel Urtasun and Sanja Fidler (University of
Toronto)
– Chloé-Agathe Azencott (Mines ParisTech)
– Julian McAuley (UC San Diego)
– Dimitris Papailiopoulos (UW-Madison)
– Jure Leskovec, Anand Rajaraman, Jeff Ullman (Stanford Univ.)
• http://www.mmds.org
– Panagiotis Tsaparas (UOI)
– Evimaria Terzi (Boston University)
– Andrew Ng (Stanford University)
– Nina Balcan and Matt Gormley (CMU)
– Ricardo Gutierrez-Osuna (Texas A&M Univ.)
3
Last lectures

Supervised learning

4
Linear (Least-Squares) Regression

• Learning: finds the parameters that minimize some objective

function

We minimize the sum of the squares:

• (Stochastic) gradient descent

• Or, closed-form solution:
5
Logistic Regression

• How to turn a real-valued expression into a

probability

• Replace the sign() with the sigmoid or logistic function

σ(ζ)
where

z
6
Maximum Likelihood Estimate (MLE)

• Suppose that we have data

Maximum Likelihood
Estimate (MLE)

Recall the we have applied MLE for parameter estimation in the logistic regression classifier

What is happening if we have prior knowledge?

Maximum a Posteriori
Estimate (MAP)

prior

7
Bayes Classifier

likelihood prior

posterior evidence

Bayes’ decision rule:

8
Naïve Bayes Classification Model

Classification using the maximum a posteriori rule (MAP):

(pick the hypothesis that is most probable)

label/category

9
Linear Discriminant Analysis (LDA)

• Idea: project all the data points into a new space, normally of
lower dimension, which:
– Maximizes the between-class separability
– Minimizes their within-class variability

• We are looking for a projection

(w) where
• Examples from the same
class are projected very
close to each other
• And at the same time, the
projected means are as
farther apart as possible

10
Non-parametric learning

11
Classification: Oranges and Lemons

• We can construct a linear

decision boundary:

• Parametric model
• Fixed number of parameters

12
Classification as Induction

• Is there alternative way to

formulate the classification
problem?

• Classification as induction
• Comparison to instances
already seen in training
• Non-parametric learning

13
Non parametric learning
• Non-parametric learning algorithm (does not mean NO parameters)
• The complexity of the decision function grows with the number of data
points

• Contrast with linear/logistic regression (≈ as many parameters as

features)

• Usually: decision function is expressed directly in terms of the training

examples

• Examples:
• K-nearest neighbors (today's lecture)
• Tree-based methods
• Some cased of SVMs

14
Parametric VS Non Parametric
• Parametric algorithms
– Pros – Cons
– Simple – Constrained
– Fast – Limited complexity
– Less Data – Overfit

• Non Parametric algorithms:

– Pros – Cons
– Flexibility – Need Data
– Power – Slow
– Performance – Overfit

15
How Would You Color the Blank
Circles?

16
How Would You Color the Blank
Circles?

17
Partitioning the Space

The training data partitions the entire space

18
Nearest Neighbors – The Idea
• Learning:
– Store all the training examples
– The function is only approximated locally
• Prediction:
– For a point x: assign the label of the training example closest to it

19
Nearest Neighbors – The Idea
• Learning:
– Store all the training examples
– The function is only approximated locally
• Prediction:
– For a point x: assign the label of the training example closest to it

– Classification
• Majority vote: predict the class of the most frequent label among
the k neighbors

– Regression
• Predict the average of the labels of the k neighbors

20
Instance-based Learning

• Learning
– Store training instances
• Prediction
– Compute the label for a new instance based on its similarity with the
stored instances
Where the “magic”
happens!
• Also called lazy learning
• Similar to case-based reasoning
• Doctors treating a patient based on how patients with similar
symptoms were treated
• Judges ruling court cases based on legal precedent

21
Computing distances and
similarities

22
Distance Function

• Distance function on a set X

• Properties of a distance function (or metric)

non-negativity
identity of indiscernibles
symmetry
triangle inequality

23
Distance Between Instances

• Euclidean distance (L2) Manhattan distance: The sum of the

horizontal and vertical distances
between points on a grid

• Manhattan distance (L1)

• Lp-norm

24
From Distance to Similarity

• Pearson’s correlation

• Assuming that the data is centered

Geometric interpretation?
25
Pearson’s Correlation

• Pearson's correlation (centered data)

inner product

• Cosine similarity: the dot product can be used to measure

similarities between vectors

26
Categorical Features

• Represent objects as the list of presence/absence (or counts) of

features that appear in it

• Example: molecules
– Features: atoms and bonds of a certain type
– C, H, S, O, N, …
– O-H, O=C, C-N, ...

27
Binary Representation (1/2)

0 1 1 0 0 1 0 0 0 1 0 1 0 0 1

no occurrence of the 1st 1+ occurrences

feature of the 10th feature

• Hamming distance between two binary representations

– Number of bits that are different
XOR operator

– Equivalent to the L1 distance

28
Binary Representation (2/2)

0 1 1 0 0 1 0 0 0 1 0 1 0 0 1

no occurrence of the 1st 1+ occurrences

feature of the 10th feature

• Jaccard similarity (or Tanimoto similarity)

– Number of shared features (normalized)
AND operator OR operator

Jaccard index: intersection over union (Wikipedia: https://en.wikipedia.org/wiki/Jaccard_index)

29
Example

x = 010101001
y = 010011000

• Hamming distance
x = 010101001
y = 010011000
Thus, d(x,y) = 3

• Jaccard similarity
J = (# of 11) / ( # of 01 + # of 10 + # of 11)
= (2) / (1 + 2 + 2) = 2 / 5 = 0.4

30
Let’s go back to the
kNN classifier

31
Nearest Neighbor Algorithm

• Training examples in the Euclidean space

• Idea: The label of a test data point is estimated from the known
value of the nearest training example
– The distance is typically defined to be the Euclidean one

Algorithm 1
1. Find example (x*, y*) from the stored training set closest to the
test instance x. That is:

2. Output y(x) = y* (The output label)

32
k-Nearest Neighbors (kNN) Algorithm
1NN 3NN

Every example in Every example in

the blue shaded the blue shaded
area will be area will be
misclassified as classified correctly
the blue class as the red class

• Algorithm 1 is sensitive to mis-labeled data (‘class noise’)

• Consider the vote of the k nearest neighbors (majority vote)

Algorithm 2
• Find k examples (x*i, y *i), i=1,…,k closest to the test instance x
• The output is the majority class

33
Choice of Parameter k (1/2)

• Small k: noisy decision

– The idea behind using more than 1 neighbors is to average out the
noise
• Large k
– May lead to better prediction performance
– If we set k too large, we may end up looking at samples that are not
neighbors (are far away from the point of interest)
– Also, computationally intensive. Why?
– Extreme case: set k=m (number of points in the dataset)
• For classification: the majority class
• For regression: the average value

34
Choice of Parameter k (2/2)

Set k by cross validation, by examining the misclassification

error

Rule of thumb for initial

guess:
k=7

m: # of training instances

Source: https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/ 35
Advantages of kNN

• Training is very fast

– Just store the training examples
– Can use smart indexing procedures to speed-up testing
• The training data is part of the ‘model’
– Useful in case we want to do something else with it
• Quite robust to noisy data
– Averaging k votes
• Can learn complex functions (implicitly)

36
Drawbacks of kNN

• Memory requirements
– Must store all training data
• Prediction can be slow (will figure it out by yourself in the lab)
– Complexity of the query: O(knm)
– But kNN works best with lots of samples
– Can we further improve the running time?
• Efficient data structures (e.g., k-D trees)
• Approximate solutions based on hashing
• High dimensional data and the curse of dimensionality
– Computation of the distance in a high dimensional space may
become meaningless
– Need more training data
– Dimensionality reduction
Wikipedia: https://en.wikipedia.org/wiki/K-d_tree 37
k-D trees

• Definition
• A binary tree
• Any internal node implements a spatial partition by a
hyperplane H, splitting the point cloud into two equal subsets
• Right subtree: points p on the side of H
• Left subtree: remaining points
• The process halts when a node contains <no points
• Complexity building: O(nlog²(n))

38
k-D trees

• Input: Training data

where

• Output: k-d tree

Algorithm: build(points,k)
1. k = k%d
2. Project points on k-th axis
3. Create node representing the median point
4. Split into two equal subsets points1, points2
5. left_child = build(points1,k+1)
6. right_child = build(points2,k+1)
7. Return node

39
kNN – Some More Issues

• Normalize the scale of the attributes

• Simple option: linearly scale the range of each feature to be, e.g., in
the range of [0,1]

• Linearly scale each dimension to have 0 mean and variance 1

– Compute the mean μ and variance σ 2 for an attribute x j and scale: (x j -
μ)/σ2

40
Decision Boundary of kNN

• Decision boundary in classification

– Line separating the positive from negative regions
• What decision boundary is the kNN building?
– The nearest neighbors algorithm does not explicitly compute
decision boundaries, but those can be inferred

41
Voronoi Tessellation
Consider the case of 1NN
• Voronoi cell of x:
– Set of all points of the space closer to x than any other point of the
training set
– Polyhedron
• Voronoi tessellation (or diagram) of the space
– Union of all Voronoi cells
• Complexity
– D=2 O(n) space and
– O(logn) query time x

42
Voronoi Tessellation

• The Voronoi diagram defines the decision boundary of the 1NN

• The kNN algorithms also partitions the space but in a more
complex way
• d > 2, O(n^(d/2))

k=1 k=3
k=1
Wikipedia: https://en.wikipedia.org/wiki/Voronoi_diagram
43
kNN Variants

• Weighted kNN
– Weight the vote of each neighbor xi according to the distance to the
test point x

– Other kernel functions can be used to weight the distance of

neighbors

Source: https://epub.ub.uni-muenchen.de/1769/1/paper_399.pdf

44
scikit-learn

http://scikit-
learn.org/stable/modules/generated/sklearn.neighbors.KNeighbors
Classifier.html

45
Next Class

• Trees/ Ensemble Methods

46
Thank You!

DiscoverGreece.com 47

Kami Export - Gene Expression-Translation-S.1617553074
89% (9)
Kami Export - Gene Expression-Translation-S.1617553074
6 pages
88 Embedded-Questions US
No ratings yet
88 Embedded-Questions US
18 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
2223 ML Lecture04
No ratings yet
2223 ML Lecture04
46 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
ML KN
No ratings yet
ML KN
12 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Machine Learning Algorithms - pptx-1
No ratings yet
Machine Learning Algorithms - pptx-1
129 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
21 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Unit 3
No ratings yet
Unit 3
100 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
05 KNN
No ratings yet
05 KNN
49 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
No ratings yet
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
23 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
ML 5
No ratings yet
ML 5
35 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
Unit 4 - KVR
No ratings yet
Unit 4 - KVR
111 pages
Lecture 07 Slides
No ratings yet
Lecture 07 Slides
45 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
KNN
No ratings yet
KNN
53 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
06 KNN
No ratings yet
06 KNN
41 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
Non Parametric Classification: Pattern Recognition
No ratings yet
Non Parametric Classification: Pattern Recognition
74 pages
Machine Learning: Nearest Neighbors
No ratings yet
Machine Learning: Nearest Neighbors
23 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Narrative Report
No ratings yet
Narrative Report
3 pages
New Prof Ed Monkayo June 14 2019
100% (2)
New Prof Ed Monkayo June 14 2019
148 pages
Basic Life Skills Course Facilitator's Manual
100% (3)
Basic Life Skills Course Facilitator's Manual
89 pages
Action Plan For Arnis
No ratings yet
Action Plan For Arnis
2 pages
Jyoti Singh 2018
No ratings yet
Jyoti Singh 2018
3 pages
Edo Basic Schools Teachers Recruitment 2025 Shortlisted Candidates For CBT
No ratings yet
Edo Basic Schools Teachers Recruitment 2025 Shortlisted Candidates For CBT
55 pages
Developing An Effective Employee Orientation Program LB
No ratings yet
Developing An Effective Employee Orientation Program LB
7 pages
AI, Brain, and Child Navigating The Intersection of Artificial Intelligence, Neuroscience, and Child Development
No ratings yet
AI, Brain, and Child Navigating The Intersection of Artificial Intelligence, Neuroscience, and Child Development
6 pages
Iligan Medical Center College
No ratings yet
Iligan Medical Center College
14 pages
PedagogySyllabus F11
No ratings yet
PedagogySyllabus F11
3 pages
Dissertation Sukanya
No ratings yet
Dissertation Sukanya
53 pages
Gradebook Shaima
No ratings yet
Gradebook Shaima
2 pages
Nomination Form18
No ratings yet
Nomination Form18
6 pages
Microsoft Windows Server 2016 Licensing
No ratings yet
Microsoft Windows Server 2016 Licensing
2 pages
Chapt 1
No ratings yet
Chapt 1
38 pages
Human Error in Shipping
No ratings yet
Human Error in Shipping
6 pages
Ielts
No ratings yet
Ielts
4 pages
Pay Raise Letter Request
No ratings yet
Pay Raise Letter Request
2 pages
Lagos State University of Science and Technology
No ratings yet
Lagos State University of Science and Technology
2 pages
Rizal Module
No ratings yet
Rizal Module
42 pages
Communicative Competence Strategies in Various Speech Situations
No ratings yet
Communicative Competence Strategies in Various Speech Situations
4 pages
Printreciept Request
No ratings yet
Printreciept Request
2 pages
Homework s13
No ratings yet
Homework s13
14 pages
Writing A Student Recommendation Letter
No ratings yet
Writing A Student Recommendation Letter
3 pages
The Problem and Its Background: Thesis Title: Learning Virtues Through Literary Selections in English
No ratings yet
The Problem and Its Background: Thesis Title: Learning Virtues Through Literary Selections in English
12 pages
E-Book To Manifest Business Goals With 7 Chakras
No ratings yet
E-Book To Manifest Business Goals With 7 Chakras
47 pages
Data Mining Concepts Models and Techniques 1st Edition by Florin Gorunescu ISBN 3642197213 9783642197215 Download
100% (4)
Data Mining Concepts Models and Techniques 1st Edition by Florin Gorunescu ISBN 3642197213 9783642197215 Download
54 pages