0% found this document useful (0 votes)

14 views53 pages

KNN 2

This document discusses the k-nearest neighbors (kNN) machine learning algorithm. It explains how kNN works by finding the k closest training examples to a test example in the feature space and predicting the label based on a majority vote of its neighbors. The document also discusses some key points about kNN including its time and space complexity, how it assumes homogeneous neighborhoods, and how it is impacted by outliers and the k hyperparameter value.

Uploaded by

TM CRO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views53 pages

KNN 2

Uploaded by

TM CRO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

k-Nearest Neighbour

Linear Regression
Logistin Regression
BUSINESS CASE - BLINK IT

Blink it needs optimal number o

delivery partners or each store.
Hi h Traffic
Hence, classiﬁed stores into 3 based
on out oin deliveries.

Moderate
Traffic

Low Traffic
BLINK IT DATA

Will logistic regression work???

Multiclass Non-
Problem Linear
Logistic Regression requires extensive
search for correctly polynomial feature

Imbalance
data Need of a new algorithm with no
features
Xq1 belon s to (+) class

2
Xq2 belon s to (o) class

FÉIN
Just by lookin at nei hbor points, we
are sure about Xq1, Xq2

K-nearest nei hbour (kNN) model works on same intuition

EEy.at
Class o datapoint (xq) depends
on class o nei hbourin points

in Evilidia distance
How does kNN work?

I xq = [2,5] & data contains 6 data points:

Step 1 : Find euclidean distance: 13,6 2,5

Step 2 : Sort data based on distance:
Step 3 : Pick 3 data points havin minimum distance:

minimum distance
rom xq
Step 4 : Find majority class o these selected data points —>> class label or xq

Majority class 2

Xq belon s to class
2

This selection o data points is decided by

Hyperparameter “k”, hence the name kNN
POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq] on the basis o

nei hbourhood.

Paramete
Wo Wi Wz

52 53 51 54
dirt ftp.u.faiorit Aiiia9
to q
What happens i k=4?

Makin predictions based on 4

nearest nei hbours

2 data points >> class 1

Tie
2 data points >> class 2

problem
Brain iblit
kNN cannot make
predictions

Hence, it is advisable to keep k as odd value

What happens i k=5?

Makin predictions based on 5

nearest nei hbours Break
Still a tie even i we keep k 8 12 a
value as odd

Hack!

Randomly pick class

labels o tied classes

Here, kNN can pick class 1 or class 2 or xq

POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq] on the basis o

nei hbourhood.

WORKING OF kNN:

● Find distance (xq and all trainin data)

● Sort distance

● Pick k nearest nei hbors

● Majority rate o class prediction

How does kNN has ood per ormance on non linear multi class data?

(+)
Assume data contains: (-) & k=5
(o)

X KM 2 4 8
xq
a 16 Class (+) = xq

kNN ails when data kNN assumes nei hbourhood as

has a lot o homo enous i.e, characteristics
noise/outliers o nearest nei hbour and xq will
be same
POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq] on the basis o nei hbourhood.

WORKING OF kNN:

● Find distance (xq and all trainin data)

● Sort distance

● Pick k nearest nei hbors

● Majority rate o class prediction

● kNN assumes homo eneous nei hbourhood

● It is heavily impacted i outliers increases.

wt I 1 1000 1 10
0 100
4
wt it
Bias- variance tradeoff in kNN.. yes…or....no?

Suppose data contains 2

outliers:
xq2

xq1

As xq1 closes to (-) outlier Class prediction (-)

For k=1
As xq1 closes to (+) outlier
Class prediction (+)

kNN tryin to ﬁt every data

point

Rou h Decision Boundary

Takin the same data, what will be class label or xq i k=72?

(+) = 31
(-)= 41

Xq-> (-) class as (-) >(+)

Even when xq is closer to (+), kNN does not ﬁt

trainin data

As k increases, kNN underﬁts

Summary!

Hovertit

A
Trainin time complexity Space complexity

kNN stores entire trainin data.

O(1)

No computation Nxd
done by kNN

Space
Stores the
complexity
data only
(O) N x d
Mobile Ph
Test time complexity 508ft
Memory

Step 1 : Find distance b/w trainin data and xq = O (n x d)

Step 2 : Sort data = O (nlo n)

Last Hinkiort
Step 3 : Pick nearest nei hbour O(k)

Step 4 : Majority vote O(k)

As k << n & d, hence O(k) i nored

Time complexity = O (nd + nlo n)
POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq] on the basis o nei hbourhood.

WORKING OF kNN:

● Find distance (xq and all trainin data)

● Sort distance

● Pick k nearest nei hbors

● Majority rate o class prediction

● kNN assumes homo eneous nei hbourhood

● It is heavily impacted i outliers increases.

● I k increases, kNN underﬁts. (bias increase, variance decrease)

● I k decreases, kNN overﬁts. (bias decrease, variance increase)

POINTS TO REMEMBER

● Train time complexity —>> O(1)

● Test time complexity —>>> O(nd + nlo n)

● Space complexity —->> O (nd)

Diabetic Patient Example

Features:
40 Diabetic (+) class
Suppose we take Gender (M,F)

40 non diabetic (-) class A e

BP
kNN does not work on cate orical
data as euclidean distance needs Glucose
Level
numeric data!
Blood roup

A+, B+. O+, AB+, AB-,

Convert cate orical O-, B-, A-

data into numerical
data by ONE HOT
ENCODING (OHE)
OHE to convert cate orical data into numeric data

OHE o ender becomes : (n,2)

Similarly, OHE o Blood Group
becomes : (n,8)
Total dimensions when One Hot Encodin

As OHE increases dimensions, it

leads to curse o dimensionality

Hence, Tar et encodin shall be

used
What is Curse o Dimensionality?

Suppose x1 and x2 have dimension = 4, then

Euclidean distance
Euclidean distance
Due to low dimension, cannot be used
Euclidean distance
between x1 & x2 is very Due to hi h dimension,
lar e Euclidean distance
between x1 & x2 is very
small

Conclusion : Euclidean distance fails when dimension is high

POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq] on the basis o nei hbourhood.

WORKING OF kNN:

● Find distance (xq and all trainin data)

● Sort distance

● Pick k nearest nei hbors

● Majority rate o class prediction

● kNN assumes homo eneous nei hbourhood

● It is heavily impacted i outliers increases.

● I k increases, kNN underﬁts. (bias increase, variance decrease)

● I k decreases, kNN overﬁts. (bias decrease, variance increase)

POINTS TO REMEMBER

● Train time complexity —>> O(1)

● Test time complexity —>>> O(nd + nlo n)

● Space complexity —->> O (nd)

● Euclidean distance ails when there is hi h

dimension data
What other distance to use?

Can be understood as distance measure as we walk on a path

Manhattan Distance
rom x1 to x2

nnn 7

FI Iiit

Manhattan distance
Coe a lyny
What other distance to use?

all D
Minkowski Distance
ee it

has
Manhattan distance & One Hot Encodin

81 15AM
Break
OHE creates a hi h dimensional
sparse data

Manhattan Distance ives equal

importance to all the eatures ( even
or irrelevant eatures)
Cosine Similarity or One Hot Encodin

I Since Cosine similarity ocuses

Mathi for Mc on direction o vectors, it easily
i nores irrelevant eatures

if
Ran es rom (-1) to 1

hgihd.in data
Least similar Most similar
Distance metric used or kNN

● Euclidean Distance – or low dimensional data

● Cosine similarity — or hi h dimensional data
● Manhattan – use ul when data is like a map
● Minkowski – or usin custom distance metric

di
cosine

Y
LE
POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq] on the basis o nei hbourhood.

WORKING OF kNN:

● Find distance (xq and all trainin data)

● Sort distance

● Pick k nearest nei hbors

● Majority rate o class prediction

● kNN assumes homo eneous nei hbourhood

● It is heavily impacted i outliers increases.

● I k increases, kNN underﬁts. (bias increase, variance decrease)

● I k decreases, kNN overﬁts. (bias decrease, variance increase)

POINTS TO REMEMBER

● Train time complexity —>> O(1)

● Test time complexity —>>> O(nd + nlo n)

● Space complexity —->> O (nd)

● Euclidean distance ails when there is hi h

dimension data

● Cosine similarity — or hi h dimensional data

[-1,1]
● Manhattan – use ul when data is like a map
● Minkowski – or usin custom distance metric
kNN work so ast in Goo le searches?
Goo le ima es use kNN to
provide amous
monuments just by
searchin city. By hashin al orithm —LSH (Locality Sensitive Hashin )
What is hashin ?

Storin o data in key value pair (Analo ous to Directory)

I query = Delhi

Returns India ate, Red ort,

Qutub Minar

Quickly returns data

Time complexity O(1)

How does LSH work?

For hash table create randomised hash unction (h(x))

Gives key or hash

table
How does LSH work?

Suppose we take a
random vector:

And deﬁne:
Hash table

Clubs them into one

key
LSH’s role in astenin kNN

LSH roups similar data points

Suppose or some xq, h(xq) = [0,1,0]

We run kNN only or data points havin h(x) = [0,1,0], instead o whole data

This reduces testin time complexity as kNN is usin a subset o data

POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq] on the basis o nei hbourhood.

WORKING OF kNN:

● Find distance (xq and all trainin data)

● Sort distance

● Pick k nearest nei hbors

● Majority rate o class prediction

● kNN assumes homo eneous nei hbourhood

● It is heavily impacted i outliers increases.

● I k increases, kNN underﬁts. (bias increase, variance decrease)

● I k decreases, kNN overﬁts. (bias decrease, variance increase)

POINTS TO REMEMBER

● Train time complexity —>> O(1)

● Test time complexity —>>> O(nd + nlo n)

● Space complexity —->> O (nd)

● Euclidean distance ails when there is hi h

dimension data

● Cosine similarity — or hi h dimensional data

[-1,1]
● Manhattan – use ul when data is like a map
● Minkowski – or usin custom distance metric
POINTS TO REMEMBER

● LSH reduces testin time complexity by selectin

a subset o data determined by h(x).
What are the techniques o imputin ?

● Mean or median o Fj eature

● Analyzin data and manually impute value
● Mean and median o whole data
kNN or Imputation

Step 1 : Exclude j rom data

I
kNN or Imputation

Find distance between xi and rest data — k nearest

Step 2: nei hbour
kNN or Imputation

Step 3: For these nearest nei hbour, check value or j value

POINTS TO REMEMBER

● kNN is a non parametric al orithm.

● kNN predicts class o test data [xq on the basis o nei hbourhood.

WORKING OF kNN:

● Find distance (xq and all trainin data)

● Sort distance

● Pick k nearest nei hbors

● Majority rate o class prediction

● kNN assumes homo eneous nei hbourhood

● It is heavily impacted i outliers increases.

● I k increases, kNN underﬁts. (bias increase, variance decrease)

● I k decreases, kNN overﬁts. (bias decrease, variance increase)

POINTS TO REMEMBER

● Train time complexity —>> O(1)

● Test time complexity —>>> O(nd + nlo n)

● Space complexity —->> O (nd)

● Euclidean distance ails when there is hi h

dimension data

● Cosine similarity — or hi h dimensional data

[-1,1]
● Manhattan – use ul when data is like a map
● Minkowski – or usin custom distance metric
POINTS TO REMEMBER

● LSH reduces testin time complexity by selectin

a subset o data determined by h(x).

● kNN can be used or imputation.

d Blue
dz Green

ds Green
d
dff da Blue
i di Blue

100
rs
i
i
HE

If IF

Operation Manual: Jesma Filter
100% (4)
Operation Manual: Jesma Filter
50 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
Supervised Learning KNN
No ratings yet
Supervised Learning KNN
23 pages
K-Nearest Neighbor (KNN) ..: Class or Value
No ratings yet
K-Nearest Neighbor (KNN) ..: Class or Value
18 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
Lecture 3 - KNN Algorithm
No ratings yet
Lecture 3 - KNN Algorithm
28 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
KNN
No ratings yet
KNN
53 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Unit 4.8 KNN
No ratings yet
Unit 4.8 KNN
10 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
AIML
No ratings yet
AIML
13 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K-Nearest Neighbors (KNN)
No ratings yet
K-Nearest Neighbors (KNN)
16 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Week 07
No ratings yet
Week 07
24 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Condition Assessment of Structures
100% (1)
Condition Assessment of Structures
55 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
K Nearest Neighbors KNN A Fundamental Machine Learning Algorithm
No ratings yet
K Nearest Neighbors KNN A Fundamental Machine Learning Algorithm
11 pages
05 KNN
No ratings yet
05 KNN
49 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
K-Nearest Neighbor Algorithm: by Vipul Pathak (00216404824) Siddharth Tyagi (02016404824)
No ratings yet
K-Nearest Neighbor Algorithm: by Vipul Pathak (00216404824) Siddharth Tyagi (02016404824)
19 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Instance Based Learning
No ratings yet
Instance Based Learning
7 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
EFFECTS OF ADDITION OF POLES AND ZEROS IN ROOT LOCUS SP
No ratings yet
EFFECTS OF ADDITION OF POLES AND ZEROS IN ROOT LOCUS SP
6 pages
Unit II 2 Mark Answers ML
No ratings yet
Unit II 2 Mark Answers ML
3 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
Yanmar 4lha STP
No ratings yet
Yanmar 4lha STP
2 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
No ratings yet
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
23 pages
Network How To
100% (1)
Network How To
139 pages
J.E. Maintenance Manual 2011 07
No ratings yet
J.E. Maintenance Manual 2011 07
8 pages
Subtraction Strategies That Lead To Regrouping
100% (1)
Subtraction Strategies That Lead To Regrouping
6 pages
Igcse Weathering
100% (1)
Igcse Weathering
16 pages
13 - Tutorial of Chapter 3 PDF
0% (1)
13 - Tutorial of Chapter 3 PDF
16 pages
Crotch-Grained Chess Table: Walnut, Poplar
100% (2)
Crotch-Grained Chess Table: Walnut, Poplar
5 pages
SAEP-348 - Chemical Cleaning, Disinfection, Post Treatment and Storage of Reverse Osmosis Membranes
No ratings yet
SAEP-348 - Chemical Cleaning, Disinfection, Post Treatment and Storage of Reverse Osmosis Membranes
34 pages
Wear3 PDF
No ratings yet
Wear3 PDF
8 pages
Learning Piano by Yourself
No ratings yet
Learning Piano by Yourself
2 pages
IEC Certification Kit Release Notes
No ratings yet
IEC Certification Kit Release Notes
27 pages
Chapters 3 To 7 Study Guide
No ratings yet
Chapters 3 To 7 Study Guide
38 pages
Full Download Fundamentals of Renewable Energy Processes 4th Edition Aldo Vieira Da Rosa PDF
100% (3)
Full Download Fundamentals of Renewable Energy Processes 4th Edition Aldo Vieira Da Rosa PDF
52 pages
Lab. Activity 6 Boolean Algebra and Simplification of Logic Equations
No ratings yet
Lab. Activity 6 Boolean Algebra and Simplification of Logic Equations
5 pages
Lab Report-03 (ME-339 Control Engineering Lab)
No ratings yet
Lab Report-03 (ME-339 Control Engineering Lab)
6 pages
RF Circuits With Multisim 10 - Exp - 1 - 8
No ratings yet
RF Circuits With Multisim 10 - Exp - 1 - 8
52 pages
Lesson 1 Measures of Position
No ratings yet
Lesson 1 Measures of Position
23 pages
Enumerations in WinCC
No ratings yet
Enumerations in WinCC
16 pages
Atg - Format
No ratings yet
Atg - Format
8 pages
AEN 1 - Laboratory Exercise No. 1
No ratings yet
AEN 1 - Laboratory Exercise No. 1
3 pages
Positouch DBF Files 2
No ratings yet
Positouch DBF Files 2
68 pages
1 s2.0 S2405959520304756 Main
No ratings yet
1 s2.0 S2405959520304756 Main
7 pages
Lector METROLOGIC MS7820 DS EN
No ratings yet
Lector METROLOGIC MS7820 DS EN
3 pages
Advanced Materials For Space Applications
No ratings yet
Advanced Materials For Space Applications
9 pages
Math 102 Midterms Reviewer (With Mock Tests)
No ratings yet
Math 102 Midterms Reviewer (With Mock Tests)
3 pages
Wireless Network Lab 2
No ratings yet
Wireless Network Lab 2
3 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet