KNN 2
KNN 2
Linear Regression
Logistin Regression
BUSINESS CASE - BLINK IT
Moderate
Traffic
Low Traffic
BLINK IT DATA
Imbalance
data Need of a new algorithm with no
features
Xq1 belon s to (+) class
2
Xq2 belon s to (o) class
FÉIN
Just by lookin at nei hbor points, we
are sure about Xq1, Xq2
EEy.at
Class o datapoint (xq) depends
on class o nei hbourin points
in Evilidia distance
How does kNN work?
minimum distance
rom xq
Step 4 : Find majority class o these selected data points —>> class label or xq
Majority class 2
Xq belon s to class
2
nei hbourhood.
Paramete
Wo Wi Wz
52 53 51 54
dirt ftp.u.faiorit Aiiia9
to q
What happens i k=4?
problem
Brain iblit
kNN cannot make
predictions
Hack!
nei hbourhood.
WORKING OF kNN:
● Sort distance
(+)
Assume data contains: (-) & k=5
(o)
X KM 2 4 8
xq
a 16 Class (+) = xq
● kNN predicts class o test data [xq] on the basis o nei hbourhood.
WORKING OF kNN:
● Sort distance
wt I 1 1000 1 10
0 100
4
wt it
Bias- variance tradeoff in kNN.. yes…or....no?
xq1
(+) = 31
(-)= 41
Hovertit
A
Trainin time complexity Space complexity
No computation Nxd
done by kNN
Space
Stores the
complexity
data only
(O) N x d
Mobile Ph
Test time complexity 508ft
Memory
● kNN predicts class o test data [xq] on the basis o nei hbourhood.
WORKING OF kNN:
● Sort distance
Features:
40 Diabetic (+) class
Suppose we take Gender (M,F)
BP
kNN does not work on cate orical
data as euclidean distance needs Glucose
Level
numeric data!
Blood roup
Euclidean distance
Euclidean distance
Due to low dimension, cannot be used
Euclidean distance
between x1 & x2 is very Due to hi h dimension,
lar e Euclidean distance
between x1 & x2 is very
small
● kNN predicts class o test data [xq] on the basis o nei hbourhood.
WORKING OF kNN:
● Sort distance
nnn 7
FI Iiit
Manhattan distance
Coe a lyny
What other distance to use?
all D
Minkowski Distance
ee it
has
Manhattan distance & One Hot Encodin
81 15AM
Break
OHE creates a hi h dimensional
sparse data
if
Ran es rom (-1) to 1
hgihd.in data
Least similar Most similar
Distance metric used or kNN
di
cosine
Y
LE
POINTS TO REMEMBER
● kNN predicts class o test data [xq] on the basis o nei hbourhood.
WORKING OF kNN:
● Sort distance
I query = Delhi
Suppose we take a
random vector:
And define:
Hash table
We run kNN only or data points havin h(x) = [0,1,0], instead o whole data
● kNN predicts class o test data [xq] on the basis o nei hbourhood.
WORKING OF kNN:
● Sort distance
I
kNN or Imputation
● kNN predicts class o test data [xq on the basis o nei hbourhood.
WORKING OF kNN:
● Sort distance
ds Green
d
dff da Blue
i di Blue
100
rs
i
i
HE
If IF