0% found this document useful (0 votes)
60 views

AI Lec 5

The document discusses the k-nearest neighbor (KNN) classifier and its application. It begins with an introduction to KNN and describes how it works, including finding the k nearest training examples to a test point and assigning the test point the most common class of its k neighbors. Examples are provided to illustrate how to apply a KNN classifier. The document also discusses variations of KNN, such as choosing the value of k, distance-weighted KNN, and its pseudocode.

Uploaded by

Shafaq Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

AI Lec 5

The document discusses the k-nearest neighbor (KNN) classifier and its application. It begins with an introduction to KNN and describes how it works, including finding the k nearest training examples to a test point and assigning the test point the most common class of its k neighbors. Examples are provided to illustrate how to apply a KNN classifier. The document also discusses variations of KNN, such as choosing the value of k, distance-weighted KNN, and its pseudocode.

Uploaded by

Shafaq Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Supervised Learning

Lecture 6

K-Nearest Neighbor (KNN)


Classifier and application

Dr Mustansar Ali Ghazanfar


[email protected]

12/10/2013
3
Supervised Learning
Lecture 6

K-Nearest Neighbor (KNN)


Classifier and application

Dr Mustansar Ali Ghazanfar


[email protected]

*Note: Not part of the course


12/10/2013
4
Objects, Feature Vectors, Points
i

x1 x1(i)
x2(i)
1 2 3
X(15)
4
X(1) X(7) 6 5
X(16)
7
X(3) X(8)
9 10
X(25) X(12) 11
12
X(13) X(6) 13
14 8
X(9)
X(10)
15 16
X(4)
X(11)
X(14) Elliptical blobs (objects)
x2

12/10/2013
5
Nearest Neighbours
x1
X(j)=(x1(j), x2(j), ,xn(j))

n
D(i, j ) xk i xk j
2

k 1

X(i)=(x1(i), x2(i), ,xn(i))

x2

12/10/2013
6
Nearest Neighbour Algorithm
Given training data (X(1),D(1)), (X(2),D(2)), , (X(N),D(N))

Define a distance metric between points in inputs space. Common


measures are:

Euclidean Distance
n
D(i, j ) xk i xk j
2

k 1

12/10/2013
7
K-Nearest Neighbour Model
Given test point X

Find the K nearest training inputs to X

Denote these points as

(X(1),D(1)), (X(2), D(2)), , (X(k), D(k))


x

12/10/2013
8
K-Nearest Neighbour Model
Classification

The class identification of X

Y = most common class in set {D(1), D(2), , D(k)}

x
x

12/10/2013
9
K-Nearest Neighbour Model
Example : Classify whether a customer will respond to a survey question
using a 3-Nearest Neighbor classifier

Customer Age Income No. credit Response


cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No

Tom 59 170K 1 No

Nellie 25 40K 4 Yes

David 37 50K 2 ?

12/10/2013
10
K-Nearest Neighbour Model
Example : 3-Nearest Neighbors
Customer Age Income No. credit Response
cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No 15.16

Tom 59 170K 1 No 15

Nellie 25 40K 4 Yes 152.23


122
15.74
David 37 50K 2 ?

12/10/2013
11
K-Nearest Neighbour Model
Example : 3-Nearest Neighbors
Customer Age Income No. credit Response
cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No 15.16

Tom 59 170K 1 No 15

Nellie 25 40K 4 Yes 152.23


122
15.74
David 37 50K 2 ?

Three nearest ones to David are: No, Yes, Yes

12/10/2013
12
K-Nearest Neighbour Model
Example : 3-Nearest Neighbors
Customer Age Income No. credit Response
cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No 15.16

Tom 59 170K 1 No 15

Nellie 25 40K 4 Yes 152.23


122
15.74
David 37 50K 2 Yes?

Three nearest ones to David are: No, Yes, Yes

12/10/2013
13
K-Nearest Neighbour Model
Example: For the example we saw earlier, pick the best K from the set {1, 2,
3} to build a K-NN classifier

Customer Age Income No. credit Response


cards
John 35 35K 3 No

Rachel 22 50K 2 Yes

Hannah 63 200K 1 No

Tom 59 170K 1 No

Nellie 25 40K 4 Yes

David 37 50K 2 ?

12/10/2013
14
k-Nearest Neighbor Pseudo code
Training Algorithm:
Store all training examples <x, f(x)>
Find best value for K
Classification (Testing) Algorithm:
Given a query instance xq to be classified,
Let x1, xk denote the k instances from the list of
training examples
k
Return f (xq ) argmax d (xq, f (xi ))
i=1

where (a,b)=1 if a=b and where (a,b)=0 otherwise

12/10/2013
15
Nearest Neighbour Rule

Consider a two class problem where


each sample consists of two
measurements (x,y).

For a given query point q, assign k=1


the class of the nearest
neighbour.

Compute the k nearest k=3


neighbours and assign the class
by majority vote.

12/10/2013 16
k-Nearest Neighbor Examples
(discrete-valued target function)

k=1
k=5

12/10/2013
17
KNN Flavors

12/10/2013
18
Distance-Weighted Nearest Neighbor
Algorithm
Distance-weighted function (For Discrete-valued function)
k

f argmax w d (x , f (x ))
i q i
i=1
where 1
wi
d(xq , xi )2

weights are proportional to distance;


d(xq, xi) is Euclidean distance.

12/10/2013
19
How many neighbors, K?

K
Fixed constant
Determines number of elements to be included in
each neighborhood.
Neighborhood determines classification
Different k values can and will produce different
classifications

12/10/2013
20
K-Nearest Neighbour Model
Picking K

Use N fold cross validation Pick K to minimize the cross validation error

For each of N training example

Find its K nearest neighbours


Make a classification based on these K neighbours
Calculate classification error
Output average error over all examples

Use the K that gives lowest average error over the N training examples

12/10/2013
21
Nearest Neighbour Complexity
Expensive for high dimensional data (d>20?)

O(Nd) complexity for both storage and query


time
N is the number of training examples,
d is the dimension of each sample

12/10/2013
22
Advantages/Disadvantages
Advantages:
Training is very fast
Learning complex target functions
Dont lose information
Disadvantages:
Slow at query
Easily fooled by irrelevant attributes

12/10/2013
23
Nearest Neighbour Issues
Expensive
To determine the nearest neighbour of a query point q,
must compute the distance to all N training examples
Pre-sort training examples into fast data structures (kd-trees)
Remove redundant data (condensing)
Storage Requirements
Must store all training data D_tr
Remove redundant data (condensing)
Pre-sorting often increases the storage requirements
High Dimensional Data
Curse of Dimensionality
Required amount of training data increases exponentially with
dimension
Computational cost also increases dramatically

12/10/2013
24
Condensing
Aim is to reduce the number of training samples
For example, Retain only the samples that are needed to
define the decision boundary

*Note: Not part of this lecture


12/10/2013 25
12/10/2013 26
Applications of KNN

Predicting Unknown movie for a


movie fan

12/10/2013 27
12/10/2013 28
KNN in Collaborative Filtering (CF)
Item1 Item2 Item3 Item4 Item5 Item6

User1 4 2 1 5

User2 5 5 5 1

User3 4 4 4 1

User4 3 3 5

12/10/2013 29
KNN in CF
Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
x
User2 5 5 5 1

User3 4 4 4 1

User4 3 3 5

12/10/2013 30
KNN in CF
Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
x
User2 5 5 5 1

User3 4 4 4 1

User4 3 3 5

12/10/2013 31
KNN in CF
Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
x
User2 5 5 5 1

User3 4 4 4 1

User4 3 3 5

Sim (u1,u2) = {-1,+1} 0.5


Sim (u1,u3) = {-1,+1} -0.1
Sim (u1,u4) = {-1,+1} 0.3

Sum(|sim|) = 0.5 + 0.1 + 0.3 = 0.9


12/10/2013 32
KNN in CF
Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
x
User2 5 5 5 1

User3 4 4 4 1

User4 3 3 5

Prediction = {(Sim (u1,u2) * rating_User2)


+ (Sim (u1,u3) * rating_User3)
+ (Sim (u1,u4) * rating_User4) } /
Sum(sim)

Prediction = {(0.5 * 5) + (-0.1 * 4) + (0.3* 3)} /(0.9)


= 3.05

12/10/2013 33
How does KNN work here?
Find similar users.
Similarity measures.
Vector

Pearson correlation

Select set of K most similar users


User their votes for prediction
12/10/2013
34
Pearson Correlation

12/10/2013
35
12/10/2013
36
Questions?

12/10/2013
37

You might also like