0% found this document useful (0 votes)

53 views35 pages

Part A 3. KNN Classification

The document discusses k-nearest neighbors (kNN) classification. kNN is a lazy learning algorithm that does not build a model from the training data. Instead, it compares a test instance to all training examples to find the k closest matches. It then assigns the test instance the most common class among those k neighbors. The document explains that kNN performance depends on selecting an appropriate value for k and a suitable distance metric. It notes some advantages of kNN, such as simplicity, but also disadvantages like slow classification speed when training data is large.

Uploaded by

Akshay kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views35 pages

Part A 3. KNN Classification

Uploaded by

Akshay kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

kNN (k-Nearest Neighbor)

k-Nearest Neighbors
Given a query item:
Find k closest matches
in a labeled dataset ↓
k-Nearest Neighbors
Given a query item: Return the most
Find k closest matches Frequent label
k-Nearest Neighbors
k = 3 votes for “cat”
k-Nearest Neighbors
2 votes for cat,
1 each for Buffalo, Cat wins…
Deer, Lion
K-Nearest Neighbor Learning
Basics

The learning methods based on decision trees, sets of rules, posterior probabilities,
and hyper-planes etc. are called eager learning methods as they learn some kinds
of models from the training data.

k-nearest neighbor (kNN) is a lazy learning method in the sense that no model is
learned from the training data.

Learning only occurs when a test example needs to be classified.

K-Nearest Neighbor Learning
Working

• Let D be the training data set.

• Nothing is done on the training examples.
• When a test instance d is presented, the algorithm compares d with
every training example in D to compute the similarity or distance between them.
• The k most similar (closest) examples in D are then selected.
• This set of examples is called the k nearest neighbors of d.
• d then takes the most frequent class among the k nearest neighbors.
K-Nearest Neighbor Learning
k = 1 is usually not sufficient for determining the class of d due to noise and outliers
in the data.
A set of nearest neighbors is needed to accurately decide the class.

Algorithm kNN(D, d, k)
1. Compute the distance between d and every example in D
2. Choose the k examples in D that are nearest to d, denote the set by P
3 Assign d the class that is the most frequent class in P (or the majority class)
K-Nearest Neighbor Learning
The key component of a kNN algorithm is the distance/similarity function, which is
chosen based on applications and the nature of the data.

The k value that gives the best accuracy on the validation set is usually selected.

For relational data, the Euclidean distance is commonly used. For text documents,
cosine similarity is a popular choice.
Distance
Minkowski Distance
If X = Rd , the Minkowski distance of order p > 0
is defined as:
K-NN metrics
Euclidean Distance: Simplest, fast to compute

Cosine Distance: Good for documents, images, etc.

Jaccard Distance: For set data:

Hamming Distance: For string data:

K-NN metrics
Manhattan Distance: Coordinate-wise distance

Edit Distance: for strings, especially genetic data.

Mahalanobis Distance: Normalized by the sample covariance matrix –

unaffected by coordinate transformations.
K-Nearest Neighbor Learning
K-Nearest Neighbor Learning
Key Points
• Despite its simplicity, the classification accuracy of kNN can be quite strong
• kNN performs equally well as SVM for some text classification tasks
• kNN is also very flexible. It can work with any arbitrarily shaped decision
boundaries
• kNN is, however, slow at the classification time, as there is no model building,
each test instance is compared with every training example at the classification
time, which can be quite time consuming especially when the training set D and
the test set are large
• kNN is unable to handle many features
• Another disadvantage is that kNN does not produce an understandable model
• It is thus not applicable if an understandable model is required in the application
k-NN issues
The Data is the Model
• No training needed.
• Accuracy generally improves with more data.
• Matching is simple and fast (and single pass).
• Usually need data in memory, but can be run off disk.
Minimal Configuration:
• Only parameter is k (number of neighbors)
• Two other choices are important:
• Weighting of neighbors (e.g. inverse distance)
• Similarity metric
Questions upon k-NN
https://www.analyticsvidhya.com/blog/2017/09/30-questions-test-k-nearest-
neighbors-algorithm/
Classification Using K-Nearest Neighbor
Supervised Unsupervised

Labeled Data Unlabeled Data

X1 X2 Class X1 X2
10 100 Square 10 100
2 4 Root 2 4
Nearest Neighbor Search

Given: a set P of n points in Rd

Goal: a data structure, which given a query point q, finds the nearest neighbor p of
q in P

p
q
K-NN

(K-l)-NN: Reduce complexity by having a threshold on the majority. We could

restrict the associations through (K-l)-NN.
K-NN

(K-l)-NN: Reduce complexity by having a threshold on the majority. We could

restrict the associations through (K-l)-NN.

K=5
K-NN

Select 5 Nearest Neighbors

as Value of K=5 by Taking their
Euclidean Distances
K-NN

Decide if majority of Instances over a given

value of K Here, K=5.
Example

Points X1 (Acid Durability ) X2(strength) Y=Classification

P1 7 7 BAD

P2 7 4 BAD

P3 3 4 GOOD

P4 1 4 GOOD
KNN Example

Points X1(Acid Durability) X2(Strength) Y(Classification)

P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 ?
Scatter Plot
Euclidean Distance From Each Point

KNN
P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)

Euclidean
Distance of
P5(3,7) from Sqrt((7-3) 2 + (7-7)2 ) = Sqrt((7-3) 2 + (4-7)2 ) = Sqrt((3-3) 2 + (4-7)2 ) = Sqrt((1-3) 2 + (4-7)2 ) =
3 Nearest NeighBour

P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)

Euclidean
Distance of
P5(3,7) from Sqrt((7-3) 2 + (7-7)2 ) = Sqrt((7-3) 2 + (4-7)2 ) = Sqrt((3-3) 2 + (4-7)2 ) = Sqrt((1-3) 2 + (4-7)2 ) =

Class BAD BAD GOOD GOOD

KNN Classification

Points X1(Durability) X2(Strength) Y(Classification)

P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 GOOD
Different Values of K
KNN (K=5)
# read in the iris data
from sklearn.datasets import load_iris
iris = load_iris() # create X (features) and y (response)
X = iris.data
y = iris.target
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X, y)
y_pred = knn.predict(X)
print(metrics.accuracy_score(y, y_pred))
KNN (K=1)
# read in the iris data
from sklearn.datasets import load_iris
iris = load_iris() # create X (features) and y (response)
X = iris.data
y = iris.target
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
y_pred = knn.predict(X)
print(metrics.accuracy_score(y, y_pred))
KNN (K=5) Train-test split
# print the shapes of X and y
# X is our features matrix with 150 x 4 dimension
print(X.shape)
# y is our response vector with 150 x 1 dimension
print(y.shape)
# STEP 1: split X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=4)
# print the shapes of the new X objects
print(X_train.shape)
print(X_test.shape)
# print the shapes of the new y objects
print(y_train.shape)
print(y_test.shape)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred))
kNN classifier # We use a loop through the range 1 to 26
# We append the scores in the dictionary
for k in k_range:

from sklearn.datasets import load_iris
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X,y)
iris = load_iris() y_pred = knn.predict(X)
scores.append(metrics.accuracy_score(y, y_pred))
# create X (features) and y (response)
X = iris.data print(scores)
y = iris.target
print(X.shape) Output:

from sklearn import metrics (150, 4)
from sklearn.neighbors import KNeighborsClassifier [1.0, 0.98, 0.96, 0.96, 0.9666666666666667,
0.9733333333333334, 0.9733333333333334, 0.98, 0.98,
k_range = range(1, 26) 0.98, 0.9733333333333334, 0.98, 0.98, 0.98,
0.9866666666666667, 0.9866666666666667, 0.98,
# We can create Python dictionary using [] or dict() 0.9733333333333334, 0.98, 0.98, 0.98, 0.98, 0.98,
scores = [] 0.9733333333333334, 0.98]

Changing values of K…
# print the shapes of X and y
# X is our features matrix with 150 x 4 dimension
print(X.shape)
# y is our response vector with 150 x 1 dimension
print(y.shape)
# STEP 1: split X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=4)
# try K=1 through K=25 and record testing accuracy
k_range = range(1, 26) [0.94999999999999996, 0.94999999999999996,
# We can create Python dictionary using [] or dict() 0.96666666666666667, 0.96666666666666667,
scores = [] 0.96666666666666667, 0.98333333333333328,
# We use a loop through the range 1 to 26 0.98333333333333328, 0.98333333333333328,
0.98333333333333328, 0.98333333333333328,
# We append the scores in the dictionary 0.98333333333333328, 0.98333333333333328,
for k in k_range: 0.98333333333333328, 0.98333333333333328,
knn = KNeighborsClassifier(n_neighbors=k) 0.98333333333333328, 0.98333333333333328,
knn.fit(X_train, y_train) 0.98333333333333328, 0.96666666666666667,
y_pred = knn.predict(X_test) 0.98333333333333328, 0.96666666666666667,
0.96666666666666667, 0.96666666666666667,
scores.append(metrics.accuracy_score(y_test, y_pred)) 0.96666666666666667, 0.94999999999999996,
print(scores) 0.94999999999999996]
import matplotlib.pyplot as plt
# allow plots to appear within the notebook
%matplotlib inline
# plot the relationship between K and testing accuracy # plt.plot(x_axis, y_axis)
plt.plot(k_range, scores)
plt.xlabel('Value of K for KNN')
plt.ylabel('Testing Accuracy')

K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
K - Nearest Neighbors
No ratings yet
K - Nearest Neighbors
33 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Notes 02
No ratings yet
Notes 02
79 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
Machine Learning KNN - Supervised
No ratings yet
Machine Learning KNN - Supervised
9 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
K Nearest Neighbour
100% (1)
K Nearest Neighbour
35 pages
4.kNN Concepts
No ratings yet
4.kNN Concepts
12 pages
Research Paper
No ratings yet
Research Paper
6 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Week 07
No ratings yet
Week 07
24 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
ML KN
No ratings yet
ML KN
12 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Bài nhóm tìm hiểu về KNN
No ratings yet
Bài nhóm tìm hiểu về KNN
5 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
S3 K Nearest Neighbor LKW 15jan2025
No ratings yet
S3 K Nearest Neighbor LKW 15jan2025
16 pages
ML Notes
100% (2)
ML Notes
125 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
KNN Algorithm
No ratings yet
KNN Algorithm
2 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
K - Nearest Neighbours
No ratings yet
K - Nearest Neighbours
6 pages
ML 2
No ratings yet
ML 2
6 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
Pci Dss Compliance Checklist
No ratings yet
Pci Dss Compliance Checklist
9 pages
Portfolio Management in Kotak Securites
0% (1)
Portfolio Management in Kotak Securites
92 pages
Tribal Pesonaliteis PDF
No ratings yet
Tribal Pesonaliteis PDF
5 pages
Gallup Test
No ratings yet
Gallup Test
25 pages
Learning Expectations: QUARTER 1 Week 1
No ratings yet
Learning Expectations: QUARTER 1 Week 1
10 pages
Eyal Lederman - Process Approach in PT
100% (1)
Eyal Lederman - Process Approach in PT
72 pages
Explore 5
No ratings yet
Explore 5
233 pages
Bodybuilding, Drugs and Risk
No ratings yet
Bodybuilding, Drugs and Risk
230 pages
Catch Up Friday Research
No ratings yet
Catch Up Friday Research
1 page
Marking Guideline: Building and Structural Construction N5
No ratings yet
Marking Guideline: Building and Structural Construction N5
8 pages
Repulsion Motor
100% (1)
Repulsion Motor
12 pages
Role of Women in Mozart and Puccinis Operas
No ratings yet
Role of Women in Mozart and Puccinis Operas
12 pages
Age of Empires Rise of Rome
No ratings yet
Age of Empires Rise of Rome
35 pages
How You Can Talk With God
No ratings yet
How You Can Talk With God
5 pages
Chapter 2
No ratings yet
Chapter 2
9 pages
Biotechnology and It's Application by Hare Krishna Deepak
No ratings yet
Biotechnology and It's Application by Hare Krishna Deepak
42 pages
Quotation Dumbwaiter
No ratings yet
Quotation Dumbwaiter
10 pages
Conceptual Development: Low Loss Precast Concrete Frame Buildings With Steel Connections
No ratings yet
Conceptual Development: Low Loss Precast Concrete Frame Buildings With Steel Connections
13 pages
Fpsyt 15 1458939
No ratings yet
Fpsyt 15 1458939
11 pages
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
No ratings yet
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
3 pages
JEE Main 2024 Solutions Jan 29 Shift 2
No ratings yet
JEE Main 2024 Solutions Jan 29 Shift 2
22 pages
15 Advanced English Phrases For Better Expressing Emotions
No ratings yet
15 Advanced English Phrases For Better Expressing Emotions
4 pages
23PGHR023 Final Review Ather
No ratings yet
23PGHR023 Final Review Ather
13 pages
Chawimawi Ru
No ratings yet
Chawimawi Ru
1 page
Semest Er - 3ecesyl L Abus (Analogelectroni CS)
No ratings yet
Semest Er - 3ecesyl L Abus (Analogelectroni CS)
2 pages
New Developments in Freefem++
No ratings yet
New Developments in Freefem++
16 pages
Excise, Taxation and Narcotics - Government of Sindh
No ratings yet
Excise, Taxation and Narcotics - Government of Sindh
1 page
Vocative in English PDF
No ratings yet
Vocative in English PDF
22 pages
Chapter1 InteractionsandMotion
No ratings yet
Chapter1 InteractionsandMotion
44 pages
Final Project Report MRI Reconstruction
No ratings yet
Final Project Report MRI Reconstruction
19 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Part A 3. KNN Classification

Uploaded by

Part A 3. KNN Classification

Uploaded by

kNN (k-Nearest Neighbor)

Learning only occurs when a test example needs to be classified.

• Let D be the training data set.

Cosine Distance: Good for documents, images, etc.

Jaccard Distance: For set data:

Hamming Distance: For string data:

Edit Distance: for strings, especially genetic data.

Mahalanobis Distance: Normalized by the sample covariance matrix –

Labeled Data Unlabeled Data

Given: a set P of n points in Rd

(K-l)-NN: Reduce complexity by having a threshold on the majority. We could

(K-l)-NN: Reduce complexity by having a threshold on the majority. We could

Select 5 Nearest Neighbors

Decide if majority of Instances over a given

Points X1 (Acid Durability ) X2(strength) Y=Classification

Points X1(Acid Durability) X2(Strength) Y(Classification)

(7,7) (7,4) (3,4) (1,4)

(7,7) (7,4) (3,4) (1,4)

Class BAD BAD GOOD GOOD

Points X1(Durability) X2(Strength) Y(Classification)

You might also like