0% found this document useful (0 votes)
82 views47 pages

Week 7 Part 1KNN K Nearest Neighbor Classification

K-Nearest Neighbors (KNN) is a simple machine learning algorithm used for classification and regression. It makes predictions based on the labels of the k nearest training examples in feature space. For classification, it assigns a new example the majority class of its k nearest neighbors. For regression, it predicts the average target value of the k nearest neighbors. KNN is non-parametric, easy to implement, and interpretable, but computationally expensive for large datasets. Choosing an optimal value for k is important to avoid overfitting or underfitting.

Uploaded by

Michael Zewdie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views47 pages

Week 7 Part 1KNN K Nearest Neighbor Classification

K-Nearest Neighbors (KNN) is a simple machine learning algorithm used for classification and regression. It makes predictions based on the labels of the k nearest training examples in feature space. For classification, it assigns a new example the majority class of its k nearest neighbors. For regression, it predicts the average target value of the k nearest neighbors. KNN is non-parametric, easy to implement, and interpretable, but computationally expensive for large datasets. Choosing an optimal value for k is important to avoid overfitting or underfitting.

Uploaded by

Michael Zewdie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Week-7-Part-1

K-NN
K-Nearest Neighbors (KNN) is a simple and versatile
machine learning algorithm used for both
classification and regression tasks.
It's based on the principle of similarity, where the
predicted label or value of a new data point is
determined by the labels or values of its k nearest
neighbors in the training dataset.
How KNN works?

Training: The KNN algorithm first requires a labelled training


dataset. Each data point in the dataset has features (attributes) and a
corresponding label (class or target value).

Prediction: When a new, unlabelled data point arrives, KNN searches


for its k nearest neighbors in the training dataset based on a chosen
distance metric (e.g., Euclidean distance).

Classification: For classification tasks, the algorithm assigns the new


data point the majority class label of its k nearest neighbors.

Regression: For regression tasks, the algorithm predicts the value of


the new data point as the average of the values of its k nearest
neighbors.
Key characteristics of KNN

Non-parametric: KNN does not make any assumptions about


the underlying data distribution, making it suitable for diverse
datasets.
Easy to implement: The algorithm's concept and
implementation are relatively straightforward, making it a
popular choice for beginners.

Interpretable: KNN predictions are easily interpretable by


examining the nearest neighbors used for the prediction.

Lazy learning: KNN doesn't explicitly build a model during the


training phase. Instead, it stores the entire training data and
performs computations only when making predictions.
KNN: Strengths & Weaknesses
Strengths:

Simple and intuitive concept


Versatile for both classification and regression
Works well with small datasets
Handles complex relationships between features
Robust to outliers

Weaknesses:

Computationally expensive for large datasets


Sensitive to irrelevant features
Can suffer from the "curse of dimensionality"
Requires careful selection of the k value
KNN-Applications
❖Image recognition
❖Spam filtering
❖Customer segmentation
❖Recommendation systems
❖Anomaly detection
❖Time series forecasting
Why k-NN?
What is k-NN algorithm?
What is k-NN algorithm contd..
What is k-NN algorithm contd..
What is k-NN algorithm contd..
K-NN
Eager Vs Lazy Learners

In machine learning, a lazy learner is an algorithm


that delays the learning process until a query is
made to the system. This contrasts with eager
learners, which build a model upfront during
training.
How do we choose the factor ‘k’?
How do we choose better ‘k’?

Choosing the optimal K-value in the K-Nearest Neighbors


(KNN) algorithm is crucial for its performance. A good K-
value leads to accurate predictions, while a bad choice can
result in underfitting or overfitting.
To choose the value of k

Small K: A very small K can lead to overfitting, where the KNN algorithm becomes too
sensitive to noise and local variations in the training data, resulting in poor generalization to
unseen data.
Large K: A very large K can lead to underfitting, where the KNN algorithm becomes insensitive
to local patterns and fails to learn the underlying relationships between features and target
variables.
Odd K: Choosing an odd K value helps to break ties when voting for the class of a new data
point.
When do we use k-NN
How does k-NN algorithm works?
How does k-NN algorithm work?
Euclidean distance to find NN
Euclidian Distance formula
Distance calculation using Ecludian
Euclidian distance
K=3
Majority neighbors
Usecase2: apply nearest neighbor
algorithm from node-B
K-NN solution
Recap of k-NN
Use case3
K-NN predict diabetes
Diabetes dataset
Diabetes predict
K-NN classification recap
k-NN
Usecase4
Similarity
K-NN Usecase solution contd..
Rank these attributes
k=1
K=2
K=3
End of Week6 Part-1
• End of Week6 (Part1)
KNN-K Nearest Neighbor
Classification
Solved numerical on KNN classification
3 Nearest values N1,N2,N3
KNN solved example
End of Solved Usecases of
week7-part-1
• End of Solved Usecases

You might also like