0% found this document useful (0 votes)
71 views13 pages

W3M3-KNN Regression

The document discusses k-Nearest Neighbor regression, a non-parametric machine learning algorithm. It describes how k-NN regression works, including finding the k nearest neighbors of a new data point and predicting the target value as the average of the neighbors' target values. The document also covers the advantages of k-NN regression, such as simplicity and interpretability, as well as its limitations, like sensitivity to outliers and computational complexity.

Uploaded by

2480054
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views13 pages

W3M3-KNN Regression

The document discusses k-Nearest Neighbor regression, a non-parametric machine learning algorithm. It describes how k-NN regression works, including finding the k nearest neighbors of a new data point and predicting the target value as the average of the neighbors' target values. The document also covers the advantages of k-NN regression, such as simplicity and interpretability, as well as its limitations, like sensitivity to outliers and computational complexity.

Uploaded by

2480054
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

k-Nearest Neighbor (k-NN)

Regression
Anubha Gupta, PhD.
Professor
SBILab, Dept. of ECE,
IIIT-Delhi, India
Contact: [email protected]; Lab: http://sbilab.iiitd.edu.in
Machine Learning in Hindi

Motivation
Parametric ML Models Non-Parametric ML Models
Assume a specific functional form for the relationship Do not assume a specific functional form
between the features and target variable for the relationship between the features
and target variable
Estimate a fixed set of parameters that describe this Estimate the relationship between the
functional form, typically using maximum likelihood features and target variable directly from
estimation (MLE) or other optimization techniques the training data
Once the parameters are estimated, the model can The training data may be required each
make predictions on new data without having to time a new prediction is to be made
reprocess the training data
Simple Linear Regression, Logistic Regression, etc. Decision trees, k-Nearest Neighbors, etc.
Machine Learning in Hindi

Motivation
Linear Regression is a parametric ML model, while K-NN is a non-parametric model.

Non-parametric ML models
• are more flexible than parametric
• can handle complex relationships between the features and target variables,
making them particularly useful for high-dimensional and non-linear datasets.
• k-nearest neighbors (k-NN) is a popular non-parametric algorithm that can be
used for both regression and classification
Machine Learning in Hindi

K-Nearest Neighbor (k-NN)


Regression
Machine Learning in Hindi

Learning Objectives
• k-Nearest Neighbor (k-NN) Regression
o Description
o Advantages
o Limitations
Machine Learning in Hindi

Description
A supervised on-parametric regression algorithm used for predicting continuous values

o Assumption: Similar samples have similar target values

o Lazy learning algorithm: it defers the learning process until a new data sample
needs to be predicted because it does not learn a model explicitly. Instead, it stores
the training samples and use them to make predictions on a new data sample.

o The value of k depicts the number of nearest neighbors that are considered while
making a prediction.

o Application: Widely used in various prediction tasks, such


as predicting stock prices, housing prices, and customer churn
Machine Learning in Hindi

Steps:
1. Calculate the distance between a new data
sample to be predicted and all the samples
of the training dataset

2. Select the k number of samples from the


training dataset that are nearest to the new
data sample based on the calculated
distance

3. Calculate the predicted value of the new


data sample as the average of the target
values of the k-nearest samples of the
training set
Machine Learning in Hindi

Advantages
1. Simplicity: Does not require any assumptions about the underlying
distribution of the data or any complex mathematical calculations

2. Versatility: Can work with both continuous and categorical target variables

3. Interpretability: It can be said as an interpretable algorithm that can provide


insight into the relationship between the features and target variable. It is
easy to understand how the algorithm arrived at the prediction

4. Flexibility: Can be used with various distance metrics and weighting


schemes
Machine Learning in Hindi

Limitations
1. Selection of k: The value of k is an important hyperparameter that needs to be
chosen carefully. A small value of k can lead to overfitting, while a large value of k
can lead to underfitting.

2. Sensitivity to outliers and noise: it relies on the distance between data samples to
make predictions. Outliers and noise can distort the distance calculation and affect
the accuracy of the predictions

3. Computationally expensive & Slow: It can be computationally expensive for


large datasets because it requires distance calculations from all the data samples of
the training dataset. This can make the algorithm slow and memory-intensive,
especially for datasets with many features
Machine Learning in Hindi

Limitations
4. Curse of dimensionality: The distance between data samples becomes less
meaningful as the number of features increases. This can lead to overfitting
and poor performance.
5. Class Imbalance: KNN regression can be biased towards the majority class in
imbalanced datasets because it tends to predict the class having the majority
of samples in the training dataset. This can lead to poor performance for the
minority class.
Machine Learning in Hindi

Let us Try!
Question: Given the following training data, use KNN regression to predict the target value for a new
data sample with x=7 and k=3.
Training Data
x y
(feature) (output/target)
2 4

5 8

8 12

11 16
You may pause the video and try.
Machine Learning in Hindi

Let us Try!
Answer: Let us calculate the distance of a test data (𝑥𝑡 = 7) from each sample of the
training dataset
x y Distance of the test sample
(feature) (output/target) from all training samples
2 4 5
5 8 2
3-Nearest Neighbors
8 12 1
11 16 4

Average of the target values of the k nearest neighbors

8+12+16
Prediction = = 12
3
Machine Learning in Hindi

Summary
In this module, we learned:
• Parametric vs. non-parametric models
• Importance considerations while building an ML model
• k-Nearest Neighbor (k-NN) Regression
o Definition
o Advantages
o Limitations
• Example

You might also like