W3M3-KNN Regression
W3M3-KNN Regression
Regression
Anubha Gupta, PhD.
Professor
SBILab, Dept. of ECE,
IIIT-Delhi, India
Contact: [email protected]; Lab: http://sbilab.iiitd.edu.in
Machine Learning in Hindi
Motivation
Parametric ML Models Non-Parametric ML Models
Assume a specific functional form for the relationship Do not assume a specific functional form
between the features and target variable for the relationship between the features
and target variable
Estimate a fixed set of parameters that describe this Estimate the relationship between the
functional form, typically using maximum likelihood features and target variable directly from
estimation (MLE) or other optimization techniques the training data
Once the parameters are estimated, the model can The training data may be required each
make predictions on new data without having to time a new prediction is to be made
reprocess the training data
Simple Linear Regression, Logistic Regression, etc. Decision trees, k-Nearest Neighbors, etc.
Machine Learning in Hindi
Motivation
Linear Regression is a parametric ML model, while K-NN is a non-parametric model.
Non-parametric ML models
• are more flexible than parametric
• can handle complex relationships between the features and target variables,
making them particularly useful for high-dimensional and non-linear datasets.
• k-nearest neighbors (k-NN) is a popular non-parametric algorithm that can be
used for both regression and classification
Machine Learning in Hindi
Learning Objectives
• k-Nearest Neighbor (k-NN) Regression
o Description
o Advantages
o Limitations
Machine Learning in Hindi
Description
A supervised on-parametric regression algorithm used for predicting continuous values
o Lazy learning algorithm: it defers the learning process until a new data sample
needs to be predicted because it does not learn a model explicitly. Instead, it stores
the training samples and use them to make predictions on a new data sample.
o The value of k depicts the number of nearest neighbors that are considered while
making a prediction.
Steps:
1. Calculate the distance between a new data
sample to be predicted and all the samples
of the training dataset
Advantages
1. Simplicity: Does not require any assumptions about the underlying
distribution of the data or any complex mathematical calculations
2. Versatility: Can work with both continuous and categorical target variables
Limitations
1. Selection of k: The value of k is an important hyperparameter that needs to be
chosen carefully. A small value of k can lead to overfitting, while a large value of k
can lead to underfitting.
2. Sensitivity to outliers and noise: it relies on the distance between data samples to
make predictions. Outliers and noise can distort the distance calculation and affect
the accuracy of the predictions
Limitations
4. Curse of dimensionality: The distance between data samples becomes less
meaningful as the number of features increases. This can lead to overfitting
and poor performance.
5. Class Imbalance: KNN regression can be biased towards the majority class in
imbalanced datasets because it tends to predict the class having the majority
of samples in the training dataset. This can lead to poor performance for the
minority class.
Machine Learning in Hindi
Let us Try!
Question: Given the following training data, use KNN regression to predict the target value for a new
data sample with x=7 and k=3.
Training Data
x y
(feature) (output/target)
2 4
5 8
8 12
11 16
You may pause the video and try.
Machine Learning in Hindi
Let us Try!
Answer: Let us calculate the distance of a test data (𝑥𝑡 = 7) from each sample of the
training dataset
x y Distance of the test sample
(feature) (output/target) from all training samples
2 4 5
5 8 2
3-Nearest Neighbors
8 12 1
11 16 4
8+12+16
Prediction = = 12
3
Machine Learning in Hindi
Summary
In this module, we learned:
• Parametric vs. non-parametric models
• Importance considerations while building an ML model
• k-Nearest Neighbor (k-NN) Regression
o Definition
o Advantages
o Limitations
• Example