Unit 2h
Unit 2h
Machine Learning
K Nearest Neighbor
Classification
• Nearest Neighbor-based models are a class of machine learning algorithms that classify data
• These models are widely used in classification, regression, and recommendation tasks.
• The core idea is simple: for a given test point, the model finds the most similar data points from
the training set and uses their labels or values to make a prediction.
Nearest Neighbor Classifiers
• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck
Compute
Distance Test Record
● The closest class will be identified using the distance measures like Euclidean
distance.
Introduction to Proximity Measures
● Manhattan distance
Distance Metrics
Non-Metric Similarity Functions
standardization.
Where,
x̄: mean of the variable
s: standard deviation
KNN methodology
Euclidean distance
● Compute the Euclidean
distance Humidity Temperature Rainfall
84 34 ?
between the new data for
● For example, each
consider the
instance.observation
first
Humidity = 58 and Temperature = 19,
the Euclidean distance is
[(58 - 84)² + (19-34)²]½ = 31.623
KNN Example
Euclidean
● Computed the Euclidean distances for observation Distance Class label
(sorted) (Rainfall)
each instance with the new data and
5 18.25 1
sorted the data in ascending order with 12 18.97 1
x y z x y z
x CC (x,x) CC (x,y) CC (x,z) x 1.oo 0.66865 -0.10131
7
Y CC (y,x) CC (y,y) CC (y,z)
y 0.668657 1.oo -0.28793
z CC (z,x) CC (z,y) CC (z,z)
z -0.10131 -0.28793 1.00
x y z
X 1.oo 0.66865 -0.10131
7
Y 0.668657 1.oo -0.28793
z -0.10131 -0.28793 1.00
Click here for python code to find correlation coefficient for the given data.
Click here for Excel sheet to find the correlation matrix for the same data.
Principal Component Analysis
► Example:
CC(x,y) =
x y xy x2 y2
Where, N is number of data points
7 4 28 49 16
4 1 4 16 1
6 3 18 36 9
8 6 48 64 36
8 5 40 64 25
7 2 14 49 4
5 3 15 25 9 Similarly, CC of other combinations can be
9 5 45 81 25 seen in matrix R.
7 4 28 49 16
8 2 16 64 4
Principal Component Analysis
Finally, we can compute the factor scores from ZB, where Z is X converted
to standard score form. These columns are the principal factors.
Principal Component Analysis
S. Question Mark BL CO
N s
o
1 a) Explain the concept of proximity measures and their importance in machine learning. [7M] 2 2
b) Differentiate between metric and non-metric similarity functions with suitable examples. [7M] 2 2
2 Solve a problem to calculate the Euclidean and Manhattan distances for given data points: [14M 3 2
]
Example: Data points: A = [3, 4], B = [6, 8], calculate both distances.
3 a) Describe different distance measures such as Euclidean, Manhattan, and Minkowski with [7M] 2 2
examples.
b) Problem: For two binary patterns A = [1, 0, 1, 1, 0] and B = [1, 1, 1, 0, 0], calculate the Jaccard [7M] 3 2
similarity.
4 a) Explain the K-Nearest Neighbor (KNN) algorithm and describe its working with an example. [7M] 2 2
b) Problem: Given a dataset: [(2, 3, "A"), (4, 5, "B"), (6, 7, "A")], classify the test point (3, 4) for K=3. [7M] 3 2
5 Describe and solve a Radius Distance Nearest Neighbor (RNN) classification problem: [14M 3 2
]
Use a sample dataset and classify points within a radius of 2 units.
QUESTION BANK
6 a) Differentiate between KNN classifier and KNN regression with examples. [7M] 2 2
b) Problem: Perform KNN regression on the given data points: [(1, 2), (2, 3), (3, 4)], predict for [7M] 3 2
x=2.5 using K=2.
7 Analyze the performance of classifiers using a confusion matrix and compute evaluation metrics: [14M] 4 2
Example: Confusion matrix: True Positives=50, False Positives=10, False Negatives=5, True 2
Negatives=35.
8 a) Discuss the factors affecting the performance of regression algorithms in KNN. [7M] 2 2
b) Problem: Calculate the mean squared error (MSE) and mean absolute error (MAE) for a KNN [7M] 4 2
regression problem:
Predicted values: [2.5, 3.7, 4.2], Actual values: [3, 4, 4.5].
Thank You!