0% found this document useful (0 votes)
25 views

Assignment 2 Solution

The document discusses the K-nearest neighbors (KNN) classifier. It explains that KNN is a lazy classifier, meaning it does not learn a discriminative function from the training data like other models. Instead, it simply stores all the training examples and classifies new examples based on similarity to past cases. The document notes that for KNN, adding new training data only requires storing it without retraining the whole model. It also states that KNN has the highest computational cost of O(n) for classifying a new test example, where n is the number of training examples, as it requires calculating distances to all past cases. Finally, it provides pseudocode for implementing a basic KNN classifier, including calculating distances, selecting

Uploaded by

Razin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Assignment 2 Solution

The document discusses the K-nearest neighbors (KNN) classifier. It explains that KNN is a lazy classifier, meaning it does not learn a discriminative function from the training data like other models. Instead, it simply stores all the training examples and classifies new examples based on similarity to past cases. The document notes that for KNN, adding new training data only requires storing it without retraining the whole model. It also states that KNN has the highest computational cost of O(n) for classifying a new test example, where n is the number of training examples, as it requires calculating distances to all past cases. Finally, it provides pseudocode for implementing a basic KNN classifier, including calculating distances, selecting

Uploaded by

Razin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Scanned by CamScanner

Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
3. K-Nearest Neighbor Classifier
3.1 Lazy Classifier
a. When a new training example becomes available, among SVM, Naive Bayes and KNN,
which classifier(s) have to be re-trained from scratch?
SVM has to be re-trained from scratch. For KNN, just add the new data to the training set
and then it will be available for prediction. Nothing else has to be done. For Naïve Bayes
also, just the count of data points will vary and accordingly probability values can be
adjusted and hence it can be updated easily. But for SVM, the new data might change the
support vectors entirely and hence has to re-trained from scratch.
b. When a new test example becomes available, among SVM, Naive Bayes and KNN, which
classifier needs the most computation to infer the class label for this example, and what
is the time complexity for this inference, assuming that we have n training examples, and
the number of features is significantly smaller than n?
KNN
In KNN, firstly we need to calculate distance from the new test sample to all of the n
training samples. Since number of features are negligible, it will take O(1) time to
calculate distance from the test sample to 1 training sample. Therefore O(n) time is
required to calculate distance from test sample to n training samples.
Now to select k closest points from the sample, need O(n Log k), assuming max-heap
is used to select k closest points.
Selecting label based on majority vote will take O(k).
Hence complexity will be O(n)+O(n Log k)+O(k). If k is negligible compared to n, then
O(n).
3.2 Implementation of KNN Classifier
a. Pseudocode
1. Download 'mnist_train.csv' and 'mnist_test.csv' files from the site mentioned.
2. Load the first 6000 samples from training set to X_train (Samples) and y_train(Labels).
3. Load the last 1000 samples from test set to X_test (Samples) and y_test(Labels).
4. Calculate Euclidean Distance from each of the test sample to all the training samples
and store it in a matrix of 1000*6000 dimension.
5. Predict label for the test set using the distance matrix using KNN Classifier algorithm
with different values of k and calculate error.
6. Plot the graph of error vs the value of k.
Function calculate_distance_matrix
For i=0 to NumberOfTestSamples-1
difference = X_testi – X_train
squared = difference^2
summed = ∑j (squaredj)
squareRooted = √ summed
distance_matrix[i] = squareRooted
return distance_matrix

Function predict
For i=0 to NumberOfTestSamples-1
distance_from_i = distance_matrix[i]
sort(distance_from_i)
select k closest points from distance_from_i
obtain classes of those k points from y_train
y_pred[i] = majority label among k values
accuracy = (# y_pred == y_test) / (# y_pred)
error = 1-accuracy
return error
b. Curve of Error vs Value of K

You might also like