0% found this document useful (0 votes)

11 views

Assignment 2

Uploaded by

Sai Buvanesh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Assignment 2

Uploaded by

Sai Buvanesh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

ASSIGNMENT 2 - PROBLEMS IN KNN CLASSIFIER

1. High Dimensionality

Problem :

KNN relies on distance calculations to determine how similar or dissimilar data

points are. In high-dimensional spaces, the distance between points becomes
less meaningful because the points tend to be equidistant from each other.
This phenomenon is known as the "curse of dimensionality." As the number of
dimensions increases, the volume of the space increases exponentially, and
data points become sparse. As a result, the model may struggle to find
meaningful nearest neighbors.

Solutions :

- Dimensionality Reduction : Use techniques like:

- Principal Component Analysis (PCA) : This technique transforms your
features into a lower-dimensional space while preserving as much
variance as possible.
- t-Distributed Stochastic Neighbor Embedding (t-SNE) : A technique
particularly well-suited for visualizing high-dimensional data in two or
three dimensions.
- Feature Selection : Identify and retain only the most relevant features
based on statistical tests, correlation matrices, or domain knowledge.

2. Choice of K

Problem :

The parameter K (the number of nearest neighbors to consider) is crucial in

determining the performance of KNN. A small K can make the model sensitive
to noise in the data, while a large K can cause the model to overlook small,
potentially important patterns in the data.

Solutions :

- Cross-Validation : Use techniques like k-fold cross-validation to systematically

evaluate how different values of K affect model performance. This involves
dividing your dataset into k subsets, training the model on k-1 of them, and
testing on the remaining subset. This process is repeated k times, and the
performance metrics are averaged to find the optimal K.

- Error Analysis : Plot the training and validation errors for different values of
K. Look for the K that minimizes the validation error without increasing the
training error.

3. Distance Metric

Problem :

The default distance metric in KNN is Euclidean distance, which might not
always be appropriate, especially if your features are on different scales or
represent different types of data (categorical vs. continuous). This can lead to
misleading neighbor calculations.

Solutions :

- Feature Scaling : Normalize (scale features to a range between 0 and 1) or

standardize (transform features to have a mean of 0 and a standard deviation
of 1) your features before applying KNN. This ensures that each feature
contributes equally to the distance calculation.

- Alternative Distance Metrics : Experiment with other distance metrics, such

as:

- Manhattan Distance : Useful for high-dimensional data, as it sums the

absolute differences.

- Minkowski Distance : A generalized distance metric that includes both

Euclidean and Manhattan distances.

- Hamming Distance : Useful for categorical data.

4. Imbalanced Dataset

Problem :

In imbalanced datasets, one class significantly outnumbers the other. KNN

tends to favor the majority class because the majority of its neighbors will
belong to that class, leading to poor performance in predicting the minority
class.
Solutions :

- Resampling Techniques :

- Oversampling : Increase the number of instances in the minority class (e.g.,

using SMOTE - Synthetic Minority Over-sampling Technique).

- Undersampling : Reduce the number of instances in the majority class.

- Weighted KNN : Modify the KNN algorithm to give more weight to the
minority class during the distance calculation or when voting for the predicted
class.

5. Noise and Outliers

Problem :

KNN is sensitive to noise and outliers, which can skew the distance
calculations. An outlier can significantly impact the nearest neighbor
calculations and lead to incorrect predictions.

Solutions :

- Outlier Detection : Use techniques such as Z-score analysis, IQR, or clustering

algorithms (like DBSCAN) to identify and remove outliers before training the
KNN model.

- Robust Distance Metrics : Consider using distance metrics that are less
sensitive to outliers, such as the Mahalanobis distance.

6. Computational Complexity

Problem :

KNN has a time complexity of O(n) per query, where n is the number of
training instances. As the dataset grows, the prediction time can become
prohibitive, especially for real-time applications.

Solutions :
- KD-Tree : This data structure allows for faster nearest neighbor searches in
lower dimensions by partitioning the space.

- Ball Tree : Similar to KD-Tree but can be more efficient in higher dimensions.

- Approximate Nearest Neighbors : Algorithms like Annoy, FLANN, or FAISS can

speed up the neighbor search by trading off some accuracy for speed.

7. Overfitting

Problem :

If the training set is small or not representative of the overall distribution, KNN
can overfit to the training data, especially with a small K.

Solutions :

- Increase Training Data : Collect more data or use data augmentation

techniques if applicable.

- Use Cross-Validation : This can help assess the model's ability to generalize to
unseen data.

8. Feature Scaling

Problem :

If your features are not on the same scale, KNN may give undue weight to
features with larger ranges, leading to biased distance calculations.

Solutions :

- Standardization : Scale features so they have a mean of 0 and a standard

deviation of 1.

- Normalization : Scale features to a specific range, typically [0, 1].

Debugging Steps
1. Check Data Quality : Make sure your data is clean and free of errors (missing
values, incorrect labels, etc.). Use data exploration techniques to understand
your dataset better.

2. Visualize Data : Use plots (like scatter plots for 2D data or pair plots) to
identify clusters, outliers, or patterns in the data.

3. Evaluate Performance : After training the model, assess its performance

using metrics like:

- Accuracy : The percentage of correct predictions.

- Precision : The ratio of true positive predictions to the total predicted

positives.

- Recall : The ratio of true positive predictions to the total actual positives.

- F1-Score : The harmonic mean of precision and recall, which gives a

balanced view.

4. Iterate : Based on the evaluations, make necessary adjustments to feature

selection, distance metric, and hyperparameters. Always keep testing and
validating your approach.

By carefully addressing these issues and iterating on your approach, you can
significantly improve the performance of your KNN classifier! If you have a
specific issue or dataset you're working with, feel free to share, and I can
provide more targeted advice!

Turbocharger and Supercharger Report
100% (3)
Turbocharger and Supercharger Report
18 pages
KNN
No ratings yet
KNN
29 pages
KMEANS
No ratings yet
KMEANS
9 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
ML Notes
100% (2)
ML Notes
125 pages
Notes: KNN: K-Nearest Neighbors
No ratings yet
Notes: KNN: K-Nearest Neighbors
4 pages
KNN REPORT
No ratings yet
KNN REPORT
28 pages
AI LAB Assignment 09
No ratings yet
AI LAB Assignment 09
4 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
06-knn
No ratings yet
06-knn
41 pages
METHODLOGY of KNN
No ratings yet
METHODLOGY of KNN
2 pages
ML-MID1-MYANS
No ratings yet
ML-MID1-MYANS
24 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
2 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Machine learning note 4
No ratings yet
Machine learning note 4
2 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
No ratings yet
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
6 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
Lab 8
No ratings yet
Lab 8
7 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Improving Performance Handout
No ratings yet
Improving Performance Handout
4 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
cYCLE 9
No ratings yet
cYCLE 9
5 pages
05 K-Nearest Neighbors
No ratings yet
05 K-Nearest Neighbors
15 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
04 KNN Implementation
No ratings yet
04 KNN Implementation
7 pages
AIML PPT[1]
No ratings yet
AIML PPT[1]
13 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Bài-nhóm-tìm-hiểu-về-KNN
No ratings yet
Bài-nhóm-tìm-hiểu-về-KNN
5 pages
Notes 02
No ratings yet
Notes 02
79 pages
When Do We Use KNN Algorithm?
No ratings yet
When Do We Use KNN Algorithm?
7 pages
Machine Learning KNN - Supervised
No ratings yet
Machine Learning KNN - Supervised
9 pages
L05-Predictive Analytics I
No ratings yet
L05-Predictive Analytics I
49 pages
lab_1_1.2
No ratings yet
lab_1_1.2
4 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
KNN_colab_illustration
No ratings yet
KNN_colab_illustration
5 pages
Research Paper
No ratings yet
Research Paper
6 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
KNN
No ratings yet
KNN
53 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
Fashion MNIST-6
No ratings yet
Fashion MNIST-6
10 pages
algosintrvwques
No ratings yet
algosintrvwques
27 pages
Ch2_Lec2_ K Nearest Neighbour (KNN)
No ratings yet
Ch2_Lec2_ K Nearest Neighbour (KNN)
18 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
KNN v2
No ratings yet
KNN v2
31 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
ML 4 (1)
No ratings yet
ML 4 (1)
33 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DCIT 305: Databases Fundamentals: Session 1: Introduction To Database Fundamentals
No ratings yet
DCIT 305: Databases Fundamentals: Session 1: Introduction To Database Fundamentals
47 pages
Biochemistry: Hindlabs Nagpur Dilip Khobragade
No ratings yet
Biochemistry: Hindlabs Nagpur Dilip Khobragade
6 pages
Test Wave Optics
100% (1)
Test Wave Optics
5 pages
Module 1.1 (System of Co-Planar Forces)
No ratings yet
Module 1.1 (System of Co-Planar Forces)
70 pages
Espn NFL 2k5
No ratings yet
Espn NFL 2k5
41 pages
Instructions:: Sources & Citations
No ratings yet
Instructions:: Sources & Citations
3 pages
How Can We Predict Bacterial Eradication
No ratings yet
How Can We Predict Bacterial Eradication
8 pages
No - Ntnu Inspera 78072401 25056133
No ratings yet
No - Ntnu Inspera 78072401 25056133
77 pages
Comparison of Mechanical Properties of Natural Rubber Vulcanizates Filled With Hybrid Fillers (Carbon Black/Palm Kernel Shell and Palm Kernel Shell/Sandbox Seed Shell)
No ratings yet
Comparison of Mechanical Properties of Natural Rubber Vulcanizates Filled With Hybrid Fillers (Carbon Black/Palm Kernel Shell and Palm Kernel Shell/Sandbox Seed Shell)
6 pages
End Term G 2016
No ratings yet
End Term G 2016
88 pages
Class 8 - Portions & Time Table (Dec '21)
No ratings yet
Class 8 - Portions & Time Table (Dec '21)
1 page
Ball Valve GBC 22S
No ratings yet
Ball Valve GBC 22S
15 pages
WORKSHOP DECEMBER 04 & 05-2024 -FINAL (1)
No ratings yet
WORKSHOP DECEMBER 04 & 05-2024 -FINAL (1)
9 pages
Pump Head Calculation
No ratings yet
Pump Head Calculation
14 pages
Abc - Recording - Form 3
No ratings yet
Abc - Recording - Form 3
3 pages
Car Parking Steel Structural Detail
No ratings yet
Car Parking Steel Structural Detail
1 page
Company Profile: Saudi Emarati Electric Power Generators Co. (L.L.C)
100% (1)
Company Profile: Saudi Emarati Electric Power Generators Co. (L.L.C)
18 pages
International Electromagnetic Field (EMF) Dosimetry Project
100% (1)
International Electromagnetic Field (EMF) Dosimetry Project
255 pages
Relation of political science with other social science
No ratings yet
Relation of political science with other social science
24 pages
Botany First Year Ipe Vsaqs
No ratings yet
Botany First Year Ipe Vsaqs
4 pages
Plan Training Session
92% (13)
Plan Training Session
92 pages
Strategic Management
0% (1)
Strategic Management
28 pages
Group 3 - Chapter 1 and 2
100% (1)
Group 3 - Chapter 1 and 2
15 pages
TMM41256P 12 Toshiba
No ratings yet
TMM41256P 12 Toshiba
10 pages
Chapter 8 Mooring and Buoy Systems
No ratings yet
Chapter 8 Mooring and Buoy Systems
13 pages
Intro Specimen Collection - PPTX (Autosaved)
No ratings yet
Intro Specimen Collection - PPTX (Autosaved)
8 pages
SCF Deck PWC
No ratings yet
SCF Deck PWC
7 pages
Rodrigues-Doolabh Et Al 2003 Attachment Scripts Across Cultures Further Evidenc
No ratings yet
Rodrigues-Doolabh Et Al 2003 Attachment Scripts Across Cultures Further Evidenc
15 pages
Q1 W1 Mod1 Physical Education 12 HRF
No ratings yet
Q1 W1 Mod1 Physical Education 12 HRF
9 pages