0% found this document useful (0 votes)
27 views

AI LAB Assignment 09

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

AI LAB Assignment 09

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Name: Aditi Kandari

PRN: 21070122209
Batch: TYCS AI4
Assignment 9
Aim: Apply the K-Nearest Neighbours classifier algorithm on a sample case
study and data set and evaluate results.
Objective: The objective of the provided code is to build a classification model
using the K-Nearest Neighbors (KNN) algorithm to predict the class of
shipping e-commerce orders based on various features.
Algorithm:
1. Data Loading: Load the shipping ecommerce dataset from a CSV file.
2. Data Preprocessing:
 Encode categorical variables: Convert categorical variables like
'Warehouse_block', 'Mode_of_Shipment', 'Product_importance',
and 'Gender' into numerical format using label encoding.
3. Data Splitting: Split the dataset into features (X) and the target
variable (y), which represents the class labels.
4. Feature Scaling: Scale the features using StandardScaler to
standardize them.
5. Model Training: Train a KNN classifier using the training data to
learn patterns in the features and their corresponding class labels.
6. Model Evaluation: Evaluate the trained model's performance on the
test dataset by predicting the class labels and computing metrics such
as accuracy and classification report.
7. Output: Print the accuracy and classification report to assess the
model's performance.

Input:
Output:

Conclusion: In this experiment, we applied the K-NN classifier to the provided


dataset and evaluated the performance and result.
Post-lab Questions:
1. How did the choice of the number of neighbors (K) impact the
performance of the K- Nearest Neighbors (KNN) classifier in terms of
accuracy and precision?
The choice of the number of neighbors (K) in K-Nearest Neighbors (KNN)
impacts accuracy and precision as follows:
 Smaller K (e.g., K=1) can lead to high accuracy on training data but
may overfit, reducing generalization to test data. It often results in
higher precision due to focusing on local data points.
 Larger K (e.g., K=10) reduces overfitting but may oversmooth, leading
to lower accuracy. It tends to have lower precision as it includes more
distant points, potentially introducing noise.
Thus, choosing the appropriate K value involves balancing model complexity
and generalization ability to optimize accuracy and precision.
2. What insights did the visualization techniques, such as confusion
matrices or ROC curves, provide about the strengths and weaknesses of
the KNN algorithm in the context of the applied case study?
Visualization techniques like confusion matrices and ROC curves provide
valuable insights into the performance of the K-Nearest Neighbors (KNN)
algorithm.
Confusion matrices summarize the model's accuracy, precision, recall, and F1
score for each class, highlighting areas of misclassification or class
imbalances.
Meanwhile, ROC curves illustrate the trade-off between a true positive rate
and a false positive rate, helping to assess the algorithm's discrimination ability
and select an optimal threshold.
These visualizations enable a thorough evaluation of the KNN algorithm's
strengths and weaknesses, aiding in model refinement and optimization for
specific case studies.
3. How could additional preprocessing techniques or alternative
distance metrics be explored to further optimize the performance of
the KNN classifier on the given dataset?
Additional preprocessing techniques such as:
1. Feature Selection: Conduct feature selection techniques such as
Recursive Feature Elimination (RFE) or feature importance ranking
to identify and retain only the most relevant features for training the
KNN classifier. This can reduce noise and improve model
performance.
2. Handling Imbalanced Data: Implement techniques like oversampling
(e.g., SMOTE), under sampling, or using algorithms designed for
imbalanced data (e.g., weighted KNN) to address class imbalance in
the dataset. Balancing the class distribution can prevent the model
from being biased towards the majority class and improve its ability to
classify minority classes.
3. Dealing with Outliers: Explore methods to detect and handle outliers
in the dataset, such as removing them, transforming them, or using
robust distance metrics like the Mahalanobis distance. Outliers can
adversely affect the performance of the KNN algorithm by skewing
the distance calculations and misclassifying instances.
4. Alternative Distance Metrics: Investigate alternative distance
metrics besides the default Euclidean distance used in KNN. For
example, Manhattan distance (L1 norm) may be more suitable for
high- dimensional data or when features have different scales.
Minkowski distance allows tuning the distance metric by adjusting
the p parameter, providing flexibility in capturing data similarities.

You might also like