IML Assingment Report
IML Assingment Report
Data preprocessing
● Separate the column for numerical data and categorical data.
● Normalizing the numerical datasets
● Using label encoder for normalize categorical column Here the screenshots of the code
for this steps
3)KNN implementation code (from scratch without using scikit learn library)
1. IN the code defines a custom KNN classifier (KNNClassifier) as a class that extends
scikit-learn's Base Estimator and ClassifierMixin.
2. The fit method is used to train the custom KNN model, and the predict method is used
to make predictions.
3. The euclidean_distance method calculates the Euclidean distance between two data
points.
4)Accuracy score
• Calculate accuracy score for both custom as well as predefined KNN model
• Then draw the graph for both
Custom-KNN score-
Accuracies for different values of k:
k = 1: 0.7934782608695652 k
= 3: 0.7934782608695652 k
= 5: 0.8152173913043478 k
= 7: 0.8097826086956522 k
= 9: 0.8097826086956522
k = 11: 0.7934782608695652
6)Decision Boundary
- visualizes the decision boundary of the custom KNN model for two selected features,
'cp_encoded' and 'sex_encoded', using a contour plot. This helps visualize how the model
separates different classes.
• The for design boundary is attached below
• The code successfully preprocessed the heart disease dataset by filling missing values,
• It implemented a custom KNN classifier and evaluated its performance through k-fold
cross-validation, with k = 5. The accuracy was computed for various values of k and
• The custom KNN model's accuracy was compared with scikit-learn's KNN model, and
• The decision boundary for the custom KNN model was visualized for two selected
Reference-;
https://jakevdp.github.io/PythonDataScienceHandbook/03.06-concat-and-append.html
https://www.geeksforgeeks.org/ml-one-hot-encoding-of-datasets-in-python/
https://saturncloud.io/blog/how-to-combine-two-columns-in-a-pandas-dataframe/
https://www.geeksforgeeks.org/stratified-sampling-in-pandas/