0% found this document useful (0 votes)
19 views6 pages

IML Assingment Report

The document discusses preprocessing a heart disease dataset for classification using KNN models. It separates numerical and categorical data, normalizes values, and encodes categories. A custom KNN classifier is defined and fit to the data without using scikit-learn. Accuracy is calculated for different k values for both the custom and scikit-learn models and plotted. The custom model's decision boundary is visualized for two features to understand its classifications. In summary, the document implements a custom KNN classifier on preprocessed heart disease data, compares its accuracy to scikit-learn's KNN, and visualizes the custom model's decision boundary.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

IML Assingment Report

The document discusses preprocessing a heart disease dataset for classification using KNN models. It separates numerical and categorical data, normalizes values, and encodes categories. A custom KNN classifier is defined and fit to the data without using scikit-learn. Accuracy is calculated for different k values for both the custom and scikit-learn models and plotted. The custom model's decision boundary is visualized for two features to understand its classifications. In summary, the document implements a custom KNN classifier on preprocessed heart disease data, compares its accuracy to scikit-learn's KNN, and visualizes the custom model's decision boundary.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Link for colab file-Click here

Data preprocessing
● Separate the column for numerical data and categorical data.
● Normalizing the numerical datasets
● Using label encoder for normalize categorical column Here the screenshots of the code
for this steps

For categorical data


1) Data visualization

3)KNN implementation code (from scratch without using scikit learn library)
1. IN the code defines a custom KNN classifier (KNNClassifier) as a class that extends
scikit-learn's Base Estimator and ClassifierMixin.
2. The fit method is used to train the custom KNN model, and the predict method is used
to make predictions.
3. The euclidean_distance method calculates the Euclidean distance between two data
points.

4)Accuracy score
• Calculate accuracy score for both custom as well as predefined KNN model
• Then draw the graph for both
Custom-KNN score-
Accuracies for different values of k:
k = 1: 0.7934782608695652 k
= 3: 0.7934782608695652 k
= 5: 0.8152173913043478 k
= 7: 0.8097826086956522 k
= 9: 0.8097826086956522
k = 11: 0.7934782608695652

Scikit –learn KNN Model score


Accuracies for different values of k:
k = 1: 0.7934782608695652 k
= 3: 0.7554347826086957 k
= 5: 0.782608695652174
k = 7: 0.782608695652174 k
= 9: 0.7934782608695652 k
= 11: 0.7663043478260869

6)Decision Boundary
- visualizes the decision boundary of the custom KNN model for two selected features,
'cp_encoded' and 'sex_encoded', using a contour plot. This helps visualize how the model
separates different classes.
• The for design boundary is attached below
• The code successfully preprocessed the heart disease dataset by filling missing values,

scaling numeric features, and encoding categorical features.

• It implemented a custom KNN classifier and evaluated its performance through k-fold

cross-validation, with k = 5. The accuracy was computed for various values of k and

visualized using a plot.

• The custom KNN model's accuracy was compared with scikit-learn's KNN model, and

both models' accuracy scores were printed.

• The decision boundary for the custom KNN model was visualized for two selected

features, allowing for a qualitative understanding of the model's classification boundaries.

Reference-;

https://jakevdp.github.io/PythonDataScienceHandbook/03.06-concat-and-append.html

https://www.geeksforgeeks.org/ml-one-hot-encoding-of-datasets-in-python/

https://saturncloud.io/blog/how-to-combine-two-columns-in-a-pandas-dataframe/

https://www.geeksforgeeks.org/stratified-sampling-in-pandas/

You might also like