0% found this document useful (0 votes)
5 views4 pages

ML Lab Week 7

The document outlines the implementation of the K-Nearest Neighbors (KNN) classifier using Scikit-learn, detailing the steps from library installation to model evaluation. It includes specific instructions for working with the Breast Cancer dataset, including data loading, preprocessing, and finding the optimal value for 'k'. The conclusion suggests that the best value of 'k' is around 5 based on the training and test scores graph.

Uploaded by

akruthishare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

ML Lab Week 7

The document outlines the implementation of the K-Nearest Neighbors (KNN) classifier using Scikit-learn, detailing the steps from library installation to model evaluation. It includes specific instructions for working with the Breast Cancer dataset, including data loading, preprocessing, and finding the optimal value for 'k'. The conclusion suggests that the best value of 'k' is around 5 based on the training and test scores graph.

Uploaded by

akruthishare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

WEEK-7

AIM: Implementation of KNN using sklearn

K-Nearest Neighbors is a most simple but fundamental classifier algorithm in Machine


Learning. It is under the supervised learning category and used with great intensity for pattern
recognition, data mining and analysis of intrusion. It is widely disposable in real-life scenarios
since it is non-parametric, meaning, it does not make any underlying assumptions about the
distribution of data.

Here are the steps for implementing a KNN classifier using Scikit-learn (sklearn)

1.​ Install Required Libraries: Install Scikit-learn and other dependencies.

2.​ Import Libraries: Import necessary


libraries: numpy, pandas, train_test_split, StandardScaler, KNeighborsClassifier, accurac
y_score, etc.
3.​ Load the Dataset: Load your dataset with load_iris() or any other dataset.

4.​ Split the Data into Training and Testing Sets: Split your data into training and test
sets using train_test_split().

5.​ Feature Scaling (Optional but Recommended): Perform feature scaling


using StandardScaler().

6.​ Initialize the KNN Classifier: Instantiate KNeighborsClassifier() and define the
number of neighbors (k).

7.​ Train the Classifier: Train the model by calling fit().

8.​ Make Predictions: Make predictions on test data using predict().

9.​ Evaluate the Model: Use metrics such as accuracy_score, confusion_matrix,


and classification_report to evaluate performance.

K-Nearest Neighbors Classifier using sklearn for Breast Cancer


Dataset
Here's the complete code broken down into steps, from importing libraries to plotting
the graphs:
Step 1: Importing the required Libraries
Step 2: Reading the Dataset
1
df = pd.read_csv('https://www.kaggle.com/datasets/yasserh/breast-cancer-dataset.csv')
2
​# Separate dependent and independent variables
4
y = df['diagnosis']
5
X = df.drop('diagnosis', axis = 1)
6
X = X.drop('Unnamed: 32', axis = 1)
7
X = X.drop('id', axis = 1)
8

9
# Splitting the data into a training set and a test set
10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

Step 3: Training the model

Step 4: Evaluating the model


1
We now try to find the optimum value for 'k' ie the number of nearest neighbors.

Step 5: Plotting the training and test scores graph


1
ax = sns.stripplot(x=K, y=training)
# Use x and y as keyword arguments2
ax.set(xlabel='Values of k', ylabel='Training Score')3
plt.show()

From the above scatter plot, we can come to the conclusion that the optimum value of
k will be around 5.

You might also like