0% found this document useful (0 votes)
44 views2 pages

Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)

This document implements a K-Nearest Neighbors (KNN) classifier on the Haberman's Survival Data Set using scikit-learn. It loads and prepares the data, splits it into training and test sets, trains a KNN classifier with 3 neighbors, evaluates the model's performance using a confusion matrix and classification report, and plots the error rate for different values of K between 1 and 40.

Uploaded by

Raheel Aslam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views2 pages

Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)

This document implements a K-Nearest Neighbors (KNN) classifier on the Haberman's Survival Data Set using scikit-learn. It loads and prepares the data, splits it into training and test sets, trains a KNN classifier with 3 neighbors, evaluates the model's performance using a confusion matrix and classification report, and plots the error rate for different values of K between 1 and 40.

Uploaded by

Raheel Aslam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

LAB#07 Raheel Aslam (74-FET/BSEE/F16)

Machine Learning Lab


Code KNN algorithm using Haberman's Survival Data Set:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://archive.ics.uci.edu/ml/machine-learning-
databases/haberman/haberman.data"
# Assign colum names to the dataset
names = ['Age of patient at time of operation', 'Patients year of operation', 'Number
of positive axillary nodes detected', 'Survival status']
# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names)
dataset.head()
X= dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=3)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
error = []
# Calculating error for K values between 1 and 40
for i in range(1, 40):
knn = KNeighborsClassifier(n_neighbors=i)
knn.fit(X_train, y_train)
pred_i = knn.predict(X_test)
error.append(np.mean(pred_i != y_test))
plt.figure(figsize=(12, 6))
plt.plot(range(1, 40), error, color='red', linestyle='dashed',
marker='o',markerfacecolor='blue', markersize=10)
plt.title('Error Rate K Value')
plt.xlabel('K Value')
plt.ylabel('Mean Error')
plt.show()
Output:
[[44 1]
[11 6]]
precision recall f1-score support

1 0.80 0.98 0.88 45


2 0.86 0.35 0.50 17

accuracy 0.81 62
macro avg 0.83 0.67 0.69 62
weighted avg 0.82 0.81 0.78 62

You might also like