Machine Learning Lab Assessment 3
18BCE2301
Devangshu Mazumder
Aim:
Design and implement a KNN Classifier using a csv file
Csv file: processed.cleveland.data
Abstract:
The abbreviation KNN stands for “K-Nearest Neighbour”. It is a supervised machine learning
algorithm. The algorithm can be used to solve both classification and regression problem
statements. The number of nearest neighbours to a new unknown variable that has to be predicted
or classified is denoted by the symbol 'K'.
KNN works by finding the distances between a query and all the examples in the data, selecting the
specified number examples (K) closest to the query, then votes for the most frequent label (in the
case of classification) or averages the labels (in the case of regression).
Sample Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score
dataset = pd.read_csv("C:\\Users\\skull\\BCE2301\\vir\\env\\Machine Learning Lab
CSE4020/processed.cleveland.data.csv",
names=['age','sex','cp','trestbps','chol','fbs','restecg','thalach','exang','oldpeak','slope','ca','thal','output'])
dataset_mean= dataset
print("**Before Filling missing values***")
print(dataset_mean.loc[287])
dataset1=dataset_mean
df1=pd.DataFrame(dataset1)
print("**Mean of Coloumn 11**")
print(df1['ca'].mean())
df1.fillna(df1.mean(), inplace=True)
print("**After Filling missing values**")
print(df1.loc[[166,192,287,302]])
print("**Mean of Coloumn 12**")
print(df1['thal'].mean())
df1.fillna(df1.mean(), inplace=True)
print("**After Filling missing values**")
print(df1.loc[[87,266]])
feature_cols = list(dataset.columns[0:13])
print("Feature coloumns: \n{}".format(feature_cols))
#Separate the data into feature data and target data
X= dataset[feature_cols]
y= dataset['output'].values
print("\nFeature values:")
X.head
#split the dataset into training and testing data
X_train,X_test , y_train, y_test = train_test_split(X,y, test_size=0.30, random_state=5)
print(X_train)
#Normalization
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
print("**After Z-score normalization on X_train***")
print(X_train)
scaler.fit(X_test)
X_test = scaler.transform(X_test)
print("**After Z-score Normalization on X_test***")
print(X_test)
print("KNN CLASSIFER")
clf2 = KNeighborsClassifier(n_neighbors=5)
clf2.fit(X_train,y_train)
y_predictions = clf2.predict(X_test)
cm1 = confusion_matrix(y_test, y_predictions)
print("Accuracy=",accuracy_score(y_test, y_predictions))
OUTPUT: