0% found this document useful (0 votes)
46 views5 pages

Name: Mussab Bin Shahid Sap-Id: 2024 Assignment: Machine-Learning

1. The document discusses implementing a KNN classifier with Euclidean distance to predict whether passengers on the Titanic survived or not, based on features like passenger class, sex, and fare. 2. It tests different values of K and finds the highest testing accuracy of 81% is achieved when K=7. 3. Confusion matrices and accuracy scores are calculated for different distance metrics like Euclidean, Minkowski, and Manhattan at various K values, with the best results obtained at K=7 for Manhattan distance.

Uploaded by

Mussab Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views5 pages

Name: Mussab Bin Shahid Sap-Id: 2024 Assignment: Machine-Learning

1. The document discusses implementing a KNN classifier with Euclidean distance to predict whether passengers on the Titanic survived or not, based on features like passenger class, sex, and fare. 2. It tests different values of K and finds the highest testing accuracy of 81% is achieved when K=7. 3. Confusion matrices and accuracy scores are calculated for different distance metrics like Euclidean, Minkowski, and Manhattan at various K values, with the best results obtained at K=7 for Manhattan distance.

Uploaded by

Mussab Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: Mussab Bin Shahid

Sap-Id: 2024
Assignment: Machine-Learning
Dataset: https://github.com/rashida048/Datasets

Problem
We take dataset of titanic ship and we see 0 means the person survived and 1 means
the person did not survive.For this tutorial, our goal will be to predict
the‘Survived’feature. This dataset is very simple. Just from intuition, we can see that
there are columns that cannot be important to predict the ‘Survived’ feature.
For example, ‘PassengerId’, ‘Name’, ‘Ticket’ and, ‘Cabin’ does not seem to be useful
to predict that if a passenger survived or not.

KNN implementation With Euclidean Distance


# Calculating Distance Using Euclidean Distance
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'
#Importing the Dataset
from sklearn.model_selection import train_test_split
titanic = pd.read_csv('titanic_data.csv')
titanic.head(5)
titanic1 = titanic[['Pclass', 'Sex', 'Fare',
'Survived']]
#As computer doesn't understands text value so we convert it into numeric form
for male is 0 and for female is 1
titanic1['Sex'] = titanic1.Sex.replace({'male':0, 'female':1})
X = titanic1[['Pclass', 'Sex', 'Fare']]
y = titanic1['Survived']
#Splitting Test Data and Training Data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.25,random_state=0)
# Implementing KNN Classifier and using Euclidean Distance
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 3,metric = 'euclidean', p = 2)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
y_pred
#Calculating Accuracy and Applying Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(y_test, y_pred))
cm
# df = pd.DataFrame({'Real Values':y_test, 'Predicted Values':y_pred}) # if you
want to see the real data comparison with predicted
#data
# df

Choosing Value of K=3


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(y_test, y_pred))
cm
Accuracy : 0.7802690582959642

Out[71]:
array([[118, 21],
[ 28, 56]], dtype=int64)

As accuracy when choosing k=3 on testing data prediction is 78%.

Choosing Value of K=7


Accuracy : 0.8116591928251121
Out[56]:
array([[118, 21],
[ 21, 63]], dtype=int64)
so this is our sweet point at this point our model has highest testing accuracy of 81%.
As I have checked points with different K values.
At k=5 it is 78%
K=9 it is 80%
K=11 it is 79%
So

Confusion Matrix and Accuracy


The confusion matrix is a table that is used to show the number of correct and
incorrect predictions on a classification problem when the real values of the Test
Set are known.
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(y_test, y_pred))
cm
Accuracy with Minkowski at

It has also its sweet point at K=7

Accuracy with Manhatten k=3

Accuracy with Manhatten k=5

Accuracy with Manhatten k=7

Accuracy with Manhatten k=9


This is our sweet point where we have greatest accuracy above this

You might also like