ML 2
ML 2
1. Classify the email using the binary classification method. Email Spam detection has two states: a) Normal State – Not Spam, b) Abnormal State – Spam.
Use K-Nearest Neighbors and Support Vector Machine for classification. Analyze their performance. Dataset link: The emails.csv dataset on the Kaggle
https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv
In [20]: df=pd.read_csv('emails.csv')
In [21]: df.head()
Out[21]:
Email No. the to ect and for of a you hou ... connevey jay valued lay infrastructure military allowing ff dry Prediction
0 Email 1 0 0 1 0 0 0 2 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 Email 3 0 0 1 0 0 0 8 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 Email 4 0 5 22 0 5 1 51 2 10 ... 0 0 0 0 0 0 0 0 0 0
4 Email 5 7 6 17 1 5 2 57 0 9 ... 0 0 0 0 0 0 0 1 0 0
In [22]: df.columns
Out[22]: Index(['Email No.', 'the', 'to', 'ect', 'and', 'for', 'of', 'a', 'you', 'hou',
...
'connevey', 'jay', 'valued', 'lay', 'infrastructure', 'military',
'allowing', 'ff', 'dry', 'Prediction'],
dtype='object', length=3002)
In [23]: df.isnull().sum()
KNN classifier
In [35]: from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
In [36]: print("Prediction",y_pred)
Prediction [0 0 1 ... 1 1 1]
SVM classifier
In [27]: # cost C = 1
model = SVC(C = 1)
# fit
model.fit(X_train, y_train)
# predict
y_pred = model.predict(X_test)