0% found this document useful (0 votes)
10 views7 pages

SPPUML2

Machine learning lab assignment 2

Uploaded by

kanaseaditya800
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

SPPUML2

Machine learning lab assignment 2

Uploaded by

kanaseaditya800
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

07/08/2024, 16:34 ML_2 - Jupyter Notebook

Name : Kanase Aditya Madhukar

Roll No : 2441059

Batch : D

Classify the email using the binary


classification method. Email Spam detection
has two
states: a) Normal State – Not Spam, b) Abnormal State – Spam. Use K-Nearest
Neighbors and Support Vector Machine for classification. Analyze their performance.
Dataset link: The emails.csv dataset on the Kaggle
https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv
(https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv)

In [69]: import numpy as np


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

In [70]: df = pd.read_csv('emails.csv')

In [71]: df.head()

Out[71]:
Email
the to ect and for of a you hou ... connevey jay valued lay infras
No.

Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0
1

Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0
2

Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0
3

Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0
4

Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0
5

5 rows × 3002 columns

localhost:8888/notebooks/ML/ML_2.ipynb 1/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [72]: df.tail()

Out[72]:
Email
the to ect and for of a you hou ... connevey jay valued lay inf
No.

Email
5167 2 2 2 3 0 0 32 0 0 ... 0 0 0 0
5168

Email
5168 35 27 11 2 6 5 151 4 3 ... 0 0 0 0
5169

Email
5169 0 0 1 1 0 0 11 0 0 ... 0 0 0 0
5170

Email
5170 2 7 1 0 2 1 28 2 0 ... 0 0 0 0
5171

Email
5171 22 24 5 1 6 5 148 8 2 ... 0 0 0 0
5172

5 rows × 3002 columns

In [73]: df.describe()

Out[73]:
the to ect and for of

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898

8 rows × 3001 columns

In [74]: df.shape

Out[74]: (5172, 3002)

In [75]: df.drop(['Email No.'],axis=1, inplace=True)

localhost:8888/notebooks/ML/ML_2.ipynb 2/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [76]: df

Out[76]:
the to ect and for of a you hou in ... connevey jay valued lay infras

0 0 0 1 0 0 0 2 0 0 0 ... 0 0 0 0

1 8 13 24 6 6 2 102 1 27 18 ... 0 0 0 0

2 0 0 1 0 0 0 8 0 0 4 ... 0 0 0 0

3 0 5 22 0 5 1 51 2 10 1 ... 0 0 0 0

4 7 6 17 1 5 2 57 0 9 3 ... 0 0 0 0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

5167 2 2 2 3 0 0 32 0 0 5 ... 0 0 0 0

5168 35 27 11 2 6 5 151 4 3 23 ... 0 0 0 0

5169 0 0 1 1 0 0 11 0 0 1 ... 0 0 0 0

5170 2 7 1 0 2 1 28 2 0 8 ... 0 0 0 0

5171 22 24 5 1 6 5 148 8 2 23 ... 0 0 0 0

5172 rows × 3001 columns

In [77]: df.isna().sum()

Out[77]: the 0
to 0
ect 0
and 0
for 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3001, dtype: int64

localhost:8888/notebooks/ML/ML_2.ipynb 3/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [78]: X = df.drop("Prediction", axis=1) # Features


y = df["Prediction"] # Target variable
print("Features: ",X)
print("Target: ",y)

localhost:8888/notebooks/ML/ML_2.ipynb 4/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

Features: the to ect and for of a you hou in


... enhancements \
0 0 0 1 0 0 0 2 0 0 0 ...
0
1 8 13 24 6 6 2 102 1 27 18 ...
0
2 0 0 1 0 0 0 8 0 0 4 ...
0
3 0 5 22 0 5 1 51 2 10 1 ...
0
4 7 6 17 1 5 2 57 0 9 3 ...
0
... ... .. ... ... ... .. ... ... ... .. ...
...
5167 2 2 2 3 0 0 32 0 0 5 ...
0
5168 35 27 11 2 6 5 151 4 3 23 ...
0
5169 0 0 1 1 0 0 11 0 0 1 ...
0
5170 2 7 1 0 2 1 28 2 0 8 ...
0
5171 22 24 5 1 6 5 148 8 2 23 ...
0

connevey jay valued lay infrastructure military allowi


ng ff dry
0 0 0 0 0 0 0
0 0 0
1 0 0 0 0 0 0
0 1 0
2 0 0 0 0 0 0
0 0 0
3 0 0 0 0 0 0
0 0 0
4 0 0 0 0 0 0
0 1 0
... ... ... ... ... ... ...
... .. ...
5167 0 0 0 0 0 0
0 0 0
5168 0 0 0 0 0 0
0 1 0
5169 0 0 0 0 0 0
0 0 0
5170 0 0 0 0 0 0
0 1 0
5171 0 0 0 0 0 0
0 0 0

[5172 rows x 3000 columns]


Target: 0 0
1 0
2 0
3 0
4 0
..
5167 0
5168 0
5169 1
5170 1
localhost:8888/notebooks/ML/ML_2.ipynb 5/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook
5171 0
Name: Prediction, Length: 5172, dtype: int64

In [79]: X = df.drop(columns='Prediction',axis = 1)
Y = df['Prediction']

In [80]: x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=

In [81]: KN = KNeighborsClassifier
knn = KN(n_neighbors=7)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)

/home/comp/anaconda3/lib/python3.9/site-packages/sklearn/neighbor
s/_classification.py:228: FutureWarning: Unlike other reduction fu
nctions (e.g. `skew`, `kurtosis`), the default behavior of `mode`
typically preserves the axis it acts along. In SciPy 1.11.0, this
behavior will change: the default value of `keepdims` will become
False, the `axis` over which the statistic is taken will be elimin
ated, and the value None will no longer be accepted. Set `keepdims
` to True or False to avoid this warning.
mode, _ = stats.mode(_y[neigh_ind, k], axis=1)

In [82]: print("Prediction: \n")


print(y_pred)

Prediction:

[0 1 0 ... 0 0 0]

In [83]: M = metrics.accuracy_score(y_test,y_pred)
print("KNN accuracy: ", M)

KNN accuracy: 0.8714975845410629

In [84]: C = metrics.confusion_matrix(y_test,y_pred)
print("Confusion matrix: ", C)

Confusion matrix: [[635 84]


[ 49 267]]

In [85]: model = SVC(C = 1) # cost C = 1

model.fit(x_train, y_train)

y_pred = model.predict(x_test)

In [86]: kc = metrics.confusion_matrix(y_test, y_pred)


print("SVM accuracy: ", kc)

SVM accuracy: [[700 19]


[189 127]]

localhost:8888/notebooks/ML/ML_2.ipynb 6/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [87]: sc = metrics.accuracy_score(y_test,y_pred)
print("SVM accuracy: ", sc)

SVM accuracy: 0.7990338164251207

In [88]: model = LogisticRegression()

model.fit(x_train, y_train)

y_pred = model.predict(x_test)

/home/comp/anaconda3/lib/python3.9/site-packages/sklearn/linear_mo
del/_logistic.py:814: ConvergenceWarning: lbfgs failed to converge
(status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as


shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (ht
tps://scikit-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver opti
ons:
https://scikit-learn.org/stable/modules/linear_model.html#logi
stic-regression (https://scikit-learn.org/stable/modules/linear_mo
del.html#logistic-regression)
n_iter_i = _check_optimize_result(

In [90]: acc = metrics.accuracy_score(y_test,y_pred)


print("Logistic Regression accuracy: ", acc)

Logistic Regression accuracy: 0.9594202898550724

In [ ]:

localhost:8888/notebooks/ML/ML_2.ipynb 7/7

You might also like