0% found this document useful (0 votes)

10 views7 pages

SPPUML2

Machine learning lab assignment 2

Uploaded by

kanaseaditya800

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views7 pages

SPPUML2

Machine learning lab assignment 2

Uploaded by

kanaseaditya800

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

07/08/2024, 16:34 ML_2 - Jupyter Notebook

Name : Kanase Aditya Madhukar

Roll No : 2441059

Batch : D

Classify the email using the binary

classification method. Email Spam detection
has two
states: a) Normal State – Not Spam, b) Abnormal State – Spam. Use K-Nearest
Neighbors and Support Vector Machine for classification. Analyze their performance.
Dataset link: The emails.csv dataset on the Kaggle
https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv
(https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv)

In [69]: import numpy as np

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

In [70]: df = pd.read_csv('emails.csv')

In [71]: df.head()

Out[71]:
Email
the to ect and for of a you hou ... connevey jay valued lay infras
No.

Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0
1

Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0
2

Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0
3

Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0
4

Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0
5

5 rows × 3002 columns

localhost:8888/notebooks/ML/ML_2.ipynb 1/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [72]: df.tail()

Out[72]:
Email
the to ect and for of a you hou ... connevey jay valued lay inf
No.

Email
5167 2 2 2 3 0 0 32 0 0 ... 0 0 0 0
5168

Email
5168 35 27 11 2 6 5 151 4 3 ... 0 0 0 0
5169

Email
5169 0 0 1 1 0 0 11 0 0 ... 0 0 0 0
5170

Email
5170 2 7 1 0 2 1 28 2 0 ... 0 0 0 0
5171

Email
5171 22 24 5 1 6 5 148 8 2 ... 0 0 0 0
5172

5 rows × 3002 columns

In [73]: df.describe()

Out[73]:
the to ect and for of

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898

8 rows × 3001 columns

In [74]: df.shape

Out[74]: (5172, 3002)

In [75]: df.drop(['Email No.'],axis=1, inplace=True)

localhost:8888/notebooks/ML/ML_2.ipynb 2/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [76]: df

Out[76]:
the to ect and for of a you hou in ... connevey jay valued lay infras

0 0 0 1 0 0 0 2 0 0 0 ... 0 0 0 0

1 8 13 24 6 6 2 102 1 27 18 ... 0 0 0 0

2 0 0 1 0 0 0 8 0 0 4 ... 0 0 0 0

3 0 5 22 0 5 1 51 2 10 1 ... 0 0 0 0

4 7 6 17 1 5 2 57 0 9 3 ... 0 0 0 0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

5167 2 2 2 3 0 0 32 0 0 5 ... 0 0 0 0

5168 35 27 11 2 6 5 151 4 3 23 ... 0 0 0 0

5169 0 0 1 1 0 0 11 0 0 1 ... 0 0 0 0

5170 2 7 1 0 2 1 28 2 0 8 ... 0 0 0 0

5171 22 24 5 1 6 5 148 8 2 23 ... 0 0 0 0

5172 rows × 3001 columns

In [77]: df.isna().sum()

Out[77]: the 0
to 0
ect 0
and 0
for 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3001, dtype: int64

localhost:8888/notebooks/ML/ML_2.ipynb 3/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [78]: X = df.drop("Prediction", axis=1) # Features

y = df["Prediction"] # Target variable
print("Features: ",X)
print("Target: ",y)

localhost:8888/notebooks/ML/ML_2.ipynb 4/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

Features: the to ect and for of a you hou in

... enhancements \
0 0 0 1 0 0 0 2 0 0 0 ...
0
1 8 13 24 6 6 2 102 1 27 18 ...
0
2 0 0 1 0 0 0 8 0 0 4 ...
0
3 0 5 22 0 5 1 51 2 10 1 ...
0
4 7 6 17 1 5 2 57 0 9 3 ...
0
... ... .. ... ... ... .. ... ... ... .. ...
...
5167 2 2 2 3 0 0 32 0 0 5 ...
0
5168 35 27 11 2 6 5 151 4 3 23 ...
0
5169 0 0 1 1 0 0 11 0 0 1 ...
0
5170 2 7 1 0 2 1 28 2 0 8 ...
0
5171 22 24 5 1 6 5 148 8 2 23 ...
0

connevey jay valued lay infrastructure military allowi

ng ff dry
0 0 0 0 0 0 0
0 0 0
1 0 0 0 0 0 0
0 1 0
2 0 0 0 0 0 0
0 0 0
3 0 0 0 0 0 0
0 0 0
4 0 0 0 0 0 0
0 1 0
... ... ... ... ... ... ...
... .. ...
5167 0 0 0 0 0 0
0 0 0
5168 0 0 0 0 0 0
0 1 0
5169 0 0 0 0 0 0
0 0 0
5170 0 0 0 0 0 0
0 1 0
5171 0 0 0 0 0 0
0 0 0

[5172 rows x 3000 columns]

Target: 0 0
1 0
2 0
3 0
4 0
..
5167 0
5168 0
5169 1
5170 1
localhost:8888/notebooks/ML/ML_2.ipynb 5/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook
5171 0
Name: Prediction, Length: 5172, dtype: int64

In [79]: X = df.drop(columns='Prediction',axis = 1)
Y = df['Prediction']

In [80]: x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=

In [81]: KN = KNeighborsClassifier
knn = KN(n_neighbors=7)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)

/home/comp/anaconda3/lib/python3.9/site-packages/sklearn/neighbor
s/_classification.py:228: FutureWarning: Unlike other reduction fu
nctions (e.g. `skew`, `kurtosis`), the default behavior of `mode`
typically preserves the axis it acts along. In SciPy 1.11.0, this
behavior will change: the default value of `keepdims` will become
False, the `axis` over which the statistic is taken will be elimin
ated, and the value None will no longer be accepted. Set `keepdims
` to True or False to avoid this warning.
mode, _ = stats.mode(_y[neigh_ind, k], axis=1)

In [82]: print("Prediction: \n")

print(y_pred)

Prediction:

[0 1 0 ... 0 0 0]

In [83]: M = metrics.accuracy_score(y_test,y_pred)
print("KNN accuracy: ", M)

KNN accuracy: 0.8714975845410629

In [84]: C = metrics.confusion_matrix(y_test,y_pred)
print("Confusion matrix: ", C)

Confusion matrix: [[635 84]

[ 49 267]]

In [85]: model = SVC(C = 1) # cost C = 1

model.fit(x_train, y_train)

y_pred = model.predict(x_test)

In [86]: kc = metrics.confusion_matrix(y_test, y_pred)

print("SVM accuracy: ", kc)

SVM accuracy: [[700 19]

[189 127]]

localhost:8888/notebooks/ML/ML_2.ipynb 6/7
07/08/2024, 16:34 ML_2 - Jupyter Notebook

In [87]: sc = metrics.accuracy_score(y_test,y_pred)
print("SVM accuracy: ", sc)

SVM accuracy: 0.7990338164251207

In [88]: model = LogisticRegression()

model.fit(x_train, y_train)

y_pred = model.predict(x_test)

/home/comp/anaconda3/lib/python3.9/site-packages/sklearn/linear_mo
del/_logistic.py:814: ConvergenceWarning: lbfgs failed to converge
(status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as

shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (ht
tps://scikit-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver opti
ons:
https://scikit-learn.org/stable/modules/linear_model.html#logi
stic-regression (https://scikit-learn.org/stable/modules/linear_mo
del.html#logistic-regression)
n_iter_i = _check_optimize_result(

In [90]: acc = metrics.accuracy_score(y_test,y_pred)

print("Logistic Regression accuracy: ", acc)

Logistic Regression accuracy: 0.9594202898550724

In [ ]:

localhost:8888/notebooks/ML/ML_2.ipynb 7/7

TP Debug Info
No ratings yet
TP Debug Info
16,729 pages
CoEvolution (Alec Newald, 2011)
85% (13)
CoEvolution (Alec Newald, 2011)
262 pages
TP Debug Info
No ratings yet
TP Debug Info
4,847 pages
Rma DLP
No ratings yet
Rma DLP
2 pages
TP Debug Info
No ratings yet
TP Debug Info
2,391 pages
Pre Cal
100% (1)
Pre Cal
532 pages
Important Questions MARKETING MANAGEMEMT
No ratings yet
Important Questions MARKETING MANAGEMEMT
3 pages
DLL g7 Sci Micros
100% (2)
DLL g7 Sci Micros
3 pages
CP Cooler Technik
No ratings yet
CP Cooler Technik
16 pages
18UEC173 Experiment 7 Sankalp Jain
No ratings yet
18UEC173 Experiment 7 Sankalp Jain
9 pages
Cooling Tower Performance Test (Id CT) : Manual Input Sheet Station: Report Date: Unit: Test Date
No ratings yet
Cooling Tower Performance Test (Id CT) : Manual Input Sheet Station: Report Date: Unit: Test Date
12 pages
TP Debug Info
No ratings yet
TP Debug Info
508 pages
Process, Assumptions, Values N Beliefs of OD
100% (2)
Process, Assumptions, Values N Beliefs of OD
21 pages
Pre-Calculus First Quarter Worksheets
No ratings yet
Pre-Calculus First Quarter Worksheets
28 pages
TP Debug Info
No ratings yet
TP Debug Info
492 pages
TP Debug Info
No ratings yet
TP Debug Info
501 pages
TP Debug Info
No ratings yet
TP Debug Info
230 pages
TP Debug Info
No ratings yet
TP Debug Info
246 pages
Astm A-291
No ratings yet
Astm A-291
4 pages
Earthing Design
No ratings yet
Earthing Design
12 pages
Praktikum RNN
100% (1)
Praktikum RNN
11 pages
Inbound 9150624524498524698
No ratings yet
Inbound 9150624524498524698
136 pages
TP Debug Info
No ratings yet
TP Debug Info
230 pages
TCS Codevita Previous Papers - Pdf-Edited
No ratings yet
TCS Codevita Previous Papers - Pdf-Edited
8 pages
Determining Liquid Limits of Soils: Test Procedure For
No ratings yet
Determining Liquid Limits of Soils: Test Procedure For
12 pages
Algebraic Expressions and The Distributive Property
No ratings yet
Algebraic Expressions and The Distributive Property
18 pages
IP Addressing and Subnetting: Workbook
No ratings yet
IP Addressing and Subnetting: Workbook
103 pages
Week 7
No ratings yet
Week 7
71 pages
TP Debug Info
No ratings yet
TP Debug Info
68 pages
Pre Thesis
No ratings yet
Pre Thesis
66 pages
Mloa Exp1 C121
No ratings yet
Mloa Exp1 C121
49 pages
Operating-Instruction PV18,24VKF PDF
No ratings yet
Operating-Instruction PV18,24VKF PDF
24 pages
Filech2 9
No ratings yet
Filech2 9
52 pages
Filech2 0
No ratings yet
Filech2 0
52 pages
Deltarunechapter - 3 Pre Knight
No ratings yet
Deltarunechapter - 3 Pre Knight
52 pages
Filech2 9
No ratings yet
Filech2 9
52 pages
TP Debug Info
No ratings yet
TP Debug Info
50 pages
Q1W4 Solving Equations Tranformable Into Quadratic Equations Problem Solving Involving Quadratic Equation and Rational Algebraic Equations
No ratings yet
Q1W4 Solving Equations Tranformable Into Quadratic Equations Problem Solving Involving Quadratic Equation and Rational Algebraic Equations
38 pages
Sem Data Analysis
No ratings yet
Sem Data Analysis
36 pages
DL Manuel
No ratings yet
DL Manuel
37 pages
Filech2 2
No ratings yet
Filech2 2
52 pages
Examen de Estructuras 1
No ratings yet
Examen de Estructuras 1
33 pages
Random Forest Classification
No ratings yet
Random Forest Classification
24 pages
IndahAgustienML 7 1-CNN
No ratings yet
IndahAgustienML 7 1-CNN
11 pages
MNIST Digit Classification Using NN
No ratings yet
MNIST Digit Classification Using NN
16 pages
Itc Practicle
No ratings yet
Itc Practicle
29 pages
Matlab Chap 10
No ratings yet
Matlab Chap 10
11 pages
The Five Stages of Hawthorne Studies and Their Purposes
No ratings yet
The Five Stages of Hawthorne Studies and Their Purposes
3 pages
TP Debug Info
No ratings yet
TP Debug Info
17 pages
Siddhesh Asati: #Group: B (ML) #Assignment: 7
No ratings yet
Siddhesh Asati: #Group: B (ML) #Assignment: 7
9 pages
Email Classification: Roll No-41463 (LP-3)
No ratings yet
Email Classification: Roll No-41463 (LP-3)
5 pages
TP Debug Info
No ratings yet
TP Debug Info
17 pages
108 Unix
No ratings yet
108 Unix
20 pages
File 9
No ratings yet
File 9
10 pages
Suvdata Analysis
No ratings yet
Suvdata Analysis
7 pages
Practical 2
No ratings yet
Practical 2
4 pages
DL EXP2.ipynb - Colaboratory
No ratings yet
DL EXP2.ipynb - Colaboratory
6 pages
Diabetics Data Analysis
No ratings yet
Diabetics Data Analysis
5 pages
Tarea 4
No ratings yet
Tarea 4
6 pages
ML Practical 2D
No ratings yet
ML Practical 2D
6 pages
Chap4 - Functions, Pigeonhole Principle
No ratings yet
Chap4 - Functions, Pigeonhole Principle
31 pages
"D:/ML/Cleaned-Data - CSV": Import As From Import From Import From Import
No ratings yet
"D:/ML/Cleaned-Data - CSV": Import As From Import From Import From Import
5 pages
2017 07 07 14 41 00
No ratings yet
2017 07 07 14 41 00
1 page
Global Minds Academy of The Philippines: Auditor Peace Officer P.I.O. English Grade 11 Representative
No ratings yet
Global Minds Academy of The Philippines: Auditor Peace Officer P.I.O. English Grade 11 Representative
14 pages
Hu
No ratings yet
Hu
10 pages
Practical PRogram List 2.ipynb - Colab
No ratings yet
Practical PRogram List 2.ipynb - Colab
6 pages
Import As As Import As From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As As Import As From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
Python Lab Program 3
No ratings yet
Python Lab Program 3
3 pages
Aimil Ist Lot Delivery
No ratings yet
Aimil Ist Lot Delivery
2 pages
Mekrek 2
No ratings yet
Mekrek 2
4 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Algoritmo de Coloración
No ratings yet
Algoritmo de Coloración
7 pages
Makala
No ratings yet
Makala
4 pages
Vega Gomez Joseé Darwin: Análisis Computacional
No ratings yet
Vega Gomez Joseé Darwin: Análisis Computacional
3 pages
Hardware Implementation of RLC Plant Using OP-AMP: Hammad Muneer: RP21-EE-429
No ratings yet
Hardware Implementation of RLC Plant Using OP-AMP: Hammad Muneer: RP21-EE-429
6 pages
% Datos de Longitud de Barras % Barra 1-2 L1 ( (36+100+4) ) (0.5)
No ratings yet
% Datos de Longitud de Barras % Barra 1-2 L1 ( (36+100+4) ) (0.5)
6 pages
16 - Practical - 6-7.ipynb - Colab
No ratings yet
16 - Practical - 6-7.ipynb - Colab
3 pages
Ortho Dimensions and Sketching
No ratings yet
Ortho Dimensions and Sketching
13 pages
Car Buying - Naive Bayes - Colab
No ratings yet
Car Buying - Naive Bayes - Colab
2 pages
Naivebays
No ratings yet
Naivebays
2 pages
Childhood Interests and Motivation - CCCHE (PVT) LTD
No ratings yet
Childhood Interests and Motivation - CCCHE (PVT) LTD
9 pages
Aurora Putri Latifah - ICCSCP 2023
No ratings yet
Aurora Putri Latifah - ICCSCP 2023
9 pages
2017 08 12 22 00 14
No ratings yet
2017 08 12 22 00 14
1 page
2017 08 15 12 59 10
No ratings yet
2017 08 15 12 59 10
1 page
Herpetology Notes, Volume 4 219-224 (2011) (Published Online On 27 May 2011) - New Locality Records For Chelonians
No ratings yet
Herpetology Notes, Volume 4 219-224 (2011) (Published Online On 27 May 2011) - New Locality Records For Chelonians
6 pages
Scholarship For MSC Student
No ratings yet
Scholarship For MSC Student
3 pages
Lab 1 - Experiment On Electrostatics
No ratings yet
Lab 1 - Experiment On Electrostatics
5 pages
Weir & Retaining Wall
No ratings yet
Weir & Retaining Wall
2 pages
Block Wise Sub Allocation NFSM CC (Jute) 24-25
No ratings yet
Block Wise Sub Allocation NFSM CC (Jute) 24-25
1 page

SPPUML2

Uploaded by

SPPUML2

Uploaded by

07/08/2024, 16:34 ML_2 - Jupyter Notebook

Name : Kanase Aditya Madhukar

Classify the email using the binary

In [69]: import numpy as np

5 rows × 3002 columns

5 rows × 3002 columns

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898

8 rows × 3001 columns

Out[74]: (5172, 3002)

In [75]: df.drop(['Email No.'],axis=1, inplace=True)

5168 35 27 11 2 6 5 151 4 3 23 ... 0 0 0 0

5171 22 24 5 1 6 5 148 8 2 23 ... 0 0 0 0

5172 rows × 3001 columns

In [78]: X = df.drop("Prediction", axis=1) # Features

Features: the to ect and for of a you hou in

connevey jay valued lay infrastructure military allowi

[5172 rows x 3000 columns]

In [80]: x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=

In [82]: print("Prediction: \n")

KNN accuracy: 0.8714975845410629

Confusion matrix: [[635 84]

In [85]: model = SVC(C = 1) # cost C = 1

In [86]: kc = metrics.confusion_matrix(y_test, y_pred)

SVM accuracy: [[700 19]

SVM accuracy: 0.7990338164251207

In [88]: model = LogisticRegression()

Increase the number of iterations (max_iter) or scale the data as

In [90]: acc = metrics.accuracy_score(y_test,y_pred)

Logistic Regression accuracy: 0.9594202898550724

You might also like