0% found this document useful (0 votes)

487 views5 pages

Email Classification: Roll No-41463 (LP-3)

The document discusses classifying emails using binary classification with K-Nearest Neighbors and Support Vector Machine models. It analyzes the performance of these models on an email spam classification dataset from Kaggle. For KNN, the accuracy was 87% and for SVM the accuracy was 93%, showing SVM had slightly better performance on this task.

Uploaded by

fgfsgsg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

487 views5 pages

Email Classification: Roll No-41463 (LP-3)

Uploaded by

fgfsgsg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Roll No- 41463 (LP-3)

Email Classification

Classify the email using binary classification method. Email Spam detection has two
states: a) Normal State Not Spam b) Abnormal State Spam. Use K-Nearest Neighbors and
Support Vector Machine for Classification. Analyze their performance.

Dataset used: https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv

(https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv)

In [1]: import numpy as np

import pandas as pd

from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import mean_squared_error,mean_absolute_error
from sklearn.metrics import accuracy_score

In [2]: df = pd.read_csv("emails.csv")
df.head()

Out[2]:
Email
the to ect and for of a you hou ... connevey jay valued lay infrastructu
No.

Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0
1

Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0
2

Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0
3

Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0
4

Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0
5

5 rows × 3002 columns

In [3]: df.tail()

Out[3]:
Email
the to ect and for of a you hou ... connevey jay valued lay infrastru
No.

Email
5167 2 2 2 3 0 0 32 0 0 ... 0 0 0 0
5168

Email
5168 35 27 11 2 6 5 151 4 3 ... 0 0 0 0
5169

Email
5169 0 0 1 1 0 0 11 0 0 ... 0 0 0 0
5170

Email
5170 2 7 1 0 2 1 28 2 0 ... 0 0 0 0
5171

Email
5171 22 24 5 1 6 5 148 8 2 ... 0 0 0 0
5172

5 rows × 3002 columns

In [4]: df.info()

RangeIndex: 5172 entries, 0 to 5171

Columns: 3002 entries, Email No. to Prediction

dtypes: int64(3001), object(1)

memory usage: 118.5+ MB

In [5]: df.describe()

Out[5]:
the to ect and for of

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.00000

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55.51740

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87.57417

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.00000

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12.00000

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28.00000

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62.25000

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898.00000

8 rows × 3001 columns

In [6]: df.isnull().sum()

Out[6]: Email No. 0

the 0

to 0

ect 0

and 0

for 0

of 0

a 0

you 0

hou 0

in 0

on 0

is 0

this 0

enron 0

i 0

be 0

that 0

will 0

have 0

with 0

your 0

at 0

we 0

s 0

are 0

it 0

by 0

com 0

as 0

decisions 0

produced 0

ended 0

greatest 0

degree 0

solmonson 0

imbalances 0

fall 0

fear 0

hate 0

fight 0

reallocated 0

debt 0

reform 0

australia 0

plain 0

prompt 0

remains 0

ifhsc 0

enhancements 0

connevey 0

jay 0

valued 0

lay 0

infrastructure 0

military 0

allowing 0

ff 0

dry 0

Prediction 0

Length: 3002, dtype: int64

Splitting Train and Test dataset

In [7]: x = df.iloc[:,1:3001]
y = df.iloc[:,-1].values

In [8]: x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2,

a) Using K-Nearest Neighbours

In [9]: knn = KNeighborsClassifier(n_neighbors=8)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)

In [ ]:

Analyzing performance

In [10]: print("MSE: ", mean_squared_error(y_test, y_pred))

print("MAE: ", mean_absolute_error(y_test, y_pred))
print("RMSE: ", np.sqrt(mean_squared_error(y_test, y_pred)))
print("R2 Score: ", metrics.r2_score(y_test, y_pred))
print("Accuracy Score for KNN: ", accuracy_score(y_test, y_pred))

MSE: 0.12560386473429952

MAE: 0.12560386473429952

RMSE: 0.3544063553807966

R2 Score: 0.40780091899790494

Accuracy Score for KNN: 0.8743961352657005

b) Using Support Vector Machine(SVM)

In [11]: svc = SVC(C=1.0, gamma='auto', kernel='rbf')
svc.fit(x_test, y_test)
y_pred = svc.predict(x_test)

Analyzing Performance
In [12]: print("MSE: ", mean_squared_error(y_test, y_pred))
print("MAE: ", mean_absolute_error(y_test, y_pred))
print("RMSE: ", np.sqrt(mean_squared_error(y_test, y_pred)))
print("R2 Score: ", metrics.r2_score(y_test, y_pred))
print("Accuracy Score for KNN: ", accuracy_score(y_test, y_pred))

MSE: 0.07149758454106281

MAE: 0.07149758454106281

RMSE: 0.2673903224521464

R2 Score: 0.6629020615834228

Accuracy Score for KNN: 0.9285024154589372

In [ ]:

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Research: Handbook
100% (2)
Research: Handbook
244 pages
C1 Advanced Unit 3 Test: Section 1: Vocabulary
No ratings yet
C1 Advanced Unit 3 Test: Section 1: Vocabulary
2 pages
Multi - Core Architectures and Programming - Lecture Notes, Study Material and Important Questions, Answers
0% (1)
Multi - Core Architectures and Programming - Lecture Notes, Study Material and Important Questions, Answers
49 pages
A Qualitative Study Exploring Female College Students' Instagram Use and Body Image
100% (1)
A Qualitative Study Exploring Female College Students' Instagram Use and Body Image
6 pages
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
100% (1)
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
6 pages
Fs Lab Manual
No ratings yet
Fs Lab Manual
57 pages
RDBMS Unit 5
No ratings yet
RDBMS Unit 5
39 pages
Chap - 4: Screen Designing: Visually Pleasing Composition
No ratings yet
Chap - 4: Screen Designing: Visually Pleasing Composition
23 pages
Software Quality: Robert Hughes and Mike Cotterell
No ratings yet
Software Quality: Robert Hughes and Mike Cotterell
46 pages
Model Question Paper
0% (1)
Model Question Paper
2 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
Sample Report 22-23 1
No ratings yet
Sample Report 22-23 1
30 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Machine Learning Lab Manual (15CSL76)
No ratings yet
Machine Learning Lab Manual (15CSL76)
30 pages
Nested Classes
No ratings yet
Nested Classes
23 pages
Smart Health Disease Prediction Django
No ratings yet
Smart Health Disease Prediction Django
41 pages
Chapter - 2 Taxonomy of Bugs
No ratings yet
Chapter - 2 Taxonomy of Bugs
7 pages
Peephole Optimization PDF
No ratings yet
Peephole Optimization PDF
20 pages
DBMS ER Design Issues - Copy Unit.2
No ratings yet
DBMS ER Design Issues - Copy Unit.2
2 pages
Lab Manual
No ratings yet
Lab Manual
28 pages
Cs3391 Oops Unit 1 Notes Eduengg
No ratings yet
Cs3391 Oops Unit 1 Notes Eduengg
60 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Topic: Classification of Data Structure
No ratings yet
Topic: Classification of Data Structure
24 pages
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
100% (1)
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
2 pages
Lab Program
100% (1)
Lab Program
15 pages
3.multicore Architecture and Programming
0% (1)
3.multicore Architecture and Programming
3 pages
DS&BD Lab Manul
No ratings yet
DS&BD Lab Manul
98 pages
Principles of Pervasive Computing
No ratings yet
Principles of Pervasive Computing
15 pages
B.SC (CS) Real Syllabus
No ratings yet
B.SC (CS) Real Syllabus
75 pages
Oopcgl Mini Project
No ratings yet
Oopcgl Mini Project
6 pages
Mini ProjectA17
0% (1)
Mini ProjectA17
25 pages
Narrative Report in Facilitating Learner-Centered Teaching
100% (1)
Narrative Report in Facilitating Learner-Centered Teaching
5 pages
M.E. Bda 2021
No ratings yet
M.E. Bda 2021
64 pages
Advance Java Handwriting Notes
No ratings yet
Advance Java Handwriting Notes
5 pages
BECEd
No ratings yet
BECEd
1 page
Mini Project Report
No ratings yet
Mini Project Report
25 pages
Dbms
No ratings yet
Dbms
99 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Deep Learning Techniques Notes
No ratings yet
Deep Learning Techniques Notes
42 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
Unit 1: Database Management System (DBMS) Historical Perspective
100% (1)
Unit 1: Database Management System (DBMS) Historical Perspective
30 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
wdl%uKYS, S Ajh (Aggression) : Ironne Jayasekara National Institute of Social Development
No ratings yet
wdl%uKYS, S Ajh (Aggression) : Ironne Jayasekara National Institute of Social Development
16 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Q4-Module 7-2nd Sem-3Is
No ratings yet
Q4-Module 7-2nd Sem-3Is
8 pages
Machine Learning Quantum
No ratings yet
Machine Learning Quantum
64 pages
Agriculture Management System-3
No ratings yet
Agriculture Management System-3
22 pages
Cs3451 Ios Unit 5 Notes
No ratings yet
Cs3451 Ios Unit 5 Notes
21 pages
Lab Manual - LP2 - Sem - II - 2022 - 23
No ratings yet
Lab Manual - LP2 - Sem - II - 2022 - 23
91 pages
CS3352 Foundations of Data Science Apr May 2023 Question Paper Download
No ratings yet
CS3352 Foundations of Data Science Apr May 2023 Question Paper Download
3 pages
Cp4251 Internet of Things
No ratings yet
Cp4251 Internet of Things
61 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Term Paper
No ratings yet
Term Paper
13 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Numericals CG Unit II
No ratings yet
Numericals CG Unit II
7 pages
Week 4 - Performance Task (12 & Abm 2)
No ratings yet
Week 4 - Performance Task (12 & Abm 2)
4 pages
SDL Practice Question Form
No ratings yet
SDL Practice Question Form
5 pages
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
7th Sem Updated Lab Manual
No ratings yet
7th Sem Updated Lab Manual
14 pages
Lect 1
No ratings yet
Lect 1
15 pages
Centers and Peripheries Revisited
No ratings yet
Centers and Peripheries Revisited
29 pages
ML Unit-1
No ratings yet
ML Unit-1
32 pages
From A Rational Standpoint: Analyzing Nuances in The Utility of Western Psychological Tests As Assessment Tools in The Philippines
No ratings yet
From A Rational Standpoint: Analyzing Nuances in The Utility of Western Psychological Tests As Assessment Tools in The Philippines
6 pages
Module 1 Meet and Greet
No ratings yet
Module 1 Meet and Greet
2 pages
Toc Mod 5 Notes
No ratings yet
Toc Mod 5 Notes
41 pages
Chapter 5 - Applying Theories of Learning To Health Care Practice (Part 2)
No ratings yet
Chapter 5 - Applying Theories of Learning To Health Care Practice (Part 2)
3 pages
Lecture 4 IMB 516
No ratings yet
Lecture 4 IMB 516
26 pages
Mengatur Barang SCR CT
No ratings yet
Mengatur Barang SCR CT
2 pages
Mzumbe 13 October 2020 Presentation Lisa Mzumbe
No ratings yet
Mzumbe 13 October 2020 Presentation Lisa Mzumbe
15 pages
TFN 100
No ratings yet
TFN 100
2 pages
STM Question Paper R18
No ratings yet
STM Question Paper R18
2 pages
Critics of Open System Theory
No ratings yet
Critics of Open System Theory
6 pages
Linguistics S2 P2
No ratings yet
Linguistics S2 P2
3 pages
Factors and Tables
No ratings yet
Factors and Tables
10 pages
Pull and Push Factors in Can Tho
No ratings yet
Pull and Push Factors in Can Tho
13 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
Safety Intelligence As An Essential Perspective For Safety Management in The Era of Safety 4.0 - From A Theoretical To A Practical Framework
No ratings yet
Safety Intelligence As An Essential Perspective For Safety Management in The Era of Safety 4.0 - From A Theoretical To A Practical Framework
11 pages
Arduino - Architecture, Programming and Application
No ratings yet
Arduino - Architecture, Programming and Application
64 pages
Data Engineering UNIT-1
100% (1)
Data Engineering UNIT-1
14 pages
AI Startups Mapping in Africa
No ratings yet
AI Startups Mapping in Africa
47 pages
Pre Test
No ratings yet
Pre Test
37 pages
MajGEd 305 - Activity 2
No ratings yet
MajGEd 305 - Activity 2
2 pages
Transform and Conquer, Presorting
100% (1)
Transform and Conquer, Presorting
2 pages
VR in Education Presentation Script
No ratings yet
VR in Education Presentation Script
5 pages
Developmental Psychology II 2024
No ratings yet
Developmental Psychology II 2024
2 pages
Personal Statement ESEP-G2025
No ratings yet
Personal Statement ESEP-G2025
3 pages
List Participant
No ratings yet
List Participant
1 page
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet

Email Classification: Roll No-41463 (LP-3)

Uploaded by

Email Classification: Roll No-41463 (LP-3)

Uploaded by

Roll No- 41463 (LP-3)

Dataset used: https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv

In [1]: import numpy as np

5 rows × 3002 columns

5 rows × 3002 columns

RangeIndex: 5172 entries, 0 to 5171

Columns: 3002 entries, Email No. to Prediction

dtypes: int64(3001), object(1)

memory usage: 118.5+ MB

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.00000

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55.51740

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87.57417

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.00000

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12.00000

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28.00000

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62.25000

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898.00000

8 rows × 3001 columns

Out[6]: Email No. 0

Length: 3002, dtype: int64

Splitting Train and Test dataset

In [8]: x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2,

a) Using K-Nearest Neighbours

In [10]: print("MSE: ", mean_squared_error(y_test, y_pred))

Accuracy Score for KNN: 0.8743961352657005

b) Using Support Vector Machine(SVM)

Accuracy Score for KNN: 0.9285024154589372

You might also like