Spam Detection

The paper discusses the use of machine learning algorithms, specifically Naive Bayes, Support Vector Machines, and Random Forests, for detecting email spam and malware. The Random Forest classifier achieved the highest accuracy of 97%, outperforming the other models. The authors aim to improve the model further to achieve 100% accuracy in future work.

Uploaded by

koushik Dutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views4 pages

Spam Detection

Uploaded by

koushik Dutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science

Volume:02/Issue:09/September -2020 Impact Factor- 5.354 www.irjmets.com

EMAIL SPAM AND MALWARE DETECTION USING MACHINE LEARNING

Sudipta Ghosh*1, Subhojit Jalal*2
*1Student, department of Electronics and Communication Engineering, Amity University,
Kolkata, West-Bengal, India.
*2 Student, department of Mechanical and Automation Engineering, Amity University, Kolkata,
West-Bengal, India.
ABSTRACT
Spam email is one of the unwanted, unsolicited digital communication in the world of internet sent to a
particular individual or a company or to a group of individuals. In the area of spam email and malware by
machine learning algorithm is commonly used. The aim of this paper is to propose the machine learning
algorithms: Naive Bayes, Support Vector Machines, Random Forests (Bagging) to detect the email spam.
Description of the algorithms are presented and their different accuracy score is also presented in this
paper. The accuracy result naïve bayes is 0.93, SVM is 0.90 and random forest is 0.97. Random forest
classifier performed better than among the Decision Tree Classifier.
Keywords: email spam, classifier: Naïve Bayes, Support Vector Machines, Random Forest (Bagging).
I. INTRODUCTION
email system is one of the cost effective and commonly used system all over the world. Emails can be sent
and received from any computer or mobile phone devices, anywhere in the world if there is any internet
connection present. But day by day email system is getting threatened by spam emails which is a shotgun
approach, uninvited and unwanted and unwelcomed to the receiver. spam is typically sent to a random
audience or company is often characterized by misleading subject lines and poorly crafted text. It wastes
the time of the receiver and it is also waste of the money of marketing department. It also damages
company reputation. spam message also affects to the network capacity and usage to produce large
amount of unwanted data. In recent statistics we find that around 40% of all emails included spam which
about 15.4 billion email per day and that cost internet users about $355 million per year. In this paper we
approach a machine learning model to detect email spam and malware in the email. Machine learning
algorithms: naïve bayes, support vector machines (SVM), random-forest models are created to detect the
spam emails.
We collected our dataset from Kaggle dataset, a data analysis website and started to analyze it and detect
the spam emails and investigated three models. Firstly, we found the spam emails from the dataset. and
then separated from the dataset then we started the prediction
II. METHODOLOGY

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[1401]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:09/September -2020 Impact Factor- 5.354 www.irjmets.com
1)Naïve Bayes classifier method
In this project we are classifying emails typed in by the user as either 'Spam' or 'Not Spam'. Our original
dataset was a folder of 5172 text files containing the emails. We separated because this is a text-
classification problem. When a spam classifier looks at an email, it searches for potential words that it has
seen in the previous spam emails.
CASE 1: suppose let's take a word 'Greetings'. Say, it is present in both 'Spam' and 'Not Spam' mails.
CASE 2: Let's consider a word 'lottery'. Say, it is present in only 'Spam' mails.
CASE 3: Let's consider a word 'cheap'. Say, it is present only in spam.
If now we get a test email, and it contains all the three words mentioned above, there's high probability
that it is a 'Spam' mail.
The most effective algorithm for text-classification problems is the Naive Bayes algorithm, that works on
the classic Bayes' theorem. This theorem works on every individual word in the test data to make
Predictions (the conditional probability with higher probability is the predicted result).
our test email(S)is, "You have won a lottery".
P(S) = P('You') P('have') P('won') P('a') P('lottery') __ 1

Therefore, P (S |Spam) = P ('You' |Spam) P ('have' |Spam) P ('won' |Spam) P ('a '|Spam) P ('lottery' |Spam)
__ 2
Same calculation for P (S |Not Spam)
If 2 > 3, then 'Spam' Else, 'Not_ Spam'.
2)Support Vector Machines
Support Vector Machine is the most sought-after algorithm for classic classification problems. SVMs work
on the algorithm of Maximal Margin, i.e., to find the maximum margin or threshold between the support
vectors of the two classes (in binary classification). The most effective Support vector machines are the
soft maximal margin classifier, that allows one misclassification, the model starts with low bias (slightly
poor performance) to ensure low variance later.
3)Random Forests (Bagging)
Random forest has nearly the same hyperparameters as a decision tree or a bagging classifier. Ensemble
methods turn any feeble model into a highly powerful.
III. MODELING AND ANALYSIS
Model and Material which are used is presented in this section.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[1402]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:09/September -2020 Impact Factor- 5.354 www.irjmets.com
Heatmap generation of the model.

naïve bayes model is working properly with 0.93 accuracy

SVM's performance is slightly poorer than

Naive Bayes

IV. RESULTS AND DISCUSSION

Random Forest Classifier performs the best among the three. Decision tree classifiers are excellent
classifiers. Random forest is a popular ensemble model that uses a forest of decision trees. So, obviously,

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[1403]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:09/September -2020 Impact Factor- 5.354 www.irjmets.com
combining the accuracy of 100 trees (as estimators=100 here), will create a powerful model.
displacement of all 4 cases.
The model is coming with the accuracy of 97% that we can apply to the model. In future we will try to
make the model without any error that’s means with 100%accuracy. this model can be used in real life
scenario so that people doesn’t face this problem in future.
V. CONCLUSION
Here in this paper we successfully use the machine learning algorithms ad create three models out of
which the random forest classifier model is working better then two models. The model helps us to detect
spam messages in the email. as the accuracy didn’t come with 100%accuracy we will try to make the
model with 100%accuracy as a future work.
VI. REFERENCES
[1] M. N. Marson, M. W. El-Kharosthi, and F. Gabala, “Binary LNS-based naïve Bayes inference engine
for spam control: Noise analysis and FPGA synthesis”, IET Computers & Digital Techniques, 2008
[2]
[2] Muhammad N. Marson, M. Wither El-Kharosthi, Fayez Gabala “Targeting spam control on
middleboxes: Spam detection based on layer-3 e-mail content classification” Elsevier Computer
Networks, 2009 [3]
[3] Yuchen Tang, Sven Crasser, Yuncheng He, Wailaki Yang, Dmitri Petrovitch” Support Vector
Machines and Random Forests Modeling for Spam Senders Behavior Analysis” IEEE GLOBECOM,
2008
[4] Carpinteria, O. A. S., Lima, I., Assis, J. M. C., de Souza, A. C. Z., Moreira, E. M., & Pinheiro, C. A. M. "A
neural model in anti-spam systems.", Lecture notes in computer science. Berlin, Springer, 2006 [9]
[5] El-Sayed M. El-Alfie, Radwan E. Abdel-Aal "Using GMDH-based networks for improved spam
detection and email feature analysis “Applied Soft Computing, Volume 11, Issue 1, January 2011
[10]
[6] Li, K. and Zhong, Z., “Fast statistical spam filter by approximate classifications”, In Proceedings of
the Joint international Conference on Measurement and Modeling of Computer Systems. Saint Malo,
France, 2006 [11]
[7] Cormack, Gordon. Smucker, Mark. Clarke, Charles " Efficient and effective spam filtering and re-
ranking for large web datasets" Information Retrieval, Springer Netherlands. January 2011 [12]
[8] Analysis “Applied. Almeida, analysis “Applied, Akebi " Spam filtering: how the dimensionality
reduction affects the accuracy of Naive Bayes classifiers" Journal of Internet Services and
Applications, Springer London, February 2011 [13]
[9] You, S., Yang, Y., Lin, F., and Moon, I. “Mining social networks for personalized email prioritization”.
In Proceedings of the 15th ACM SIGKDD international Conference on Knowledge Discovery and
Data Mining (Paris, France), June 28 - July 01, 2009
[10] Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malic. "SVM-KNN: Discriminative
nearest neighbor classification for visual category recognition", IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2006

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[1404]

Final PPT
No ratings yet
Final PPT
18 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
Risk Assessment
100% (3)
Risk Assessment
15 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Kami Export - Gene Expression-Translation-S.1617553074
89% (9)
Kami Export - Gene Expression-Translation-S.1617553074
6 pages
1st Activity VMGO BTLED
No ratings yet
1st Activity VMGO BTLED
12 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
Quarkus 1
No ratings yet
Quarkus 1
10 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Daycare Strategic Plan
No ratings yet
Daycare Strategic Plan
37 pages
Teaching Vocabulary
No ratings yet
Teaching Vocabulary
4 pages
E-Mail Spam Detection Using Machine Lear PDF
No ratings yet
E-Mail Spam Detection Using Machine Lear PDF
7 pages
Format For PBS
No ratings yet
Format For PBS
18 pages
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
No ratings yet
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
7 pages
Content Based Spam Detection in Email Us PDF
No ratings yet
Content Based Spam Detection in Email Us PDF
5 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Machine Learning Based Classification For Spam Detection
No ratings yet
Machine Learning Based Classification For Spam Detection
14 pages
Id - 3747 - Literature Review
No ratings yet
Id - 3747 - Literature Review
3 pages
Spam Email Using Machine Learning
No ratings yet
Spam Email Using Machine Learning
13 pages
A Study of Machine Learning Algorithms On Email Spam Classification
No ratings yet
A Study of Machine Learning Algorithms On Email Spam Classification
10 pages
$RB0DCAN
No ratings yet
$RB0DCAN
10 pages
Published Paper
No ratings yet
Published Paper
9 pages
1 s2.0 S0950705106001390 Main
No ratings yet
1 s2.0 S0950705106001390 Main
6 pages
Spam Filtering On Social Media Using Machine Learning Ijariie21244
No ratings yet
Spam Filtering On Social Media Using Machine Learning Ijariie21244
6 pages
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
No ratings yet
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
5 pages
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
No ratings yet
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
7 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
Elshoush 2019
No ratings yet
Elshoush 2019
6 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
16 pages
1822 B Deleted
No ratings yet
1822 B Deleted
38 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
0 - Spam Mail Prediction
No ratings yet
0 - Spam Mail Prediction
29 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Final Report (Saie)
No ratings yet
Final Report (Saie)
38 pages
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
No ratings yet
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
6 pages
Irjet V9i11154
No ratings yet
Irjet V9i11154
4 pages
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
No ratings yet
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
4 pages
Project Report Emaildetection 4 44
No ratings yet
Project Report Emaildetection 4 44
41 pages
1822 B Deleted Merged Cropped
No ratings yet
1822 B Deleted Merged Cropped
40 pages
Evaluating The Effectiveness of Machine Learning Methods For
No ratings yet
Evaluating The Effectiveness of Machine Learning Methods For
8 pages
Research Article On The Forensic
No ratings yet
Research Article On The Forensic
14 pages
ETCW15
No ratings yet
ETCW15
4 pages
E-Mail Spam Detection Using Machine Learning and Deep Learning
No ratings yet
E-Mail Spam Detection Using Machine Learning and Deep Learning
7 pages
Spam Detection & Classification Final
No ratings yet
Spam Detection & Classification Final
38 pages
Spam Filtering
No ratings yet
Spam Filtering
31 pages
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
No ratings yet
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
8 pages
Vishal FOML Micro Project Vishal & Milan
No ratings yet
Vishal FOML Micro Project Vishal & Milan
26 pages
E-Mail Spam Detection by Using NLP and Naïve Bayes Classification Through Machine Learning
No ratings yet
E-Mail Spam Detection by Using NLP and Naïve Bayes Classification Through Machine Learning
5 pages
Evaluation and Comparison of Machine Learning Models For Ham and Spam Email Classification
No ratings yet
Evaluation and Comparison of Machine Learning Models For Ham and Spam Email Classification
13 pages
Pruthviraj Micor Foml
No ratings yet
Pruthviraj Micor Foml
26 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Spam Mail Detection Using Machine Learning
No ratings yet
Spam Mail Detection Using Machine Learning
5 pages
Big As References
No ratings yet
Big As References
1 page
Final
No ratings yet
Final
51 pages
Spam Detection
No ratings yet
Spam Detection
39 pages
B. Flowchart of The Model: Esult
No ratings yet
B. Flowchart of The Model: Esult
3 pages
Ijirt156181 Paper
No ratings yet
Ijirt156181 Paper
5 pages
Gold Coast Network Map
No ratings yet
Gold Coast Network Map
1 page
Fin Irjmets1697888326
No ratings yet
Fin Irjmets1697888326
4 pages
Related Work
No ratings yet
Related Work
5 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
Side by Side Extra L1 U3 - Teacher's Guide
No ratings yet
Side by Side Extra L1 U3 - Teacher's Guide
22 pages
Improving Spam Email Classification Accuracy Using Ensemble Techniques: A Stacking Approach
No ratings yet
Improving Spam Email Classification Accuracy Using Ensemble Techniques: A Stacking Approach
13 pages
Theories of Earth Formation
No ratings yet
Theories of Earth Formation
3 pages
Math 9 DLL Q1W1
No ratings yet
Math 9 DLL Q1W1
7 pages
IAccept
No ratings yet
IAccept
6 pages
Operations Research Assignment 2
No ratings yet
Operations Research Assignment 2
9 pages
Statement of Account: Date Narration Chq./Ref - No. Value DT Withdrawal Amt. Deposit Amt. Closing Balance
No ratings yet
Statement of Account: Date Narration Chq./Ref - No. Value DT Withdrawal Amt. Deposit Amt. Closing Balance
15 pages
Insurance E Card
No ratings yet
Insurance E Card
3 pages
Transkrip Nilai Sementara
No ratings yet
Transkrip Nilai Sementara
4 pages
My Internship Overview1
No ratings yet
My Internship Overview1
15 pages
MSSQL Injection
No ratings yet
MSSQL Injection
9 pages
Code of Ethics For Portfolio
No ratings yet
Code of Ethics For Portfolio
4 pages
Article 31 Guidelines
No ratings yet
Article 31 Guidelines
49 pages
10.CV Terbaru Firmansyah
No ratings yet
10.CV Terbaru Firmansyah
5 pages
Microsoft Windows Server 2016 Licensing
No ratings yet
Microsoft Windows Server 2016 Licensing
2 pages
White Classic Clean Resume
No ratings yet
White Classic Clean Resume
2 pages
Homework s13
No ratings yet
Homework s13
14 pages
Joshua William Buckholtz, PH.D.: Curriculum Vitae
No ratings yet
Joshua William Buckholtz, PH.D.: Curriculum Vitae
7 pages
Daftar Pustaka
No ratings yet
Daftar Pustaka
6 pages
(Ebook) Textbook of Pedodontics by Shobha Tandon ISBN 9788186635964, 8186635963 Download
No ratings yet
(Ebook) Textbook of Pedodontics by Shobha Tandon ISBN 9788186635964, 8186635963 Download
57 pages
Introduction To Sap Hana Cloud Platform Certificate Full 27755
No ratings yet
Introduction To Sap Hana Cloud Platform Certificate Full 27755
1 page
MMW - Chapter 4 - Polya's Strategy
No ratings yet
MMW - Chapter 4 - Polya's Strategy
12 pages
Eula
No ratings yet
Eula
7 pages
IJSRpaper Kanchan Yadav 22
No ratings yet
IJSRpaper Kanchan Yadav 22
8 pages
From Detached Concern To Empathy Humanizing Medical Practice Jodi Halpern Instant Download
No ratings yet
From Detached Concern To Empathy Humanizing Medical Practice Jodi Halpern Instant Download
46 pages
Vocabulary Acquisition of A Four-Year-Old Child Through Piaget's Accommodation Theory
No ratings yet
Vocabulary Acquisition of A Four-Year-Old Child Through Piaget's Accommodation Theory
13 pages
Ionic Equilibrium - JEE Main 2024 January Question Bank - MathonGo
No ratings yet
Ionic Equilibrium - JEE Main 2024 January Question Bank - MathonGo
6 pages
Study - As A Population Gets Older, Automation Accelerates - MIT News - Massachusetts Institute of Technology 2024 09 27 04x58 507.0 KB
No ratings yet
Study - As A Population Gets Older, Automation Accelerates - MIT News - Massachusetts Institute of Technology 2024 09 27 04x58 507.0 KB
5 pages
Board of Apprenticeship Training (Southern Region) CIT Campus, Taramani, Chennai - 600113
No ratings yet
Board of Apprenticeship Training (Southern Region) CIT Campus, Taramani, Chennai - 600113
1 page
(Wk3) Debrief Rubric
No ratings yet
(Wk3) Debrief Rubric
1 page
III Q# MODULE 2 Lesson 3 Conceptual Framework
No ratings yet
III Q# MODULE 2 Lesson 3 Conceptual Framework
2 pages
Resume - Koushik Dutta - 0212
No ratings yet
Resume - Koushik Dutta - 0212
1 page
Bribery - Corruption
No ratings yet
Bribery - Corruption
1 page
Agenda GRC Bootcamp
No ratings yet
Agenda GRC Bootcamp
2 pages
Julien Day School: Booklist 2025-2026
No ratings yet
Julien Day School: Booklist 2025-2026
1 page
Securing Critical Infrastructures
From Everand
Securing Critical Infrastructures
Professor Mohamed K. Kamara Ph.D.
No ratings yet

Spam Detection

Uploaded by

Spam Detection

Uploaded by

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science

EMAIL SPAM AND MALWARE DETECTION USING MACHINE LEARNING

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

naïve bayes model is working properly with 0.93 accuracy

SVM's performance is slightly poorer than

IV. RESULTS AND DISCUSSION

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

You might also like