0% found this document useful (0 votes)

29 views7 pages

Email (Research) 3

Uploaded by

utkarshgupta2430

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

Email (Research) 3

Uploaded by

utkarshgupta2430

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Harnessing the power of Machine Learning

For Email Spam Classification

Utkarsh Gupta
(6th seme, Section-‘G’, Roll NO.67)
Computer Science and Engineering

Graphic Era Hill University

Dehradun, Uttarakhand

[email protected]

Abstract— With the increase of email sending and Keywords:Ml Algorithm, Email spam classifier, Spam,
receiving in our day to day life. Due to this spam email spam Filter,
increases rapidly and became the biggest problem which
affected our globally integrated communication system.
Previously solutions used to filter and hide spam email
included the blacklisting of specific domains created who I. INTRODUCTION
send spam email and manual detecting the specific
keywords. There has been done a lot of research to render Nowadays email has become an essential part of our
spam filtering more accurately in classifying emails as lives, internet usage has taken a drastic increase from
ham (real or valid email) or spam by using ML classifier. past few years. All the social media application usage
This system uses machine learning techniques to detect also increased, due to this email has now become a
pattern of repetitive keywords which is classified as spam. crucial part of our lives, with the increases of mail in
Even then also we are still getting lots of spam email in
our inboxes on a daily basis due to this email spam also
our daily life. This is not the problem of filters, this
increased. The data collected from the internet shows
happens due to adoption of rising technology by
spammers. The approaches that have been developed to that the number of emails sent and received per day is
reduce the email spamming, filtration is important 347.3 billion(2023) with a 4.3% increase from the 2022
technique. Research is important in the field of spam year, the record of 2022 was 333.2 billion, this data
classification. shows how the usage of email is increasing with the
passing years. With the increase of emails, it is difficult significance of feature engineering, data preparation,
to differentiate between a real email and spam email, model selection, and assessment metrics, all of which
and then it comes to cyber security concerns. are crucial for creating a reliable email spam classifier.
A reliable classifier for email spam would have far-
Our email addresses are collected by spammers through reaching effects. By clearing out the clutter in our
chatrooms, websites, newsgroups and are sold to other inboxes, individuals may increase productivity,
spammers. Through this, the number of spam messages safeguard our privacy, and lessen the threats brought on
increases rapidly. From the 2023 data 3.4 billion spam by harmful email. We will discuss a number of topics
sent every day. Google itself blocks approximate 100 related to classifying email spam in the ensuing emails,
million spam emails daily and over 45% of emails that such as the different machine learning techniques that
was sent in 2022 were spam. So, to reduce the spam are frequently employed, the difficulties encountered in
mails we need a technology that identify between spam real-world situations, and methods for enhancing
email and real email. The implementation of a system classifier performance over time. This series will offer
that delays the transmission of some Gmail messages insightful information and useful skills to anyone
for a short period of time has improved Google's interested in learning more about the inner workings of
performance in detecting phishing attacks since these this technology, whether they are aspiring data
attacks are easier to spot when they are examined all at scientists, cybersecurity enthusiasts, or just curious
once. Delaying the distribution of some of these
questionable emails allows for a more thorough
investigation while waiting for the arrival of additional
messages and real-time algorithm updates. This II. LITERATURE SURVEY
intentional delay affects only 0.05 percent of emails.
One of the Major problems of today’s internet is spam
Entering in the field of Machine Learning that email, which brings financial damage to individual user
revolutionized the way of solving the complex and to companies. The approaches developed to stop
problems. Machine Learning provided a powerful spam, filtering technique are most important. The
algorithm and techniques that learn from the real time process of filtering technique is to remove unrequested
data or previous data and make an accurate prediction, emails from user’s mail inbox. The unrequested mails
that helps to tackle the challenge of email spam. already caused a problem of filling up the mailboxes
and utilizing user’s time [1]. Two different methods
Leading email providers like Gmail, Yahoo Mail, and were classified in paper [2]. Some rules that was
Outlook have combined a variety of machine learning defined manually in first method. One of the example is
(ML) techniques, including neural networks, in their rule based expert system. When all the classes are static
spam filters to successfully tackle the danger posed by and they can be easily separated according to few
email spam. These machine learning approaches have features and the second method one is done with the
the ability to learn and recognise spam emails and help of techniques which are in machine learning. In
phishing communications by examining a large number paper [3] uses a collection of criterion function to
of these messages across a large network of computers. define a statement of clustering of spam messages,
Gmail and Yahoo mail spam filters go beyond simply which is nothing but finds the similar keywords
scanning spam emails using pre-existing rules since between statement or message in clusters, which also
machine learning has the ability to adapt to changing can be define with the help of K- nearest neighbor
circumstances. As they continue their spam filtering algorithm (KNN). In paper [4] they have classified their
activity, they create new rules on their own using what data in four different categories – Neural Network,
they have learned. SVM, Naïve Bayesian and J48 classifier. They perform
their implementation on different data and attributes
In this email, we set off on a fascinating tour into the size. Their final result shows that it is spam if output
area of machine learning-based email spam comes ‘1’; otherwise it shows ‘0’ on not spam.
classification. We shall investigate every facet of this
technology, from its fundamental ideas to its actual In paper [5]-[7], automatic anti-spam filtering method
uses. In this section, we'll go into detail about the becoming an important feature for internet for the
raising family of junk-filtering tools. The researcher has As of most of email spam cleaning techniques
separated numeric distance measure and nominal developed are purely based on text classification
variables, and after that there overall distance measure techniques. Thus filtration of spam now converted into
is combined. In second method, the nominal variables multiple problems. In my paper, work is done to extract
are converted to numeric variables, and then with the attributes vector from statement in email. Here, three
help of variables distance measure are calculated. The machine learning algorithm SCV, Multinomial Naïve
researchers has analyzes in Paper [8] the calculation Bias and Decision Tree Classifier are used to train the
Pre-process Split Data Train data
complexity of the algorithm, and tested their application Dataset
model. data

on n number of data sets that is taken from different-

different domains. The main concept are [9] that the
spammers sends either phishing emails only or no
phishing emails at all, [10] shows that most community
Spam email Make Check for Test data
of spammers sends only phishing or no phishing emails classification accuracy

at all, and [11] state that many different groups of

spammers exhibit relevant behavior within the Not spam
communities or having same IP addresses [12]. It has
been explained that both the methods have little Figure 1: Architecture of the proposed system
generalization from small examples; shows that each
methods are similar in generalization behavior on this
type of problem, even with training sets which is large 1. Collection and preprocessing of data:
in size [13].
Dataset: Collection of labeled dataset which contain
Goodman et al encapsulated different except machine spam and non-spam emails.
learning in email spam filtration and they state that
email spam filtration was in the control of user, but the
real conflict was between the generator and the
researcher of spam was going on [14].

A client easily able to send or receive an email by doing

just a single click through an ISP. In the level of client
spam filtering which provides some framework for that
individual client to secure his or her mail transmission
system. A client can do this by just installing some Preprocessing:
several existing frameworks on their PC or system. This
installed framework directly interacts with Mail user  Tokenization: Breaking the emails into
agent (MUA) and filters the inbox by just accepting and separate words.
managing the particularly messages [15].  Lowercasing: Converting each statement into
lowercase such that uniformity remains ensure.
However, spam filter methods lighten the burden of the
receiver, it is believe to develop a system of email spam
detection which gives results more efficiently and
accurately. Along with all this a system which gives a
result that is user specific has been dreamed for. This
makes sure that the user friendly system is developed.

 Removing of Stop Words: Eliminating the

III. METHODOLOGY words which are common like “and” or “the”.
 Feature Extraction: Converting the text data to
numeric attribute such as TF-IDF numeric
attributes which is known as Term frequency Support Vector Classifier (SVC):
inverse document frequency.
Kernel Selection: Choosing the appropriate kernel from
These keywords show the non-spam emails which are linear, radial basis function, etc. based on
safe and that does not contain any wrong information or implementation and performance.
any bleach to cyber security.
Hyper parameter Tuning: This uses techniques such as
grid search or randomized search to find the optimal
hyperparameters for SVC model.

Multinomial Naive Bayes (MultinomialNB):

Text Vectorization: Converting the textual data into the

form of numeric format which is suitable for
MultinomialNB using the technique like TK-IDF.

In this only two category are required: spam or non-

spam(ham). Almost every spam filters based on statistic
uses Bayesian probability to join separated token’s
statistic to an overall score, and make the decision
based on the overall score. Usually, firstly these filters
goes through training stage that collect each token’s
statistics. In statistic most of the time we are interested
for a token T, which is calculated as follows:
Figure 2. Common keywords in non spam mails

These keywords show the spam emails which are not

safe and that contain wrong, false information or
viruses that may harm system.
Whereas, C spam(T) and C ham(T) are known as the
number of spam and ham statement that contain
token T. A easy way to make classifications is to
calculate the spam token’s result and differentiate
it with the result of ham token’s.

The mail is known as spam email if overall spamminess

product S[M] is greater than hamminess product H[M].

Stage 1. Training - Resolve every email into its

constituent tokens that produce a probability for
individual token W. S[W]=C spam(W) /(C ham(W)+ C
(W)) save spamminess feature to a database
spam

stage2.
Figure 2. Common keywords in spam mails
Stage2. Filtering – For every statement W do scan
2. Model Selection: statement for the coming token Ti. Query for database
spamminess S(Ti).
Now calculate the accumulated statement probability of
S[M] and H[M].

Now, Calculate the whole statement filtering indicated

by: I[M]=f(S[M],H[M])

f is a filter dependent func.

If I[M] > threshold

statement is declared as spam

else This Graph shows the specific keywords which are used
in most of the emails in recent which are not spam.
statement is declared as non-spam

Decision Tree Classifier:

Tree Depth: Performing experiment with multiple tree

depths to find the suitable stability between under and
over fitting.

3. Training and Evaluation:

Training:

In this dataset is splits into training and testing sets for

example 70% for training the dataset and reaming 30%
is used for testing the dataset.

Training every classifier (SVC, Multinomial Naïve

Bayes, and Decision Tree Classifier) on train set. IV. RESULTS

Evaluation Metrics: In my paper, model is train with machine learning

algorithm to detect that a received mail is spam or not.
Assessing each model performance by using metrics In this model I have used spam base dataset. After
such as accuracy, precision, and confusion matrix. selecting dataset it was cleaned and processed so that
there should not any null attributes present. The
attributes of the email dataset were measure using
min_max_scaler for making proper connection with
training of the model by using three machine learning
algorithm. After this dataset was divided into x and y
This Graph shows the specific keywords which are used attributes. Then, these x and y variables were further
in most of the emails in recent which are spam, as we divided into x_train,x_test,y_train,y_text. Then these
can see that call keyword is used in most of the emails. train and test cases were being trained using these three
machine learning algorithms.

This graph shows the comparison between SVC, K

Neighbors Classifier, Multinomial NB, Decision Tree
Classifier, Logistic Regression, Random Forest The Different algorithm used in this approach are SCV,
Classifier, AdaBoost Classifier Bagging Classifier, Multinomial NB and Extra tree classifier. The accuracy
Extra Trees Classifier, Gradient Boosting Classifier and achieved by these algorithm are 97.29%, 95.93% and
XGB Classifier algorithm and between all them Extra 97.77% respectively and the overall combined accuracy
tress classifier, Multinomial NB and SVC shows the is 97.87% with precision of 93.28%.
best accuracy and precision.

REFERENCES

[1] Kh. Ahmed, “An overview of content-based spam filtering

techniques,” Informatica, vol. 31, no. 3, pp. 269–277, 2007.

[2] Biro. I, J. Szabo, and A. A. Benczur. Latent Dirichlet,”

location in Web Spam Filtering”. In Proceedings of the 4th
International Workshop on Adversarial Information Retrieval
on the Web (AIRWeb), 2008.

[3] Perkins, A. The classification of search engine spam.

http://www. ebrand management.Com/ white papers/spam
classification.

[4] Youn and Dennis McLeod, “ A Comparative Study for

Email Classification, Seongwook Los Angeles” , CA 90089,
And after combining SCV, Multinomial NB and Extra
tree classifier their accuracy and precision shows best USA, 2006.
result.
[5] Androutsopoulos .I, J. Koutsias, K.V. Chandrinos, G.
Paliouras, and C.D. Spyropoulos. An Evaluation of Naive
Accuracy 0.9816247582205029
Bayesian Anti-Spam Filtering. Proceedings of the
Precision 0.9917355371900827
Workshopon Machine Learning in the New Information Age,
11th European Conference on Machine Learning, Barcelona,
V. CONCLUSION Spain, pages 9–17, 2000.

[6] Androutsopoulos I., J. Koutsias, K.V. Chandrinos, and

As today email Spam or email fraud becomes
C.D. Spyropoulos. An Experimental Comparison of Naive
demanding internet issue of world of communication.
Bayesian and Keyword-Based Anti-Spam Filtering with
Spam emails are generated by spammers and they Encrypted Personal Messages. Proceedings of the 23rd
misuse them and can affect the organization or any Annual International ACM SIGIR Conference on Research
individual. As we also know that there are already and Development in Information Retrieval, Athens, Greece,
many email spam filtering tools are present. Due to the 2000.
existence of spammers and development of new
[7] Apte, C. and F. Damerau. Automated Learning of
technology, filtering spam emails becomes a
Decision Rules for Text Categorization. ACM Transactions
challenging topic to the researcher. These techniques on Information Systems, 12(3):233–251, 1994.
can be used by mail server or at mail client to decrease
the rate of spam message and to decrease the risk of [8] X. Li and N. Ye, “A supervised clustering and
future loss and storage usage. This system specifically classification algorithm for mining data with mixed
focuses on differentiating emails in two different variables,” IEEE Transactions on Systems, Man, and
categories, known as spam and no-spam. This has a lot Cybernetics Part A, vol. 36, no. 2, pp. 396– 406, 200
of suggestion for both organization and individual [9] Androutsopoulos .I, J. Koutsias, K.V. Chandrinos, G.
users. Paliouras, and C.D. Spyropoulos. An Evaluation of Naive
Bayesian Anti-Spam Filtering. Proceedings of the
Workshopon Machine Learning in the New Information Age,
11th European Conference on Machine Learning, Barcelona,
Spain, pages 9–17, 2000.

[10] Androutsopoulos I., J. Koutsias, K.V. Chandrinos, and

C.D. Spyropoulos. An Experimental Comparison of Naive
Bayesian and Keyword-Based Anti-Spam Filtering with
Encrypted Personal Messages. Proceedings of the 23rd
Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, Athens, Greece,
2000.

[11] Apte, C. and F. Damerau. Automated Learning of

Decision Rules for Text Categorization. ACM Transactions
on Information Systems, 12(3):233–251, 1994.

[12] A. Bratko, B. Filipic, G. Cormack, T. Lynam, and B.

Zupan. “Spam Filtering Using Statistical Data Compression
Models”, The Journal of Machine Learning Research, pp.,
2673–2698, 2006

[13] Cohen, W, Learning rules that classify e-mail. In

Proceedings of the AAAI Spring Symposium on Machine
Learning in Information Access. Palo Alto, California, 1996.

[14] Tretyakov, K. (2004, May). Machine learning techniques

in spam filtering. Inb bData Mining Problem-oriented
Seminar, MTAT (Vol. 3, No. 177, pp. 60-79)

[15] Saad, O., Darwish, A., & Faraj, R. (2012). A survey of

machine learning techniques for Spam filtering. International
Journal of Computer Science and Network Security
(IJCSNS), 12(2), 66.

Design For Usability: Nigel Bevan
No ratings yet
Design For Usability: Nigel Bevan
6 pages
Method Statement of Bored Piles 21
100% (1)
Method Statement of Bored Piles 21
40 pages
Optimizing Spam Filtering With Machine Learning
No ratings yet
Optimizing Spam Filtering With Machine Learning
35 pages
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
100% (2)
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
58 pages
PPT
0% (1)
PPT
15 pages
CPP Report
No ratings yet
CPP Report
14 pages
Human Factors Assignment
No ratings yet
Human Factors Assignment
5 pages
Naive Bayes Spam Filte....
No ratings yet
Naive Bayes Spam Filte....
10 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
EmailSpamFilteringTechniques AReview
No ratings yet
EmailSpamFilteringTechniques AReview
13 pages
Spam Email Using Machine Learning
No ratings yet
Spam Email Using Machine Learning
13 pages
Machine Learning Based Spam E-Mail Detection
No ratings yet
Machine Learning Based Spam E-Mail Detection
10 pages
Part 61
No ratings yet
Part 61
18 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Email Spam Filtering Techniques
No ratings yet
Email Spam Filtering Techniques
11 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Comparative Analysis of Classifiers For PDF
No ratings yet
Comparative Analysis of Classifiers For PDF
6 pages
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
No ratings yet
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
9 pages
Article 28
No ratings yet
Article 28
5 pages
Chapter 6 - Consolidated Financial Statements (Part 3)
No ratings yet
Chapter 6 - Consolidated Financial Statements (Part 3)
41 pages
PRQZ 2
No ratings yet
PRQZ 2
31 pages
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
No ratings yet
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
6 pages
KTM Adventure Range Folder MY21-EN
No ratings yet
KTM Adventure Range Folder MY21-EN
33 pages
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
No ratings yet
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
7 pages
Research Paper Emaildetection
No ratings yet
Research Paper Emaildetection
6 pages
Spam Filtering Thesis
100% (2)
Spam Filtering Thesis
6 pages
Intro To USB-6009 DAQ
No ratings yet
Intro To USB-6009 DAQ
10 pages
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
No ratings yet
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
11 pages
AutoCAD Electrical Ladder Tutorial
100% (1)
AutoCAD Electrical Ladder Tutorial
23 pages
E-Mail Spam Filtering
No ratings yet
E-Mail Spam Filtering
7 pages
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
No ratings yet
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
4 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
Slide Format
No ratings yet
Slide Format
14 pages
5 Steps Developing Sales Plan p52 55
No ratings yet
5 Steps Developing Sales Plan p52 55
4 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
Email Based Spam Detection
No ratings yet
Email Based Spam Detection
5 pages
Spam Detection Using BERT
No ratings yet
Spam Detection Using BERT
6 pages
Alfamart CMR - KAS Validation
No ratings yet
Alfamart CMR - KAS Validation
5 pages
Moutafis EWS 098
No ratings yet
Moutafis EWS 098
8 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
A Technical Explanation of T-Reinforcement For Trusses PDF
No ratings yet
A Technical Explanation of T-Reinforcement For Trusses PDF
5 pages
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
No ratings yet
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
5 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
Spam Classification Based On Supervised Learning U
No ratings yet
Spam Classification Based On Supervised Learning U
6 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
International Pension Application Statepension PDF
No ratings yet
International Pension Application Statepension PDF
5 pages
SocialPeta H1 2022 Global Mobile Game Marketing White Paper VN
No ratings yet
SocialPeta H1 2022 Global Mobile Game Marketing White Paper VN
109 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Decision Tree Model For Email Classification: Ivana Čavor
No ratings yet
Decision Tree Model For Email Classification: Ivana Čavor
4 pages
Paper On Housing PUBLISHING
No ratings yet
Paper On Housing PUBLISHING
35 pages
IJISAE 25 Dr+K.+Aditya+Shastry 8 1103
No ratings yet
IJISAE 25 Dr+K.+Aditya+Shastry 8 1103
9 pages
CHEN 309 Marking Scheme
No ratings yet
CHEN 309 Marking Scheme
4 pages
Email Spam A Comprehensive Review of Optimize Detection Methods Challenges and Open Research Problems
No ratings yet
Email Spam A Comprehensive Review of Optimize Detection Methods Challenges and Open Research Problems
31 pages
Assisted Living Facilities
No ratings yet
Assisted Living Facilities
8 pages
A Comprehensive Survey For Intelligent Spam Email Detection
No ratings yet
A Comprehensive Survey For Intelligent Spam Email Detection
59 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Spam 2023
No ratings yet
Spam 2023
11 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
1822 B Deleted Merged Cropped
No ratings yet
1822 B Deleted Merged Cropped
40 pages
NLP Report
No ratings yet
NLP Report
19 pages
Point Load Tests On Double Tee Flanges
No ratings yet
Point Load Tests On Double Tee Flanges
8 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
Capstone Proposal
No ratings yet
Capstone Proposal
2 pages
Case Study On Email Spam and Non
No ratings yet
Case Study On Email Spam and Non
5 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Bhardwaj Sharma 2022 Email Spam Detection Using Bagging and Boosting of Machine Learning Classifiers
No ratings yet
Bhardwaj Sharma 2022 Email Spam Detection Using Bagging and Boosting of Machine Learning Classifiers
25 pages
Class Lecture Nootte
No ratings yet
Class Lecture Nootte
33 pages
Medical: Paramedical Pharmacy Agriculture B.SC
No ratings yet
Medical: Paramedical Pharmacy Agriculture B.SC
28 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
AI in Ecommerce
No ratings yet
AI in Ecommerce
11 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
Email Spam Detection
No ratings yet
Email Spam Detection
13 pages
Conversation Questions Travel
No ratings yet
Conversation Questions Travel
3 pages
Diabeties Minor
No ratings yet
Diabeties Minor
45 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Ap15 Compsci A q2
No ratings yet
Ap15 Compsci A q2
9 pages
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
No ratings yet
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
12 pages
4 Substations
No ratings yet
4 Substations
14 pages
Rack and Pinion
No ratings yet
Rack and Pinion
4 pages
14.4 V & 18 V Family Handout-R1
No ratings yet
14.4 V & 18 V Family Handout-R1
2 pages
Project Report Emaildetection 4 44
No ratings yet
Project Report Emaildetection 4 44
41 pages
Ssa 2025 07 1
No ratings yet
Ssa 2025 07 1
4 pages
2025 Final Program
No ratings yet
2025 Final Program
10 pages
Ijirt156181 Paper
No ratings yet
Ijirt156181 Paper
5 pages
DK Essential Managers: Dealing With E-mail
From Everand
DK Essential Managers: Dealing With E-mail
David Brake
4/5 (1)
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet

Email (Research) 3

Uploaded by

Email (Research) 3

Uploaded by

Harnessing the power of Machine Learning

For Email Spam Classification

Graphic Era Hill University

on n number of data sets that is taken from different-

at all, and [11] state that many different groups of

A client easily able to send or receive an email by doing

 Removing of Stop Words: Eliminating the

Multinomial Naive Bayes (MultinomialNB):

Text Vectorization: Converting the textual data into the

In this only two category are required: spam or non-

These keywords show the spam emails which are not

The mail is known as spam email if overall spamminess

Stage 1. Training - Resolve every email into its

Now, Calculate the whole statement filtering indicated

f is a filter dependent func.

If I[M] > threshold

statement is declared as spam

Decision Tree Classifier:

Tree Depth: Performing experiment with multiple tree

3. Training and Evaluation:

In this dataset is splits into training and testing sets for

Training every classifier (SVC, Multinomial Naïve

Evaluation Metrics: In my paper, model is train with machine learning

This graph shows the comparison between SVC, K

[1] Kh. Ahmed, “An overview of content-based spam filtering

[2] Biro. I, J. Szabo, and A. A. Benczur. Latent Dirichlet,”

[3] Perkins, A. The classification of search engine spam.

[4] Youn and Dennis McLeod, “ A Comparative Study for

[6] Androutsopoulos I., J. Koutsias, K.V. Chandrinos, and

[10] Androutsopoulos I., J. Koutsias, K.V. Chandrinos, and

[11] Apte, C. and F. Damerau. Automated Learning of

[12] A. Bratko, B. Filipic, G. Cormack, T. Lynam, and B.

[13] Cohen, W, Learning rules that classify e-mail. In

[14] Tretyakov, K. (2004, May). Machine learning techniques

[15] Saad, O., Darwish, A., & Faraj, R. (2012). A survey of

You might also like