Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability

The document proposes an algorithm to reduce the rate of false positive emails identified by spam filters. The algorithm performs a series of checks on emails before sending to alert the user about spammy elements and suggest changes. The goal is to help users ensure their emails are delivered by avoiding triggers that cause false positive identification.

Uploaded by

Madhu Raj Sekhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views4 pages

Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability

Uploaded by

Madhu Raj Sekhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Reverse of E-mail Spam Filtering Algorithms to

Maintain E-mail Deliverability

Hussah AlRashid, Rasheed AlZahrani, Eyas ElQawasmeh

Information Systems Department
College of Computer and Information Sciences
King Saud University
Riyadh, Saudi Arabia

Abstract—Over the past years, research has highlighted the legitimate e-mails that are misclassified as spam [1]. This
importance of enhancing the performance of e-mail spam filters paper will focus on the problem of false positive e-mails trying
to eliminate the risk of false negative e-mails. On the other hand, to identify its reasons and propose an algorithm that could help
the problem of false positive e-mails got less attention despite the in reducing its ratio.
fact that it is critical and may cause a failure in the delivery of
important e-mails. The organization of this paper is as follows. The first
section lists some of the pervious work done in this area. The
The aim of this research is to provide a solution to reduce the second section describes the details of the algorithm that we
rate of false positive e-mails. It addresses the problem by developed in order to reduce the false positive rate of e-mails.
exploring the behavior of the existing e-mail spam filters and The third section presents the experiment that we carried out in
highlighting the different reasons behind the failure of e-mail order to evaluate the performance of the proposed algorithm.
delivery. Based on this investigation, we developed an algorithm Finally, the last section concludes the work that has been done
that helps e-mail users in ensuring the deliverability of their e- in this research and lists the future work that lays the
mails. The proposed algorithm is based on reversing the foundation for further enhancements.
mechanism of spam filters on the client-side.

Keywords— Electronic mail, e-mail, ham, spam, false positive,

false negative, spam filters, spam filtering. II. RELATED WORK
A lot of work has been concerned with improving the
performance of e-mail spam filters to avoid the
I. INTRODUCTION misclassification of e-mail messages. Many researches were
Nowadays, the exponential growth of the Internet led to the aimed at reducing the spam filters’ false negative rate.
creation of enormous means of electronic communications However, to the author’s knowledge, there are only few
around the world. Electronic mail (or e-mail) is considered one researches that were focused on improving the performance of
of the most popular methods amongst people for spam filters to decrease its false positive rate.
communications. Since the proposed algorithm takes advantage of the
But as we all know, technology is a double-edged sword. existing spam filters, we are going to do a short overview of
Although e-mail has several advantages, however, some people them in the next section.
tend to misuse technology to support their own malicious Authors of [2] explored different spam filtering models to
intentions. In the case of e-mail, some people may use this create a powerful algorithm that classifies e-mail messages
technology to send a deceptive message that contains a based on the e-mail user behavior in addition to the message
malicious code (such as a virus or worm) to harm the recipient. content. The algorithm is expected to understand the user’s
To overcome this issue, computer specialists developed what is behavior by learning which kind of messages the user wants to
known today as spam filters. Their work relies on powerful receive. This way, the spam filter understands the behavior of
algorithms to classify e-mails into two categories: legitimate each user individually and can classify messages more
(ham) and illegitimate (spam) e-mails. precisely.
Spam filters were originally developed to prevent spam e- Authors of [3] demonstrated that spam filters could be
mail from reaching the recipient’s inbox. However, a few classified based on the technique they use such as: machine
problems have occurred as a result of using spam filters. False learning, Bayesian theory, etc. Machine learning based
positive/negative e-mails is one of the critical limitations of classifiers rely on training the spam filter using a set of input
spam filters. False negative e-mails are those e-mails that are data to enhance the overall performance of the classification
truly spam but instead they are misclassified as legitimate e- process. The authors proposed a method for e-mail spam
mails [1]. On the other hand, false positive e-mails are classification using machine-learning algorithm called

ISBN: 978-1-4799-3724-0/14/$31.00 ©2014 IEEE 297

“Adaptive Boosting Algorithm”. The algorithm is expected to Step.2: Get the subject of the message; if it is empty
adapt its behavior based on the input it receives over time to then ask the user to add a subject.
decrease the false negative/positive rate.
Step.3: If the subject contains spam trigger words;
Another type of spam classifiers is Bayesian based then list them and ask the user to change
classifiers . They work by analyzing the content of the entire them.
message to calculate the overall spam probability of the
message [4]. Auhtors of [4] proposed an enhanced algorithm Step.4: Get the body of the message; if it is empty
then ask the user to add a small portion of
that aims at reducing the false positive rate. The algorithm is
called “Minimum Risk Bayes Algorithm” and it uses the text.
decision-making technique in the classification process. Step.5: If the body of the message contains spam
Authors of [5] addressed the issue of false positive trigger words; then list them and ask the user
to change them.
classification and how sensitive it is given that important
messages may end up as spam. To solve this problem, the Step.6: If the body of the message contains
authors proposed a training algorithm for spam classification. capitalized words; then convert them into
A series of tests were conducted to evaluate the performance of lower case words.
the proposed algorithm.
Step.7: If the body of the message contains continues
punctuation marks and/or symbols; then
minimize them.
III. PROPOSED ALGORITHM
Step.8: If the words within the e-mail body contain
There are two aspects that could help in solving this
symbols; then list them and ask the user to
problem. The first aspect is the server-side and the second
change them.
aspect is the client-side. The solution on the server-side is that
spam filters algorithms could be enhanced to eliminate the Step.9: If the format styles are used excessively in the
possibility of false positive filtering. However, this solution is body of the message; then ask the user to
not applicable because there is a trade off between false reduce them.
positive and false negative classification. In order to reduce the
rate of one of them, the rate of the other will increase Step.10: If the body of the message contains blocked,
accordingly. shortened or IP form URL; then refer to it and
ask the user to change/delete it.
In this paper, we are going to focus on the client-side
(especially the sender’s side and not the recipient’s side). The Step.11: If the body of the message contains invalid
proposed algorithm will help e-mail users in decreasing the HTML code; then refer to it and ask the user
spam probability of their e-mails. The algorithm will perform to fix it.
a series of checks on the message before sending it. Moreover, Step.12: If an alternative text is not included with
it will alert the user regarding the things that make his/her e- HTML e-mails; then ask the user to include it.
mail look like spam. It will also suggest some changes on the
e-mail content if necessary. Step.13: If the message body contains embedded
forms; then ask the user to remove them.
The expected outcome of the proposed algorithm is to help
users make their e-mails’ content look legitimate to pass the Step.14: If the body of the message is composed of
spam filters’ test successfully and land in the recipient’s inbox. images only; then ask the user to add a text.
If the user followed the algorithm’s suggestions, there is a great Step.15: If the subject and body of the e-mail are
chance that the deliverability rate of his/her e-mail will identical; then ask the user to make some
increase. changes.
The algorithm works as follows. When the user writes an e- Step.16: Get the attachment type, if it is video, java
mail message, the algorithm checks its contents. The checking script, executable files or any other suspicious
process includes looking at the subject, words, URLs, From, type; then inform the user about it.
To, CC, and BCC addresses, attachments, etc. These elements
will be compared with the ones that are usually associated with Step.17: Get the “From” address, if it is not in the right
spam e-mails. The user will then be provided with a list of format; then inform the user.
things that might trigger spam filters along with the suggested Step.18: If the “From” address is invalid; then inform
recommendations. the user.
The checking process of the proposed algorithm includes Step.19: If the mail server of the “From” address is
breaking down the problem into the following points: blocked; then inform the user.
BEGIN Step.20: Get the “To” addresses, if there are more than
Step.1: Obtain the body and header of the e-mail 20 addresses; then ask the user to reduce the
message. number and send the message to 20 users at a
time.

ISBN: 978-1-4799-3724-0/14/$31.00 ©2014 IEEE 298

Step.21: If the “To” addresses contains an Table I presents the formulas for these measures. Note that
invalid/trap/bounce addresses; then ask the nH and nS represent the number of ham and spam messages
user to remove them. respectively, nH → H and nS → S represent the number of
ham/spam messages that are classified correctly, nH → S
Step.22: Display a message with the changes and/or
recommendations to the user. represents the number of ham classified as spam, and nS → H
represents the number of spam classified as ham.
END
TABLE I. EVALUATION MEASURES

IV. PERFORMANCE RESULTS Measure Formula

Since the algorithm is designed to work on the client-side, nH → S
we developed a webmail client for this purpose. The program FPR
nH → H + nH → S
is implemented with minimal functionality to achieve the goal
of this research. Fig. 1 shows the interface of the algorithm. FNR
nS → H
nS → S + nS → H

nH → H
Ham Recall
nH → H + nH → S

nH → H
Ham Precision
nH → H + nS → H

nS → S
Spam Recall
nS → S + nS → H

nS → S
Spam Precision
nS → S + nH → S

nH → H + nS → S
Accuracy
nH → H + nS → S + nS → H + nH → S

C. Experiment Results
The experiment involves resending ten e-mails: five ham e-
mails and five spam e-mails using the application we
Fig. 1. The Interface of the Webmail Application developed. The algorithm is applied on the messages before
sending them and some changes are made based on the
recommendations of the algorithm.
An experiment was conducted to test the overall
performance of the proposed algorithm. The following The results showed that two out of five of the false positive
subsections describe the experiment settings. e-mails landed in the inbox, and the rest have been classified as
spam. On the other hand, all the five spam e-mails have been
A. Data Set classified as spam. So this gives us the following values:
The data set used in the experiment is a collection of emails nH → H = 2 , nH → S = 3 , nS → S = 5 , nS → H = 0 . We used
from an e-mail user. The e-mails were classified as spam by these values to calculate the formulas in Table I. The results of
the user’s spam filter. However, the user considers those e- the evaluation are listed in Table II.
mails as legitimate. The false positive e-mails are used to
evaluate the purpose of this research, which is ensuring the TABLE II. EVALUATION RESULTS
deliverability of legitimate e-mails to the recipient.
Measure Result
The data set also includes a set of actual spam e-mails.
Spam e-mails are used to ensure that the algorithm will not FPR 60%
pass actual spam. We want the performance of the proposed
FNR 0%
algorithm to be balanced so that it passes legitimate e-mails
without affecting the security of e-mail users by increasing the Ham Recall 40%
chance of receiving actual spam.
Ham Precision 100%
B. Evaluation Measures Spam Recall 100%
There are various measures to evaluate the performance of
Spam Precision 62.5%
an algorithm. Here, we are going to focus on five measures:
false positive rate (FPR), false negative rate (FNR), recall, Accuracy 70%
precision and accuracy.

All the ham messages in the data set are false positive Although the proposed algorithm has proven its potentiality
cases, which means that the original percentage of the false in serving the aim of this research, however, there are several
positive rate before applying the algorithm is 100%. The false avenues for potential improvements. Further testing on the
positive rate for these messages after applying the algorithm is implemented part of the algorithm is required. In addition, the
60% (3/[2+3]=0.6), and that is a 40% decrease in the test sample could be expanded to include a wide range of e-
misclassification of ham e-mail. mail messages from e-mail users of different backgrounds.
Moreover, the e-mails used in the evaluation can be collected
Ham recall percentage shows that 40% of the actual ham
on a longer time span to cover all the possibilities.
messages in the data set have been classified correctly. 100%
of ham precision indicates that all the messages that are
classified as ham are actually ham and there are no false
negative cases amongst them. REFERENCES
Regarding the rest of the measures, the 0% of false negative [1] J. Duntemann. (2004). Degunking Your Email, Spam, and Viruses.
[Online]. Avialable: http://www.ebrary.com.
rate and 100% of spam recall show that the algorithm did not
[2] S. Hershkop and S. Stolfo. “Combining Email Models for False Positive
cause a trade off between the security of e-mail users and the Reduction.” In the Proceedings of the 11th ACM SIGKDD International
deliverability of messages. Conference on Knowledge Discovery in Data Mining, 2005, pp. 98-107.
Overall, the proposed algorithm has been proven to be [3] A. Ali and Y. Xiang. “Spam Classification Using Adaptive Boosting
Algorithm.” In the Proceedings of the International Conference on
affective in increasing the deliverability of legitimate e-mails. Computer and Information Science, 2007, pp. 972-976.
[4] H. Yin and Z. Chaoyang. “An improved Bayesian Algorithm for
Filtering Spam E-mail.” In the Proceedings of the International
V. CONCLUSION AND FUTURE WORK Symposium on Intelligence Information Processing and Trusted
Computing, 2011, pp. 87-90.
The proposed algorithm provides support for e-mail users [5] L. Zhen and Z. Ming-Tian. “Spam Filtering Issue: FPD Research
to help them increase their e-mail’s deliverability. It can be between False Positive and False Negative.” In the Proceedings of the
integrated as an extra layer with existing e-mail clients or it can Fourth International Conference on Fuzzy Systems and Knowledge
be implemented separately as a support tool. Discovery, 2007, pp. 526-534.

PPT
0% (1)
PPT
15 pages
Final Year Project
No ratings yet
Final Year Project
3 pages
Optimizing Spam Filtering With Machine Learning
No ratings yet
Optimizing Spam Filtering With Machine Learning
35 pages
Reportfile
No ratings yet
Reportfile
10 pages
Anil Cap1
No ratings yet
Anil Cap1
6 pages
Print 22may2023
No ratings yet
Print 22may2023
54 pages
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
No ratings yet
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
17 pages
Report1 4 Sem New Final
No ratings yet
Report1 4 Sem New Final
27 pages
IJISAE Term Paper Charan
No ratings yet
IJISAE Term Paper Charan
6 pages
Spam Filtering Thesis
100% (2)
Spam Filtering Thesis
6 pages
FICE Project Report Spam
No ratings yet
FICE Project Report Spam
14 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
2nd Seminar
No ratings yet
2nd Seminar
7 pages
CPP Report
No ratings yet
CPP Report
14 pages
Spam 2023
No ratings yet
Spam 2023
11 pages
Miniproject Thirukumaran
No ratings yet
Miniproject Thirukumaran
38 pages
Aryan Blackbook 1
No ratings yet
Aryan Blackbook 1
29 pages
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
100% (2)
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
58 pages
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
No ratings yet
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
3 pages
Implementation of Naïve Bayesian Spam Filter Algorithm
No ratings yet
Implementation of Naïve Bayesian Spam Filter Algorithm
16 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
Second Progress Report
No ratings yet
Second Progress Report
17 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Table Content 1
No ratings yet
Table Content 1
3 pages
Naive Bayes Spam Filte....
No ratings yet
Naive Bayes Spam Filte....
10 pages
Introduction To Mail Management System: Suvarnsing G. Bhable, Jaypalsing N. Kayte
No ratings yet
Introduction To Mail Management System: Suvarnsing G. Bhable, Jaypalsing N. Kayte
4 pages
Ijirt156181 Paper
No ratings yet
Ijirt156181 Paper
5 pages
Email (Research) 3
No ratings yet
Email (Research) 3
7 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
Article 28
No ratings yet
Article 28
5 pages
E-Mail Security Using Spam Mail Detection and Filtering Network System
No ratings yet
E-Mail Security Using Spam Mail Detection and Filtering Network System
4 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Maths Answers
No ratings yet
Maths Answers
4 pages
Comparative Analysis of Classifiers For PDF
No ratings yet
Comparative Analysis of Classifiers For PDF
6 pages
IJTC201510012-Email With Classification Detection Power
No ratings yet
IJTC201510012-Email With Classification Detection Power
7 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
Spam Classification Based On Supervised Learning U
No ratings yet
Spam Classification Based On Supervised Learning U
6 pages
Email Based Spam Detection
No ratings yet
Email Based Spam Detection
5 pages
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
No ratings yet
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
6 pages
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
No ratings yet
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
6 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
Spam Filtering Algorithm
No ratings yet
Spam Filtering Algorithm
19 pages
NLP Report
No ratings yet
NLP Report
19 pages
2020CSEPID63 - Spam Alert System Synopsis Final
No ratings yet
2020CSEPID63 - Spam Alert System Synopsis Final
12 pages
Content Based Spam Detection in Email Us PDF
No ratings yet
Content Based Spam Detection in Email Us PDF
5 pages
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
No ratings yet
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
11 pages
MSME Declaration Template 3
No ratings yet
MSME Declaration Template 3
3 pages
Major-Final Research Paper
No ratings yet
Major-Final Research Paper
3 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Email - PGP
No ratings yet
Email - PGP
25 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Final Year Project
No ratings yet
Final Year Project
3 pages
Spam Filtering Using Spam Mail Communities: A Paper On
No ratings yet
Spam Filtering Using Spam Mail Communities: A Paper On
13 pages
Thief of Thieves - Season One Free Download (v1.2.0) IGGGAMES
No ratings yet
Thief of Thieves - Season One Free Download (v1.2.0) IGGGAMES
6 pages
OptiX OSN500 STM 1 Amp STM 4 Multi Service CPE Optical Transmission System Product Description V100R002 02 PDF
No ratings yet
OptiX OSN500 STM 1 Amp STM 4 Multi Service CPE Optical Transmission System Product Description V100R002 02 PDF
143 pages
Platform SAMPLE Project Plan - 7
No ratings yet
Platform SAMPLE Project Plan - 7
10 pages
IBMS & HVAC Specs
No ratings yet
IBMS & HVAC Specs
156 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Manual de Configuración y Comisionamiento OptiX RTN
No ratings yet
Manual de Configuración y Comisionamiento OptiX RTN
38 pages
Alto Mistral 2500, 4000
100% (1)
Alto Mistral 2500, 4000
46 pages
Coding Basics 2
No ratings yet
Coding Basics 2
79 pages
Facebook Solutions Engineering Interview Take-Home Assignment
100% (1)
Facebook Solutions Engineering Interview Take-Home Assignment
15 pages
DCA 103 - Solved - Questions...
No ratings yet
DCA 103 - Solved - Questions...
22 pages
Three Phase Inverter With Synergy Technology: SE50K / SE55K / SE82.8K
No ratings yet
Three Phase Inverter With Synergy Technology: SE50K / SE55K / SE82.8K
2 pages
Changelog
No ratings yet
Changelog
26 pages
Log
No ratings yet
Log
332 pages
Use Case and Analysis
No ratings yet
Use Case and Analysis
10 pages
Ch.8 - More About Equations - Method of Substitution
No ratings yet
Ch.8 - More About Equations - Method of Substitution
7 pages
Gmail - Booking Confirmation On IRCTC, Train - 03401, 25-Sep-2021, CC, BGP - PNBE
No ratings yet
Gmail - Booking Confirmation On IRCTC, Train - 03401, 25-Sep-2021, CC, BGP - PNBE
1 page
Zishan
No ratings yet
Zishan
2 pages
Jokes Advanced
No ratings yet
Jokes Advanced
6 pages
Components of Space Complexity
No ratings yet
Components of Space Complexity
5 pages
User Guide Nokia 2 4 User Guide
No ratings yet
User Guide Nokia 2 4 User Guide
47 pages
Cad/Cam Technology in Dental Medicine: Andonovic Vladan, Vrtanoski Gligorce
No ratings yet
Cad/Cam Technology in Dental Medicine: Andonovic Vladan, Vrtanoski Gligorce
6 pages
VI SEM CSE CS1351 Artificial Intelligence UNIT-III Question and Answers
No ratings yet
VI SEM CSE CS1351 Artificial Intelligence UNIT-III Question and Answers
18 pages
Sync Async Reset
No ratings yet
Sync Async Reset
9 pages
Toward Understanding Outcomes Associated With Data Quality Improvement
No ratings yet
Toward Understanding Outcomes Associated With Data Quality Improvement
12 pages
Zoom - Gentoo Wiki
No ratings yet
Zoom - Gentoo Wiki
2 pages
Noujatra SRS
No ratings yet
Noujatra SRS
5 pages
지원금 및 지원 수준 (Details on the Research Assistantship and Support) 교수 소속 및 연구분야 (Professor's Contact Details and Fields of Study)
No ratings yet
지원금 및 지원 수준 (Details on the Research Assistantship and Support) 교수 소속 및 연구분야 (Professor's Contact Details and Fields of Study)
1 page
Kushal Kanal Resume
No ratings yet
Kushal Kanal Resume
3 pages
Nokia Price List
No ratings yet
Nokia Price List
1 page

Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability

Uploaded by

Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability

Uploaded by

Reverse of E-mail Spam Filtering Algorithms to

Maintain E-mail Deliverability

Hussah AlRashid, Rasheed AlZahrani, Eyas ElQawasmeh

Keywords— Electronic mail, e-mail, ham, spam, false positive,

ISBN: 978-1-4799-3724-0/14/$31.00 ©2014 IEEE 297

ISBN: 978-1-4799-3724-0/14/$31.00 ©2014 IEEE 298

IV. PERFORMANCE RESULTS Measure Formula

ISBN: 978-1-4799-3724-0/14/$31.00 ©2014 IEEE 299

ISBN: 978-1-4799-3724-0/14/$31.00 ©2014 IEEE 300

You might also like