Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability
Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability
Abstract—Over the past years, research has highlighted the legitimate e-mails that are misclassified as spam [1]. This
importance of enhancing the performance of e-mail spam filters paper will focus on the problem of false positive e-mails trying
to eliminate the risk of false negative e-mails. On the other hand, to identify its reasons and propose an algorithm that could help
the problem of false positive e-mails got less attention despite the in reducing its ratio.
fact that it is critical and may cause a failure in the delivery of
important e-mails. The organization of this paper is as follows. The first
section lists some of the pervious work done in this area. The
The aim of this research is to provide a solution to reduce the second section describes the details of the algorithm that we
rate of false positive e-mails. It addresses the problem by developed in order to reduce the false positive rate of e-mails.
exploring the behavior of the existing e-mail spam filters and The third section presents the experiment that we carried out in
highlighting the different reasons behind the failure of e-mail order to evaluate the performance of the proposed algorithm.
delivery. Based on this investigation, we developed an algorithm Finally, the last section concludes the work that has been done
that helps e-mail users in ensuring the deliverability of their e- in this research and lists the future work that lays the
mails. The proposed algorithm is based on reversing the foundation for further enhancements.
mechanism of spam filters on the client-side.
nH → H
Ham Recall
nH → H + nH → S
nH → H
Ham Precision
nH → H + nS → H
nS → S
Spam Recall
nS → S + nS → H
nS → S
Spam Precision
nS → S + nH → S
nH → H + nS → S
Accuracy
nH → H + nS → S + nS → H + nH → S
C. Experiment Results
The experiment involves resending ten e-mails: five ham e-
mails and five spam e-mails using the application we
Fig. 1. The Interface of the Webmail Application developed. The algorithm is applied on the messages before
sending them and some changes are made based on the
recommendations of the algorithm.
An experiment was conducted to test the overall
performance of the proposed algorithm. The following The results showed that two out of five of the false positive
subsections describe the experiment settings. e-mails landed in the inbox, and the rest have been classified as
spam. On the other hand, all the five spam e-mails have been
A. Data Set classified as spam. So this gives us the following values:
The data set used in the experiment is a collection of emails nH → H = 2 , nH → S = 3 , nS → S = 5 , nS → H = 0 . We used
from an e-mail user. The e-mails were classified as spam by these values to calculate the formulas in Table I. The results of
the user’s spam filter. However, the user considers those e- the evaluation are listed in Table II.
mails as legitimate. The false positive e-mails are used to
evaluate the purpose of this research, which is ensuring the TABLE II. EVALUATION RESULTS
deliverability of legitimate e-mails to the recipient.
Measure Result
The data set also includes a set of actual spam e-mails.
Spam e-mails are used to ensure that the algorithm will not FPR 60%
pass actual spam. We want the performance of the proposed
FNR 0%
algorithm to be balanced so that it passes legitimate e-mails
without affecting the security of e-mail users by increasing the Ham Recall 40%
chance of receiving actual spam.
Ham Precision 100%
B. Evaluation Measures Spam Recall 100%
There are various measures to evaluate the performance of
Spam Precision 62.5%
an algorithm. Here, we are going to focus on five measures:
false positive rate (FPR), false negative rate (FNR), recall, Accuracy 70%
precision and accuracy.