Enhancing Mail Server Security Using Machine
Enhancing Mail Server Security Using Machine
ISSN No:-2456-2165
Abstract— Even with the advancement in technologies for Thus, our algorithm will entirely run on a local network in
providing at most security to users, there is always a glitch conjugation with the mail server. The mail server data will
in the implementation or the algorithm used, which means neither be shared to third parties nor will be stored there. in
compromise on user’s security or rather privacy. Most of this way we can provide high level security by leveraging the
the standalone servers do not provide much protection benefits of Machine Learning to those who care about data
beyond password security and spam control. The paper privacy and don’t want to systems. With most of the
aims to build a machine learning model that moulds proprietary software used for email transaction, almost all of
according the user’s e-mail usage pattern ,and hence, no them illegally track the email behavior of the software user’s
third-party data sets are needed to train the model. As we and then collect these data-sets and sell it out in the market for
progress, further, in the paper we focus on building a higher shares. This is being done without the user consent ,
unsupervised machine learning model to improvise the thereby violating their civil liberty rights.
security of mail server protocols such as SMTP, IMAP, So for those of the user’s who are concerned about safe-
SSH, etc. guarding and protecting their data from the third party services
and run their own mail-servers to allow mail transmission we
Keywords - Mail Server Security, Machine Learning , build a machine learning model that will incorporate
Security SMTP, IMAP and SSH protocols. unsupervised methodologies to build individually according to
the user’s email behavior.
I. INTRODUCTION
Whenever we speak of mail related security issues, we think II. IMPLEMENTATION
of it as message applied security measure, and more often to
In This Survey Paper the existing methodologies of Spam
antivirus and antispam protection. However, this is only one
Detection and Authentication led us to arriving at a much
stage in the complicated process of securing the mail sever.
more reliable way of authenticating users to their mail
Most of us use password based system to access our mails
which provides only a layer of security. If the password is accounts.
known to someone it can be easily manipulated to send spams A. Recognizing Abused Mail Accounts
or used to the extract the user sensitive information. One way The methodology used in [1] detects compromised mail
to enhance security is limiting the number of connections or accounts using the Metadata from MTA (Mail Transfer Agent
authentication errors, the maximum number of commands or e.g: SMTP)with information like IP, country and Delivery
setting a time-out for the sessions, number and size of email Status Notification (DSN) , Sent time , Message Identifier
messages. (Message-ID):
Most common Mail Server security measures these days is • IP-Geolocating – Country of the authentication point
only concerned with authentication of user and building From Source IP
security policies based on a fixed set of rules. In this project, as
an alternative to these simple and fixed rules we take a • Delivery Status Notification - Status of Delivery
different approach by building a Machine Learning System that from one MTA to Another.
will be trained with the email habits of a user through its mail
server and then provide security against account compromise • Message-ID -Correlates information between
vulnerabilities. Unlike, most Machine Learning models which incoming and outgoing messages through MTA.
need to be trained with large data sets in order to further use it • Timestamp- To correlate information in a
for decision making, here, we won’t rely on third party data chronological order.
sets, instead we will collect data from each mail server. Thus, • Further Information- Like, No. Of Recipients,
for each instance our ML Algorithm will start from Zero-Data Message Size etc.
and then grow upwards. Also, Most Machine Learning
algorithms require a data collection system which can in turn The Metadata from the MTA (Mail Transfer Agents) and
act as a Surveillance system. MDA (Mail Delivery Agents) can be used to detect abused
The methodology used here is the novel keyword C. Tracking User’s Email Behavior through Usage Pattern
extraction technique from ASR output[2]. This technique can BDI model is an agent model, which is widely used to
be preferred over the other as it maximizes the chances of simulate user behavior in complex and dynamic environment.
including the overall potential information that is necessary. It When a user clicks on any malicious link sent via email the
means that we can obtain only the important or necessary user host is subjected to virus. This action of the user
information for our model. The server logs that we obtain is operation on the mail is called as :user’s email behavior. To
a long text (string) from which we need very particular better secure the user in a way to protect them from the
information like IP address, location etc. This keyword malicious URL links or spam, authentication problems, etc
extraction model helps us achieve that requirement. After BP-BDI model [3] proposes a machine learning model that
extracting these necessary keywords, these will be clustered simulates user’s email behavior accurately and effectively.
to build topically-separated queries and these are finally
When trying to understand a user’s behavior there are
merged into a set which is ranked orderly.
notably two difficulties: the difference and the complexity of
Keyword Extraction Methods: The dataset for this particular online user behavior. Difference in the user’s behavior
project can be a multi-variate or multi-frequency data. There supposedly refers to the individual behavior pattern in mail
are many models that can handle such datasets with its activities such as: sending, receiving, deleting the email and so
semantic representations such as WordNet, LSA, PLSA or on. Due to the complex and dynamic environments, user
LDA[2]. Some supervised machine learning methods are also online behavior becomes complicated, that is, the complexity
of user behavior. Hence, it means that we cannot adopt the
used for such keyword extraction models.
traditional procedure-oriented and object-oriented modeling
method to track user’s email behavior.
Diverse Keyword Extraction Algorithm: This paper
introduces various modeling techniques and its advantages for
The main focus here is to track the user’s email usage pattern
representing multi-frequency data. It first fragments the data,
select content words as keywords and applies various to better train the model to adapt to individual user’s mail
summarization methods. The important advantage of this server. Therefore, in order to track user’s email behavior
model is that the conversation of data fragments here is better, we modify the Belief-Desire-Intention model in two
maximized. aspects: one where it carries out automatic updates to the
belief set, and the other where a behavioral learning module is
Fig.1 shows keyword extraction rates for variable datasets. constructed based on BP(back propagation) neural network. In
the BP-BDI model, the learning module mainly studies the
behavior of clicking on URL, which makes user’s email
behavior involves network security is accurately simulated. In
Management module mainly realizes functions such as: Contrary to ART 1, where there are bottom-up and top-
User information management, Domain keys private key down weight vectors, Fuzzy ART has only one weight vector
management, Domain information management, Mailbox wj for each category j, which is initialized to wj,1=wj,2=?=1
management, System configuration done by administrator. when the category is uncommitted.
Inter-domain authentication is used to authenticate the mail Before putting the sample into ART, it has to be
sender’s domain. It is a combination of three authentications: normalized to [0, 1] and enhanced with complement coding to
• Domain keys based on the encryption: It is done, if prevent category proliferation problems[4]. The clusters are
authentication is successful, then “source address formed in layer F2. When an input x is presented to layer F1,
authentication is successful” and the information that both committed neurons and one uncommitted neuron
Domain keys authentication is successful are inserted compete in a winner take all manner to select the one with
into the mail’ header, authentication is over. maximum activation considering to the formula below:
CONCLUSION
We have seen how the extent of security of a mail server
has remained narrow and restricted to classic methods of
security and authentication. Spam control is one such aspect of
a Mail Server that has shown stronger intent and efforts
towards detecting a abused user over a small number of mail
instances[1].Inheriting from the ideas of spam detection to
detect a abused user through his logs[2], we can make use of
Fig 2: Accuracy Of Users Email Behavior Simulation in BPNN , BDI , BP- models like BP-BDI [3] and Ipv6 addressing [5] to draw a line
BDI Models [3] between a abused and a legitimate user , and feed such
processed data to a unsupervised self learning machine
The BP - BDI is a model used to track user's email learning model that can train itself. A stronger, safer, adaptive,
behavior in a dynamic environment by constructing a learning and a much independent security model can be established for
model with BP neural network , the above graph shows an a mail server that would secure it beyond the already existing
accuracy of 80% while simulating the email user's behavior. password based authentication.
TABLE 1: Comparision Between TS , WF , D(.75) Extraction
Methods [2] REFERENCES
[1] Schäfer, C. (2017, April). Detection of compromised
email accounts used for spamming in correlation with origin-
destination delivery notification extracted from metadata. In
2017 5th International Symposium on Digital Forensic and
Security (ISDFS) (pp. 1-6). IEEE.
[7] Yuan, H., Maple, C., Chen, C., & Watson, T. (2018).
Cross-device tracking through identification of user typing
behaviours. Electronics Letters, 54(15), 957-959.
[8] Khanji, S., Jabir, R., Ahmad, L., Alfandi, O., & Said, H.
(2016, April). Evaluation of Linux SMTP server security
aspects—A case study. In 2016 7th International Conference
on Information and Communication Systems (ICICS) (pp. 252-
257). IEEE