0% found this document useful (0 votes)

76 views6 pages

A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail

Evaluación comparativa del rendimiento de la detección de de spam y URL maliciosas en el correo electrónico

Uploaded by

Corporacion H21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views6 pages

A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail

Evaluación comparativa del rendimiento de la detección de de spam y URL maliciosas en el correo electrónico

Uploaded by

Corporacion H21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

A Comparative Performance Evaluation of Content

Based Spam and Malicious URL Detection in E-mail
Sunil B. Rathod Tareek M. Pattewar
PG Student, Department of Computer Engineering, Assistant Professor, Department of Computer Engineering,
North Maharashtra University, North Maharashtra University,
SES’s R. C. Patel Institute of Technology, Shirpur, India SES’s R. C. Patel Institute of Technology, Shirpur, India
[email protected] [email protected]

Abstract—E-mail communication is growing rapidly. Email

contains Text and URLs as content. Text can be suspicious, from II. RELATED WORK
undesired sender which contains un-required content and URLs Today’s internet is suffering from major problem known as
may be malicious which redirects users to phishing (malicious)
Email spam .It annoys users and make financial damage to
websites. Thus to stop such activity a spam and malicious URLs
detection system is required which benefits users by removing
companies. So far developed techniques to stop spam are
spam content and malicious URLs in Email. We have used data filtering methods .Spam emails are UBE also known as junk
mining approach like supervised classification which improves emails ,that are send to many recipients who have not
the systems accuracy and detects more amount of spam and requested or subscribe to this. Spam filter removes spam or
malicious URLs. un-required messages from email inbox . It also has Phishing
URLs which redirects users to phishing websites and seeking
Keywords— Bayesian Classifier, Decision Tree, Malicious URL personal credentials like username and password for financial
Detection, Spam Detection . purpose.
The existing work by Dhanalakshmi R and Chellapan C, did
I. INTRODUCTION implementation on malicious URL detection in Email. Lexical

E Mail is becoming fastest and economical mode of features, page rank, Host information are taken into
communication . The growing use of email has lead to consideration to classify URLs. Phishtank corpora has been
increased rate of spam emails. As it is information age users used and Bayesian classification is done to improve the
rely on emails to communicate with the globe. Business performance of system [1].
organization, individuals and all corporate industries are Georgios Paliouras et al., have presented learning method to
filter spam email. The two machine learning algorithm are
communicating with emails so that it is important part
considered for anti-spam filtering such as Naïve Bayesian and
concerning with education, business and personal usage.
Memory based learning approach and they are compared
concerning performance. So, that in both methods spam
Spam: filtering accuracy has improved and keyword based filter are
Spam are nothing but the unsolicited bulk emails (UBE) and used widely for email [2].
it’s another part is unsolicited commercial email .These spam Zhan Chuan, LU Xian-liang has given an application for
emails not only consume the user’s time but also the energy to email filtering using a new improved Bayesian filter. They
recognize the undesired messages, It is wasting the network have represented word frequency by vector weights and word
bandwidth. entropy is used for attribute selection then formula is derived
Content Based Spam Filter: which improves the performance apparently [3].
Content Based filter works on content of emails i.e., text, Vikas P. Deshpande et al., has presented an efficient method
URLs, main headers like subject for classification purpose. It of naïve Bayesian which blocks all spam emails without
is the method used to filter spam. blocking legitimate emails. To derive solution on this
The emails include two parts such as Body of the message problem, they considered statistical classifier such as naïve
and Header, Header stores the information about message like Bayesian anti-spam filter and content based spam filter which
from whom it is received, date and time of emails received, are adaptive in nature [4].
sender etc. Now emails ambiguous data is removed by Sheng et al., have shown that phishing websites are hacked as
preprocessing then text is extracted. soon as they are identified as phishing campaigns have two
hours of average life. So to block and identify such phishing
URLs they have extracted features like suspicious characters,
number of dots, ip address, hexadecimal character [5].

978-1-4673-7437-8/15/$31.00 ©2015 IEEE

49
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

Pawan et al., discovered malicious URLs by enhancing legitimate. (3.5)

blacklisting. One conflict with this method is that their Posterior probability of X-mail being spam = Prior probability
updation process is fast so they failed to identify phishing of spam mail × Likelihood of X-mail given spam. (3.6)
URLs in early hours of a phishing attack[6]. Finally we classify X-mail as spam as its class membership
Maher Abburrous et al., endeavor for a survey to recognize the has a largest posterior probability.
essential features which can develop accuracy and precision
for malicious URLs detection [7].
Congfu Xu et al, did a feature extraction on Base64 encoding B. Decision Tree C4.5:
of image with n-gram technique. A SVM needs to be trained C4.5 is developed by Ross Quinlan. It is Extension of ID3
for efficiently detecting spam images from legitimate images. and also known as statistical classifier. C4.5 creats decision
Its seen from experiment that It has improved the performance tree alike ID3 as it is successor of ID3 using the concept of
in terms of Accuracy, Precision and Recall [8]. “Information Entropy”: It is measure of homogeneity of a
R. Malathi et al., has given a new spam detection method by learning set. At each node of tree, C 4.5 selects attributes for
employing Text Categorization, using Supervised Learning dividing its sets into subsets. Normalized information gain is
with Bayesian Neural Network which uses Rule based the important criterion for splitting the data. Another term is
heuristic approach and statistical analysis tests to identify “Information Gain” which is the difference in information
“Spam” [9]. entropy associated with attribute. The attribute with highest
Sadeghian A. et al, had presented spam detection based on normalized information gain is choosen to make decision.
interval type-2 fuzzy sets. This system gives user more control performance of the system can be derived by Accuracy and
on categories of spam and permits the personalization of the Error Rate as follows;
spam filter [10].
CANTINA+ classifies phishing URLs and the feature set is
more exhaustive and obtained classification accuracy of (3.7)
92.3%. There exist various related researches and case studies
conducted on analyzing the feature set required to reduce the
exhaustiveness and time consumption [11].

(3.8)
III. ALGORITHM STUDY
IV. EXPERIMENT
A. Bayesian Classifier:
A. Implementation using Bayesian Classifier :
Naïve bayes classifier is statistical classifier famous for
Email filtering, Spam emails are identified by classification 1) Gmail Dataset and SpamAssassin Dataset:
This is the combination of the real time dataset downloaded
method. Naïve bayes uses tokens (words) with spam and ham
from Gmail and some emails from SpamAssassin in bulk
mails for Calculating probability to determine whether a mail
consisting of legitimate and spam emails. These emails are
is spam or not. considered for input to preprocess in HTML format.
Mathematical Formulation:
Bayesian classifier is based on Naïve Bayes theorem, Naïve 2) Text Preprocessing:
Bayes theorem can perform more sophisticated classification
methods. a) HTML Tag Removal:
To demonstrate the concept consider following equations [11];
The input Emails are in HTML format so this contains
Thus, we can write:
the tag, so to purify the text we need to remove the
Prior probability of Legitimate mail = Number of legitimate
tags.
mail / Total number of mail (3.1)
Prior probability of Spam mail = Number of spam mail / Total
number of mail (3.2) b) Stopword Removal:
Likelihood of X-mail given Legitimate = Number of This is the stopword list which consist of terms
legitimate mail in the vicinity of X-mails / Total number of including articles, prepositions, conjunctions and
legitimate mail. (3.3) certain high frequency words (such as some verbs,
Likelihood of X-mail given Spam = Number of spam mail in adverbs)
the vicinity of X-mails / Total number of spam mail. (3.4)
Posterior probability of X-mail being legitimate = Prior c) Tokenization :
probability of legitimate mail × Likelihood of X-mail given Lexical analysis also named as Tokenization, It

50
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

involves dividing the content of text into strings of DMOZ: It is used to get genuine and legitimate URLs of web
character called as Tokens. Filtering techniques uses links for Dataset of legitimate or non-phishing URLs.
white space (blank) removal and removal of
punctuation symbols in tokenizing. 2) URL Preprocessing :
a) IP Address:
d) Word Frequency: IP addresses and hexadecimal characters are used to hide the
This counts the frequency of words depending on its actual URLs. For example consider the URL
occurrence, This helps in deriving the word probability http://www.bankingcompany.com/online/transaction/website/
for spam and legitimate mails. Or phishing.html” which is shortened using the IP address
Term Frequency http://132.115.201.115 which looks like legitimate and not
Terms Frequency of term can be defined as the overall suspicious.
frequency of a term in the entire corpus i.e. in the entire email b) Hexadecimal Character
instances. To calculate the TF score, frequencies of terms in The URL can also be represented using hexadecimal base
individual emails were first calculated and then all the values with a ‘%’ symbol. It may represent any special
frequencies of a term in the entire set of emails were added to characters Spoofguard identified the ‘@’ and ‘-’ symbol most
find the TF Score for a particular term tk. Mathematically it prominent in phishing URLs. In URL a @ symbol is
can be expressed as considered as centre and its left side is dispensed and its right
side is thrown into phishing site. Consider the URL
http://[email protected]” will enter into
(4.1) “phishing site.com” and discards “www.citibank.com”. Such
types of methods uses mask for phishing site and pretense as
Terms having less TF Score will be eliminated and those legitimate sites.
having high score will be selected.

3) Bayesian Classifier:
It is method used for classification of text, It gives efficient
learning algorithm for data mining. This uses Bayes classifier
theorem which is based on conditional independence
assumption:

P (spam/word) = [P (word/spam) P (spam)] / p (word)

Considering spam probability for words, It evaluates Spam

and Legitimate mails for classification then gives performance
measurement.

4) Performance Measurement
Performance can be evaluated in terms of Accuracy,
Error, Time, Precision and Recall for Base method using
Bayesian Classifier .

B. Implementation using combination of both Bayesian

Classifier and Decision Tree C4.5 :
As Email body consist mainly of ‘TEXT’ and ‘URLs’, for
TEXT we do classification based on Bayesian Classifier and
The process undergo classification as in A) Base Method
using Bayesian Classifier and for URLs we use following
method of classification.
1) Phishtank Dataset and DMOZ Dataset
Phishtank is source of blacklisted phishing URLs which
admits user input and they are verified by users. It is set of
URLs which are suspected and reported as phishing URLs to Fig. 1. Combination Approach of Content Based Spam Detection using
phishtank. Bayesian Classifier and malicious URLs Detection in Email using Decision
Tree C4.5.

51
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

c) Suspicious Character
Presence of suspicious characters such as @ symbol and
other special binary characters such as (‘.’, ‘=’, ‘$’, ‘^’ and
etc.) either in the host or path name, can be suspicious
characters.

d) Number of Dots
In this number of dots are observed in given URLs of email to
predict whether a given URL is malicious or legitimate.

3) Decision Tree C4.5:

The Dataset from Phishtank is preprocessed and passed as
input to Decision Tree C4.5 for classification then
performance is measured in terms of Accuracy, Time and
Error.

4) Testing G-mail Dataset :

This is derived from g-mail consisting of spam and legitimate
mails .It also needs to be preprocessed in two terms : A)
Preprocessing for Text and B) Preprocessing on URLs to give
pure Text and URLs then classification is done by
combination of both Bayesian classifier and Decision Tree ( C
4.5). Further correctly classified instances (mails) and
Incorrectly classified instances (mails) are evaluated. Fig. 2. Accuracy of the Implementation for different volume of the Datasets

5) Performance Measurement:
As combination classification model builds of Bayesian and
Decision Tree C4.5, It is essential to derive performance on
the basis of parameters such as Accuracy (Correctly classified
instances), Error (Incorrectly classified instances ), precision
and Recall are evaluated .

Accuracy = (TN + TP) / (TN + TP + FN + FP)

Error = 100- (Accuracy)

Precision = (TP) / (TP+FP)

Recall = (TP) / (TP + FN)

Where,
TN: True Negative, Legitimate predicted as Legitimate
TP: True Positive, Spam predicted as Spam
FP: Legitimate predicted as Spam
FN: Spam predicted as Legitimate.

V. EXPERIMENTAL RESULTS AND PERFORMANCE EVALUATION

A. Computation of system’s efficiency under different volume Fig. 3. Error of the Implementation for different volume of the Datasets
of Dataset for combination approach using Bayesian
Classifier and Decision Tree (C4.5) Classifier:

52
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

Fig. 4. Time taken for Implementation for different volume of the Datasets Fig. 6. Recall of the Implementation for different volume of the Datasets

B. Tabular Results:
TABLE I
Implementation Results using Bayesian Classifier

TABLE II
Implementation Results using Combination of Bayesian
Classifier and Decision Tree C4.5 Classifier

Fig. 5. Precision of Implementation for different volume of the Datasets

53
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

ACKNOWLEDGMENT
TABLE III We are sincerely grateful to all the persons who help us
Comparative Performance Evaluation of A) Implementation through this work to make it successful.
using Bayesian Classifier and B) Implementation using
Bayesian Classifier and Decision Tree (C4.5) Classifier
Where, A - Bayesian Classifier and B - Bayesian and C4.5 REFERENCES
Classifier
[1] Dhanalakshmi Ranganayakulu and Chellappan C., “Detecting malicious
URLs in E-Mail - An implementation”, in AASRI Conference on
Intelligent Systems and Control, Vol. 4 , pg. 125–131, 2013.
[2] G. Paliouras et al.,“An Evaluation of Naive Bayesian Anti-Spam
Filtering”, in Proceedings of the Workshop on Machine Learning in the
New Information Age, 11th European Conference on Machine Learning,
Barcelona, Spain, pages 9–17, 2000.
[3] Zhan Chuan et al., “An Improved Bayesian with Application to Anti-
Spam Email”, in Journal of Electronic Science and Technology of
China, Vol.3 No.1, Mar. 2005.
[4] Vikas P. Deshpande and Robert F. Erbacher, “An Evaluation of Naïve
Bayesian Anti-Spam Filtering Techniques”, in Proceedings of the 2007
IEEE Workshop on Information Assurance United States Military
Academy, West Point, NY 20-22 June 2007.
[5] Sheng, S. et al.,“An empirical analysis of phishing blacklists”, in
Proceedings of the CEAS’09, 2009.
VI. CONCLUSIONS [6] Pawan Prakash et al.,“PhishNet:Predictive Blacklisting to Detect
Phishing Attacks”, in Proceedings of the IEEE Infocom, pp.1-5, 2010.
[7] Maher Aburrous et al., “Experimental Case Studies for Investigating E-
We have integrated the content based spam detection using Banking Phishing Techniques and Attack Strategies”, Cognitive
Bayesian Classifier and phishing URLs detection using Computing, DOI 10.1007/s12559-010-9042-7, Vol. 2, pp. 242-253,
Decision Tree C4.5. Thus we found that performance 2010.
evaluated for combination approach of Bayesian classifier and [8] Congfu Xu et al.,“An approach to image spam filtering based on base64
encoding and N-Gram feature extraction”, in IEEE International
Decision Tree C4.5 are improved as compared to Conference on Tools with Artificial Intelligence, DOI
implementation using content based spam detection by 10.1109/ICTAI.2010.31, 2010.
Bayesian Classifier. [9] R. Malathi, “Email Spam Filter using Supervised Learning with
Bayesian Neural Network”, Computer Science, H.H. The Rajah’s
We have evaluated the results across different volume of College, Pudukkottai-622 001,Tamil Nadu, India, Int J Engg Techsci
dataset, Implementation using Bayesian classifier gives 94.86 Vol 2(1),89-100, 2011.
% accuracy whereas The Combination approach of Bayesian [10] Sadeghian, A and Ariaeinejad, R., “Spam detection system: A new
Classifier and Decision Tree C4.5 gives 95.54 % accuracy So, approach based on interval type-2 fuzzy sets”, in IEEE CCECE -000379,
2011.
We can say that combination approach has improved the
[11] Xiang, G. et al., “CANTINA+: A feature-rich machine learning
results in terms of Accuracy and It became the efficient framework for detecting phishing Web sites”. in ACM Trans. Inf. Syst.
method for classification of content based spam detection and Secur. Vol.14, No.2, pp.1-21, 2011.
malicious URL detection in integrated form. [12] Naïve Bayes Classifier.(2014, Dec) [online] Available :
http://www.statsoft.com/textbook/naive-bayes-classifier .

Texting Factory Notes
100% (1)
Texting Factory Notes
43 pages
IBM Cloud Registration Procedure Using Edunet Mail ID
No ratings yet
IBM Cloud Registration Procedure Using Edunet Mail ID
3 pages
Salesforce Agnetforce Certification Q&A
No ratings yet
Salesforce Agnetforce Certification Q&A
30 pages
Unit2 Advanced Concepts of Modeling in AI Class X 2025-26 Part 1
No ratings yet
Unit2 Advanced Concepts of Modeling in AI Class X 2025-26 Part 1
77 pages
PM Fantics: Gopika Radhakrishnan Naafiah Sadique
No ratings yet
PM Fantics: Gopika Radhakrishnan Naafiah Sadique
13 pages
Enhancing Spam Detection Using Harris Hawks Optimization Algorithm
No ratings yet
Enhancing Spam Detection Using Harris Hawks Optimization Algorithm
8 pages
Quantive Junior Full-Stack Assignment
No ratings yet
Quantive Junior Full-Stack Assignment
5 pages
Ijirt156181 Paper
No ratings yet
Ijirt156181 Paper
5 pages
Related Work
No ratings yet
Related Work
5 pages
How Do I Fix Issues With Random MAC Addresses Under MacOS Sequoia - Equinux FAQ
No ratings yet
How Do I Fix Issues With Random MAC Addresses Under MacOS Sequoia - Equinux FAQ
2 pages
Email Exchange With Agoda
No ratings yet
Email Exchange With Agoda
2 pages
Project Report Emaildetection 4 44
No ratings yet
Project Report Emaildetection 4 44
41 pages
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
No ratings yet
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
12 pages
Lesson
No ratings yet
Lesson
3 pages
Icdici 274 Spam Sms
No ratings yet
Icdici 274 Spam Sms
6 pages
Job Description - MBA - GGM MDU BD
No ratings yet
Job Description - MBA - GGM MDU BD
3 pages
UNIT-II PHP 2022 New
No ratings yet
UNIT-II PHP 2022 New
22 pages
(In 30 Minutes Book) Lamont, Ian - Google Drive - Docs in 30 Minutes - The Unofficial Guide To Google Drive, Docs, Sheets - Slides-In 30 Minutes Guides - I30 Media Corporation (2016)
No ratings yet
(In 30 Minutes Book) Lamont, Ian - Google Drive - Docs in 30 Minutes - The Unofficial Guide To Google Drive, Docs, Sheets - Slides-In 30 Minutes Guides - I30 Media Corporation (2016)
124 pages
A Method To Measure The Efficiency of Phishing Emails Detection Features
No ratings yet
A Method To Measure The Efficiency of Phishing Emails Detection Features
5 pages
The Smartest Ways To Use Email at Work - WSJ
No ratings yet
The Smartest Ways To Use Email at Work - WSJ
5 pages
Gestión de Inventario 2021
No ratings yet
Gestión de Inventario 2021
10 pages
Report (1) 1
No ratings yet
Report (1) 1
35 pages
Administración 1
No ratings yet
Administración 1
15 pages
Costos Contabilidad 2023
No ratings yet
Costos Contabilidad 2023
16 pages
Market Research and The New Product Development PR
No ratings yet
Market Research and The New Product Development PR
25 pages
Maths Answers
No ratings yet
Maths Answers
4 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Aws - SQS
No ratings yet
Aws - SQS
2 pages
Bba141 FPD 2 2023 1
No ratings yet
Bba141 FPD 2 2023 1
18 pages
Topic 6 - Processing Mail
100% (1)
Topic 6 - Processing Mail
25 pages
1822 B Deleted Merged Cropped
No ratings yet
1822 B Deleted Merged Cropped
40 pages
1822 B Deleted
No ratings yet
1822 B Deleted
38 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
2023 V14i805
No ratings yet
2023 V14i805
7 pages
1 Artículo 2022 Ingles
No ratings yet
1 Artículo 2022 Ingles
28 pages
For OPD or Consultant
No ratings yet
For OPD or Consultant
2 pages
Motivación y Desempeño 1
No ratings yet
Motivación y Desempeño 1
14 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
William Clark SA
No ratings yet
William Clark SA
3 pages
Job Reference Letter PDF
100% (1)
Job Reference Letter PDF
6 pages
Id - 3747 - Literature Review
No ratings yet
Id - 3747 - Literature Review
3 pages
Single Sign-On (SSO)
No ratings yet
Single Sign-On (SSO)
2 pages
NLP Report
No ratings yet
NLP Report
19 pages
Network Forensic Frameworks - Survey and Research Challenges
100% (1)
Network Forensic Frameworks - Survey and Research Challenges
14 pages
Research On High Security of IP Tunnel in Virtual Private Network
No ratings yet
Research On High Security of IP Tunnel in Virtual Private Network
6 pages
Wren 2021
No ratings yet
Wren 2021
13 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
Classifying Phishing URLs Using Recurrent Neural Networks
No ratings yet
Classifying Phishing URLs Using Recurrent Neural Networks
8 pages
Email (Research) 3
No ratings yet
Email (Research) 3
7 pages
External Escalation Matrix - 2021
No ratings yet
External Escalation Matrix - 2021
1 page
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Spam Classification Based On Supervised Learning U
No ratings yet
Spam Classification Based On Supervised Learning U
6 pages
Moutafis EWS 098
No ratings yet
Moutafis EWS 098
8 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
Published Paper
No ratings yet
Published Paper
9 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Cyber Risk in IoT Systems
No ratings yet
Cyber Risk in IoT Systems
27 pages
ECE Online Classes Schedule-III Year
No ratings yet
ECE Online Classes Schedule-III Year
1 page
Detection and Analysis Cerber Ransomware Based On Network Forensics Behavior
No ratings yet
Detection and Analysis Cerber Ransomware Based On Network Forensics Behavior
9 pages
Information Systems and Computer Engineering
No ratings yet
Information Systems and Computer Engineering
101 pages
Tyler Prescott: Complete A Cover Letter Based On A Template
No ratings yet
Tyler Prescott: Complete A Cover Letter Based On A Template
4 pages
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
No ratings yet
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
6 pages
Appendix D - Bank Details Form
No ratings yet
Appendix D - Bank Details Form
2 pages
Tax Invoice: Tpin 3 0 0 0 7 3 2 5 0
No ratings yet
Tax Invoice: Tpin 3 0 0 0 7 3 2 5 0
1 page
Class II Discontinuation Notification
No ratings yet
Class II Discontinuation Notification
25 pages
Machine Learning Based Spam E-Mail Detection
No ratings yet
Machine Learning Based Spam E-Mail Detection
10 pages
Training
No ratings yet
Training
11 pages
Decision Tree Model For Email Classification: Ivana Čavor
No ratings yet
Decision Tree Model For Email Classification: Ivana Čavor
4 pages
Economic Justification For Automation
No ratings yet
Economic Justification For Automation
6 pages
Ecocert' (Amm: OF Auxiliaries
No ratings yet
Ecocert' (Amm: OF Auxiliaries
2 pages
1 s2.0 S0950705106001390 Main
No ratings yet
1 s2.0 S0950705106001390 Main
6 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
02 Alavi - Whole - Thesis
No ratings yet
02 Alavi - Whole - Thesis
314 pages
A Benchmark of Machine Learning Approaches For Credit Score Prediction
No ratings yet
A Benchmark of Machine Learning Approaches For Credit Score Prediction
8 pages
Artículo Reporte de Cifras Inmobiliarias en Perú
No ratings yet
Artículo Reporte de Cifras Inmobiliarias en Perú
33 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
1 s2.0 S1389128622000469 Main - Good
No ratings yet
1 s2.0 S1389128622000469 Main - Good
15 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
No ratings yet
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
7 pages
Optimizing Spam Filtering With Machine Learning
No ratings yet
Optimizing Spam Filtering With Machine Learning
35 pages
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
No ratings yet
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
4 pages
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
No ratings yet
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
9 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
Article 28
No ratings yet
Article 28
5 pages
Barangay Management System: Claire Ong Carpio
No ratings yet
Barangay Management System: Claire Ong Carpio
7 pages
Major-Final Research Paper
No ratings yet
Major-Final Research Paper
3 pages
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
No ratings yet
Analysis of Spam Email Filtering Through Naive Bayes Algorithm Across Different Datasets
4 pages
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
No ratings yet
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
17 pages
A Study of Machine Learning Algorithms On Email Spam Classification
No ratings yet
A Study of Machine Learning Algorithms On Email Spam Classification
10 pages
PPT
0% (1)
PPT
15 pages
Spam Email Using Machine Learning
No ratings yet
Spam Email Using Machine Learning
13 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
No ratings yet
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
5 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
No ratings yet
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
7 pages
Naive Bayes Spam Filte....
No ratings yet
Naive Bayes Spam Filte....
10 pages
Content Based Spam Detection in Email Us PDF
No ratings yet
Content Based Spam Detection in Email Us PDF
5 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
100% (2)
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
58 pages
A Comparative Approach To Email Classification Using Naive Bayes Classifier and Hidden Markov Model
No ratings yet
A Comparative Approach To Email Classification Using Naive Bayes Classifier and Hidden Markov Model
6 pages
ETCW15
No ratings yet
ETCW15
4 pages
Chung-Kwei Spam IA
No ratings yet
Chung-Kwei Spam IA
18 pages
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
No ratings yet
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
6 pages
Implementation of Naïve Bayesian Spam Filter Algorithm
No ratings yet
Implementation of Naïve Bayesian Spam Filter Algorithm
16 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
SpamAssassin: A practical guide to integration and configuration
From Everand
SpamAssassin: A practical guide to integration and configuration
Alistair McDonald
No ratings yet

A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail

Uploaded by

A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail

Uploaded by

2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

A Comparative Performance Evaluation of Content

Abstract—E-mail communication is growing rapidly. Email

978-1-4673-7437-8/15/$31.00 ©2015 IEEE

Pawan et al., discovered malicious URLs by enhancing legitimate. (3.5)

P (spam/word) = [P (word/spam) P (spam)] / p (word)

Considering spam probability for words, It evaluates Spam

B. Implementation using combination of both Bayesian

3) Decision Tree C4.5:

4) Testing G-mail Dataset :

Accuracy = (TN + TP) / (TN + TP + FN + FP)

Error = 100- (Accuracy)

Precision = (TP) / (TP+FP)

Recall = (TP) / (TP + FN)

V. EXPERIMENTAL RESULTS AND PERFORMANCE EVALUATION

Fig. 5. Precision of Implementation for different volume of the Datasets

You might also like