0% found this document useful (0 votes)
158 views6 pages

Paper 19-Malicious URL Detection Based On Machine Learning

This document discusses malicious URL detection using machine learning techniques. It proposes a system that uses a new set of URL attributes and behaviors to train machine learning classifiers, namely support vector machines and random forests, to detect malicious URLs. The key contributions are the new features extracted from URLs' static and dynamic behaviors. Experimental results show that the proposed attributes and behaviors can help improve malicious URL detection significantly compared to existing methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views6 pages

Paper 19-Malicious URL Detection Based On Machine Learning

This document discusses malicious URL detection using machine learning techniques. It proposes a system that uses a new set of URL attributes and behaviors to train machine learning classifiers, namely support vector machines and random forests, to detect malicious URLs. The key contributions are the new features extracted from URLs' static and dynamic behaviors. Experimental results show that the proposed attributes and behaviors can help improve malicious URL detection significantly compared to existing methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 11, No. 1, 2020

Malicious URL Detection based on Machine Learning


Cho Do Xuan1, Hoa Dinh Nguyen1 Tisenko Victor Nikolaevich3
Information Security dept, Posts and Telecommunications Systems of automatic Design
Institute of Technology, Hanoi, Vietnam1, 2 Peter the Great St. Petersburg Polytechnic University
Information Assurance dept, FPT University, Hanoi, Russia, St.Petersburg
Vietnam1 Polytechnicheskaya, 29

Abstract—Currently, the risk of network information according to this statistic, the three main URL spreading
insecurity is increasing rapidly in number and level of danger. techniques, which are malicious URLs, botnet URLs, and
The methods mostly used by hackers today is to attack end-to- phishing URLs, increase in number of attacks as well as danger
end technology and exploit human vulnerabilities. These level.
techniques include social engineering, phishing, pharming, etc.
One of the steps in conducting these attacks is to deceive users From the statistics of the increase in the number of
with malicious Uniform Resource Locators (URLs). As a results, malicious URL distributions over the consecutive years, it is
malicious URL detection is of great interest nowadays. There clear that there is a need to study and apply techniques or
have been several scientific studies showing a number of methods methods to detect and prevent these malicious URLs.
to detect malicious URLs based on machine learning and deep
learning techniques. In this paper, we propose a malicious URL Regarding the problem of detecting malicious URLs, there
detection method using machine learning techniques based on are two main trends at present as malicious URL detection
our proposed URL behaviors and attributes. Moreover, bigdata based on signs or sets of rules, and malicious URL detection
technology is also exploited to improve the capability of detection based on behavior analysis techniques [1, 2]. The method of
malicious URLs based on abnormal behaviors. In short, the detecting malicious URLs based on a set of markers or rules
proposed detection system consists of a new set of URLs features can quickly and accurately detect malicious URLs. However,
and behaviors, a machine learning algorithm, and a bigdata this method is not capable of detecting new malicious URLs
technology. The experimental results show that the proposed that are not in the set of predefined signs or rules. The method
URL attributes and behavior can help improve the ability to of detecting malicious URLs based on behavior analysis
detect malicious URL significantly. This is suggested that the techniques adopt machine learning or deep learning algorithms
proposed system may be considered as an optimized and friendly to classify URLs based on their behaviors. In this paper,
used solution for malicious URL detection. machine learning algorithms are utilized to classify URLs
based on their attributes. The paper also includes a new URL
Keywords—URL; malicious URL detection; feature extraction; attribute extraction method.
feature selection; machine learning
In our research, machine learning algorithms are used to
I. INTRODUCTION classify URLs based on the features and behaviors of URLs.
Uniform Resource Locator (URL) is used to refer to The features are extracted from static and dynamic behaviors
resources on the Internet. In [1], Sahoo et al. presented about of URLs and are new to the literature. Those newly proposed
the characteristics and two basic components of the URL as: features are the main contribution of the research. Machine
protocol identifier, which indicates what protocol to use, and learning algorithms are a part of the whole malicious URL
resource name, which specifies the IP address or the domain detection system. Two supervised machine learning algorithms
name where the resource is located. It can be seen that each are used, Support vector machine (SVM) and Random forest
URL has a specific structure and format. Attackers often try to (RF).
change one or more components of the URL's structure to The paper is organized as follows. Section II reviews some
deceive users for spreading their malicious URL. Malicious recent works in the literature on malicious URL detection. The
URLs are known as links that adversely affect users. These proposed malicious URLs detection system using machine
URLs will redirect users to resources or pages on which learning is presented in Section III. In this section, the new
attackers can execute codes on users' computers, redirect users features for URLs detection process are also described in
to unwanted sites, malicious website, or other phishing site, or details. Experimental results and discussions are provided in
malware download. Malicious URLs can also be hidden in Section IV. The paper is concluded by Section V.
download links that are deemed safe and can spread quickly
through file and message sharing in shared networks. Some II. RELATED WORKS
attack techniques that use malicious URLs include [2, 3, 4]:
Drive-by Download, Phishing and Social Engineering, and A. Signature based Malicious URL Detection
Spam. Studies on malicious URL detection using the signature
sets had been investigated and applied long time ago [6, 7, 8].
According to statistics presented in [5], in 2019, the attacks
Most of these studies often use lists of known malicious URLs.
using spreading malicious URL technique are ranked first
Whenever a new URL is accessed, a database query is
among the 10 most common attack techniques. Especially,

148 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 1, 2020

executed. If the URL is blacklisted, it is considered as downloading all the specified sites. and run them in a
malicious, and then, a warning will be generated; otherwise sandbox browser environment.
URLs will be considered as safe. The main disadvantage of this
approach is that it will be very difficult to detect new malicious  Some other tools: Among aforementioned typical tools,
URLs that are not in the given list. there are some other URL checking tools, such as
UnShorten.it, VirusTotal, Norton Safe Web,
B. Machine Learning based Malicious URL Detection SiteAdvisor (by McAfee), Sucuri, Browser Defender,
There are three types of machine learning algorithms that Online Link Scan, and Google Safe Browsing
can be applied on malicious URL detection methods, including Diagnostic.
supervised learning, unsupervised learning, and semi- From the analysis and evaluation of malicious URL
supervised learning. And the detection methods are based on detection tools presented above, it is found that the majority of
URL behaviors. current malicious URL detection tools are signature-based
In [1], a number of malicious URL systems based on URL detection systems. Therefore, the effectiveness of these
machine learning algorithms have been investigated. Those tools is limited.
machine learing algorithms include SVM, Logistic Regression,
III. MALICIOUS URL DETECTING USING MACHINE
Nave Bayes, Decision Trees, Ensembles, Online Learning, ect.
In this paper, the two algorithms, RF and SVM, are used. The LEARNING
accuracy of these two algorithms with different parameters A. The Model
setups will be presented in the experimental results. Fig. 1 presents the proposed malicious URL detection
The behaviors and characteristics of URLs can be divided system using machine learning. The malicious URL detection
into two main groups, static and dynamic. In their studies [9, model using machine learning contains two stages: training and
10, 11] authors presented methods of analyzing and extracting detection.
static behavior of URLs, including Lexical, Content, Host, and
 Training stage: To detect malicious URLs, it is
Popularity-based. The machine learning algorithms used in
necessary to collect both malicious URLs and clean
these studies are Online Learning algorithms and SVM.
URLs. Then, all the malicious and clean URLs are
Malicious URL detection using dynamic actions of URLs is
correctly labeled and proceeded to attribute extraction.
presented in [12, 13]. In this paper, URL attributes are
These attributes will be the best basis for determining
extracted based on both static and dynamic behaviors. Some
which URLs are clean and which are malicious. Details
attribute groups are investigated, including Character and
of these attributes will be presented in details in this
semantic groups; Abnormal group in websites and Host-based
paper. Finally, this dataset is divided into 2 subsets:
group; Correlated group.
training data used for training machine learning
C. Malicious URL Detection Tools algorithms, and testing data used for testing process. If
 URL Void: URL Void is a URL checking program the classification performance of the machine learning
using multiple engines and blacklists of domains. Some model is good (high classification accuracy), the model
examples of URL Void are Google SafeBrowsing, will be used in the detection phase.
Norton SafeWeb and MyWOT. The advantage of the  Detection phase: The detection phase is performed on
Void URL tool is its compatibility with many different each input URL. First, the URL will go through
browsers as well as it can support many other testing attribute extraction process. Next, these attributes are
services. The main disadvantage of the Void URL tool input to the classifier to classify whether the URL is
is that the malicious URL detection process relies clean or malicious.
heavily on a given set of signatures.
B. URL Attribute Extraction and Selection
 UnMask Parasites: Unmask Parasites is a URL testing In [1], the authors listed some main attribute groups for
tool by downloading provided links, parsing Hypertext malicious URL detection as follows.
Markup Language (HTML) codes, especially external
links, iframes and JavaScript. The advantage of this Lexical features: these features include URL length, main
tool is that it can detect iframe fast and accurately. domain length, maximum token domain length, path average
However, this tool is only useful if the user has length, average token length in domain.
suspected something strange happening on their sites.
Host-based Features: these features are extracted from the
 Dr.Web Anti-Virus Link Checker: Dr.Web Anti-Virus host characteristics of the URLs. These attributes indicate the
Link Checker is an add-on for Chrome, Firefox, Opera, location of malicious servers, the identity of malicious servers,
and IE to automatically find and scan malicious content the degree of impact of several host-based features that
on a download link on all social networking links such contribute the URL's malicious level.
as Facebook, Vk.com, Google+.
Content-based Features: these features are acquired when a
 Comodo Site Inspector: This is a malware and security whole web page is downloaded. The workload of these features
hole detection tool. This helps users check URLs or is quite heavy, since a lot of information needs to be extracted,
enables webmasters to set up daily checks by and there may be security concerns about accessing that URL.
However, with more information available about a particular

149 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 1, 2020

site, it is expected to create a better prediction model. The for each particular experimental dataset. In this paper, the use
content-based features of a website can be extracted primarily of all three attribute groups is recommended. However, in each
from its HTML content and the use of JavaScript. attribute group some new attributes and characteristics of the
URL to optimize the ability to detect malicious URLs are
Above are the three main attribute groups commonly used proposed. The new attributes for malicious URL detection in
by researchers to detect malicious URLs. However, each study this research are listed in Tables I, II, and III.
has its own decision on suitable attributes and characteristics
Training stage Detection stage

URL URL

Feature extraction, Labeling Feature extraction

Classification
Machine learning
algorithm Training

Safe URL
Malicious URL

Fig. 1. Malicious URL Detection Model using Machine Learning.

TABLE. I. LIST OF URL FEATURES IN LEXICAL FEATURE GROUP

No Feature group Feature Data type Description


1 NumDots numeric Number of character '.' in URL
2 SubdomainLevel numeric Number of subdomain levels
3 PathLevel numeric The depth of URL
4 UrlLength numeric The length of URL
5 NumDash numeric Number of the dash character '-'
6 NumDashInHostname numeric Number of dash character in the hostname
7 AtSymbol boolean There exists a character '@' in URL
8 TildeSymbol boolean There exists a character '~' in URL
9 NumUnderscore numeric Number of the underscore character
10 NumPercent numeric Number of the character '%'
11 NumQueryComponents numeric Number of the query components
12 NumAmpersand numeric Number of the character '&'
13 NumHash numeric Number of the character '#'
14 Lexical group NumNumericChars numeric Number of the numeric character
15 NoHttps boolean Check if there exists a HTTPS in website URL
16 IpAddress boolean Check if the IP address is used in the hostname of the website URL
17 DomainInSubdomains boolean Check if TLD or ccTLD is used as a part of the subdomain in website URL
18 DomainInPaths boolean Check if TLD or ccTLD is used in the link of website URL
19 HttpsInHostname boolean Check if HTTPS is disordered in the hostname of website URL
20 HostnameLength numeric Length of hostname
21 PathLength numeric Length of the link path
22 QueryLength numeric Length of the query
23 DoubleSlashInPath boolean There exists a slash '//' in the link path
Number of sensitive words (i.e., “secure”, “account”, “webscr”, “login”, “ebayisapi”,
24 NumSensitiveWords numeric
“sign in”, “banking”, “confirm”) in website
25 EmbeddedBrandName boolean There exists a brand name in the domain
26 PctExtHyperlinks* float The percentage of external hyper links in the HTML source code of website

150 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 1, 2020

TABLE. II. LIST OF URL FEATURE IN THE HOST-BASED FEATURE GROUP

No Feature group Feature Data type Description


27 PctExtResourceUrls* float Percentage of URL external resource in HTML source codes of website
Check if favicon is installed from a hostname different from the URL
28 ExtFavicon* boolean
hostname of website
Check if actions in the form containing the contend of URL without HTTPS
29 InsecureForms* boolean
protocol
30 RelativeFormAction* boolean Check if the action form contains a relative URL
31 ExtFormAction* boolean Check if the action form contains an external URL
32 AbnormalFormAction* boolean Check if the action form contains an abnormal URL.
Percentage of hyperlinks containing an empty value, an auto-redirecting
33 PctNullSelfRedirectHyperlinks* float value, such as “#”, URL of current website, or some abnormal values such as
“file://E:/”
Check if the most frequent hostname in the HTML source code does not
34 FrequentDomainNameMismatch boolean
match the URL of website.
Check if HTML source code contains a JavaScript command on MouseOver
35 FakeLinkInStatusBar* boolean
to display a fake URL in the status bar
Host-based Check if HTML source code contains a JavaScript command to turn off the
36 RightClickDisabled boolean
feature group right click of the mouse
Check if HTML source code contains a JavaScript command to start a popup
37 PopUpWindow boolean
window
38 SubmitInfoToEmail boolean Check if HTML source code contains “mailto” in the HTML
39 IframeOrFrame boolean Check if iframe or frame is used in HTML source codes
40 MissingTitle boolean Check if the title tag is empty in HTML source codes
41 src_eval_cnt int Number of function eval () in HTML source codes
42 src_escape_cnt int Number of function escape () in HTML source codes
43 src_exec_cnt int Number of function exec() in HTML source codes
src_search_cnt
44 int Number of function search() HTML source codes
Check if actions in the form of HTML source code does not contain text, but
45 ImagesOnlyInForm* boolean
only images
46 rank_country Boolean Current country rank of website URL is in top 1 million of Alexa
47 rank_host Boolean The rank of the host website URL is in top 1 million of Alexa
48 AgeDomain int The age of domain since it is registered

TABLE. III. LIST OF URL FEATURES IN CORRELATED FEATURE GROUP

No Feature group Feature Data type Description


49 UrlLengthRT* -1, 0, 1 Correlated length of URL
50 PctExtResourceUrlsRT* -1, 0, 1 Correlated percentage of external URL
51 correlated AbnormalExtFormActionR* -1, 0, 1 Correlated abnormal actions in form
52 feature group ExtMetaScriptLinkRT* -1, 0, 1 Correlated meta script link
53 SubdomainLevelRT* -1, 0, 1 Correlated sub-domain level
54 PctExtNullSelfRedirectHyperlinksRT * -1, 0, 1 Correlated null self-redirect hyperlinks

All attributes marked “*” in Tables I, II, III are newly In this research, machine learning algorithms are the last
extracted and selected in this research. Besides, in previous puzzle to complete our proposed malicious URL detection
researches, authors tend to use feature extraction and selection system. Those algorithms are suitable to utilized the usefulness
method based on a group of predefined features. However, of our new features selected for malicious URL detection. The
those recommended features are specialized and not popular. machine learning algorithms are already well investigated in
As a results, it is usually difficult to implement those features the literature. In this work, SVM and RF are selected as an
in other works, and to re-evaluate the detection performance of example to illustrate the good performance of the whole
those features. In this work, we try to combine basic features to detection system, and are not our main focus. Readers are
formulate new ones. encouraged to implement some other algorithms such as Naïve
Bayes, Decision trees, k-nearest neighbors, neural networks,
C. Machine Learning Algorithm Selection etc.
The application of machine learning algorithms in detecting
malicious URLs has been studied and applied widely [1]. In In order to explore the effectiveness of using these two
this paper, two commonly used supervised machine learning algorithms, different adjustments of parameters are
algorithms, RF and SVM [14, 15], are used. implemented.

151 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 1, 2020

IV. EXPERIMENTAL RESULTS TP


precision   100%
A. Dataset and Experiment Environments TP  FP (2)
1) Experiment dataset: The experimental dataset for Recall: is the percentage of malicious URLs correctly
malicious URL detection model includes: 470.000 URLs labeled (TP) among all malicious URLs of the testing data
collected from [16, 17, 18, 19], of which about 70.000 URLs (TP+FN).
are malicious and 400.000 URLs are safe. All these URLs are TP
checked by Virus Total tool to verify the labels of each URL. Re call   100%
TP  FN (3)
The complete dataset is stored using CSV format. Each URL
sample has a label "bad" for malicious and "good" for safe. F1-score: is the harmonic mean of precision and recall.
Details of the data are as follows: High F1 value means the classifier is good.
 Phishtank [16]: Phishtank is a service Website 2  precision Re call
F1 
dedicated for sharing phishing URLs. Suspicious URLs precision Re call (4)
can be sent to Phishtank for verification. The data in
Phishtank is updated hourly. FPR (False prediction rate) is calculated as:
 URLhaus [17]: URLhaus is a project from abuse.ch FRP 
FP
 100%
aiming at sharing malicious URLs being used for FP  TN (5)
malicious software distribution.
2) Results
 Alexa [18]: Is a database ranking all websites according
to their usefulness.  Training performance
 Malicious_n_Non-Malicious URL [19]: is a data source To evaluate the training performance of the machine
with more than 400,000 labeled URL. In this database, learning algorithm, both two data subsets are used individually.
82% of all URLs are safe, while remaining 18% of Each of these data subsets has different data size as well as
URLs are malicious. different distribution of data labels, which may result in
different training performances. The results are presented in
2) Experimental setup: The dataset of both safe and Table V.
malicious URLs mentioned above is divided into 2 subsets.
Experimental results show that the RF with 100 trees gives
About 80% of the dataset, 470.000 URLs (400.000 safe URLs,
the best predictive result. In return, the training time of the RF
70.000 malicious URL), is used for training, and about 20% of is slightly longer than SVM, but the testing time is not much
the dataset, about 10.000 URLs (5.000 malicious URLs, 5.000 different. The accuracy of the second dataset is reduced due to
safe URLs), is used for testing. The experiment is repeated the unbalance between safe and malicious URLs of the data.
many times with both SVM and RF algorithm. Different As expected, RF algorithm, with its fast speed and high
parameter settings are used in different runs. accuracy, is very suitable for classification problem. Besides,
3) Experiment dataset in our research, when machine learning algorithms are
combined with spark libraries, the training and testing time can
 Setup environment: Python version 3.6; Spark version be reduced significantly. SparkML Machine Learning is a
2.3.0; Hadoop version 2.7; Java (JDK) 8; Ubuntu 18.04. library package that provides and supports many machine
 Hardware: RAM 16GB; Intel(R) Xeon(R) CPU E5- learning algorithms such as SVM, RF, Naïve Bayes,
2640 v3 @ 2.60GHz. Regression, Clustering, Collaborative Filtering, ... It is a
suitable tool for applying machine learning algorithms with
B. Results and Discussions fast and accurate processing speed on large datasets.
1) Evaluation metrics: Accuracy: the percentage of  Testing results: In this paper, additional small testing
correct decisions among all testing samples dataset, with 107 safe URLs and 118 malicious URLs,
TP  TN is used to evaluate the performance of the best machine
acc   100%
TP  TN  FP  FN (1) learning algorithm discussed above, RF (100). The
results are presented in Table VI.
where: TP- True positive is the number of malicious URLs
correctly labeled; FN - False negative is the number of Confusion matrix parameters: TP: 92.174%; FPR:
malicious URLs misclassified as safe; TN- True negative is the 12.037%; TN: 87.963%; FN: 7.826%
number of safe URL correctly labeled; FP - False positive is
the number of safe URLs misclassified as malicious. TABLE. IV. CONFUSION MATRIX

Confusion matrix: is a two-way Table IV representing how Classified malicious URL Classified safe URL
many samples are classified into which label accordingly. Real malicious URLs TP FN
Precision: is the percentage of malicious URLs correctly Real safe URLs FP TN
labeled (TP) among all malicious URLs labeled by the
classifier (TP+FP).

152 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 1, 2020

TABLE. V. TRAINING PERFORMANCE OF MALICIOUS URL DETECTION SYSTEM

Dataset Algorithm and parameters Accuracy (%) Precision (%) Recall (%) Training time (s) Testing time (s)
SVM (100 iterations) 93.39 94.67 92.51 2.32 0.01
SVM (10 iterations) 93.35 94.84 92.71 3.11 0.01
10.000 URLs
RF (10 trees) 99.10 98.43 97.45 2.78 0.01
RF (100 trees) 99.77 98.75 97.85 3.34 0.01
SVM (100 iterations) 90.70 93.43 88.45 272.97 2.12
SVM (10 iterations) 91.07 93.75 88.85 280.33 2.31
470.000 URLs
RF (10 trees) 95.45 90.21 95.12 372.97 2.02
RF (100 trees) 96.28 91.44 94.42 480.33 2.30

TABLE. VI. TESTING RESULTS [7] C. Seifert, I. Welch, and P. Komisarczuk, “Identification of malicious
web pages with static heuristics,” in Telecommunication Networks and
Predicted safe Applications Conference, 2008. ATNAC 2008. Australasian. IEEE,
Predicted malicious URL
URL 2008, pp. 91–96.
Real safe URL (107) 96 11 [8] S. Sinha, M. Bailey, and F. Jahanian, “Shades of grey: On the
Real malicious URL (118) 9 109 effectiveness of reputation-based “blacklists”,” in Malicious and
Unwanted Software, 2008. MALWARE 2008. 3rd International
Conference on. IEEE, 2008, pp. 57–64.
V. CONCLUSIONS
[9] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Identifying suspicious
In this paper, a method for malicious URL detection using urls: an application of large-scale online learning,” in Proceedings of the
machine learning is presented. The empirical results in 26th Annual International Conference on Machine Learning. ACM,
2009, pp. 681–688.
Tables V and VI have shown the effectiveness of the proposed
extracted attributes. In this study, we do not use special [10] B. Eshete, A. Villafiorita, and K. Weldemariam, “Binspect: Holistic
analysis and detection of malicious web pages,” in Security and Privacy
attributes, nor do we seek to create huge datasets to improve in Communication Networks. Springer, 2013, pp. 149–166.
the accuracy of the system as many other traditional [11] S. Purkait, “Phishing counter measures and their effectiveness– literature
publications. Here, the combination between easy-to-calculate review,” Information Management & Computer Security, vol. 20, no. 5,
attributes and big data processing technologies to ensure the pp. 382–420, 2012.
balance of the two factors is the processing time and accuracy [12] Y. Tao, “Suspicious url and device detection by log mining,” Ph.D.
of the system. The results of this research can be applied and dissertation, Applied Sciences: School of Computing Science, 2014.
implemented in information security technologies in [13] G. Canfora, E. Medvet, F. Mercaldo, and C. A. Visaggio, “Detection of
information security systems. The results of this article have malicious web pages using system calls sequences,” in Availability,
Reliability, and Security in Information Systems. Springer, 2014, pp.
been used to build a free tool [20] to detect malicious URLs on 226–238.
web browsers. [14] Leo Breiman.: Random Forests. Machine Learning 45 (1), pp. 5- 32,
REFERENCES (2001).
[1] D. Sahoo, C. Liu, S.C.H. Hoi, “Malicious URL Detection using Machine [15] Thomas G. Dietterich. Ensemble Methods in Machine Learning.
Learning: A Survey”. CoRR, abs/1701.07179, 2017. International Workshop on Multiple Classifier Systems, pp 1-15,
Cagliari, Italy, 2000.
[2] M. Khonji, Y. Iraqi, and A. Jones, “Phishing detection: a literature
survey,” IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. [16] Developer Information. https://www.phishtank.com/developer_info.php.
2091–2121, 2013. [Last accessed 11/2019].
[3] M. Cova, C. Kruegel, and G. Vigna, “Detection and analysis of driveby- [17] URLhaus Database Dump. https://urlhaus.abuse.ch/downloads/csv/.
download attacks and malicious javascript code,” in Proceedings of the [Ngày truy nhập 11/2019].
19th international conference on World wide web. ACM, 2010, pp. 281– [18] Dataset URL. http://downloads.majestic.com/majestic_million.csv. [Last
290. accessed 10/2019].
[4] R. Heartfield and G. Loukas, “A taxonomy of attacks and a survey of [19] Malicious_n_Non-MaliciousURL. https://www.kaggle.com/antonyj453/
defence mechanisms for semantic social engineering attacks,” ACM urldataset#data.csv. [Last accessed 11/2019].
Computing Surveys (CSUR), vol. 48, no. 3, p. 37, 2015. [20] chrome.zip.
[5] Internet Security Threat Report (ISTR) 2019–Symantec. https://drive.google.com/file/d/13G_Ndr4hMFx_qWyTEjHuOyJmHFW
https://www.symantec.com/content/dam/symantec/docs/reports/istr-24- D0Gud/view?fbclid=IwAR0SLVCrvjHHGmoHZH97nXN3Bm-
2019-en.pdf [Last accessed 10/2019]. DMY7jG4SOsKZYLAZjTFgeoJADfli64-g. [Last accessed 12/2019].
[6] S. Sheng, B. Wardman, G. Warner, L. F. Cranor, J. Hong, and C. Zhang,
“An empirical analysis of phishing blacklists,” in Proceedings of Sixth
Conference on Email and Anti-Spam (CEAS), 2009.

153 | P a g e
www.ijacsa.thesai.org

You might also like