HTMLPhish Enabling Accurate Phishing Web Page Detection by Applying Deep Learning Techniques On HTML Analysis WCCI
HTMLPhish Enabling Accurate Phishing Web Page Detection by Applying Deep Learning Techniques On HTML Analysis WCCI
[email protected], [email protected]
† Northumbria University, Newcastle upon Tyne, UK
Abstract—Recently, the development and implementation of augmenting password logins [4], and multi-factor authenti-
phishing attacks require little technical skills and costs. This cation [5]. However, these techniques are usually server-side
uprising has led to an ever-growing number of phishing attacks systems that require the Internet user to correspond with a
on the World Wide Web. Consequently, proactive techniques to
fight phishing attacks have become extremely necessary. In this remote service, which adds further delay in the communication
paper, we propose HTMLPhish, a deep learning based data- channel. Another popular phishing detection system that relies
driven end-to-end automatic phishing web page classification on a centralised architecture is the phishing blacklist and
approach. Specifically, HTMLPhish receives the content of the whitelist methods [6]. A URL visited by an internet user will
HTML document of a web page and employs Convolutional be compared with the URL in these lists in real-time. Although
Neural Networks (CNNs) to learn the semantic dependencies in
the textual contents of the HTML. The CNNs learn appropriate the list based methods tend to keep the false positive rate
feature representations from the HTML document embeddings low, however, a significant shortcoming is that the lists are not
without extensive manual feature engineering. Furthermore, exhaustive, and they fail to detect zero-day phishing attacks. To
our proposed approach of the concatenation of the word and mitigate these limitations, researchers have developed several
character embeddings allows our model to manage new fea- anti-phishing techniques using machine learning models as
tures and ensure easy extrapolation to test data. We conduct
comprehensive experiments on a dataset of more than 50,000 they are mostly client-side based and can generalise their
HTML documents that provides a distribution of phishing to predictions on unseen data.
benign web pages obtainable in the real-world that yields over Machine learning-based anti-phishing techniques typically
93% Accuracy and True Positive Rate. Also, HTMLPhish is a follow specific approaches: (1) The required representation
completely language-independent and client-side strategy which of features is firstly extracted, then (2) a phishing detection
can, therefore, conduct web page phishing detection regardless
of the textual language. machine learning model is trained using the feature vectors.
To extract the feature representation from the lexical and static
Keywords-Phishing detection, Web pages, Classification model,
components of a web page, the machine learning models rely
Convolutional Neural Networks, HTML
on the assumption that the infrastructure of phishing pages are
I. I NTRODUCTION different from legitimate pages. For example, in [7], phishing
The infamous phishing attack is a social engineering tech- web pages are automatically detected based on handcrafted
nique that manipulates internet users into revealing private features extracted from the URL, HTML content, network,
information that may be exploited for fraudulent purposes and JavaScript of a web page. Furthermore, natural language
[1]. This form of cybercrime has recently become common processing techniques are currently used to extract specific
because it is carried out with little technical ability and features such as the number of common phishing words, type
significant cost [2]. The proliferation of phishing attacks is of ngram, etc. from the components of a web page [8], [9],
evident in the 46% increase in the number of phishing websites [10].
identified between October 2018 and March 2019 by the Anti- While the above approaches have proven successful, they
Phishing Working Group (APWG) [3]. Most phishing attacks nevertheless are prone to several limitations, particularly in
are started by an unsuspecting Internet user merely clicking the context of HTML analysis: i. inability to accommodate
on a link in a phishing email message that leads to a bogus unseen features: As the accuracy of existing models depends
website. The impact of phishing attacks on individuals such on how comprehensive the feature set is and how impervious
as identity theft, psychological, and financial costs can be the feature set remains to future attacks, they will be unable
devastating. to correctly detect new phishing web pages with evolved
content and structure without a regular update of the feature
A. Problem Definition set. ii. They require substantial manual feature engineering:
Recent research in phishing detection approaches has re- Existing phishing detection machine learning models require
sulted in the rise of multiple technical methods such as specialised domain knowledge in order to ascertain the needed
features suitable to each task (e.g., number of white spaces in reduce. Our result only recorded a minimal 4% decrease
the HTML content, number of redirects, and iframes, etc.). in accuracy on the test data. This confirms that HTML-
This is a tedious process, and these handcrafted features are Phish remains reliable and temporally robust over a long
often targeted and bypassed in future attacks. It is also chal- period.
lenging to know the best features for one particular application. We organised the remainder of the paper as follows: the next
To address the above issues, we propose HTMLPhish, a section provides an overview of related works on proposed
deep learning based data-driven end-to-end automatic phishing techniques of detecting phishing on web pages. Section III
web page classification approach. Specifically, HTMLPhish gives the prior knowledge on Convolutional Neural Networks,
uses both the character and word embedding techniques to and Section IV provides an in-depth description of our pro-
represent the features of each HTML document. Then Convo- posed model. Section V elaborates on the dataset collection,
lutional Neural Networks (CNNs) are employed to model the while the detailed results on the evaluations of our proposed
semantic dependencies. model are found in Section VI. Finally, we conclude our paper
The following characteristics highlight the relevance of in Section VII.
HTMLPhish to web page phishing detection:
(1) HTMLPhish analyses HTML directly to help reserve II. R ELATED W ORKS
useful information. It also removes the arduous task required
for the manual feature engineering process. In this section, we address two most closely related topics
(2) HTMLPhish takes into consideration all the elements of to our work: the phishing web page detection using feature
an HTML document, such as text, hyperlinks, images, tables, engineering and the Deep Learning method (especially for
and lists, when training the deep neural network model. NLP).
We experimentally demonstrate the significance of character
and word embedding features of HTML contents in detecting A. Feature Engineering for Phishing Web Page Detection
phishing web pages. We then propose a state-of-the-art HTML These techniques extract specific features from a web page
phishing detection model, in which the character and word such as JavaScript, HTML web page, URL, and network fea-
embedding matrices are concatenated before employing con- tures. These are fed into machine learning algorithms to build a
volutions on the represented features. Our proposed approach classification model. These machine learning techniques differ
ensures an adequate embedding of new feature vectors that in the type of heuristics and number of feature sets used and
enables straightforward extrapolation of the trained model to the optimisation algorithm applied to the machine learning
test data. Subsequently, we conduct extensive evaluations on algorithm. These techniques are based on the fact that both
a dataset of over 50,000 HTML documents collected over two the phishing and benign web pages have a different content
months. This ensures our evaluation settings reproduces real- distribution of extracted features. The accuracy of heuristics
world situations in which models are applied to data generated and machine learning-based techniques critically depends on
up to the present point and applied to new data. the type of features extracted, and the machine learning
We summarise the main contributions of this paper as algorithm applied. Many phishing detection techniques have
follows: been built on different proposed feature sets.
• Different from existing methods, our proposed model, Varshney et al [11] proposed LPD, a client-side based web
HTMLPhish, to the best of our knowledge, is the first page phishing detection mechanism. The strings from the URL
to use only the raw content of the HTML document and page title from a specified web page is extracted and
of a web page to train a deep neural network model searched on the Google search engine. If there is a match
for phishing detection. Manual feature engineering is between the domain names of the top T search results and the
reduced as HTMLPhish learns the representation in the domain name of the specified URL, the web page is considered
features of the HTML document, and we do not depend to be legitimate. The result from their evaluations gave a true
on any other complicated or specialist features for the positive rate of 99.5%.
task. Our proposed approach takes advantage of the word Smadi et al. [12] proposed a neural network model that
and character embedding matrix to present a phishing can adapt to the dynamic nature of phishing emails using
detection model that automatically accommodates new reinforcement learning. The proposed model can handle zero-
features and is therefore easily applied to test data. day phishing attacks and also mitigate the problem of a limited
• We conduct extensive evaluations on a dataset of more dataset using an updated offline database. Their experiment
than 50, 000 HTML documents collected in two months. yielded a high accuracy of 98.63% on fifty features extracted
The distribution of the instances in our dataset is similar from a dataset of 12,266 emails.
to the ratio of phishing and legitimate web pages found The selection of features from various web page elements
in the real-world. This ensures that our evaluation metrics can be an expensive process from security risk and techno-
and results are relevant to existing systems. logical workload angle. For example, it can be prolonged and
• Furthermore, we carried out a longitudinal study on the somewhat problematic to extract specific feature sets. Besides,
efficiency HTMLPhish to infer the maximum retraining it needs specialist domain expertise to define which features
period, for which the accuracy of the system does not are essential.
2
B. Deep Learning words from the HTML document. Each input is subsequently
Due to its performance in many applications, Deep Learning transformed in an embedding si Rd is the ith column of S
has attracted increased interest in recent years [13], [14], and the d-dimension is the vector size which is automatically
[15]. The core concept is to learn the feature representation initialized and learnt together with the remainder of the model.
from unprocessed data instantaneously without any manual In this paper, the embedding matrix was automatically
feature engineering. Under this premise, we want to use Deep initialised, and for parallelisation, all sequences were padded
Learning to detect phishing HTML content by directly learning to the same length maxlen.
how features from the raw HTML string is represented instead The CNN performs a convolution operation ⊗ over
of using specialist features that are manually engineered. sRmaxlen×d using:
As we want to train our Deep Learning networks using ci = f (M ⊗ si:i+n−1 + bi )
textual features, it is, therefore, essential to discuss NLP as it
relates to Deep Learning. Deep learning techniques have been followed by a non-linear activation where bi is the bias,
successful in a lot of NLP tasks, for example, in document M is the convolving filter and n is the kernel size of the
classification [16], machine translation [17], etc. Recurrent convolution operation. After the convolution, a pooling step
neural networks (e.g., LSTM [18]) have been extensively is applied (which in our model is the Max Pooling) in order
applied due to their ability to exhibit temporal behaviour and to decrease the feature dimension and determine the most
capture sequential data. However, CNN has become brilliant important features.
substitutes for LSTMs, especially showing excellent perfor- The CNN is capable of exploiting the temporal relation of n
mance in text classification and sentiment analysis as CNN kernel size in its input using the filter M to convolve on each
learns to recognize patterns across space [19]. segment of n kernel size. A CNN model typically contains
Very few attempts have been made to use Deep Learning several sets of filters with different kernel sizes (n). Those are
to detect phishing web pages using web page components. the model hyperparameters that are set by the user. In this deep
Bahnsen et al. [20] proposed a phishing classifying scheme neural network, the convolution layer is usually followed by
that used features of the URLs of a web page as input and a Pooling layer. The features from the Pooling layer are then
implemented the model on an LSTM network. The results passed to dense layers to perform the required classification.
yielded gave an accuracy of 98.7% accuracy on a corpus of The entire network is then trained by using backpropagation.
2 million phishing and legitimate URLs. The authors of [21] Note: In order to differentiate our state-of-the-art model
proposed a CNN based model which combines the outputs of from the baseline models, for the rest of this paper, we
two Convolutional layers to detect malicious URLs. will use the term HTMLPhish-Full to indicate HTMLPhish
However, our review did not find any existing approach trained with the proposed model unless otherwise stated, while
that detects malicious phishing web pages using only HTML HTMLPhish-Character and HTMLPhish-Word represent the
documents on Deep Learning. HTMLPhish learns the semantic deep neural network model using only the character and word
information present only in the character and words in an embedding respectively.
HTML document to determine the maliciousness of the web
IV. T HE P ROPOSED M ODEL
page. Our thorough analysis shows that phishing web pages
can be detected using only their HTML document content. In this section, we elaborate on the architecture of our
proposed deep neural network model HTMLPhish-Full. The
III. P RELIMINARIES network architecture seen in Figure 3 shows HTMLPhish-
We define the problem of detecting phishing web pages us- Full has two input layers. The first input layer processes the
ing their HTML content as a binary classification task for pre- raw HTML document into an embedding matrix made up
diction of two classes: legitimate or phishing. Given a dataset of character-level feature representations, while the second
with T HTML documents {(html1 , y1 ), ..., (htmlT , yT )}, input layer does the same with words. These two branches
where htmlt for t = 1, . . . , T represents an HTML are concatenated in a dense layer called the Concatenation
document , while yt ∈ {0, 1} is its label. yt = 1 corresponds layer. Therefore, the embedding matrix in this model is the
to a phishing HTML document while yt = 0 is a legitimate sum of the character-level embedding matrix and the word
HTML document. embedding matrix Cem + Wem where Cem →c Rmaxlen1 ×d ,
and Wem →w Rmaxlen2 ×d . The features in the Concatenation
A. Deep Neural Network for Phishing HTML Document De- layer allows the preservation of the original information in the
tection HTML content. In the concatenation layer, the content of both
The deep neural network that underlies HTMLPhish is a embedding layers are put alongside each other to yield a 3
Convolutional Neural Network (CNN). To detail a basic CNN dimensional layer [Cem + Wem →(None, 180, 100) + (None,
for HTML document classification, an HTML document is 2000, 100) = (None, 2180, 100)].
comprised of a string of characters or words. Our goal is to ob- To generate the character-level embedding matrix Cem , the
tain an embedding matrix html →s Rmaxlen×d , in a way that model learns an embedding, which takes the characteristics of
s is made up of sets of adjoining inputs si ∈ (1, 2, ..., maxlen) the characters in an HTML document. To do so, all the distinct
in a string, in which the input can be individual characters or characters, including punctuation marks in the corpus, are
3
listed. We obtained 167 unique characters. We set the length <!DOCTYPE html><htmlclass="no_js"id="facebook"
of the sequences maxlen1 = 180 characters. Every HTML
document with strings greater than 180 characters is cut from
DOCT
the 180th character, and any HTML document with characters < ! D O < ! YPE
html Tokens
smaller than 180 characters would be padded up to 180 with
zeroes. Before each character in our work is embedded into Sequence of
1 54 5 83 1 54 4 6
a d-dimensional vector, we conduct a tokenization on the Integers
characters in the HTML document and segment the characters 0.02 0.15 0.09 0.88 0.01 0.67 0.78 0.44
Embedding
into tokens as shown in Figure 1. An index is associated with 0.58 0.69 0.17 0.43 0.98 0.34 0.42 0.59
Matrix
each token before being applied to a d-dimensional character 0.27 0.65 0.41 0.56 0.81 0.26 0.03 0.11
Character Embedding Word Embedding
embedding vector where d is set at 100, which is automatically
initialised and learnt together with the remainder of the model. Fig. 1: Configuration of the Embedding Layer
To facilitate its implementation, each HTML document html
is transformed into a matrix, html →c Rmaxlen1 ×d , where d TABLE I: HTML Documents Used in this Paper
= 100 and maxlen1 = 180. Dataset D1 D2
For the word embedding matrix Wem , firstly, the raw Date generated 11 - 18 Nov, 2018 10 -17 Jan, 2019
HTML document is processed into word-level representations Legitimate Web Pages 23,000 24,000
by the word embedding layer. To achieve this, all the different Phishing Web pages 2,300 2,400
Total 25,300 26,400
words in the HTML document of the training corpus are
listed using the following approach: An HTML document
is split into individual words while treating all punctuation
characters as separate tokens. For example, as shown in CNN models trained either on character-level embeddings or
Figure 1, <!DOCT Y P E html>, will be split into [0 <0 , word-level embeddings, respectively. The embedding matri-
0 0 0
! , DOCT Y P E 0 , 0 html0 ]. We surmise that punctuation ces described above are applied to 32 Convolutionary filters
marks provide important information benefits for phishing M Rd×n where n 8. The next layer after the Convolutionary
HTML document detection since punctuation marks are more filters is the Max-Pooling layer, whose features are then passed
prevalent and useful in the context of HTML documents than to a 10 unit dense layer. The Dense layer, which also is
ordinary languages. HTML contains a sequence of markup regularised by dropout, finally connects to a Sigmoid layer.
tags that are used to frame the elements on a website. Also, the models are trained through backpropagation using
Tags contain keywords and punctuation marks that define the the ADAM optimisation algorithm.
formatting and display of the content on the Web browser. V. DATASET
The listed unique words are used to create a dictionary where
every word becomes a feature. We obtained about 321,009 Data collection plays an essential role in phishing web page
unique words in our dataset. We also padded the HTML detection. In our approach, we collated HTML documents
documents to make the lengths of the HTML documents using a web crawler. We used the Beautiful Soup [23] library
uniform in terms of number of words (maxlen2 = 2000). in Python to create a parser that dynamically extracted the
Each unique word is then embedded into a d-dimensional HTML document from each final landing page. We chose to
vector, where d is set at 100, which is automatically initialised use Beautiful Soup for the following reasons:
and learned together with the remainder of the model. All (1) it has functional versatility and speed in parsing HTML
the HTML documents are converted to their respective matrix contents, and
representation (maxlen2 × d), on which the CNN is applied (2) Beautiful Soup does not correct errors when analysing
where d = 100 and maxlen2 = 2000. Figure 1 shows an the HTML Document Object Model (DOM). The HTML
overview of the character and word embedding layer. documents in our corpus include all the contents of an HTML
We can now introduce Convolutionary layers using the document, such as text, hyperlinks, images, tables, lists, etc.
HTML document matrix (for all the HTML documents st ∀t = Figure 2 shows an overview of the data collection stage.
1, ..., T ) as the corpus. We applied 32 Convolutionary filters
M Rd×n where n 8. The Max-Pooling layer whose features A. Data Collection
are then passed to a 10 unit dense layer comes after the Since phishing campaigns follow temporal trends in the
Convolutionary filters. The dense layer, which is regularised composition of web pages, the earliest data obtained should
by dropout, finally connects to a Sigmoid layer. Then using always be used for training and the most recent data collected
the ADAM optimisation algorithm [22], we train the model for testing [24]. Different phishing pages created during the
through backpropagation. same time may probably have the same infrastructure. This
could exaggerate an over-trained classification model’s predic-
A. Baseline Models tive output. To ensure our evaluation settings reproduces real-
The baseline models, HTMLPhish-Character and HTML- world situations in which models are applied on data generated
Phish-Word, whose architectures are detailed in Figure 3, are up to the present point and applied on new web pages, we
4
User Web page Extract HTML
<DOCTYPE html>
<html class="no-js" dir="ltr" lang="en" Data Collection
xmlns="http://www.w3.org/1999/xhtml">
Tokenization
Length Preprocessing
padding
Embedding
Convolutional
Filters
Dense
Layer
Deep Neural Network
Sigmoid Output
Layer Label
Input HTML Document Input HTML Document Input HTML Document Input HTML Document
32 Convolutional Filters
Max Pooling Max Pooling With 8 Kernel Sizes
Max Pooling
Dense Layer (10 Units) Dense Layer (10 Units)
Activation = ReLU Activation = ReLU Dense Layer (10 Units)
Activation = ReLU
5
TABLE II: HTMLPhish-Full Deep Neural Network • HTMLPhish-Word
Layers Values Activation • HTMLPhish-Full
Embedding Dimension = 100 - The three CNN models were implemented in Python 3.5
Convolution Filter = 32, Filter Size ReLU on a Tensorflow backend and a learning rate of 0.0015 in the
=8 Adam optimizer [22]. The batch size for training and testing
Max Pooling Pool Size = 2 - the model were adjusted to 20.
Dense1 No. of Neurons = 10, ReLU
Dropout = 0.5 All HTMLPhish and baseline experiments were conducted
Dense2 No. of Neurons = 1 Sigmoid on an HP desktop with Intel(R) Core CPU, Nvidia Quadro
Total Number of 412,388,597 - P600 GPU, and CUDA 9.0 toolkit installed.
Trainable Parameters
B. Evaluation Metrics
Because of the severely imbalanced nature of our dataset,
collected a dataset of HTML documents from phishing and we evaluated the performance of our models in terms of the
legitimate web pages over 60 days. Area under the ROC Curve (AUC). We also used the receiver
Also, to ensure the deployability of our model to real-word operating characteristic (ROC) curve in our evaluation. The
systems, our data set is required to provide a distribution ROC curve is a probability curve, while the AUC depicts how
of phishing to benign web pages obtainable on the Internet much the model can distinguish between two classes, which
in the real-world (≈ 10/100) [25], [26]. Given that when a for our model is - legitimate or phishing. The higher the AUC
balanced dataset (1/1), is used, the results can yield a baseline value, the better the performance of the model. The ROC curve
error [27]. Consequently, our training dataset D1 consisting of is plotted with the true positive rate (TPR) against the false
HTML documents from 23,000 legitimate URLs and 2,300
positive rate (FPR) where T P R = (T P(T+F P)
N ) and F P R =
phishing URLs was collected between 11 November 2018 (F P )
to 18 November 2018. D1 dataset was used to train and (T N +F P ) .
Where TP, FP, TN, and FN stand for the numbers
validate the three different variants of our model (HTMLPhish- of True Positives, False Positives, True Negatives, and False
Character, HTMLPhish-Word, and HTMLPhish-Full). From Negatives, respectively.
10 January 2019 to 17 January 2019, testing data set D2 Additionally, we employed the precision, True Positive
consisting of HTML document from 24,000 legitimate URLs Rate, and F-1 score metrics to evaluate the performance of
and 2,400 phishing URLs were generated. HTMLPhish and the baseline models. The True Positive Rate
Note that D1 ∩ D2 = ∅. Also, our testing dataset D2, is computes the ratio of phishing HTML documents that are
slightly larger than our training dataset D1. This is because detected by the models. In contrast, the precision metrics
learning with fewer data, and having decent tests on a broader compute the ratio of detected phishing HTML documents that
test data means that the detection technique is generalised. are actual phishes to the total number of detected phishing
This ensures that the features and model of classification HTML documents.
include specific features from legitimate and phishing web
pages and that the approach can be applied to the vast number C. Overall Result
of online Web pages. In total, our corpus was made up To record the performance of HTMLPhish-Full and the
of 47,000 legitimate HTML documents and 4,700 phishing baseline models on the D1 dataset, we split the dataset into
HTML documents, as shown in Table I. 80% for training, 10% for validation, and 10% for testing.
The legitimate URLs were drawn from Alexa.com’s top Also, taking cognizance of how our data is severely imbal-
500,000 domains, while the phishing URLs were gathered anced, we ensured we manually shuffled the datasets before
from continuously monitoring Phishtank.com. The web pages training.
in our dataset were written in different languages. Therefore, The ROC curves of HTMLPhish and its variants are
this does not limit our model to only detecting English shown in Figure 4. From the result detailed in Table III, in
web pages. We manually sanitised our corpus to ensure no general, HTMLPhish-Full significantly outperforms the other
replicas or web pages that are pointing to empty content. two variants: HTMLPhish-Character, and HTMLPhish-Word.
Alexa.com offers a top list of working websites that internet While HTMLPhish-Character and HTMLPhish-Word have
users frequently visit, so it is an excellent source to be used similar performances, HTMLPhish-Full takes advantage of the
for our aim. strengths of both and produces more consistently better results.
Also, HTMLPhish-Full offered a significant jump in AUC over
VI. E VALUATION OF HTMLP HISH VARIANTS
the other variants, while HTMLPhish-Word performs slightly
A. Experimental Setup worse amongst the three.
Table II details the selected parameters we found gave On the D1 dataset, HTMLPhish-Full provided a 98% accu-
the best performance on our dataset bearing in mind the racy and 2% False Positive Rate. The minimal False Positive
unavoidable hardware limitation for our proposed HTMLPhish Rates indicates the ratio of legitimate web pages, which are
variants: incorrectly identified as a phish. This is helpful when the
• HTMLPhish-Character model will be deployed in real-world scenarios as users will
6
TABLE III: Result of HTMLPhish and Baseline Evaluations on the D1 dataset
Models Accuracy Precision True Positive Rates F-1 Score AUC Training time
HTMLPhish-Full 0.98 0.97 0.98 0.97 0.93 6.75 mins
HTMLPhish-Word 0.94 0.93 0.94 0.93 0.88 10 mins
HTMLPhish-Character 0.95 0.92 0.95 0.94 0.90 3.5 mins
[28] 0.97 0.96 0.97 0.96 0.93 5.25 mins
[20] 0.95 0.94 0.95 0.94 0.91 18 mins
not be inappropriately blocked from accessing legitimate web Furthermore, CNN’s using only character level embedding
pages. struggles to differentiate information for scenarios where
Considering the computational complexity of HTMLPhish- phishing HTML documents try to imitate benign HTML
Full, it can be seen that on a dataset of over 25,000 HTML documents through small modifications to one or few words in
documents, HTMLPhish-Full can be speedily trained within the HTML document[29]. This is because the Convolutional
7 minutes. Once trained, HTMLPhish-Full can evaluate an filters will likely yield similar output from a sequence of
HTML document in 1.4 seconds. characters with a similar spelling. Therefore, CNNs using
only character embeddings are not enough to obtain structural
D. Comparison with State-Of-The-Art Techniques information from the HTML document in detail. That is
the reason word embeddings must be taken into account.
We compared HTMLPhish-Full with the methodology,
Consequently, HTMLPhish-Full takes advantage of both word
speed, and performance of existing state-of-the-art models in
and character embedding matrices to accommodate unseen
[20] and [28]. [28] is a Deep Neural Network with multiple
words in the test data, and therefore yield a better result than
layers of CNNs that takes as input word tokens from a URL to
the other variants and baseline models.
determine the maliciousness of the associated web page. On
the other hand, [20] takes as input the character sequence of a
URL and models its sequential dependencies using Long short- 5 2 &