0% found this document useful (0 votes)
50 views15 pages

Baduwal Survey - On - Machine - Learning - Paradigms - For - Phishing - Website - Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views15 pages

Baduwal Survey - On - Machine - Learning - Paradigms - For - Phishing - Website - Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/370658818

Survey On Machine Learning Paradigms For Phishing Website Detection

Article in International Journal of Engineering & Technology · May 2023

CITATIONS READS
0 334

4 authors, including:

Madan Baduwal Prakash Madai


University of Texas of the Permian Basin University of Texas of the Permian Basin
5 PUBLICATIONS 5 CITATIONS 2 PUBLICATIONS 0 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Madan Baduwal on 12 June 2023.

The user has requested enhancement of the downloaded file.


Survey On Machine Learning Paradigms For Phishing Website Detection

Madan Baduwal1 Prakash Madai2 Tasnimul Alam 3


Quan Yuan 4
Department of Computer Science, University of Texas Permian Basin
{baduwal m63609,madai p57558,alam m61291,yuan q}@utpb.edu

Abstract sitioned from offline to online services, such as retail, cater-


ing, education, entertainment, healthcare, banking, and fi-
Phishing attacks continue to be a major security threat nance.
for individuals and organizations alike. Phishers employ Phishing websites have become a significant threat to on-
tactics like social engineering and creating imitation web- line security. These fraudulent websites are created with
sites in an attempt to trick their targets, aiming to extract the intent of deceiving users into disclosing their confiden-
sensitive details such as account ID, username, and pass- tial information, such as passwords, credit card details, and
word from individuals and organizations. It causes billions other sensitive personal data [49]. Phishing attacks are exe-
of dollars in losses annually. Machine learning (ML) has cuted using several tactics, such as link manipulation, filter
shown great promise in detecting such attacks by identify- evasion, website forgery, covert redirects, and social engi-
ing patterns and anomalies in large datasets. However, the neering. Spoofing web pages that mimic legitimate web-
tradeoff between feature selection and model selection is a sites is the most common approach used in these attacks.
tedious task in ML for phishing detection. Low numbers of They are considered a significant threat and were among the
features and traditional machine learning algorithms, i.e., top concerns highlighted in the 2022 Internet Crime Report
Logistic regression (LR), Support Vector Machine(SVM), issued by the U.S. Federal Bureau of Investigation’s Inter-
Random Forest (RF), XG boost, Naive Bayes(NB), etc., are net Crime Complaint Center (IC3). The IC3’s statistics for
not enough for generalizability. And it’s strenuous for deep 2022 revealed that internet-based theft, fraud, and exploita-
learning (DL) algorithms to learn patterns from ambiguous tion remain widespread, resulting in a massive $10.3 billion
behavior between phishing and non-phishing websites. This in financial losses that year. The IC3 received a staggering
paper presents a comprehensive survey of various datasets, 800,944 complaints related to business email compromise
features, and ML paradigms that have been employed for (BEC) and email account compromise (EAC) in 2022. [14]
the detection of phishing websites. The results of this sur- .
vey provide valuable insight into the accuracies of different
ML techniques and the current state of the art in phishing
detection and can serve as a useful resource for researchers
and practitioners working in this field.

Keywords: Cyber Security, Cybercrime, Phishing,


Phishing Detection, Machine Learning, Deep Learning

1. Introduction
As of January 2023, there were 5.16 billion individuals
worldwide using the internet, accounting for 64.4% of the
global population. Of this figure, 4.76 billion people, or
59.4% of the world’s population, were utilizing social me-
dia platforms [30]. The internet has revolutionized various Figure 1. Total number of phishing attacks detected by APWG
aspects of people’s lives, including communication, shop-
ping, chatting, and office work. With the onset of the pan- The Anti-Phishing Working Group (APWG) emphasizes
demic in late 2019, many conventional industries have tran- that phishing attacks have grown in recent years, Figures 1

1
illustrates the total number of phishing attacks detected by tiphishing techniques.
APWG in the third quarter of 2022 and the last quarter of
2021. During the third quarter of 2022, APWG recorded 2. Phishing Techniques
a staggering 1,270,883 phishing attacks, marking a new
record and the most severe quarter for phishing observed This section delves into a range of phishing techniques
by APWG to date. The total for August 2022 was 430,141 frequently utilized by criminals to deceive and manipulate
attacks, which is the highest monthly total reported. The individuals.
number of reported phishing attacks reported to APWG has 2.1. Spoofing
more than quintupled since the first quarter of 2020, when
APWG observed 230,554 attacks. The rise in Q3 2022 is Phishers create fake websites that closely resemble le-
attributable in part to the increasing number of attacks re- gitimate ones, including similar designs, logos, and URLs.
ported against several specific targets. These targets suf- They aim to deceive users into believing they are on a
fered from large numbers of attacks from persistent phish- trusted site, leading them to enter their login credentials,
ers. Statistical Highlights for the 3rd Quarter 2022 [5]. personal information, or financial details, which are then
With the increasing sophistication of phishing attacks, captured by the attackers [7].
conventional anti-phishing techniques are becoming less ef- 2.2. Link manipulation
fective. To address this problem, machine learning has
emerged as a promising solution for real-time detection of Phishing primarily revolves around links, with attackers
phishing websites. Our study has examined several machine utilizing various techniques to deceive users into clicking
learning methods, including Logistic Regression, KNN, on them. One such technique involves manipulating the
Support Vector Machine, Random Forest, Ada-Boost, Gra- URL to mimic a legitimate one, such as by representing
dient Boosting, XGBoost, Naive Bayes, Feed Forward Neu- malicious URLs as hyperlinks with legitimate names on
ral Networks, Convolution Neural Networks (CNN), Recur- websites. Another approach is to create misspelled URLs
rent Neural Networks (RNN), and Transformers [35]. that closely resemble legitimate ones, for instance, face-
The paper is organized in the following sections: In sec- buuk.com. However, there is a much more sophisticated
tion 2 we list some widely used phishing techniques, in variant of typosquatting known as IDN spoofing, which in-
Section 3 we discuss different types of phishing and phish- volves the use of non-English characters that bear a striking
ing attack prevention methods, Section 4 reviews the back- resemblance to their English counterparts. For example, an
ground and related work of phishing, in section 5 we discuss attacker may use a ”c” or ”a” in Cyrillic instead of the cor-
the datasets usually used in machine learning approaches, responding English letters, which makes it much more dif-
Section 6 reviews the preprocessing and feature engineer- ficult to recognize deceit [6].
ing that are usually used in machine learning for phishing 2.3. Filter evasion
detection. In Section 8, the paper outlines various method-
ologies for detecting website phishing. These methodolo- Phishers often display the content of their fraudulent
gies encompass techniques that leverage machine learning websites as images or use Adobe Flash to make it difficult
methods. Specifically, we provide a detailed explanation of for some phishing detection techniques to detect their ma-
the overall architecture of the machine learning-based solu- licious activities. To counter this type of attack, the optical
tion for detecting phishing networks. In section 9 We show character recognition (OCR) technique must be used [23].
the evaluation results of suggested machine learning subse-
quently, we conclude by discussing future implications and 2.4. Website forgery
insights stemming from the methods used 10. In this form of attack, phishers manipulate the JavaScript
Furthermore, the paper will provide a critical analysis code of a legitimate website to carry out phishing activities.
of the current state-of-the-art, identifying the strengths and These types of attacks, also referred to as cross-site script-
weaknesses of each machine learning paradigm in phishing ing, are exceptionally difficult to detect since the victim is
detection. Finally, the survey will conclude by discussing interacting with a genuine website [23].
the challenges and future research directions in machine
learning-based anti-phishing techniques. 2.5. Covert redirect
In conclusion, this paper’s primary objective is to This attack is aimed at websites that use the OAuth 2.0
provide a comprehensive survey of machine learning and OpenID protocols. In this scenario, when attempting
paradigms for phishing website detection. The survey will to grant token access to a legitimate website, users acciden-
be useful for researchers, practitioners, and policymakers tally provide their token to a malicious service. Neverthe-
in the field of cybersecurity, providing insights into the lat- less, this technique has received little attention due to its
est trends and developments in machine learning-based an- limited significance [45].

2
2.6. Social engineering Some of the commonly used machine learning algo-
rithms for phishing detection include Logistic Regression,
Social engineering phishing is a deceptive attack that
Support Vector Machine (SVM), Random Forest, Gradient
employs psychological manipulation to trick users into di-
Boosting, Naive Bayes, Feed Forward Neural Networks,
vulging their security information. This type of attack typi-
Convolution Neural Networks (CNN), Recurrent Neural
cally involves multiple steps: first, the phisher researches
Networks (RNN), and Transformers.
the potential vulnerabilities of their targets. Then, the
The advantage of machine learning approaches is that
phisher tries to gain the trust of the target before finally
they can adapt to new and previously unseen phishing at-
creating a situation where the target inadvertently divulges
tacks, making them more effective than rule-based and
important information. Social engineering phishing tech-
signature-based approaches. However, machine learning al-
niques include baits, scarewares, pretexts and spearphishing
gorithms require a large amount of data for training and may
[22].
produce false positives and false negatives.

3. Phishing Detection Approaches 3.4. Hybrid Approaches


There have been several proposed methods to prevent Hybrid approaches combine multiple detection tech-
phishing attacks across each stage of the attack flow. Some niques to improve accuracy and reduce false positives. For
of these methods involve training users to recognize and example, a hybrid approach might use rule-based tech-
prepare for future attacks, while others operate automati- niques to quickly identify known phishing websites and ma-
cally to alert and protect the user [2]. The following meth- chine learning algorithms to detect previously unseen at-
ods can be categorized as follows: tacks.
The advantage of hybrid approaches is that they can pro-
3.1. Rule-based Approaches vide better protection against phishing attacks than a single
Rule-based approaches use a set of predefined rules or detection technique. However, they can be more complex to
heuristics to identify and block phishing websites. These implement and require more computational resources [25].
rules are typically based on known patterns and characteris-
3.5. User Education
tics of phishing attacks, such as URL or domain name sim-
ilarity to legitimate websites, suspicious keywords, and IP Educating users about the risks and warning signs of
reputation [9]. phishing attacks can help them to identify and avoid phish-
One of the advantages of rule-based approaches is that ing websites. This approach relies on user awareness and
they are relatively simple and can be easily implemented. vigilance rather than technology-based detection methods.
However, they are not very effective against new and so- The advantage of user education is that it can be effective
phisticated phishing attacks that may not fit into the prede- in preventing phishing attacks if users are well-informed
fined rules. and cautious. However, it may not be effective against new
and sophisticated phishing attacks that exploit human vul-
3.2. Signature-based Approaches nerabilities.
Signature-based approaches use signatures or finger-
prints of known phishing attacks to identify and block new 4. Background and Related Work
phishing websites. These signatures can be based on var- Figure 2 depicts the life cycle of a typical phishing at-
ious factors such as URLs, IP addresses, email addresses, tack. Phishing attacks often start with an attacker sending
and content. a fraudulent email or message that appears to be from a
The advantage of signature-based approaches is that trusted source, such as a bank or a social media website.
they are very effective at detecting known phishing attacks. The email or message contains a link that directs the victim
However, they may not be effective against new and previ- to a phishing website, which is designed to look like a le-
ously unseen phishing attacks. gitimate website. The victim is then prompted to enter their
sensitive information, such as login credentials, credit card
3.3. Machine Learning Approaches
information, or personal details. Once the victim submits
Machine learning algorithms can be used to detect phish- their information, it is collected by the attacker for financial
ing attacks by analyzing various features such as web- gain or other malicious purposes.
site content, URL structure, IP address, and user behavior.
Machine learning algorithms can detect previously unseen First, an attacker creates a phishing website that looks
phishing attacks by learning from a large dataset of known very similar to a legitimate website. Attackers use var-
phishing attacks [36]. ious methods to forge the URL of a legitimate website,

3
website look more convincing. The user may be prompted
to enter sensitive information, such as their login credentials
or payment details, which will be collected by the attacker.
The next step is users submit their personal information,
such as login credentials, payment information, and other
sensitive data on the fake website, attackers will receive all
the information. This is a critical step in the phishing pro-
cess as it allows the attackers to gain access to the user’s ac-
counts or use the information for fraudulent purposes. It’s
important for users to be cautious and verify the authenticity
of websites before submitting any sensitive information.
The final stage involves the utilization of a user’s gen-
Figure 2. Phishing Life Cycle
uine details to fabricate a genuine website request, resulting
in the misappropriation of the user’s account funds. It is
common for people to use identical login credentials across
particularly the domain name and network resource di- various websites, enabling the attacker to pilfer multiple ac-
rectory, including using spelling mistakes, similar alpha- counts from a single individual. Some cybercriminals em-
betic characters, and other techniques. For example, the ploy stolen data for unlawful pursuits. Phishing tactics have
link “https://aimazon.amz-z7acyuup9z0y16.xyz/v”,imitates evolved since their inception in 1987, keeping pace with
https://www.amazon.com. More details about phishing the advancements in internet technology. As online pay-
techniques are in section 2. This means that while the URL ment mechanisms gained popularity, attackers shifted their
of a phishing website may be visible to the user by hovering focus to online payment phishing. The article ”A com-
over the link, it can still be challenging for an average user prehensive survey of AI-enabled phishing attacks detection
to recognize it as a fake URL that imitates a legitimate one. techniques,” published in 2020 in the Telecommunications
Additionally, cybercriminals often use scripts to obtain lo- Systems journal, discusses different methods that use ar-
gos, web layouts, and text from genuine web pages, which tificial intelligence (AI) to detect phishing attacks. These
allows them to create phishing websites that closely resem- attacks involve the impersonation of a trustworthy entity
ble legitimate ones. As a result, imitation of web content to steal sensitive information from individuals or organiza-
is also a crucial factor in successful phishing attacks. At- tions. The authors conducted a survey of recent research
tackers often use scripts to copy the logos, web layouts, and papers and categorized the techniques into rule-based, ma-
text from genuine web pages to make the phishing web- chine learning-based, hybrid, and deep learning-based [8].
site look as similar as possible to the legitimate website. They emphasized the importance of using AI-based tech-
They also often create fake form submission pages that ask niques to improve phishing attack detection, as traditional
users to input sensitive information such as login creden- approaches may not be effective against increasingly so-
tials, payment information, and password recovery infor- phisticated attempts. The article also suggests future re-
mation. These pages are designed to look exactly like the search directions, such as developing techniques to detect
legitimate pages, in order to trick users into giving away zero-day phishing attacks and considering user behavior
their sensitive information. and contextual information. Overall, the article provides
In the second step, attackers rely heavily on social en- a useful resource for those interested in developing and
gineering tactics to manipulate users into clicking on the implementing AI-enabled phishing attack detection tech-
phishing link. They often use fear and urgency to create a niques. In ”Phishing Detection: A Machine Learning Ap-
sense of pressure on the user, such as threatening to sus- proach,” Singh et al. reviewed machine learning-based tech-
pend their account or urging them to take immediate action niques for detecting phishing attacks [38]. The authors
to avoid a negative consequence. They may also try to gain provided an overview of the history of phishing and high-
the user’s trust by impersonating a familiar brand or author- lighted major phishing attack reports. They classified phish-
ity figure. Phishing attacks can be delivered through various ing attacks into two types: social engineering attacks and
channels, including email, SMS, voice messages, QR codes, malware-based phishing. The authors also categorized fea-
and spoof mobile applications. In some cases, attackers tures into three groups: source code features, URL features,
may use multiple channels to increase their chances of suc- and image features. These feature categories were rule-
cess. Once the user clicks on the phishing link, they will based and used to detect phishing attacks.
be directed to a fake website that looks similar to the legit- In 2020, Vijayalakshmi et al. conducted a survey on
imate one. Attackers often use scripts to obtain logos, web major detection techniques and a taxonomy for detecting
layouts, and text from genuine web pages to make the fake phishing in their research paper [43]. They referred to

4
a statistical report from APWG to illustrate the trend of Massari, and Maria Carla Calzarossa investigates the effec-
phishing attacks between 2017 and 2019. The paper in- tiveness of different methods for detecting phishing web-
troduced a taxonomy of automated phishing detection so- sites [51]. The authors conducted a survey to evaluate the
lutions, which classified the solutions into three categories accuracy of several approaches, including machine-learning
based on the input parameters: web address-based methods, techniques, feature-based techniques, and heuristic meth-
webpage content-based solutions, and hybrid approaches. ods. They found that the combination of several techniques
The web address-based approaches were further divided is necessary to achieve a high level of accuracy in detecting
into list-based, heuristic rule-based, and learning-based ap- phishing websites. They also identified some challenges in
proaches, while web content-based solutions were catego- this field, such as the need for a large and diverse dataset
rized into rule-based and machine learning-based solutions. of phishing websites and the difficulty of staying up-to-date
The authors listed most of the state-of-the-art methodolo- with the constantly evolving phishing techniques. The pa-
gies for each category and presented the details of each so- per provides insights that could help researchers and practi-
lution. They then compared all the methods based on sev- tioners develop better methods for detecting phishing web-
eral evaluation metrics, such as classification performance, sites.
limitations, third-party service independence, and zero-hour
attack detection. Finally, the authors suggested that hybrid 5. Datasets
approaches would achieve high accuracy rates and be suit-
able for real-time systems, and deep learning-based solu- Each approach relies on data as its source and this data
tions would be a promising direction for future research. is a critical factor in determining its performance. In other
words, the quality and quantity of the data used to de-
Kalaharsha and Mehtre conducted a survey on phishing
velop and test an approach heavily influence its accuracy
detection solutions, which were classified into multiple cat-
and effectiveness. Therefore, obtaining reliable and relevant
egories based on the techniques and input parameters used,
data is essential to achieving positive outcomes in any data-
in their research paper titled ”Detecting Phishing Attacks:
driven approach. There exist two approaches for data col-
A Survey.” The authors introduced different types of phish-
lection: utilizing published datasets and extracting data by
ing attacks and three phishing techniques [19]. They listed
retrieving URLs directly from the internet. Table 3 shows
18 methods and 9 datasets for detecting phishing websites
several major data sources. In these published datasets,
and compared the accuracy performance of all the models.
every row’s data object contains several features extracted
Furthermore, the paper presented some challenges, such as
from a URL and a label of classes. The original URL strings
reducing false-positive rates and overfitting, in detecting
could be collected from websites by running pen API or
phishing attacks.
data mining scripts.
The paper by Jain and Gupta, titled ”A Comprehensive
Survey on Analyzing Phishing Attack Techniques, Detec- 6. Preprocessing And Feature Engineering
tion Methods, and Some Existing Challenges” provides a
detailed overview of phishing attack techniques, detection 6.1. Preprocessing
methods, and challenges associated with combating phish- Preprocessing is a crucial task in Natural Language Pro-
ing attacks [18]. The authors gather statistical reports on cessing (NLP), involving the cleaning and transformation
the prevalence and motivation behind phishing attacks and of unstructured text data to extract useful and meaningful
describe various techniques that attackers use to target both information. In the field of NLP, preprocessing techniques
PCs and smartphones.In the paper, the authors also present are applied to extract interesting and non-trivial knowledge
different defense methods that have been proposed to de- from text data that is often disorganized and difficult to work
tect and prevent phishing attacks. They analyze and com- with. By employing appropriate preprocessing techniques,
pare existing anti-phishing approaches published from 2006 such as tokenization, stemming, and stop-word removal,
to 2017, highlighting their advantages and limitations. To- NLP models can better understand the meaning and con-
wards the end of the paper, Jain and Gupta discuss several text of textual data, leading to improved performance and
major challenges associated with detecting and preventing more accurate results [20].
phishing attacks, such as selecting efficient features, iden-
tifying tiny URLs, and detecting attacks on smartphones.
Overall, the paper provides a comprehensive overview of 6.1.1 Tokenization
the state-of-the-art techniques for detecting and prevent- Tokenization refers to the act of dividing a continuous
ing phishing attacks, and highlights the need for further re- stream of text into smaller, meaningful elements, such as
search in this area to address the remaining challenges. words, phrases, or symbols, known as tokens. The main
A recent paper ”Phishing or Not Phishing? A Survey on goal of tokenization is to analyze the individual words or
the Detection of Phishing Websites” by Rasha Zieni, Luisa elements within a sentence. Once tokenization is complete,

5
Data Source Type Remarks
https://phishtank.com 11, 000phishing websites and over
Website
(accessed on 18 July 2021) [1] 8, 000legitimate websites.
11, 055 instances with
UCI [26] Published dataset
30 features
10, 000 instances with
Mendeley [41] Published dataset
48 features
35, 000 legitimate URLs
ISCX-URL-2016 [10] Published dataset
10,000 phishing URLs
https://openphish.com Real-time information
Valid phishing URLs
(accessed on 18 July 2021) about new phishing attacks.
https://commoncrawl.org
Wegite Legitimate URLs
(accessed on 18 July 2021)
https://www.alexa.com
Website Webte URLs
(accessed on 18 July 2021) [3]

Figure 3. Major data sources for detecting phishing websites.

the resulting list of tokens serves as the input for subsequent and reduce overfitting issues. Generally, feature selection
processing tasks. method- ologies could be classified into three categories:
the filter method, wrapper method, and embedded method.
6.1.2 Stemming Filter, wrapper, and embedded methods are three differ-
ent approaches to feature selection in machine learning.
Stemming is a linguistic process that involves reducing the
different forms of a word to a single, base form known as 7.1. Filter Method
the stem. This helps to simplify the analysis of text data by
grouping together all the variations of a word under a sin- The filter method is a type of feature selection method
gle representation. For instance, the words ”presentation,” that uses statistical measures to rank the features based on
”presented,” and ”presenting” could all be stemmed to the their correlation with the target variable. Features with
base form ”present.” This technique is useful in natural lan- higher correlation values are considered more important and
guage processing (NLP) applications, such as information are selected for use in the model. Some common statistical
retrieval and text classification, where variations in word measures used in the filter method include chi-squared test,
forms can cause issues with accurate analysis. correlation coefficient, mutual information, and ANOVA F-
test. The filter method is simple and computationally effi-
cient, but it does not consider the interactions between fea-
6.1.3 Stop Word Removal
tures and may not be able to identify the most relevant fea-
Many words in documents recur very frequently but are tures for the model.
essentially meaningless as they are used to join words to-
gether in a sentence. It is commonly understood that stop 7.2. Wrapper Method
words do not contribute to the context or content of tex- The wrapper method is another type of feature selection
tual documents. Due to their high frequency of occurrence, method that uses a subset of features to train the model and
their presence in text mining presents an obstacle in under- evaluates its performance. The wrapper method searches
standing the content of the documents. Stop words are very through all possible combinations of features and selects the
frequently used common words like ‘and’, ‘are’, ‘this’ etc. best subset that maximizes the model’s performance. This
They are not useful in classification of documents. So they method takes into account the interactions between features
must be removed. and is more accurate than the filter method. However, it is
computationally expensive and may overfit the model to the
7. Feature Engineering training data.
Feature selection is the process of automatically select-
7.3. Embedded Method
ing important features which contribute the most to the ma-
chine learning model. Having closely relevant features in The embedded method is a type of feature selection
the input can enhance the performance of the model, de- method that combines feature selection and model training
crease training time (especially in deep learning models), into a single step. The embedded method is typically used in

6
algorithms that have built-in feature selection mechanisms, be mapped to two or more discrete classes. Logistic regres-
such as regularized regression, decision trees, and gradi- sion works well when the relationship in the data is almost
ent boosting. The embedded method optimizes the model linear despite if there are complex nonlinear relationships
and selects the most relevant features simultaneously. This between variables, it has poor performance. Besides, it re-
method is efficient and accurate, but it may be limited to the quires more statistical assumptions before using other tech-
specific algorithm used and may not perform well on other niques [47, 27].
models. Let’s recall the equation of simple linear regression.

8. Machine Learning Approach ŷ = β0 + β1 x


where β0 and β1 are the regression coefficients and x is
the input feature. In logistic regression, we pass the output
of the linear regression ŷ to a function known as the sigmoid
function. The sigmoid function is of the following form:

1 1
h(x) = g(z) = −z
= −(β
1+e 1 + e 0 +β1 x)
8.2. K Near Neighbors
K-Nearest Neighbors (KNN) is one of the simplest algo-
rithms used in machine learning for regression and classifi-
cation problems which is non-parametric and lazy. In KNN
there is no need for an assumption for the underlying data
distribution. KNN algorithm uses feature similarity to pre-
dict the values of new datapoints which means that the new
data point will be assigned a value based on how closely
it matches the points in the training set. The similarity be-
tween records can be measured in many different ways.
A popular choice is the Euclidean distance given by
v
u n
uX 2
d (p, q) = t (qi − pi )
i=1

Once the neighbors are discovered, the summary predic-


tion can be made by returning the most common outcome
or taking the average. As such, KNN can be used for classi-
fication or regression problems. There is no model to speak
of other than holding the entire training dataset [28].
8.3. Support Vector Machine
Support vector machines (SVMs) are one of the most
popular classifiers. The idea behind SVM is to get the clos-
est point between two classes by using the maximum dis-
tance between classes. This technique is a supervised learn-
ing model used for linear and nonlinear classification.
Figure 4. Phishing Detection Approaches
Nonlinear classification is performed using a kernel
function to map the input to a higher-dimensional feature
8.1. Logistic regression
space. Although SVMs are very powerful and are com-
Logistic Regression is a classification algorithm used to monly used in classification, it has some weakness. They
assign observations to a discrete set of classes. Unlike linear need high calculations to train data. Also, they are sensitive
regression which outputs continuous number values, Logis- to noisy data and are therefore prone to over-fitting.
tic Regression transforms its output using the logistic sig- There are some of the kernels that are more widely used
moid function to return a probability value which can then than others. Some of the widely used kernels are:

7
regression, and feature selection, making it a versatile and
powerful tool in data [28, 27].

8.5. Ada-Boost
From some aspects, Ada-boost is like Random Forest,
the Ada-Boost classification like Random Forest groups
weak classification models to form a strong classifier. A
single model may poorly categorize objects. But if we com-
bine several classifiers by selecting a set of samples in each
iteration and assign enough weight to the final vote, it can
be good for the overall classification. Trees are created se-
quentially as weak learners and correcting incorrectly pre-
Figure 5. Support Vector Machine dicted samples by assigning a larger weight to them after
each round of prediction. The model is learning from pre-
vious errors. The final prediction is the weighted majority
8.3.1 Linear Kernel
vote (or weighted median in case of regression problems).
K(x, z) = x⊺ z In short Ada-Boost algorithm is repeated by selecting the
training set based on the accuracy of the previous training.
The weight of each classifier trained in each iteration de-
8.3.2 Polynomial Kernel
pends on the accuracy obtained from previous ones .
Polynomial kernel represents the similarity of vectors over
the training sample in a polynomial feature space of original
8.6. Gradeint Boosting
vectors. Gradient Boosting trains many models incrementally and
sequentially. The main difference between Ada-Boost and
K(x, z) = (1 + x⊺ z)p Gradient Boosting Algorithm is how algorithms identify the
shortcomings of weak learners like decision trees. While
Here p is the degree of the polynomial. With p = 2, we the Ada-Boost model identifies the shortcomings by using
can have a quadratic model, with p = 3, cubic and so on. high weight data points, Gradient Boosting performs the
When p = 1, we have a linear kernel. same methods by using gradients in the loss function. The
loss func- tion is a measure indicating how good the models
8.3.3 Radial Basis Function Kernel (RBF kernel) coefficients are at fitting the underlying data. A logical un-
derstanding of loss function would depend on what we are
RBF is a very popular kernel and is used widely in many
trying to optimize.
applications. RBF kernel for some feature vector x and z is
defined as 8.7. GBoost
XGBoost is a refined and customized version of a Gradi-
 
−∥x−z∥2
2σ 2
K(x, z) = e ent Boosting to provide better performance and speed. The
2
most important factor behind the success of XGBoost is
In the above equation, you can see the term ∥x − z∥ its scala- bility in all scenarios. The XGBoost runs more
which is the squared euclidian distance between x and z. than ten times faster than popular solutions on a single ma-
[27]. chine and scales to billions of examples in distributed or
memory-limited set- tings. The scalability of XGBoost is
8.4. Random Forest
due to several important algorithmic optimizations. These
Random forest is a popular machine learning algorithm innovations include a novel tree learning algorithm for han-
that is widely used in data science and research. It is an dling sparse data; a theoretically justified weighted quantile
ensemble learning method that combines multiple decision sketch procedure enables handling instance weights in ap-
trees to create a robust and accurate model. In random for- proximate tree learning. Parallel and dis- tributed comput-
est, each decision tree is constructed using a subset of the ing make learning faster which enables quicker model ex-
available features and training data, and the final prediction ploration. More importantly, XGBoost exploits out- of-core
is made by aggregating the results of all the trees. This ap- computation and enables data scientists to process hundreds
proach helps to reduce overfitting and improve the overall of millions of examples on a desktop. Finally, it is even
performance of the model. Random forest can be applied to more exciting to combine these techniques to make an end-
a wide range of research problems, including classification, to-end system that scales to even larger data with the least

8
amount of cluster resources. 3, data flows from the first layer to the last layer. Different
layers may perform different transformations on their input.
8.8. Stacking The weights of neurons are set randomly at the start of the
This technique involves training multiple models and us- training and they are gradually adjusted by the help of the
ing the predictions of each model as input to a meta-model gradient descent method to get close to the optimal solution.
that makes the final prediction. The power of neural networks is due to the non-linearity of
Authors in Mahmoud et al. [29] proposed a stack- hidden nodes. As a result, introducing non-linearity in the
ing model using RF, KNN, DT, LDA and BNB as network is very important so that you can learn complex
a base classifier(level 0), while using Logistic Regres- functions [27].
sion (LR) as a meta-model (level 1) for detecting the
phishing website. They used data from Grega et al.
8.10. CNN
[42] which contained phishing and legitimate website in- Convolutional Neural Networks (CNNs) are a type of
stances. There are two different versions of this dataset, deep learning algorithm that has proven to be highly effec-
one with a total of 58,645(Legitimate:27,998, Phish- tive in a wide range of computer vision tasks, such as image
ing:30,647)instances and the second version consists of classification, object detection, and segmentation.
88,647(Legitimate:58,000,Phishing:30,647) instances, with CNNs are designed to automatically learn features from
more instances with label legitimate. The datasets in total input data by using convolutional layers, which apply filters
contain 111 features except for the class. The features of the to extract useful patterns and features from the input. The
datasets are divided into six groups based on: URL proper- output of each convolutional layer is then passed through
ties,Domain properties,URL directory properties,URL file non-linear activation functions, such as ReLU, to introduce
properties,URL parameter properties,URL external metrics non-linearity into the network.
and resolving data. Stacking models improve the exhibition Pooling layers are often used after convolutional layers
of the classifiers in terms of precision, F-measure, and ROC to reduce the spatial size of the input and increase the ro-
region. Experimental results reveal that by utilizing sacking bustness of the network to variations in input data.
mod-els, they concluded that the ensemble model grants ac- In addition to convolutional and pooling layers, CNNs
curacy 97.49% for dataset 1 and 98.69% for dataset 2. also typically include fully connected layers, which map the
extracted features to the final output. The entire network
8.9. Artificial Neural Networks
is trained end-to-end using backpropagation and gradient
descent to minimize a given loss function [24, 21, 15, 40,
48, 39, 46, 37, 50, 17, 16, 11].

8.11. RNN
A recurrent neural network (RNN) is a type of neural net-
work that is commonly used for processing sequential data.
RNNs have a feedback loop in their architecture, which al-
lows them to take into account previous inputs and their own
previous state when making predictions.
Mathematically, an RNN can be expressed as follows:
At time step t, the hidden state of the RNN is denoted as
ht , and the output is denoted as yt . The input at time t is
denoted as xt .
The hidden state at time t is calculated as:
ht = f (Wh h ∗ ht−1 + Wh x ∗ xt + bh )
Where Wh h is the weight matrix for the hidden state,
Figure 6. Neural network Wh x is the weight matrix for the input, and bh is the bias
term for the hidden state. The function f is an activa-
Artificial neural networks (ANNS) are a learning model tion function that applies a nonlinearity to the sum of the
roughly inspired by biological neural networks. These mod- weighted inputs.
els are multilayered, each layer containing several process- The output at time t is calculated as:
ing units called neurons. Each neuron receives its input yt = g(Wy h ∗ ht + by )
from its adjacent layers and computes its output with the Where Wy h is the weight matrix for the hidden state, by
help of its weight and a non-linear function called the ac- is the bias term for the output, and g is an activation function
tivation function. In feed-forward neural networks like in that applies a nonlinearity to the sum of the weighted inputs.

9
During training, the RNN learns the optimal values of
the weight matrices and bias terms by minimizing a loss
function, such as mean squared error, with respect to these
parameters. This is typically done using backpropagation
through time, which is an extension of the backpropagation
algorithm used in standard feedforward neural networks
[33].

8.12. LSTM
Long Short-Term Memory (LSTM) is a type of recurrent
neural network (RNN) that has been proven to be highly ef-
fective in a wide range of sequence modeling tasks, such as
natural language processing, speech recognition, and time
series forecasting.
LSTM networks are designed to overcome the vanish-
ing gradient problem of traditional RNNs, which occurs
when gradients become exponentially small as they prop-
agate back through time, making it difficult to learn long-
term dependencies. LSTM networks address this issue by
introducing a memory cell, which allows the network to se-
lectively remember or forget information over time.
The memory cell is composed of three gates: an input
gate, an output gate, and a forget gate. The input gate con-
trols which information is updated and added to the mem-
ory cell, the forget gate determines which information is
removed from the memory cell, and the output gate decides
which information from the memory cell is used to make
predictions.
The weights of the gates are learned during training us-
ing backpropagation through time, a variant of backpropa-
gation that is used to train RNNs. The network is trained
to minimize a given loss function, such as cross-entropy
for classification tasks or mean squared error for regression
tasks [12]. Figure 7. Transformer

8.13. Transformer
The Transformer is a deep learning algorithm that was the input into multiple heads, each of which computes a
introduced in 2017 and has since become a popular choice separate attention score and output embedding. The out-
for a wide range of natural language processing tasks, in- puts of the multiple heads are then concatenated and passed
cluding language translation, text summarization, and ques- through a feedforward network to produce the final output.
tion answering [34]. In addition to self-attention and multi-head attention, the
Transformer also includes positional encoding, which al-
The Transformer is based on a self-attention mechanism, lows it to take into account the order of the input sequence.
which allows it to weigh the importance of different parts The positional encodings are added to the input embeddings
of the input sequence when making predictions. This is ac- to provide the network with positional information.
complished by computing an attention score for each pair of 11. K-Means Clustering
input positions, which reflects the relevance of one position K-means clustering is a popular machine learning algo-
to another. The attention scores are then used to compute rithm used for unsupervised learning tasks, such as clus-
a weighted sum of the input embeddings, which forms the tering and data segmentation. The goal of K-means is to
output of the self-attention layer. partition a given dataset into K distinct clusters, where each
The Transformer also includes a multi-head attention cluster represents a group of similar data points.
mechanism, which allows it to attend to different aspects The algorithm works by first randomly selecting K initial
of the input simultaneously. This is achieved by splitting centroids, which serve as the centers of each cluster. Then,

10
each data point is assigned to the cluster whose centroid rate of false warnings, which is critical in real-time phishing
is closest to it. This is done by computing the Euclidean detection systems since it directly impacts user experience
distance between each data point and the centroids, and as- and trust.
signing the point to the cluster with the nearest centroid. TP
Once all data points have been assigned to a cluster, the Precision = T P +F P
centroids are recalculated as the mean of all the data points Recall is another important metric, which measures the
in their respective cluster. This process of assigning points fraction of positive data points that are correctly identi-
to clusters and updating the centroids is repeated until con- fied as positive by the model, out of all truly positive data
vergence is reached, typically when the centroids no longer points. False-negative cases (FN) represent the number of
move or the change is below a specified threshold. phishing URLs that the model failed to detect. In the con-
text of security systems, such as phishing detection, false-
9. Evaluation Matrices negative cases could lead to security breaches and data leak-
age, which could harm users significantly. Therefore, it is
During the testing process, performance evaluation was crucial to minimize the number of false negatives.
conducted by dividing the original dataset into training data
In such scenarios, issuing false-positive alarms, which
and test data, typically 80% and 20% respectively. The clas-
indicate the presence of a phishing attack where none ex-
sifier’s behavior was evaluated on the testing dataset using
ists, can be less damaging than missing a real attack. False
four statistical numbers: TP (the number of correctly identi-
alarms could be annoying and may affect user experience,
fied positive data points), TN (the number of correctly iden-
but it is still preferable to err on the side of caution, as miss-
tified negative data points), FP (the number of negative data
ing a real attack could have severe consequences.
points labeled as positive by the classifier), and FN (the
number of positive data points labeled as negative by the TP
Recall = T P +F N
model). The details are presented in the table.
Prediction outcome The F-measure or F-score is a commonly used metric
p n total that takes into account both precision and recall to provide
an overall assessment of the model’s performance. It is typ-
ically calculated as the harmonic mean of precision and re-
True False
actual

p′ P′ call and is expressed as follows:


value

Positive Negative
(β 2 +1)×P recision×Recall
Fβ = β 2 ×P recision+Recall β ∈ (0, ∞)
In the F-measure formula, the parameter β is used to
′ False True ′ weigh the importance of precision and recall differently.
n N
Positive Negative When β is set to 1, precision and recall are given equal
importance, and the metric is referred to as F1-score. The
total P N F1-score is a widely used measure in binary classification
Various metrics are commonly utilized to assess perfor- tasks and is particularly useful when precision and recall
mance. One of the most widely used is classification accu- are of equal importance. In other words, F1-score provides
racy, which is calculated as the proportion of accurate pre- a balance between precision and recall, which is crucial in
dictions to the total number of predictions made: real-world scenarios.
The F-score does the best job of any single statistic, but
T P +T N
accuracy = T P +T N +F N +F P all four work together to describe the performance of a clas-
sifier:
In binary classification scenarios, it is widely recognized
that random guessing would result in an accuracy of 50%. F1 = 2×P recision×recall
= TP
P recision+recall T P + 12 (F P +F N )
However, in the case of imbalanced datasets, a high ac-
curacy score does not necessarily indicate a high-quality In addition to the evaluation metrics discussed earlier,
model. For instance, consider a dataset of 10,000 web- many researchers utilize the N-fold cross-validation tech-
sites, of which 9,000 are legitimate and 1,000 are phishing nique to assess the performance of phishing detection mod-
sites. Even if the prediction model did nothing, it would still els. This technique is widely used, especially when dealing
achieve an accuracy score of 90%, which could be mislead- with small datasets, and involves dividing the original data
ing. In such cases, precision becomes an important metric. samples into N subsets after shuffling the dataset randomly.
Precision is the percentage of correctly identified positive One of the subsets is used for testing the model, while the
data points among those that the model predicted as posi- remaining subsets are used for training the model. The pro-
tive. The number of false-positive cases (FP) indicates the cess is repeated N times, and each time a different subset

11
is used for testing. The results obtained from each fold are of machine learning models.
then averaged to obtain an overall performance estimate for It is also crucial to note that in order to advance the state
the model. of the art, experiments must be reproducible. This means
Typically, N is set to 10 or 5, but this can vary depend- that all implementation details must be clearly stated, and
ing on the dataset size and complexity of the model being the datasets used should be made available to the public.
evaluated. The N-fold cross-validation technique is a valu- Lastly, we believe that education is crucial for individu-
able tool for evaluating model performance as it provides als who are often the weakest link in the chain. Therefore,
a more robust estimate of the model’s performance, even some form of phishing countermeasure education should be
with limited data. It also helps to reduce the risk of overfit- included in any solution.
ting, which can occur when the model is trained on a small,
biased subset of the data. References
[1] PhishTank—Join the Fight against Phishing. https://
10. Conclusion www.phishtank.com/index.php. Accessed on 18
Phishing is a persistent and successful security threat that July 2021. 6
impacts both individuals and targeted organizations. De- [2] Mafaz Alanezi. Phishing detection methods: A review. Tech-
nium: Romanian Journal of Applied Sciences and Technol-
spite its longevity, it remains one of the most prevalent at-
ogy, 3:19–35, 11 2021. 3
tack methods utilized today. As attackers continue to de-
[3] Alexa. Keyword research, competitive analysis, & website
velop increasingly sophisticated social engineering and eva- ranking—alexa, no year. Accessed: 18 July 2021. 6
sion tactics, detecting and preventing these attacks becomes [4] Ali Aljofey, Qingshan Jiang, Qiang Qu, Mingqing Huang,
ever more challenging. As such, research is essential in and Jean-Pierre Niyigena. An effective phishing detection
identifying effective countermeasures. model based on character level convolutional neural network
Our research survey has revealed that there has been a from url. Sensors, 21(5):1612, 2021. 13
significant focus on detecting phishing websites through [5] APWG. Phishing activity trends report., 2022.
various research efforts. Machine learning-based methods https://docs.apwg.org/reports/apwgt rendsr eportq 32 022.pdf.2
have gained popularity due to their ability to detect zero- [6] V. B. et al. Study on phishing attacks. International Journal of
hour attacks and handle newly discovered phishing web Computer Applications, 2018. 2
pages efficiently. However, to combat phishing more effec- [7] P. Babu, Lalitha Bhaskari, and CH.Satyanarayana. A comprehen-
tively, it is crucial to anticipate the tactics used by attackers sive analysis of spoofing. International Journal of Advanced Com-
and address the gaps in current research. In the following puter Sciences and Applications, 01 2011. 2
sections, we outline the main research gaps identified from [8] Abdul Basit, Muhammad Zafar, Xiaodong Liu, Aneeqa Rashid
Javed, Zafar Jalil, and Kashif Kifayat. A comprehensive survey
our survey.
of ai-enabled phishing attacks detection techniques. Telecommu-
One of the research gaps identified is the increased use
nication Systems, 76(1):139–154, 2020. 4
of URL shortening services by attackers to mask the true [9] Ram Basnet, Andrew Sung, and Qingzhong Liu. Rule-based
phishing URLs, which poses a challenge for list-based ap- phishing attack detection. 04 2012. 3
proaches and the management of blacklists. This also af- [10] Canadian Institute for Cybersecurity—UNB. URL
fects machine learning-based approaches as most URL fea- 2016—Datasets—Research—Canadian Institute for Cyber-
tures recommended in the literature become irrelevant in security—UNB, 2016. Accessed: 18 July 2021. 6
this scenario, resulting in the failure of detection mecha- [11] François Chollet. Xception: Deep learning with depthwise sep-
nisms. Additionally, there are other unresolved issues as- arable convolutions. In Proceedings of the IEEE conference on
sociated with features that are related to attackers’ evasion computer vision and pattern recognition, pages 1251–1258, 2017.
techniques. It is not enough to simply retrain a machine 9
learning model whenever new data becomes available; there [12] Ashit Kumar Dutta. Detecting phishing websites using machine
is a need to quickly identify the tactics employed by these learning technique, 2021. 10
ever-evolving attacks and automatically extract suitable fea- [13] Muna Elsadig, Ashraf Osman Ibrahim, Shakila Basheer,
tures. Therefore, further research efforts should be directed Manal Abdullah Alohali, Sara Alshunaifi, Haya Alqahtani, Nihal
Alharbi, and Wamda Nagmeldin. Intelligent deep machine learn-
towards addressing these challenges.
ing cyber phishing url detection based on bert features extraction.
Exploring model explainability is another valuable area Applied Sciences, 11(2):876, 2021. 13
of research within the context of machine learning. Being [14] FBI. Internet crime complaint center (ic3),annual reports, 2022.
able to understand how machine learning algorithms make https://www.ic3.gov/Home/AnnualReports. 1
decisions, such as which characteristics indicate whether a [15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
web page is legitimate or phishing, has significant implica- residual learning for image recognition. In Proceedings of the
tions for the design of security systems. Furthermore, inves- IEEE conference on computer vision and pattern recognition,
tigating adversarial attacks can help improve the robustness pages 770–778, 2016. 9

12
Model or Algorithm Type Dataset Accuracy
46, 839 instances from
Recurrent Neural Networks [32] Single 99.08
PhishTank, OpenPhish, and Common Crawl
245 instances from
BLSTM [44] Single 95.47
UCI
83, 857 instances from PhishTank
CNN [4] Single 98.58
UCI
549, 346 instances from Kaggle
BERT [13] Single 96.66
UCI
100, 000 instances from Kaggle ISCX-URL-2016
VAE [31] Single 97.45
UCI

Figure 8. Comparison of major five deep learning state-of-the-art solutions.

[16] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. [29] Mahmoud Othman and Hesham Hassan. An empirical study
Weinberger. Densely connected convolutional networks. In Pro- towards an automatic phishing attack detection using ensemble
ceedings of the IEEE conference on computer vision and pattern stacking model. Journal of Information Security and Applications,
recognition, pages 4700–4708, 2017. 9 61:102872, 2021. 9
[17] Sergey Ioffe and Christian Szegedy. Batch normalization: Accel- [30] Ani Petrosyan. Worldwide digital population 2023, 2023.
erating deep network training by reducing internal covariate shift. https://www.statista.com/statistics/617136/digital-population-
In Proceedings of the 32nd International Conference on Machine worldwide/. 1
Learning, pages 448–456, 2015. 9 [31] Manoj Kumar Prabakaran, Abinaya Devi Chandrasekar, and Par-
[18] Akanksha K. Jain and Brij B. Gupta. A survey of phishing attack vathy Meenakshi Sundaram. An enhanced deep learning-based
techniques, defence mechanisms and open research challenges. phishing detection mechanism to effectively identify malicious
Enterprise Information Systems, pages 1–39, 2021. 5 urls using variational autoencoders. Security and Communication
[19] Poornachandra Kalaharsha and B.M. Mehtre. Detecting phishing Networks, 2021, 2021. 13
sites–an overview. arXiv preprint arXiv:2103.12739, 2021. 5 [32] Aman Rangapur, Tarun Kanakam, and P Dhanvanthini. Phish-
[20] S Kannan, V Gurusamy, and S Vijayarani. Preprocessing tech- defence: Phishing detection using deep recurrent neural networks.
niques for text mining. International Journal of Computer Appli- IEEE Access, 9:51488–51500, 2021. 13
cations, 101(5):1–7, 2014. 5 [33] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.
[21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Ima- Learning representations by back-propagating errors. nature,
genet classification with deep convolutional neural networks. In P. 323(6088):533–536, 1986. 10
Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Wein- [34] Muhammad Sanwal. A hybrid phishing detection model based on
berger, editors, Advances in Neural Information Processing Sys- transformer characterbert from urls, 2021. 10
tems 25, pages 1097–1105. Curran Associates, Inc., 2012. 9 [35] Iqbal Sarker. Machine learning: Algorithms, real-world applica-
[22] Katharina Krombholz, Heidelinde Hobel, Markus Huber, and tions and research directions. SN Computer Science, 2, 03 2021.
Edgar Weippl. Advanced social engineering attacks. Journal of 2
Information Security and Applications, 22:113–122, 2015. 3 [36] Vahid Shahrivari, Mohammad Mahdi Darabi, and Mohammad
[23] I-F Lam, W-C Xiao, S-C Wang, and K-T Chen. Counteracting Izadi. Phishing detection using machine learning techniques.
phishing page polymorphism: An image layout analysis approach. arXiv preprint arXiv:2009.11116, 2020. 3
In International Conference on Information Security and Assur- [37] Karen Simonyan and Andrew Zisserman. Very deep convolutional
ance, pages 270–279. Springer, 2009. 2 networks for large-scale image recognition. In International Con-
[24] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based ference on Learning Representations, 2015. 9
learning applied to document recognition. Proceedings of the [38] Chhaya Singh. Phishing website detection based on machine
IEEE, 86(11):2278–2324, 1998. 9 learning: A survey. In Proceedings of the 2020 6th International
[25] Amaad Mirza, Sohail Asghar, Ayesha Zafar, and Saira Gilani. A Conference on Advanced Computing and Communication Systems
hybrid model to detect phishing-sites using supervised learning (ICACCS), pages 274–279, Coimbatore, India, March 2020. IEEE.
algorithms. pages 1126–1133, 12 2016. 3 4
[26] Rami M. A. Mohammad, Lee McCluskey, and Fadi Thab- [39] Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. High-
tah. UCI machine learning repository: Phishing websites data way networks. In International Conference on Machine Learning,
set. https://archive.ics.uci.edu/ml/datasets/ pages 646–654. PMLR, 2015. 9
Phishing+Websites, 2015. Accessed on 26 March 2015. 6 [40] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott
[27] Bhanu Teja Mummadi. Detection of phishing websites using su- Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke,
pervised learning, 2022. 7, 8, 9 and Andrew Rabinovich. Going deeper with convolutions. In Pro-
[28] Mahmoud Othma. An empirical study towards an automatic ceedings of the IEEE conference on computer vision and pattern
phishin, 2022. 7, 8 recognition, pages 1–9, 2015. 9

13
[41] Chia Li Tan. Phishing dataset for machine learning: Feature eval-
uation. 2018. 6
[42] Grega V., Iztok F., and Vili P. Datasets for phishing websites de-
tection. Data in Brief, 3:105196, 2020. 9
[43] M. Vijayalakshmi, S. M. Shalinie, and Ming-Hsuan Yang. Web
phishing detection techniques: A survey on the state-of-the-art,
taxonomy and future directions. IET Networks, 9(4):235–246,
2020. 4
[44] Shan Wang, Sulaiman Khan, Chuyi Xu, Shah Nazir, and Abdul
Hafeez. Deep learning-based efficient model development for
phishing detection using random forest and blstm classifiers. Sen-
sors, 21(3):976, 2021. 13
[45] Ibrahim Waziri. 1 website forgery : Understanding phishing at-
tacks &. 2015. 2
[46] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks.
In Proceedings of the British Machine Vision Conference, pages
87.1–87.12. BMVA Press, 2016. 9
[47] Ammara Zamir. Phishing web site detection using diverse machine
learning algorithms, 2020. 7
[48] Matthew D Zeiler and Rob Fergus. Visualizing and understand-
ing convolutional networks. In European conference on computer
vision, pages 818–833. Springer, 2014. 9
[49] H. Zhang. A survey on phishing detection techniques, 2020. 1
[50] Xieyuanli Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun.
Polynet: A pursuit of structural diversity in very deep networks.
In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 3900–3908, 2017. 9
[51] Rasha Zieni, Luisa Massari, and Maria Carla Calzarossa. Phishing
or not phishing? a survey on the detection of phishing websites.
IEEE Access, 7:112758–112778, 2019. 5

14

View publication stats

You might also like