0% found this document useful (0 votes)
10 views9 pages

Research Paper

Uploaded by

divinepeepee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Research Paper

Uploaded by

divinepeepee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Phishing Website Detection

Mohd Shariyab Praveen Singh Dr. Sadhana Rana


Computer Science and Engineering Computer Science and Engineering (Professor)
SRMCEM SRMCEM Computer Science and Engineering
Lucknow, India Lucknow, India SRMCEM
[email protected] [email protected] Lucknow, India
[email protected]

Abstract— Phishing is an online threat where an designed to trick the user for downloading the malware
attacker impersonates an authentic and trustworthy and make user to share the sensitive information or they
organization to obtain sensitive information from a make user to share the personal data. Personal data can
victim. One example of such is trolling, which has be anyone’s bank account details, card numbers, any
long been considered a problem. However, recent social media id or the login credentials.
advances in phishing detection, such as machine Phishing is the most common type of the social
learning-based methods, have assisted in combatting engineering attack. The practice of deceiving,
these attacks. Therefore, this paper develops and pressuring or manipulating people into sending
compares four models for investigating the efficiency information or assets to the wrong people. Social
of using machine learning to detect phishing domains. engineering attacks rely on human error and pressuring
It also compares the most accurate model of the four tactics for the success. The attacker typically
with existing solutions in the literature. The work masquerades as a person or organization the victim
carried out in this study is an update in the previous trusts—e.g., a coworker, a boss, a company the victim or
systematic literature surveys with more focus on the victim’s employer does business with—and creates a
latest trends in phishing detection techniques. This sense of urgency that drives the victim to act rashly.
study enhances readers' understanding of different Hackers and fraudsters use these tactics because it’s
types of phishing website detection techniques, the easier and less expensive to trick people than it is to hack
data sets used, and the comparative performance of into a computer or network.
algorithms used. Our findings show that the model
Typically, phishing attack exploits the social
based on the K means clustering is the most accurate
of the other four techniques and outperforms other engineering to lure the victim through sending a spoofed
link by redirecting the victim to a fake web page. The
solutions in the literature.
spoofed link is placed on the popular web pages or sent
Keywords— phishing detection, machine via email to the victim. The fake webpage is created
learning, phishing domains, artificial neural similar to the legitimate webpage. Thus, rather than
networks, support vector machine, decision tree, directing the victim request to the real web server, it will
random forest. be directed to the attacker server. The current solutions
of antivirus, firewall and designated software do not
fully prevent the web spoofing attack.
The implementation of Secure Socket Layer (SSL) and
I. INTRODUCTION digital certificate (CA) also does not protect the web
The rapid evolution of the technology has brought
user against such attack. In web spoofing attack, the
unpredictable convenience to our lives. But it has also
attacker diverts the request to fake web server. In fact, a
given rise to a significant threat-phishing attacks. Social
certain type of SSL and CA can be forged while
engineering attacks are common security threat which
everything appears to be legitimate. According to,
are used to reveal the private and confidential
secure browsing connection does virtually nothing to
information by simply tricking the user without being
protect the users especially from the attackers that have
detected. Phishing attacks are basically fraudulent
knowledge on how the “secure” connections actually
emails, text messages, phone calls, websites that are
work. This paper develops an anti-web spoofing solution
based on inspecting the URLs of fake web pages. This training step is completed, the model can be
solution developed series of steps to check applied to test data. This method allows
characteristics of websites Uniform Resources Locators results to be predicted and then compared
(URLs). to expected results [3]. Figure 2 shows how
Our Phishing detection website project is a proactive each tree is responsible for producing
response to the escalating cyber threats that exploits different products when given an
human vulnerability. The website is meticulously independent random sample.
designed to combat phishing attempts by employing The random forest is used for its error
advanced algorithms, machine learning, and real-time generalization technique, and the random
data analysis. By leveraging these technologies, our forest’s accuracy improves as the forest
platform will empower users to identify and thwart grows in size. After randomly picking the
phishing attacks effectively, thereby safeguarding features for the error rate, the accuracy is
their sensitive information from failing into the wrong entirely dependent on the correlation
hands. between the trees. The random forest’s
characteristics might be created by tracking
the error and correlation between nodes. As
a consequence, the relevance of a variable
II. BACKGROUND
can be measured.
Some machine learning algorithms that are currently
being used and have proven efficient in phishing
domain detection, some of these are:

1. Random Forest
Random forest is a collection of supervised
learning algorithms for classification and
regression used in predictive modeling and
machine learning [1]. Random forest has
attracted attention due to its fast
distribution and high accuracy. It
aggregates the results and predictions of
various decision trees to select the best
results: class type (most common value in
the decision tree) or average predictions.
Random forest divides the data set into two
parts: training and testing. It then randomly
selects many examples from the training.
Then, for each example, the researchers
used a decision tree that divided each option
into two children using the optimal
distribution. After that, users must repeat
the last step to vote for each prediction and
choose the prediction with the most votes as
the final result. The main hyperparameters
in random forest are used to increase the
predictive power of the model or make the
model faster [2]. In this case more trees can
improve performance and make predictions
more stable, but can also increase
processing time. Using the maximum
number of pages in addition to the minimum
number of pages can improve the
performance of the algorithm. Once the Figure 2. A comparison of DT and RF [4].
2.Support Vector Machine 3.Gradient Boosting
SVM is a supervised learning method based Gradient Boosting algorithms have
on pattern recognition and regression emerged as a focal point in machine
study. Scientific research can identify the learning research owing to their
key factors needed to successfully learn exceptional performance across a wide
specific, simple algorithms; Most range of predictive tasks. In research
applications in the world need to use papers, Gradient Boosting is frequently
complex tools and algorithms (such as scrutinized for its ability to enhance
neural networks); This is also very predictive accuracy, particularly when
important in theory. It is difficult to define. confronted with extensive and intricate
SVM is the intersection of learning theory datasets. Scholars often delve into the
and practice. The models they create are algorithm's nuances, proposing innovative
both complex (for example, they feature a enhancements such as novel loss functions,
large class of neural networks) and yet regularization methods, or optimization
simple enough to be analyzed strategies to augment performance or tackle
mathematically. This is because SVM is a specific challenges like overfitting.
linear algorithm in high-dimensional space Moreover, the applicability of Gradient
[5]. As shown in Figure 3 SVM predicts Boosting across diverse domains such as
labels by creating a decision boundary (like finance, healthcare, natural language
a general plane) with at least one label processing, and computer vision is a
between two groups. Data points and common subject of investigation, with
support vectors are controlled by researchers examining its comparative
hyperplanes. Uses the distance between efficacy against other machine learning
data points to classify each group techniques and tailoring its implementation
independently. to accommodate specific data
Previous research has demonstrated that characteristics or tasks.
the hyperplane with the greatest margin of As scalability can be a concern due to the
separation between the two classes offers sequential nature of Gradient Boosting,
the highest generalization performance research papers frequently explore methods
[6].The best hyperplane is found by solving to improve efficiency, including
a convex optimization problem involving the parallelization, distributed computing, or
minimization of a quadratic function under hardware acceleration. Additionally, efforts
linear inequality constraints. The answer to enhance the interpretability of Gradient
maybe expressed in terms of support Boosting models are prevalent, with
vectors, which are a subset of the training researchers devising techniques such as
instances. Support vectors include all the feature importance analysis, partial
information required to solve a dependence plots, and model visualization
classification issue since the result will to elucidate the inner workings of these
remain the same even if all other vectors are complex algorithms.
removed. Through benchmarking and comparative
studies, researchers aim to elucidate the
strengths and weaknesses of Gradient
Boosting, thus contributing to the
advancement of machine learning
methodologies and applications.
There is a technique called the Gradient
Boosted Trees whose base learner is CART
(Classification and Regression Trees). The
below diagram explains how gradient-
boosted trees are trained for
regression problems.
Figure 3. Support vector machine [2].
4.Logistic Regression
Logistic regression uses a logistic function
called a sigmoid function to map
predictions and their probabilities. The
sigmoid function refers to an S-shaped
curve that converts any real value to a range
between 0 and 1.

Moreover, if the output of the sigmoid


function (estimated probability) is greater
The ensemble consists of M trees. Tree1 is than a predefined threshold on the graph,
trained using the feature matrix X and the the model predicts that the instance belongs
labels y. The predictions labeled y1(hat) are to that class. If the estimated probability is
used to determine the training set residual less than the predefined threshold, the
errors r1. Tree2 is then trained using the model predicts that the instance does not
feature matrix X and the residual errors r1 belong to the class.
of Tree1 as labels. The predicted results
r1(hat) are then used to determine the The sigmoid function is referred to as an
residual r2. The process is repeated until all activation function for logistic regression
the M trees forming the ensemble are and is defined as:
trained. There is an important parameter
used in this technique known as Shrinkage.
Shrinkage refers to the fact that the
prediction of each tree in the ensemble is
shrunk after it is multiplied by the learning
rate (eta) which ranges between 0 to 1.
There is a trade-off between eta and the where,
number of estimators, decreasing learning • e = base of natural logarithms
rate needs to be compensated with • value = numerical value one wishes to
increasing estimators in order to reach transform.
certain model performance. Since all trees
are trained now, predictions can be made.
Each tree predicts a label and the final
prediction is given by the formula,

y(pred) =
y1 + (eta * r1) + (eta *r2)+…+(eta *rN)
Table 1. Comparison table of the latest research focusing on
III. RELATED WORK machine learning phishing detection techniques
In general, users will ignore website URLs. This
increases their chances of falling into phishing
domains, which can be prevented by determining
whether the URL is genuine. Unfortunately, modern
methods for detecting phishing attacks have limited
accuracy and detect only 20% of attempts. Machine
learning techniques for phishing detection can
produce better results, but they are time-consuming
and not scalable even with small databases.
Additionally, heuristic-based phishing detection has
a false positive rate. Previous research on anti-
phishing models has focused on strategies to change
performance.
However, the use of reduced and integrated models
can increase the accuracy of these models. Machine
learning algorithms for phishing domain detection
are popular and their use has become a simple
classification problem. To build an ML detection
model, the cell data must contain features related to
phishing and legitimate websites in the cluster.
Previous studies have shown that detection accuracy
is high when using robust machine learning. Various
IV. METHODOLOGY
selection strategies are used to reduce features. To Utilizing the Kaggle dataset, four phishing
train a machine learning model to predict phishing detection models were developed using K means
attacks and legitimate traffic, a dataset needs to be clustering algorithms. The normalization feature was
provided as input. employed as a preprocessing strategy to improve the
When features are reduced, dataset visualization models' accuracy. The proposed models were able to
becomes more efficient and easier to understand. The detect different types of attacks from the UCI dataset.
most important products of DT, C4.5, k-NN and SVM The following subsections discuss the dataset used
algorithms are; They have used many research and implemented algorithms; Sections 4.1 and 4.2,
projects and investigated phishing attacks with the respectfully.
most accurate and effective results. As empirical tests
1. Dataset Used
show, manually adjust parameters and training
The dataset is borrowed from Kaggle,
periods, and poor detection accuracy are prevalent
https://www.kaggle.com/eswarchandt/phish
problems.
ing-website-detector. A collection of
Despite these benefits, researchers have noted the
website URLs for 11000+ websites. Each
limits of their studies. Many pointed out that
sample has 30 website parameters and a
ensemble learning techniques have not been applied
class label identifying it as a phishing
and that feature selection and reduction have not
website or not (1 or -1). The overview of this
been performed. A range of strategies has been
dataset is, it has 11054 samples with 32
applied to combat phishing attacks. One paper [7]
features.
used different classifiers, such as naive Bayes and
SVM. Similarly, the authors in [8] utilized random
2. Implemented Algorithm
forest to differentiate phishing attacks from normal
websites. To increase accuracy, this paper utilized the
MinMax normalization feature as a preprocessing
step in each proposed model. Normalization is a
useful strategy for improving the accuracy of
machine learning models, and it is required for some
models to work properly. The MinMax
normalization technique in the suggested model V. Model’s Flowchart
compresses the data to a domain of [0, 1], which
improves the model training input quality (see Phishing is a concern to many individuals. However,
Equations (1) and (2)). existing methods, such as browser security
indicators, cannot detect phishing websites. Due to
X_std = (X − X.min) / (X.max − X.min) …………..(1) the limits of current technology, users must evaluate
whether a URL is phishing or not on their own. As a
X_scalar = X_std × (max − min) + min …………..(2) result, an automated technique for phishing website
identification should be explored for increased cyber
To enhance the model performance and safety. This study shows how an implemented feature
complexities, we used a data normalization strategy, extraction approach and a prediction model based
as shown in Table 2. The algorithm selects on a random forest classifier help increase the
significant aspects from the initial dataset by likelihood that a user will correctly identify a
determining the prediction outcome, which is phishing website.
performed by filtering it through 30 features. The Each of the developed models, as shown in Figure 7,
UCI dataset is split 80/20 into training and testing employs a feature selection technique to increase its
sets, respectively, by using c5-fold cross-validation, accuracy. The data analysis heat map picks those
which presented the best performance in the latest that are most crucial in affecting the forecasted
research. The prediction model is then taught using result by filtering the most interesting features out of
machine learning, which employs various learning the original dataset. As a result, irrelevant features
models. This is particularly useful for making have no effect on the model’s efficiency or
predictions, as utilizing many models ensures that prediction.
the results are not biased toward a single model. To
account for this, we present the results of all the
models combined and totaled to establish their
maximum accuracies. If most of the models indicate
that a domain is phishing, then the model’s
prediction accuracy confirms that the domain is a
phishing attempt.

Classifier Training Testing


Gradient Accuracy: 98.99% Accuracy: 97.78%
Boosting Precision: 98.66% Precision: 97.81%
Recall: 99.12% Recall: 98.21%
F1-measure: 98.99% F1-measure: 98.01%

SVM Accuracy: 98.46% Accuracy: 97.06%


Precision: 98.24% Precision: 97.47%
Recall: 98.70% Recall: 97.24%
F1-measure: 98.47% F1-measure: 97.36%

Random Accuracy: 96.27% Accuracy: 95.52%


Forest Precision: 95.61% Precision: 96.09%
Recall: 96.99% Recall: 95.86%
F1-measure: 96.30% F1-measure: 95.98% Figure 7. Model’s flowchart.

Logistic Accuracy: 92.74% Accuracy: 92.89%


Regression Precision: 91.93% Precision: 94.46%
Recall: 93.72% Recall: 92.70%
F1-measure: 92.81% F1-measure: 93.57%
VI. FUTURE WORKS VII. CONCLUSION
Phishing detection is a critical task in cybersecurity. In this work, we investigated the practicality and the
Machine learning algorithms have been used to efficiency of using machine learning for phishing
detect phishing websites with high accuracy. Future detection. We developed four machine learning
work in this area can focus on the following: models based on artificial neural networks (ANNs),
support vector machines (SVMs), logistic regression,
• Improving the accuracy of phishing detection: gradient boosting and random forest (RF)
Researchers can explore new machine learning techniques. We then selected the most outperforming
algorithms and techniques to improve the model of the fours and compared its performance
accuracy of phishing detection. For example, with other solutions in the literature. The overall
researchers can use deep learning algorithms to results show random forest (RF) model achieved the
detect phishing websites based on their visual highest performance and outperforms other schemes
content. in the literature.

• Detecting zero-day phishing attacks: Zero-day The most important way to protect the user from
phishing attacks are new and unknown attacks phishing attack is the education awareness. Internet
that have not been seen before. Researchers can users must be aware of all security tips which are
develop machine learning algorithms that can given by experts. Every user should also be trained
detect zero-day phishing attacks by analyzing the not to blindly follow the links to websites where they
behavior of users and the network. have to enter their sensitive information. It is
essential to check the URL before entering the
• Detecting phishing attacks on mobile devices: website. In Future System can upgrade to automatic
With the increasing use of mobile devices, Detect the web page and the compatibility of the
phishing attacks on mobile devices are becoming Application with the web browser. Additional work
more common. Researchers can develop machine also can be done by adding some other
learning algorithms that can detect phishing characteristics to distinguishing the fake web pages
attacks on mobile devices by analyzing the user’s from the legitimate web pages. PhishChecker
behavior and the characteristics of the mobile application also can be upgraded into the web phone
device. application in detecting phishing on the mobile
platform.
• Developing real-time phishing detection
systems: Real-time phishing detection systems There are many features that can be improved in the
can detect phishing attacks as they happen, work, for various other issues. The heuristics can be
allowing users to take immediate action to further developed to detect phishing attacks in the
protect themselves. Researchers can develop presence of embedded objects like flash. Identity
machine learning algorithms that can detect extraction is an important operation and it was
phishing attacks in real-time by analyzing improved with the Optical Character Recognition
network traffic and user behavior. (OCR) system to extract the text and images. More
effective inferring rules for identifying a given
suspicious web page, and strategies for discovering
if it is a phishing target, should be designed in order
to further improve the overall performance of this
system.

Moreover, it is an open challenge to develop a robust


malware detection method, retaining accuracy for
future phishing emails. In addition, the dynamic and
static features complement each other, and therefore
both are considered important in achieving high
accuracy.
VIII. REFERENCES of the 2018 Fourth International Conference on
Computing Communication Control and
1. Breiman, L. Random Forests. Mach. Learn. 2001,
Automation (ICCUBEA), Pune, India, 19–18
45, 5–32. [CrossRef]
August 2018; pp. 1–5.
2. Friedman, J.H. The Elements of Statistical Learning:
11. Joshi, A.; Pattanshetti, P.T.R. Phishing Attack
Data Mining, Inference, and Prediction; Springer
Detection Using Feature Selection Techniques;
Open: Berlin/Heidelberg, Germany, 2017.
Social Science Research Network: Rochester, NY,
3. Brownlee, J. Train-Test Split for Evaluating
USA, 2019.
Machine Learning Algorithms. Mach. Learn.
12. Ubing, A.; Kamilia, S.; Abdullah, A.; Zaman, N.;
Mastery 2020, 23. Available online:
Supramaniam, M. Phishing Website Detection: An
https://machinelearningmastery.com/train-test-
Improved Accuracy through Feature Selection and
split-for-evaluating-machine-learning-algorithms/
Ensemble Learning. Int. J. Adv. Comput. Sci. Appl.
(accessed on 25 December 2021).
2019, 10, 252–257. [CrossRef]
4. Jeremybeauchamp English: A Visual Comparison
13. Li, Y.; Yang, Z.; Chen, X.; Yuan, H.; Liu, W. A
between the Complexity of Decision Trees and
Stacking Model Using URL and HTML Features for
Random Forests. 2020. Available online:
Phishing Webpage Detection. Future Gener.
https://commons.wikimedia.org/wiki/File:Decision
Comput. Syst. 2019, 94, 27–39. [CrossRef]
_Tree_vs._Random_Forest.png (accessed on 27
14. 44. Zamir, A.; Khan, H.U.; Iqbal, T.; Yousaf, N.;
December 2021).
Aslam, F.; Anjum, A.; Hamdani, M. Phishing Web
5. Sönmez, Y.; Tuncer, T.; Gökal, H.; Avcı, E.
Site Detection Using Diverse Machine Learning
Phishing Web Sites Features Classification Based
Algorithms. Electron. Libr. 2020, 38, 65–80.
on Extreme Learning Machine. In Proceedings of
[CrossRef]
the 2018 6th International Symposium on Digital
15. Alsariera, Y.A.; Adeyemo, V.E.; Balogun, A.O.;
Forensic and Security (ISDFS), IEEE, Antalya,
Alazzawi, A.K. AI Meta-Learners and Extra-Trees
Turkey, 22–25 March 2018; pp. 1–5.
Algorithm for the Detection of Phishing Websites.
6. Cristianini, N.; Shawe-Taylor, J. An Introduction to
IEEE Access 2020, 8, 142532–142542. [CrossRef]
Support Vector Machines and Other Kernel-Based
16. Ali, W.; Malebary, S. Particle Swarm Optimization-
Learning Methods; Cambridge University Press:
Based Feature Weighting for Improving Intelligent
Cambridge, UK, 2000.
Phishing Website Detection. IEEE Access 2020, 8,
7. James, J.; Sandhya, L.; Thomas, C. Detection of
116766–116780. [CrossRef]
Phishing URLs Using Machine Learning
17. Adebowale, M.A.; Lwin, K.T.; Sanchez, E.;
Techniques. In Proceedings of the 2013
Hossain, M.A. Intelligent Web-Phishing Detection
International Conference on Control
and Protection Scheme Using Integrated Features of
Communication and Computing (ICCC),
Images, Frames and Text—ScienceDirect. Expert
Thiruvananthapuram, India, 13–15 December 2013;
Syst. Appl. 2019, 115, 300–313. Available online:
Available online:
https://www.sciencedirect.com/science/article/abs/p
https://ieeexplore.ieee.org/abstract/document/67316
ii/S0957417418304925 (accessed on 26 September
69 (accessed on 26 September 2021).
2021). [CrossRef]
8. Liew, S.W.; Sani NF, M.; Abdullah, M.T.; Yaakob,
18. El Aassal, A.; Baki, S.; Das, A.; Verma, R.M. An In-
R.; Sharum, M.Y. An Effective Security Alert
Depth Benchmarking and Evaluation of Phishing
Mechanism for Real-Time Phishing Tweet
Detection Research for Security Needs. IEEE
Detection on Twitter—ScienceDirect. Comput.
Access 2020, 8, 22170–22192. Available online:
Secur. 2019, 83, 201–207. Available online:
https://ieeexplore.ieee.org/abstract/document/89705
https://www.sciencedirect.com/science/article/pii/S
64 (accessed on 27 September 2021). [CrossRef]
0167404818309040 (accessed on 26 September
19. Subasi, A.; Kremic, E. Comparison of Adaboost
2021). [CrossRef]
with MultiBoosting for Phishing Website
9. Hutchinson, S.; Zhang, Z.; Liu, Q. Detecting
Detection—ScienceDirect. Procedia Comput. Sci.
Phishing Websites with Random Forest. In
2020, 168, 272–278. Available online:
Proceedings of the Machine Learning and Intelligent
https://www.sciencedirect.com/science/article/pii/S
Communications, Hangzhou, China, 6–8 July 2018;
1877050920303902 (accessed on 27 September
Meng, L., Zhang, Y., Eds.; Springer International
2021). [CrossRef]
Publishing: Cham, Switzerland, 2018; pp. 470–479.
20. Mao, J.; Bian, J.; Tian, W.; Zhu, S.; Wei, T.; Li, A.;
10. 40. Patil, V.; Thakkar, P.; Shah, C.; Bhat, T.; Godse,
Liang, Z. Phishing Page Detection via Learning
S.P. Detection and Prevention of Phishing Websites
Classifiers from Page Layout Feature. EURASIP J.
Using Machine Learning Approach. In Proceedings
Wirel. Commun. Netw. 2019, 2019, 43. Available
online: https://jwcn- Selection Framework for Machine Learning-Based
eurasipjournals.springeropen.com/articles/10.1186/ Phishing Detection System—ScienceDirect. Inf.
s13638-019-1361-0 (accessed on 27 September Sci. 2019, 484, 153–166. Available online:
2021). [CrossRef] https://www.sciencedirect.com/science/article/abs/p
21. A Novel Machine Learning Approach to Detect ii/S0020025519300763 (accessed on 27 September
Phishing Websites. Available online: 2021). [CrossRef]
https://ieeexplore.ieee.org/abstract/document/84740 30. Pandey, A.; Gill, N.; Sai Prasad Nadendla, K.;
40/ (accessed on 27 September 2021). Thaseen, I.S. Identification of Phishing Attack in
22. Chen, Y.H.; Chen, J.L. AI@ntiPhish—Machine Websites Using Random ForestSVM Hybrid Model.
Learning Mechanisms for Cyber-Phishing Attack. In Proceedings of the Intelligent Systems Design
IEICE Trans. Inf. Syst. 2019, 102, 878–887. and Applications: 18th International Conference on
Available online: Intelligent Systems Design and Applications (ISDA
https://www.jstage.jst.go.jp/article/transinf/E102.D/ 2018), Vellore, India, 6–8 December 2018; Springer
5/E102.D_2018NTI0001/_article/-char/ja/ International Publishing: Midtown Manhattan, NY,
(accessed on 27 September 2021). [CrossRef] USA, 2020. Available online:
23. Abdelhamid, N.; Thabtah, F.; Abdel-Jaber, H. https://link.springer.com/chapter/0.1007/978-3-
Phishing Detection: A Recent Intelligent Machine 030-16660-1_12 (accessed on 27 September 2021).
Learning Comparison Based on Models Content and 31. Ali, W.; Ahmed, A.A. Hybrid Intelligent Phishing
Features. In Proceedings of the 2017 IEEE Website Prediction Using Deep Neural Networks
International Conference on Intelligence and with Genetic Algorithm-Based Feature Selection
Security Informatics, Beijing, China, 22–24 July and Weighting. IET Inf. Secur. 2019, 13, 659–669.
2017; Available online: [CrossRef]
https://ieeexplore.ieee.org/abstract/document/80048 32. Aljofey, A.; Jiang, Q.; Qu, Q.; Huang, M.; Niyigena,
77 (accessed on 27 September 2021). J.P. An Effective Phishing Detection Model Based
24. Jain, A.K.; Gupta, B.B. Towards Detection of on Character Level Convolutional Neural Network
Phishing Websites on Client-Side Using Machine from URL. Electronics 2020, 9, 1514. Available
Learning Based Approach. Telecommun. Syst. online: https://www.mdpi.com/2079-9292/9/9/1514
2018, 68, 687–700. Available online: (accessed on 27 September 2021). [CrossRef]
https://link.springer.com/article/10.1007/s11235- 33. Shie, E.W.S. Critical Analysis of Current Research
017-0414-0 (accessed on 27 September 2021). Aimed at Improving Detection of Phishing
[CrossRef] 78Attacks. Sel. Comput. Res. Pap. 2020, 45, 45–53.
25. Lakshmi, L.; Reddy, M.P.; Santhaiah, C.; Reddy, 34. Maurya, S.; Jain, A. Deep Learning to Combat
U.J. Smart Phishing Detection in Web Pages Using Phishing. J. Stat. Manag. Syst. 2020, 23, 945–957.
Supervised Deep Learning Classification and [CrossRef]
Optimization Technique ADAM. Wirel. Pers. 35. Mao, J.; Bian, J.; Tian, W.; Zhu, S.; Wei, T.; Li, A.;
Commun. 2021, 118, 3549–3564. [CrossRef] Liang, Z. Detecting Phishing Websites via
26. Sahingoz, O.K.; Buber, E.; Demir, O.; Diri, B. Aggregation Analysis of Page Layouts—
Machine Learning Based Phishing Detection from ScienceDirect. Procedia Comput. 2018, 129, 224–
URLs—ScienceDirect. Expert Syst. Appl. 2019, 230. Available online:
117, 345–357. Available online: https://www.sciencedirect.com/science/article/pii/S
https://www.sciencedirect.com/science/article/abs/p 187705091830276X (accessed on 27 September
ii/S0957417418306067 (accessed on 27 September 2021). [CrossRef]
2021). [CrossRef] 36. Yang, L.; Zhang, J.; Wang, X.; Li, Z.; Li, Z.; He, Y.
27. Jagadeesan, S. URL Phishing Analysis Using An Improved ELM-Based and Data Preprocessing
Random Forest. Int. J. Pure Appl. Math. 2018, 118, Integrated Approach for Phishing Detection
4159–4163. Considering Comprehensive Features—
28. Niranjan, A.; Haripriya, D.K.; Pooja, R.; Sarah, S.; ScienceDirect. Expert Syst. Appl. 2021, 165,
Deepa Shenoy, P.; Venugopal, K.R. EKRV: 113863. Available online:
Ensemble of KNN and Random Committee Using https://www.sciencedirect.com/science/article/abs/p
Voting for Efficient Classification of Phishing; ii/S0957417420306734 (accessed on 27 September
Springer: Singapore, 2019; Available online: 2021). [CrossRef]
https://link.springer.com/chapter/10.1007/978-981-
13-1708-8_37 (accessed on 27 September 2021).
29. Chiew, K.L.; Tan, C.L.; Wong, K.; Yong, K.S.;
Tiong, W.K. A New Hybrid Ensemble Feature

You might also like