Phishing
Phishing
http://dx.doi.org/10.18576/amis/180624
Abstract: Phishing websites are a significant threat, constantly evolving to deceive users into revealing sensitive information. While
current anti-phishing systems rely on URLs, website content, and third-party data, they often struggle to keep pace with these dynamic
scams. This study addresses these challenges by introducing a novel approach that analyzes the effectiveness of URL-based features,
JavaScript characteristics, and anomaly-based indicators in detecting malicious web links. To overcome the issues of data imbalance
and feature selection, our approach incorporates SMOTE oversampling and a Decision Tree-Recursive Feature Elimination cross-
validation (DT-RFECV) wrapper method. The selected features are then used to train an ensemble stacking model that combines
Decision Trees, Random Forests, and Bagging. The framework was rigorously evaluated on two benchmarking datasets and achieved
impressive accuracy rates of 97.7% on Dataset-1 and 97.5% on Dataset-2 using ten features, underscoring the effectiveness of our
approach. Our proposed framework significantly contributes to the internet community’s defense against phishing scams with its unique
features, ensemble model construction, and promising results.
Keywords: URLs, DT-RFECV, Machine learning, ensemble stacking model, phishing scam, financial inclusion
categories: whitelist/blacklist, deep learning, machine models. While comparing the results in terms of accuracy,
learning, and heuristics [1,2,3,4,5]. the XGBoost and Random Forest (RF) models’ results are
Although blacklisting and whitelisting are the most better than those of others. The challenges in developing
widely used anti-phishing techniques, their ability to Ml models are sufficient samples against the considered
withstand zero-day attacks is uncertain because they rely output class labels and suitable feature selection
on a centralized database to verify the legitimacy of the methodology. The presented work in this paper
website [6,7,8,9]. Heuristic-based anti-phishing solutions demonstrates the ML model’s efficiency for phishing
rely on a third party to assess the website’s validity. website detection using suitable data sampling techniques
Although the web page’s content, page ranking, and other and efficient feature selection methods. To reduce
aspects are included in the heuristic-based approach, the response time, the language-independent phishing
reliability of the data, which is taken from a third party, is webpage detection mechanism operates without relying
controversial [10,11,12]. To combat the new phishing and on external data. The presented novel approach utilizes
to alleviate technical hindrances, machine learning-based features from multiple sources, including URL, address
anti-phishing solutions have been presented in various domain, JavaScript, URL file, and directory attributes. To
venues [13,14,15]. The main highlight of using AI optimize efficiency, a minimal set of features from diverse
techniques for the candidate task is to learn the hidden categories is used to train various machine learning
pattern to detect unseen fake information on the phishing models. Feature significance was determined using
web link. Over the past few years, ransomware has DT-RFECV (Decision-Tree-Recursive Feature
become the most prevalent type of cybercrime, with Elimination).
phishing being the most widely employed distribution Decision-Tree-Recursive Feature Elimination offers a
method. Even while an ML-based anti-phishing solution valuable tool for feature selection, providing benefits such
might lessen the impact of a zero-day attack, it requires as feature ranking, improved model performance,
well-designed features from both legitimate and computational efficiency, versatility, and enhanced
malicious URLs that are updated. An XGB-based interpretability. The selected features were then used to
anti-phishing solution has been built upon URL character develop an ensemble stacking model. The key
order, hyperlink-specific and TF-IDF plaintext, and noisy contributions of this research are summarized below:
character features of HTML. As we move forward and
look at innovative rule-based techniques has been –Predominant feature sets are computed using
presented by [16,17,18] for detecting phishing scams in DT-RFECV.
online banking. The candidate SVM-based phish-detector –The significance of these derived features is analyzed
has been upon different features to detect the fake through various ML models.
information. However, the prospective phish detector –An ensemble stacking model is designed to mitigate
independently determines its capabilities from other cyber-threats posed by phishing scams.
sources such as search engines, network browser –The model’s resilience is evaluated using two datasets.
histories, and blacklists. Additionally, the features are –Results are compared to a recently released phishing
language-dependent because they were taken from the detection framework to assess its competence.
webpage’s content. Effective features maximize the
detection rate of phishing crimes. Filter-based feature The rest of the manuscript is organized in the
selection [19,20,21] and demonstrate encouraging following way. Section 2 examines earlier research and
outcomes [22,23,24]. methods for identifying phishingscams. Section 3 depicts
Filter-based metrics [21,25] used statistical tools the information about the dataset used and features
requiring less computing power. However, there is some selection, Section 4 elaborates on the methodology
uncertainty over its ability to forecast suitable features applied in this research. The results are reported in
dynamically. In contrast to the filter tool, wrapped-based Section 5, and the study’s conclusions are presented in
methods take advantage of the machine learning model’s Section 6.
capacity to identify compelling features. Attempts to
extract highly influential features [21,26,27,28] introduce
a novel feature selection method based on the wrapper- 2 Related Works
method and yield superior results. When the number of
features is huge, it takes longer to define them, but in the This section analyses the various aspects of the
end, it improves the classifier’s performance. Phishing ML-basedphishing detection frameworks demonstrated in
scam detection presented [29,30,31] used diverse multiple venues—an ML-based anti-phishing solution
categories, URLs, domains, HTML and JavaScript, and [32,33,34] to mitigate phishing scams. Experimentation
abnormal features. Sixteen machine-learning models were was conducted using collected samples from various
trained using two datasets. One dataset had balanced sources. Nineteen significant features were selected using
classes (equal numbers of benign and malignant samples), Pearson correlation analysis. Various ML models were
while the other was imbalanced. Top ten significant trained on features extracted from URLs, login forms,
features are extracted to train various machine learning hyperlinks, CSS, and web identity. The detailed results
c 2024 NSP
Natural Sciences Publishing Cor.
Appl. Math. Inf. Sci. 18, No. 6, 1481-1493 (2024) / www.naturalspublishing.com/Journals.asp 1483
demonstrate the effectiveness of different feature types algorithms, Learning Without Forgetting (LWF), and
for the problem. Response time is low since the model is Elastic Weight Consolidation (EWC). These CL
entirely built on client-side features. Similarly, algorithms enable the VNN to acquire new information
Al-Shanableh et al. [35] and Jain & Gupta [36] utilized while preserving previously learned knowledge.
client-side features, mainly hyperlink information. ML To enhance the robustness of phishing website
models were constructed based on twelve features detection, the authors in [49] considered a dataset
extracted from hyperlinks. Consequently, a positive comprising 112 attributes. The study explores various
aspect of the presented solution is its applicability to scenarios to assess the resilience of the phishing detection
websites in any human language. Conversely, its framework. Data imbalance was addressed using
effectiveness is contingent on the website being designed SMOTEENN, and 13 constant features were identified
using HTML. To strengthen phishing scam detection, a and removed to improve response time. Subsequently,
two-phase enabled framework was built based on URL Principal Component Analysis (PCA) and Linear
and source code features [37,38,39]. In the first phase, Discriminant Analysis (LDA) were employed to reduce
similarity-based attributes are used to generate a feature dimensionality. Based on the results, ML-based
fingerprint, which is then compared to stored fingerprints phishing detection models were unaffected by PCA and
to identify potential malicious websites. In the second LDA, butremoving constant features significantly
phase, approximately 21 features extracted from URLs improved detection accuracy. Similarly, authors in [50]
and source code are used to train an ensemble model also employed PCA for dimensionality reduction on a
employing Random Forest, XGBoost, and Extra Trees balanced dataset. SVM and DNN models were trained
classifiers. Since no webpage exists independently on the and evaluated. It would be beneficial to include details
internet, every webpage is connected to various resources, about the PCA parameters, such as the number of
such as forwarding pages. In most cases, phishing attacks principal components retained, and consider exploring
may neglect to conceal this information. additional ML models for a more comprehensive
The frameworks presented in [40] use heterogeneous comparison. To address the technical challenges of small
information networks (HIN) to understand the semantic datasets, a large dataset containing more samples for both
and syntactic relationship among the various objects that phishing and legitimate categories should be presented
constitute the web page and compute the Phish score for [51]. An Optimal Feature Vectorization Algorithm
the nodes and nodes attributes [39] are used to train ML (OFVA) was introduced to extract 41 features, including
models. A deep learning-based phishing detection 10 novel ones, effectively detecting phishing scams.
framework was presented by [41] based on the merits of Content-related features were excluded to reduce
character level and word level embedding for the input response time. Authors in [52] developed an SVM-based
URL information. The study did not focus on data phishing detection model utilizing URL-based features. A
imbalance, which might lead to overfitting issues. chi-square metric was employed to select nine significant
Recent research indicates that employing optimization features from an initial set of sixteen—the SVM model
techniques for hyperparameter tuning [42,43,44,45] and with a polynomial kernel function performed better than
feature selection (Ramaiah et al., 2024) significantly its radial basis function counterpart.
enhances the performance of machine learning models. The literature offers numerous robust solutions for
Similarly, in the context of phishing detection, authors in detecting phishing websites using ML models (Table 1),
[46] utilize Genetic Algorithms (GA) to optimize but there’s a need to cultivate the comprehensive nature of
hyperparameters for various machine learning models. such models. This involves enabling ML models to
The study employs three datasets to demonstrate its understand phishing websites and develop resilience
robustness against evolving phishing attacks. Although against emerging threats deeply. One of the challenges
the inclusion of GA improves results, the iterative nature associated with the candidate problem is insufficient
of the process increases computational time. To infer labeled samples. Very few publicly available datasets [50]
deeper insights into the syntactic and semantic maintain equal samples for benign and malign labels. In
information in the text extracted from source code, the the cited literature, the work presented in [40,48], and
authors in [47] employ various word embedding [49] used datasets where the number of benign samples is
algorithms to enhance detection accuracy. The resulting higher than the number of malign samples. Conversely,
word embeddings are used to generate feature vectors, Ejaz et al. [53], Bahaghigha et al. [49], Tamal et al. [51],
subsequently engaged to design ensemble and multimodal and Shombot et al. [52] utilize datasets with more malign
phishing detectors. The phishing framework presented in samples than benign samples.
[48] proposes eight features extracted from the URL, one They were training a machine learning model to
from the plaintext character level and six from hyperlinks. detect phishing websites, andhaving an imbalanced
An additional seven features from the literature were dataset with significantly more benign than malicious
incorporated, resulting in 15 features used by the authors samples can pose challenges. The model might become
to compile the customized dataset. To mitigate overly focused on recognizing benign patterns, leading to
performance degradation over time, a vanilla neural false negatives where phishing websites are incorrectly
network (VNN) model trained using continual learning classified as safe. With fewer malicious samples, the
c 2024 NSP
Natural Sciences Publishing Cor.
1484 M. Ramaiah et al.: Enhanced Phishing Detection: An Ensemble Stacking Model...
c 2024 NSP
Natural Sciences Publishing Cor.
Appl. Math. Inf. Sci. 18, No. 6, 1481-1493 (2024) / www.naturalspublishing.com/Journals.asp 1485
3.2 Pre-processing
The dataset DS-1 has an equal number of phishing and
legitimate samples. It has been ensured that the dataset is
devoid of null values. This indicates that each instance’s
features have valid values and no missing data. The
absence of null values enables continuous analysis and
modeling without the need for imputation or managing
Fig. 1: The architecture of the Proposed Anti-phishing missing values. Each instance in the dataset is unique,
Model. indicating no duplicate records are present. The
uniqueness of tuples ensures that each instance
contributes independently to the machine learning
process, preventing any duplication bias that could distort
a compilation of characteristics extracted from both the results. A box plot analysis was conducted on the
phishing and legitimate websites. 48 features and 10,000 dataset to resolve this issue. The box plot analysis
samples are available on both phishing and legitimate revealed that no outliers were identified in the dataset.
labels. The graphical representation is displayed in Figure This suggests that the data points don’t contain extreme
2. Four categories of features constitute the dataset, values that would skew the analysis or affect the model’s
offering better insight into web pages: sixteen functionality, and instead fall within a reasonable range.
address-based features, four domain-based features, In dataset 2, all instances with null values have been
twenty-one Abnormal-based features, and six eliminated. This procedure verifies that the remaining
JavaScript-based features. data is comprehensive and contains no missing valuesor
redundant recordings. The dataset DS-2 had an
imbalanced distribution of legitimate and phishing
instances, with 58,000 legitimate instances and 30,647
phishing instances, respectively. Synthetic Minority
Over-Sampling Technique (SMOTE) analysis addressed
this class imbalance and provideda more balanced
dataset. To match the number of phishing cases in the
majority class, SMOTE creates fictional instances of the
minority class, which, in this case, are valid instances.
Consequently, the dataset was rebalanced to contain
58,000 instances of phishing and legitimate connections.
The sample statistics before and after applying the
SMOTE can be found in Figures 3(a) and 3(b). This phase
ensures the quality and integrity of the dataset prior to
analysis.
c 2024 NSP
Natural Sciences Publishing Cor.
1486 M. Ramaiah et al.: Enhanced Phishing Detection: An Ensemble Stacking Model...
c 2024 NSP
Natural Sciences Publishing Cor.
Appl. Math. Inf. Sci. 18, No. 6, 1481-1493 (2024) / www.naturalspublishing.com/Journals.asp 1487
c 2024 NSP
Natural Sciences Publishing Cor.
1488 M. Ramaiah et al.: Enhanced Phishing Detection: An Ensemble Stacking Model...
Table 4: Significant feature through DT RFECV from Table 6: Tested Results with the cutting-edge method’s
Dataset-1 (DS-1) results (DS-1)
c 2024 NSP
Natural Sciences Publishing Cor.
Appl. Math. Inf. Sci. 18, No. 6, 1481-1493 (2024) / www.naturalspublishing.com/Journals.asp 1489
Methods A P R F1 S
DT 0.96 0.96 0.960 0.96
RF 0.975 0.969 0.981 0.975
LR 0.910 0.922 0.894 0.908
GRB 0.954 0.942 0.966 0.954 Fig. 6: Comparison of AUC values of various Machine
ADB 0.935 0.925 0.945 0.935 learning models.
SVM 0.762 0.733 0.816 0.772
KNN 0.893 0.875 0.914 0.894
GNB 0.829 0.927 0.710 0.804
BC 0.971 0.971 0.971 0.971 Table 9: Comparative results with the cutting-edge
P-Stacking 0.975 0.971 0.978 0.975 methods (DS-2)
Methods A P R
[19]GA-ADB 0.940 0.909 0.920
triggered by the presence of an external link or an empty [19]GA-RF 0.964 0.946 0.950
link, the work that is being presented has features such as [19]GA-XGB 0.973 0.962 0.961
[19]GA-BC 0.969 0.953 0.959
PctExtHyperlinks, PctExtNullSelfRedirectHyperlinksRT,
[23]LR 0.953 0.941 0.968
and PctNullSelfRedirectHyperlinks that strengthen its
[23]NB 0.930 0.909 0.961
defences against malicious resources and the effects of [23]XGB 0.992 0.991 0.994
self-redirection. The FrequentDomainNameMismatch P-Stacking 0.975 0.971 0.978
vulnerability could lead to detect man-in-the-middle
attack by hackers.
In Experiment 2, the Dataset-2 has 111 features. The
results obtained through various conventional ML models
along with the ensemble stacking model using the
significant feature through DT-RFECV are furnished in results obtained in [49] were then compared with the
Table 7. Features derived through the DT-RFECV are proposed ensemble stacking model. Compared to
provided in Table 7 and the corresponding feature names XGBoost, the performance of the proposed method is
are furnished in Table 3. Table 8 displays the results better interms of number of features and method used.
obtained through the presented ensemble stacking model The graphical representation of the same is shown in
along with the other baseline models. While observing the Figure 7.
results presented in Table 8, the presented model showing
better performance than the baseline models. The ROC In terms of security analysis, the primary component
curves obtained by the various ML models can be found that has the biggest impact on the outcome is the features
in Figure 6. that DT RFECV listed in Table 7. Few of the semantic 10
To portray the superiority of the presented work, the features from DS-2 relate to URLs, URL directories, and
experimented results are compared with the results in [46, domain-based networks. The highly significant features
49] and furnished in Table 9. For the candidate dataset, from six other categories time domain activation (OF5),
the frameworks in [46] apply optimization techniques to asn ip (OF4), time response (OF2), ttl hostname (OF10),
derive the hyper-parameters of various machine learning time domain expiration (OF6), and
models. Notably, the anti-phishing frameworks in [46] qty nameservers(OF6) become the cause of the
utilizes all 111 features, whereas the presented solution guaranteed accuracy in phishing scam detection. The
achieves better results using only ten features. The proposed fine-tuned stacking ensemble method achieves
phishing scam detection method proposed in [49] superior performance on Dataset 1 compared to recent
employs SMOTEENN to address the class imbalance works [46,49]. This was evident in both accuracy and
issue. To reduce dimensionality, statistical methods were recall metrics. Additionally, for Dataset 2, our method
used to eliminate 13 constant features and used PCA. The surpassed the solution presented in [46,49].
c 2024 NSP
Natural Sciences Publishing Cor.
1490 M. Ramaiah et al.: Enhanced Phishing Detection: An Ensemble Stacking Model...
Funding
c 2024 NSP
Natural Sciences Publishing Cor.
Appl. Math. Inf. Sci. 18, No. 6, 1481-1493 (2024) / www.naturalspublishing.com/Journals.asp 1491
[9] S. Purkait, Examining the effectiveness of phishing filters neural network architecture. Transactions on Emerging
against DNS based phishing attacks. Information & Telecommunications Technologies,32, e4221 (2021).
Computer Security,23, 333-346 (2015). [22] A.S. Al-Adwan, M. Alsoud, N. Li, T.E. Majali, J.
[10] R. Rao, S.T. Ali, Phishshield: a desktop application to detect Smedley, A. Habibi, Unlocking future learning: Exploring
phishing webpages through heuristic approach. Procedia higher education students’ intention to adopt meta-education.
Computer Science,54, 147-156 (2015). Heliyon,10, e29544 (2024).
[11] Q.Y. Shambour, M.M. Abualhaj, A. Abu-Shareha, A.H. [23] M. Alsharaiah, M. Abualhaj, L. Baniata, A. Al-saaidah,
Hussein, Q.M. Kharma, Mitigating Healthcare Information Q. Kharma, M. Al-Zyoud, An innovative network intrusion
Overload: a Trust-aware Multi-Criteria Collaborative detection system (NIDS): Hierarchical deep learning model
Filtering Model. Journal of Applied Data Sciences,5, based on Unsw-Nb15 dataset. International Journal of Data
1134-1146 (2024). and Network Science,8, 709-722 (2024).
[12] R. Al Khouri, M. Al Fauri, The Impact of Working Capital [24] R. Mangayarkarasi, C. Vanmathi, V. Ravi, A robust malware
Management on the Profitability of Jordanian Companies traffic classifier to combat security breaches in industry 4.0
Listed on the Amman Stock Exchange. Al-Balqa Journal for applications. Concurrency and Computation: Practice and
Research and Studies,26, 77-97 (2023). Experience, e7772 (2023).
[13] D.A. Al-Husban, S.I.S. Al-Hawary, I.R. AlTaweel, [25] A.A. Mohammad, I.A. Khanfar, B. Al-Oraini, A.
N.A. Al-Husban, M.F. Almaaitah, F.M. Aldaihani, D.I. Vasudevan, I.M. Suleiman, Z. Fei, Predictive analytics
Mohammad, The impact of intellectual capital on competitive on artificial intelligence in supply chain optimization. Data
capabilities: evidence from firms listed in ASE. In The effect and Metadata,3, 395-395 (2024).
of information technology on business and marketing [26] S. Abusaleh, M. Arabasy, M. Abukeshek, T. Qarem, Impacts
intelligence systems (pp. 1707-1723). Cham: Springer of E-learning on the Efficiency of Interior Design Education
International Publishing (2023). (A comparative study about the efficiency of interior
[14] M.I. Alkhawaldeh, F.M. Aldaihani, B.A. Al-Zyoud, S.I.S. design education before and during the novel Coronavirus
Al-Hawary, N.A. Shamaileh, A.A. Mohammad, O.A. Al- (COVID-19) pandemic). Al-Balqa Journal for Research and
Adamat, Impact of internal marketing practices on intention Studies,27, 47-63 (2024).
[27] H. Hmoud, A.S. Al-Adwan, O. Horani, H. Yaseen, J. Al
to stay in commercial banks in Jordan. In The effect
Zoubi, Factors influencing business intelligence adoption by
of information technology on business and marketing
higher education institutions. Journal of Open Innovation:
intelligence systems (pp. 2231-2247). Cham: Springer
Technology, Market, and Complexity,9, 100111 (2023).
International Publishing (2023).
[28] A.A. Mohammad, F.L. Aityassine, Z.N. Al-fugaha, M.
[15] R. Yang, K. Zheng, B. Wu, C. Wu, X. Wang, Phishing
Alshurideh, N.S. Alajarmeh, A.A. Al-Momani, A.M. Al-
website detection based on deep convolutional neural
Adamat, The Impact of Influencer Marketing on Brand
network and random forest ensemble learning. Sensors,21,
Perception: A Study of Jordanian Customers Influenced on
8281 (2021).
Social Media Platforms. In Business Analytical Capabilities
[16] M.S. Alshura, S.S. Tayeh, Y.S. Melhem, F.N. Al-Shaikh, and Artificial Intelligence-Enabled Analytics: Applications
H.M. Almomani, F.L. Aityassine, A.A. Mohammad, and Challenges in the Digital Era (pp. 363-376). Cham:
Authentic leadership and its impact on sustainable Springer Nature Switzerland (2024).
performance: the mediating role of knowledge ability [29] A.A. Mohammad, M.Y. Barghouth, N.A. Al-Husban, F.M.
in Jordan customs department. In The effect of information Aldaihani, D.A. Al-Husban, A.A. Lemoun, S.I.S. Al-
technology on business and marketing intelligence systems Hawary, Does Social Media Marketing Affect Marketing
(pp. 1437-1454). Cham: Springer International Publishing Performance. In Emerging Trends and Innovation in Business
(2023). and Finance (pp. 21-34). Singapore: Springer Nature
[17] A.A. Mohammad, I.A. Khanfar, B. Al Oraini, A. Vasudevan, Singapore (2023).
I.M. Suleiman, M. Ala’a, User acceptance of health [30] M.M. Abualhaj, Q.Y. Shambour, A. Alsaaidah, A.
information technologies (HIT): an application of the theory Abu-Shareha, S. Al-Khatib, M.O. Hiari, Enhancing
of planned behavior. Data and Metadata,3, 394-394 (2024). Spam Detection Using Hybrid of Harris Hawks and
[18] M. Moghimi, A.Y. Varjani, New rule-based phishing Firefly Optimization Algorithms. Journal of Applied Data
detection method. Expert systems with applications,53, 231- Sciences,5, 901-911 (2024).
242 (2016). [31] R. Ghoneim, M. Arabasy, The Role of Artworks of
[19] N. Al-shanableh, M. Alzyoud, R.Y. Al-husban, N.M. Architectural Design in Emphasizing the Arab Identity. Al-
Alshanableh, A. Al-Oun, M.S. Al-Batah, S. Alzboon, Balqa Journal for Research and Studies,27, 1-14 (2024).
Advanced Ensemble Machine Learning Techniques for [32] A.A. Mohammad, M.M. Al-Qasem, S.M. Khodeer, F.M.
Optimizing Diabetes Mellitus Prognostication: A Detailed Aldaihani, A.F. Alserhan, A.A. Haija, S.I.S. Al-Hawary,
Examination of Hospital Data. Data and Metadata,3, 363- Effect of Green Branding on Customers Green Consciousness
363 (2024). Toward Green Technology. In Emerging Trends and
[20] N. Al-shanableh, M.S. Alzyoud, E. Nashnush, Enhancing Innovation in Business and Finance (pp. 35-48). Singapore:
Email Spam Detection Through Ensemble Machine Springer Nature Singapore (2023).
Learning: A Comprehensive Evaluation Of Model Integration [33] M. Odeh, S.S. Badrakhan, N. Flayyih, M.O. Sabri, Z.
And Performance. Communications of the IIMA,22, 2 (2024). Abdijabar, H. Alsabatin, S. Hammad, Quantifying the Impact
[21] M. Ramaiah, V. Chandrasekaran, V. Ravi, N. Kumar, of the COVID-19 Pandemic on Quality Assurance Practice.
An intrusion detection system using optimized deep Appl. Math.,18, 989-996 (2024).
c 2024 NSP
Natural Sciences Publishing Cor.
1492 M. Ramaiah et al.: Enhanced Phishing Detection: An Ensemble Stacking Model...
[34] A.K. Jain, B.B. Gupta, Towards detection of phishing [48] A. Aljofey, Q. Jiang, A. Rasool, H. Chen, W. Liu, Q. Qu, Y.
websites on client-side using machine learning based Wang, An effective detection approach for phishing websites
approach. Telecommunication Systems,68, 687-700 (2018). using URL and HTML features. Scientific Reports,12, 8842
[35] n. Al-Shanableh, M. Al-Zyoud, R.Y. Al-Husban, N. Al- (2022).
Shdayfat, J.F. Alkhawaldeh, N.S. Alajarmeh, S.I.S. Al- [49] M. Bahaghighat, M. Ghasemi, F. Ozen, A high-
Hawary, Data Mining to Reveal Factors Associated with accuracy phishing website detection method based on
Quality of life among Jordanian Women with Breast Cancer. machine learning. Journal of Information Security and
Appl. Math.,18, 403-408 (2024). Applications,77, 103553 (2023).
[36] A.K. Jain, B.B. Gupta, A machine learning based approach [50] K. Elumalai, D. Bose, Advancement of Phishing Attack
for phishing detection using hyperlinks information. Journal Detection Using Machine Learning. Journal of Electrical
of Ambient Intelligence and Humanized Computing,10, 2015- Systems,20, 1208-1213 (2024).
2028 (2019). [51] M.A. Tamal, M.K. Islam, T. Bhuiyan, A. Sattar, N.
[37] L. Mobaideen, A. Adaileh, The Impact Of Organizational Prince, Unveiling suspicious phishing attacks: enhancing
Culture On Improving Institutional Performance In Aqaba detection with an optimal feature vectorization algorithm
Special Economic Zone Authority In Jordan. Al-Balqa and supervised machine learning. Frontiers in Computer
Journal for Research and Studies,27, 1-21 (2024). Science,6, 1428013 (2024).
[38] A.S. Al-Adwan, M.M. Al-Debei, The determinants of [52] E.S. Shombot, G. Dusserre, R. Bestak, N.B. Ahmed,
Gen Z’s metaverse adoption decisions in higher education: An application for predicting phishing attacks: A case of
integrating UTAUT2 with personal innovativeness in IT. implementing a support vector machine learning model.
Education and Information Technologies,2S, 7413-7445 Cyber Security and Applications,2, 100036 (2024).
(2024). [53] A. Ejaz, A.N. Mian, S. Manzoor, Life-long phishing attack
[39] R.S. Rao, A.R. Pais, Two level filtering mechanism to detection using continual learning. Scientific Reports,13,
detect phishing sites using lightweight visual similarity 11488 (2023).
[54] A. Almomani, M. Alauthman, M.T. Shatnawi, M.
approach. Journal of Ambient Intelligence and Humanized
Alweshah, A. Alrosan, W. Alomoush, B.B. Gupta, Phishing
Computing,11, 3853-3872 (2020).
website detection with semantic features based on machine
[40] B. Guo, Y. Zhang, C. Xu, F. Shi, Y. Li, M. Zhang,
learning classifiers: a comparative study. International
HinPhish: An effective phishing detection approach based on
Journal on Semantic Web and Information Systems,18, 1-24
heterogeneous information networks. Applied Sciences,11,
(2022).
9733 (2021).
[41] F. Zheng, Q. Yan, C.M. Victor, F. Leung, Y.U. Richard,
Z. Ming, HDP-CNN: High way deep pyramid convolution
neural network combining word-level and character-level M. Ramaiah received her
representations for phishing website detection. Computers & Ph.D. Degree in Information
Security,114, 102584 (2022). Technology and Engineering
[42] N. Al-shanableh, S. Anagreh, A.A. Haija, M. Alzyoud, from Vellore Institute
M. Azzam, H.M. Maabreh, S.I.S. Al-Hawary, The Adoption of Technology, M.E.
of RegTech in Enhancing Tax Compliance: Evidence in Computer Science
from Telecommunication Companies in Jordan. In Business from Anna University. She is
Analytical Capabilities and Artificial Intelligence-enabled working as a Professor in the
Analytics: Applications and Challenges in the Digital Era School of Computer Science
(pp. 181-195). Cham: Springer Nature Switzerland (2024).
Engineering and Information
[43] F.Y. Al-Kasassbeh, S.M. Awaisheh, M.A. Odeibat, S.M. Systems at VIT University, Vellore, India. She has
Awaesheh, L. Al-Khalaileh, M. Al-Braizat, Digital Human
attended many national and international conferences and
Rights in Jordanian Legislation and International Agreement.
published articles in reputed journals. Her research
International Journal of Cyber Criminology,18, 37-57
(2024).
interest includes cyber-security, Blockchain, Image
[44] N. Al-Dabbas, The Scope and Procedures of the Expert
Processing, Machine Learning, and Artificial Intelligence.
Recusal in the Arbitration Case: A Fundamental Analytical Her Orcid ID is: https://orcid.org/0000-0003-3088-6001.
Study in Accordance with Jordanian Law. Al-Balqa Journal
for Research and Studies,27, 291-306 (2024). V. Chandrasekaran
[45] A.M. Vincent, P. Jidesh, An improved hyperparameter is a Senior Professor
optimization framework for AutoML systems using at the School of Computer
evolutionary algorithms. Scientific Reports,13, 4737 (2023). Science Engineering
[46] M. Al-Sarem, F. Saeed, Z.G. Al-Mekhlafi, B.A. and Information
Mohammed, T. Al-Hadhrami, M.T. Alshammari, T.S. Systems, Vellore Institute
Alshammari, An optimized stacking ensemble model for of Technology (VIT),
phishing websites detection. Electronics,10, 1285 (2021). Vellore Campus, India.
[47] R.S. Rao, A. Umarekar, A.R. Pais, Application of word She holds a Ph.D. in
embedding and machine learning in detecting phishing Information Technology
websites. Telecommunication Systems,79, 33-45 (2022). and Engineering from VIT University, a Master’s
c 2024 NSP
Natural Sciences Publishing Cor.
Appl. Math. Inf. Sci. 18, No. 6, 1481-1493 (2024) / www.naturalspublishing.com/Journals.asp 1493
c 2024 NSP
Natural Sciences Publishing Cor.