Review Paper
Review Paper
[1] Decision Tree (DT), Gradient Boost DT: 96.0% [9] The dataset was The Random 50:50 split ratio: 96.72%
Random Forest(RF), achived highest RF:96.9% divided into split Fore 70:30 split ratio: 96.84%
Gradient Boost(GB) accuracy with GB:98.9% ratios of 50:50, st classifier 90:10 split ratio: 97.14%
precesion 70:30, and 90:10. demonstrated
99.0%, recall of Decision Tree superior accuracy
99.4% and F1 (DT), Random and the lowest
score 98.6% Forest (RF), and false negative
[2] Combined blacklisting Among these, XGBoost: (SVM) classifiers rate.
applied ML Algorithms: XGBoost was 96.7%, RF: were applied.
XGBoost, RF, DT, and found to be the 92.5%, DT: [10] A balanced Random Forest Random Forest :98.03%
Multilayer Perceptrons to most accurate 90.5%, dataset was and Naive Bayes Gaussian Naive Bayes :
dataset with features, model. Multilayer utilized to train demonstrated 97.18%
Phishing URLs collected Perceptrons: 88% classifiers such as superior accuracy
from Phishtank and Logistic
OpenPhish. Regression (LR),
[3] Random Forest (RF), Random Forest RF: 94.59%, Naive Bayes
Artificial Neural Networks (RF) was the ANN: 94.35%, (NB), Random
(ANN), Support Vector best-suited XGBoost: Forest (RF),
Machines (SVM), Logistic model based on 92.95%, DT: Decision Tree
Regression (LR), K-Nearest its highest 92.59%, KNN: (DT), and k-
Neighbor (KNN), Decision accuracy and 91.49%, LR: Nearest
Tree (DT), Naive Bayes overall 91.31%, NB: Neighbors (k-
(NB), XGBoost performance in 88.35%, SVM: NN), using
detecting 87.03% features derived
phishing URLs. from the lexical
[4] Support Vector Machine Both Support SVM: 99.96%, structure of
(SVM) and Naïve Bayes Vector Machine NB: 99.96% URLs.
(NB) with features based on (SVM) and [11] The examined Very good Logistic regression
maximum relevance with Naïve Bayes classifiers are performance in :92.6%
minimum redundancy. (NB) classifiers Logistic ensembling Decisiontree :96.5%
Phishtank (2,541 phishing have TPR of Regression, classifiers Randomforest :97.2%
URLs) and Alexa (2,500 99.96, FNR of Decision Tree, namely, Random Adabooster:93.6%
legitimate URLs) datasets. 0.04, TNR of Support Vector Forest, XGBoost KNN:95%
99.96, and FPR Machine, Ada both on SVM:94.9%
of 0.04. Boost, Random computation Gradientboosting:94.8%
[5] XGBoost,LightGBM,Graph LightGBM give XGBoost: Forest, Neural duration and XGBoost:98.3%
Neural Network(GNN) and highest accuracy 92.09% Networks, KNN, accuracy
CatBoost with precesion LightGBM: Gradient
applied.Performance 0.93 and recall 93.29% Boosting, and
evaluated using accuracy, score 0.93 GNN:70% XGBoost.
precesion, recall and F1- CatBoost:92.98%
score.
[7] DT and RF applied to a RF RF: 97%, DT:
Kaggle dataset with 30 outperformed 91.94%
features. PCA used for DT, addressing
feature selection. overfitting and
Performance evaluated variability
using accuracy, precision, effectively.
recall, and F1-score. 6) PERFORMANCE EVALUATION METRICS
[8] Applied SVM on data from RF and NB SVM: 95.66%,
PhishTank and Alexa, with classifiers had RF: 94.27% A selected parameter will be used to evaluate the
internal and external better measure of performance for the system. The associated
features, and PCA for accuracies. In models are Accuracy, Precision, Recall, F1 Score, and
dimensionality reduction. terms of AUC, ROC curve, all derived from the values of True Positive
Gaussian Naive
Bayes had a (TP), True Negative (TN), False Positive (FP), and False
slightly higher Negative (FN).
value of 0.991.
[11] Four classifiers (DT, SVM, DT have 91.5% DT: 91.5%, In the context of URL classification.
Naïve Bayes, Neural accuracy but SVM: 86.69%,
Network) applied to a UCI required pruning Naïve Bayes: True Positive (TP): The number of phishing URLs
dataset with 1,353 labeled to address 86.14%, Neural correctly detected as phishing.
URLs and 9 extracted overfitting. Network:
features. Ensemble 84.87% True Negative (TN): The number of legitimate URLs
methods were correctly detected as legitimate.
recommended
for False Positive (FP): The number of legitimate URLs
improvement. incorrectly classified as phishing.
False Negative (FN): The number of phishing URLs
incorrectly classified as legitimate.
A Confusion Matrix represents these values in terms of
how it indicates the performance of the classification [2] Dr. Nitin N. Sakhare, Jyoti L. Bangare, Dr. Radhika G.
model. Purandare, Disha S. Wankhede, Pooja Dehankar, “Phishing
Website Detection Using Advanced Machine Learning
Techniques”, International Journal of Intelligent Systems and
Applications in Engineering 2024.
[3] Sucharitha, B., Chandini, B., Kumar, D. S., Surendra, M., &
[10] Kumar, G. K. (2024). Detecting phishing websites using
machine learning. IJARCCE, 13(4).
https://doi.org/10.17148/ijarcce.2024.134145
[4] Machikuri Santoshi Kumari, Chiguru Keerthi Priya, Gondhi
[10] Bhavya Haridas Neha, Monisha Awasthi, Surendra Tripathi, ”
Viable Detection of URL Phishing using Machine Learning
Approach”, 15th International Conference on Materials
Processing and Characterization (ICMPC 2023).
[5].A.A. Orunsolu, A. S. Sodiya, and A. T. Akinwale, “A
predictive model for phishing detection,” Journal of King Saud
[10] University – Computer and Information Sciences, vol. 34, no.
2, pp. 232–247, 2022.
[6] Korkma, M., Sahingoz, O. K., & Diri, B. (2020). Detection
of Phishing Websites by Using Machine Learning-Based URL
Analysis. Presented at the 11th International Conference on
Computing, Communication and Networking Technologies
(ICCCNT), July 1-3, 2020, IIT Kharagpur, India. IEEE.
[10] [7] Mohammad Nazmul Alam, Dhiman Sarma et al., “Phishing
OBSERVATIONS attacks detection using machine learning approach,” 3rd
International Conference on Smart Systems and Inventive
Phishing attacks are constantly evolving and the cyber world Technology (ICSSIT), 2020.
is hit by new types of attacks often. Hence a particular detection
approach or algorithm cannot be tagged as the best one giving [8] Junaid Rashid, “Phishing Detection Using Machine
exact results. Through the literature survey, it is evidently Learning Technique”, First International Conference of Smart
visible that Random Forest gives better results in most Systems and Emerging Technologies (SMARTTECH), 2020.
scenarios. But then the performance of each algorithm varies
depending on the dataset used, train-test split ratio, feature [9] Vahid Shahrivari, Mohammad Mahdi Darabi, Mohammad
selection techniques applied etc. Researchers prefer to create Izadi “Phishing Detection Using Machine Learning
machine learning models that perform phishing detection with Techniques” arXiv preprint arXiv:2009.11116, 2020. Retrieved
best value for evaluation parameters and least training time. from arXiv.
Therefore, the future works should focus on these aspects of
phishing detection. [10] Jitendra Kumar, A. Santhanavijayan, B. Janet, Balaji
Rajendran, and Bindhumadhava BS, “Phishing website
classification and detection using machine learning,”
6. CONCLUSION International Conference on Computer Communication and
Due to the greater demand for the security of personal, Informatics (ICCCI), 2020.
financial, and professional data in this digital era, phishing
detection has risen to be a highly critical area of research. [11] Arun Kulkarni, Leonard L. Brown, “Phishing Websites
URL-based analysis is one of the ways that enhance both Detection using Machine Learning”, IJACSA International
detection speed and detection accuracy. By extracting Journal of Advanced Computer Science and Applications, Vol.
those features from the given URL and applying feature 10, No. 7, 2019.
selection and dimensionality reduction techniques, models [12] Rishikesh Mahajan, and Irfan Siddavatam, “Phishing
are refined by eliminating unnecessary data and focusing website detection using machine learning algorithms,”
on the most informative features. Numerous machine International Journal of Computer Applications (0975-8887),
learning algorithms have shown strong performance on vol. 181, no. 23, 2018.
phishing URL classification including Random Forest,
XGBoost, and Support Vector Machines. In this paper, we
retrospectively examined phishing detection, focusing on
different methodologies and their performance. The
review builds a good basis for future researchers taking
their next step at improving phishing detection systems.
REFERENCES
[1] 2023 Internet Crime Report FBI. Retrieved from:
https://www.ic3.gov/Media/PDF/AnnualReport/2023_IC3Re
port.pdf
[1] 2023 Internet Crime Report FBI. Retrieved from: https://www.ic3.gov/Media/PDF/AnnualReport/2023_IC3Report.pdf
[2] Dr. Nitin N. Sakhare, Jyoti L. Bangare, Dr. Radhika G. Purandare, Disha S. Wankhede, Pooja Dehankar, “Phishing Website
Detection Using Advanced Machine Learning Techniques”, International Journal of Intelligent Systems and Applications in
Engineering 2024.
[3] Sucharitha, B., Chandini, B., Kumar, D. S., Surendra, M., & Kumar, G. K. (2024). Detecting phishing websites using machine
learning. IJARCCE, 13(4). https://doi.org/10.17148/ijarcce.2024.134145
[4] Machikuri Santoshi Kumari, Chiguru Keerthi Priya, Gondhi Bhavya Haridas Neha, Monisha Awasthi, Surendra Tripathi, ”
Viable Detection of URL Phishing using Machine Learning Approach”, 15th International Conference on Materials Processing and
Characterization (ICMPC 2023).
[5] A.A. Orunsolu, A. S. Sodiya, and A. T. Akinwale, “A predictive model for phishing detection,” Journal of King Saud University
– Computer and Information Sciences, vol. 34, no. 2, pp. 232–247, 2022.
[6] Korkma, M., Sahingoz, O. K., & Diri, B. (2020). Detection of Phishing Websites by Using Machine Learning-Based URL
Analysis. Presented at the 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT),
July 1-3, 2020, IIT Kharagpur, India. IEEE.
[7] Mohammad Nazmul Alam, Dhiman Sarma et al., “Phishing attacks detection using machine learning approach,” 3rd
International Conference on Smart Systems and Inventive Technology (ICSSIT), 2020.
[8] Junaid Rashid, “Phishing Detection Using Machine Learning Technique”, First International Conference of Smart Systems and
Emerging Technologies (SMARTTECH), 2020.
[9] Vahid Shahrivari, Mohammad Mahdi Darabi, Mohammad Izadi “Phishing Detection Using Machine Learning Techniques”
arXiv preprint arXiv:2009.11116, 2020. Retrieved from arXiv.
[10] Jitendra Kumar, A. Santhanavijayan, B. Janet, Balaji Rajendran, and Bindhumadhava BS, “Phishing website classification and
detection using machine learning,” International Conference on Computer Communication and Informatics (ICCCI), 2020.
[11] Arun Kulkarni, Leonard L. Brown, “Phishing Websites Detection using Machine Learning”, IJACSA International Journal of
Advanced Computer Science and Applications, Vol. 10, No. 7, 2019.
[12] Rishikesh Mahajan, and Irfan Siddavatam, “Phishing website detection using machine learning algorithms,” International
Journal of Computer Applications (0975-8887), vol. 181, no. 23, 2018.