Phishing Url Detection Using CNNLSTM and Random Forest Classifier
Phishing Url Detection Using CNNLSTM and Random Forest Classifier
Submitted: 2023, Oct 27; Accepted: 2023, Nov 06; Published: 2024, May 27
Citation: Gurung, H., Nepal, R., Nepal, S. (2023). Phishing URL Detection Using CNN-LSTM and Random Forest Classifier.
Int J Med Net, 2(5), 01-06.
Abstract
This paper presents the classification of phishing URL's apart from legitimate URL's with the use of machine learning and
deep learning techniques. Phishing is defined as an act to steal the private information by pretending to be a legitimate entity
which they are not. Machine learning model, Random Forest classifier is trained on the extracted features based on Address
Bar, Domain and HTML and JavaScript of the URL. On the other hand, CNN-LSTM hybrid model was trained to learn the
character sequence features of the given URL and make the classification. The dataset used was public data from Kaggle
which was downloaded from their website. The dataset contained 11,430 URLs: 5,715 legitimate URLs and 5,715 phishing
URL. Hereafter, we classified the URL of the current address bar as legitimate or phishing with the use of previously trained
model. Thus, proposed paper focuses on the study and development of models for detection of phishing sites so that properties
of various URLs can be learnt by feature extraction and can be classified as accurately as possible.
Keywords: Phishing Website Detection, Convolutional Neural Network, Long Short-Term Memory Network, Random Forest,
Machine Learning
The same test data was also used for evaluation of CNN-LSTM model. For the testing data, the model provided an accuracy of
94.7%.The confusion matrix for CNN_LSTM model evaluated against test data is shown in Table 2:
The Table 3 and Figure 4 above shows the calculated value 76.66%. Finally, Average precision is obtained for this dataset is
of performance parameters i.e., Actual legitimate, Predicted 67.75% with highest value of 71.3% and lowest of 64%.
Legitimate, Actual Phishing, Predicted Phishing, Accuracy,
Sensitivity and Precision. Here, the average accuracy of 4.3.2 Deep Learning Model
algorithm is 70.25% with highest accuracy value as 71.1% and The result of Performance parameter calculation for Deep
lowest of 67.13%. Similarly average sensitivity obtained here is Learning Model and Dataset are classified in following table
78.16% with highest sensitivity value of 80.27% and lowest is and graph:
Competing Interests
• The authors have no relevant financial or non-financial interests
Copyright: ©2023 Sopnil Nepal, et al. This is an open-access article
to disclose.
distributed under the terms of the Creative Commons Attribution License,
• The authors have no competing interests to declare that are
which permits unrestricted use, distribution, and reproduction in any
relevant to the content of this article.
medium, provided the original author and source are credited.
Int J Med Net, 2023 https://opastpublishers.com Volume 2 | Issue 5 | 6