Phishing Attacks Detection A Machine Learning-Based Approach

Good for Study

Uploaded by

gptu35083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views6 pages

Phishing Attacks Detection A Machine Learning-Based Approach

Good for Study

Uploaded by

gptu35083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Phishing Attacks Detection

A Machine Learning-Based Approach

Fatima Salahdine1,2, Zakaria El Mrabet1, Naima Kaabouch1
1
School of Electrical Engineering & Computer Sciences, University of North Dakota
Grand Forks, ND-58203, USA
2
Department of Electrical and Computer Engineering, the University of North Carolina at Charlotte,
Charlotte, NC-28223, USA
{fatima.salahdine, zakaria.elmrabet, naima.kaabouch}@und.edu

Abstract- Phishing attacks are one of the most common social techniques can be classified into four categories: rule-based,
engineering attacks targeting users’ emails to fraudulently steal white and blacklist, heuristic, and hybrid. The rule-based
confidential and sensitive information. They can be used as a part approach consists of using data mining techniques to train the
of more massive attacks launched to gain a foothold in corporate model based on a specific dataset with a certain number of
or government networks. Over the last decade, a number of anti- features, then extract some phishing attacks rules. For instance,
phishing techniques have been proposed to detect and mitigate a rule-based phishing attacks approach was proposed for the
these attacks. However, they are still inefficient and inaccurate. banking service in which several features were selected,
Thus, there is a great need for efficient and accurate detection including IP address, SSL certificate, web address length,
techniques to cope with these attacks. In this paper, we proposed
number of dots in URL, and blacklist keywords. In [4], the
a phishing attack detection technique based on machine learning.
We collected and analyzed more than 4000 phishing emails
authors proposed a data mining tool called Multi-label Classifier
targeting the email service of the University of North Dakota. We Associative Classification in which 16 features were selected,
modeled these attacks by selecting 10 relevant features and including IP address, Long URL, URL's having @ symbol,
building a large dataset. This dataset was used to train, validate, prefix and suffix, and DNS record. In [5], a rule-based technique
and test the machine learning algorithms. For performance was described, in which 17 features were selected and different
evaluation, four metrics have been used, namely probability of classifiers were used, namely C4.5, RIPPER, PRISM, and CBA.
detection, probability of miss-detection, probability of false alarm, The results show that C4.5 outperforms the other algorithms in
and accuracy. The experimental results show that better detection terms of detection rate and accuracy. Rule-based approaches are
can be achieved using an artificial neural network. easy to implement; however, they represent some shortcomings,
including a low accuracy rate.
Keywords- Security; Phishing attacks; Machine learning
Other techniques are based on whitelist and blacklist
I. INTRODUCTION approaches [6][7]. In [6], a white-list-based approach was
With more than 7 billion email accounts worldwide in 2021 proposed in which a number of features related to the legitimate
websites were recorded, such as URL, IP address, and Login
and over 3 million emails sent per second, email services have
become an indispensable way for personal and professional User Interface. When the user visits a website that does not
transactions. However, the massive use of email services has match any entry in this list, the requested website is classified as
grabbed the attention of attackers as a potential field for malicious. In [7], a blacklist-based approach was proposed in
launching successful attacks. Compromising an email account which the URL of the suspicious webpage is divided into several
becomes challenging or almost impossible since the email parts and compared to a list of phishing websites. The list of
service providers offer secure E2E communication. Thus, the suspicious websites is gathered from several sources, including
attackers opt for using social engineering strategies to spam traps and open phishing email databases. Whitelist and
compromise email accounts by manipulating human blacklist approaches are inefficient in dealing with new
webpages that are not included in those pre-established lists. In
intelligence to obtain critical and confidential information [1].
addition, these lists require frequent updates, which can be
Phishing attacks perform by sending forged emails looking computationally expensive.
legitimate from an authentic entity to a victim or a group of
victims [2][3]. They aim at obtaining users’ confidential data or For the heuristic techniques, feature sets are selected, and the
impact of each set in increasing the detection likelihood is
uploading malware on their machines. For instance, the
attackers send an email with a redirection link to a malicious investigated. The tested feature sets can range from URL, IP
address to HTML DOM of the webpage. For instance, a
website where the user is requested to provide some sensitive
data, including bank account number or login and password. The heuristic-based technique was proposed in which 20 heuristic
features were selected [8]. The results show that the URL-based
attacker can also attach a file to the fake email to be uploaded
by the victim, which can automatically trigger the execution of and HTML-based heuristics are effective, and they outperform
the blacklist-based approach. In [9], a heuristic-based approach
embedded malware.
called CANTINA+ was proposed to extract the most frequent
To cope with phishing attacks and mitigate their potential words in the webpage and search for them on a search engine.
risks, a number of techniques have been proposed. These The webpage is classified as legitimate if it appears in the first

978-1-6654-0690-1/21/$31.00 ©2021 IEEE

results of the research since the first reported webpage is the 1) SSL certificate
most visited and their likelihood of being legitimate is high. When a user is requested to enter confidential data on
However, the attacker can access these entries and make legitimate websites, the exchanged data between the server and
malicious webpages appear in the first search results. the end-user is encrypted, which can be achieved through the
A number of hybrid detection techniques have been HTTP protocol with an additional secure socket layer [13].
proposed that combine the fuzzy-logic approach with other data However, most of the phishing emails include HTTP links
mining techniques [10][11][12]. In [10], a hybrid approach was without any supplementary secure layer exposing the data to
proposed that has an accuracy of 98.5% with 288 features. It potential unauthorized access and loss. Thus, if an email
requires a considerable number of features, which makes its includes a secure HTTP link, then it is legitimate; otherwise, it
implementation complex. In [11], a hybrid approach was is malicious.
proposed reaching an accuracy of 86.38% with 27 features. "##$% '()* → ',-(.(/0., ,/0('
However, it was not clear how the features were extracted. A Feature 1: if !
1.ℎ,34(%, → %5%6,7(85% ,/0('
target identification algorithm was designed to identify phishing
webpages [12]. It is based on third-party services to investigate 2) Certificate authority
in-depth the content of the suspicious link and verify its source, Not every HTTPS link can guarantee a secure connection to
which may result in more processing time. the server and make the sensitive data undisclosed to a third
In this paper, we investigate the efficiency of the machine party since the SSL certificate can be delivered by an
learning approaches in detecting phishing emails. After unauthentic entity or self-signed. An SSL certificate is
understanding the research problem’s requirements and insufficient to decide if an HTTSs link is secure. Investigating
analyzing the training dataset, we selected three models among the identity of the entity that issued the certificate is crucial in
others, namely support vector machine (SVM), logistic verifying the email's legitimacy [14]. Thus, if an SSL certificate
regression (LR), and artificial neural network (ANN). We is not delivered by a trusted and credible authority such as
explored other variations with different kernel types and GoDaddy, Comodo, and Symantec, then the email is suspicious.
different architectures. The dataset used to train and test the 95.ℎ,).(7 :9 → ',-(.(/0., ;;<
classifiers was from real attacks launched against the email Feature 2: if !
1.ℎ,34(%, → %5%6(7(85% ;;< 7,3.(=(70.,
service of the University of North Dakota.
3) Blacklist keywords
The rest of this paper is organized as follows. Section II
describes the methodology of the proposed approach. Section III Phishing emails share in common some keywords and short
discusses and compares the simulation results. Finally, a phrases. These keywords have a sense of urgency, including
conclusion is given at the end. "Click Now", "Verify Now," "Valid in 24h", and "Update Now."
Including such keywords in the email, the body provides clues
II. METHODOLOGY about the illicitness of the email. In this paper, we established a
A. Features selection list of several suspicious keywords used by the attackers to grab
the attention of the victim [15]. If the email includes one or more
Usually, a typical email is composed of a header and a body. blacklist words, then it is malicious.
The email header has a specific structure consisting of several
information related to the sender and the receiver, including >/0(' 483? ∈ {B'07*'(%.} → %5%6(7(85% ,/0('
their IP addresses, the subject, and the date. Regarding the email Feature 3:if!
8.ℎ,34(%, → ',-(.(/0., ,/0('
body, it has no specific format, and it can be customized and
different from one email to another. However, there are some 4) Redirection URL
items that can be found in any typical email, such as text, link to Some phishing emails include a link that implicitly redirects
a website, attached files, and the email's signature. Since not the the user to a hidden server before reaching the requested
entire email content is relevant in detecting legitimate emails website, such as a proxy server. This server will handle the
from malicious ones, it is important to select and extract only communication between the user, the malicious, and the
those specific features that are used in phishing emails. In this legitimate websites [16]. GET request of the HTTP protocol is
paper, we used ten relevant features in which eight are extracted used to verify the legitimacy of an URL.
from the email body while the rest is from the email header.
These features are: sender email address, attached file extension, D># ('()*!"# ) ≠ '()*!"# → %5%6(7(85% ,/0('
Feature 4:if!
blacklist keywords, secure socket layer (SSL) certificate, 1.ℎ,34(%, → ',-(.(/0., ,/0('
certificate authority (CA), redirection URL, hiding links, clear 5) Hiding links
IP address, website traffic, and webpage age. Individual features
may not reveal the legitimacy of an email but combining several An alternative way to hide the actual URL website is to use
features increase the likelihood of detecting potential phishing hiding links, which rely on two techniques: URL shorteners and
emails. customized HTML emails. In the former, the attacker wraps the
real URL in a short one such as "goo.gl", or "j.mp". In the latter,
the attacker forges an HTML email with the Cascading Style @domain name ∈ {'(%. 8= 73,?(B', ?8/0(),}
Sheets and JavaScript scripts to customize the webpage link Feature 9: if H )0/,%} → ',-(.(/0., ,/0('
with a personalized clicked text or image. Thus, an email is 1.ℎ,34(%, → %5%6(7(85% ,/0('
suspicious if it includes a short URL.
10) Attached file extension
'()*!"# (% %ℎ83.!"# 83 '()*!"# (% ℎ(??,) ()%(?,
Feature 5:ifH(/0-, 83”7'(7*,? .,J.”) → %5%6(7(85% ,/0(' It is used to increase the likelihood of detecting phishing
1.ℎ,34(%, → ',-(.(/0., ,/0(' emails. Some phishing emails include an attached file, including
an embedded payload. This payload can be an executable shell
6) Clear IP address script giving the attacker the privileges to execute some
command on the user's machine. One of the known tools used
Some phishing emails include links with a clear IP address.
by attackers to forge phishing emails is the social engineering
"https://50.10.125.26/index.php" is an example that indicates Toolkit installed by default on the Kali Linux. It generates a file
the illegitimacy of the email. Using an IP address instead of the
including the payload with ".exe" or ".dll" extension. If the
specific domain name is because malicious webpage links last
attached file has ".exe" extension, then the email is suspicious.
for less than three days, and attackers do not buy a domain name
for a short period of time. Thus, if a link includes a clear IP =(',_)0/,. exe → %5%6(7(85% ,/0('
address, then it is suspicious. Feature 10: if !
1.ℎ,34(%, → ',-(.(/0., ,/0('
Feature 6: if B. Classification techniques
'()*!"# ()7'5?,% K$ 0??3,%% → %5%6(7(85% ,/0(' In this work, we compared the performance of three
!
1.ℎ,34(%, → ',-(.(/0., ,/0(' classifiers, namely support vector machine (SVM), logistic
7) Website traffic regression (LR), and artificial neural network (ANN). SVM is a
machine learning algorithm used for solving classification and
Legitimate websites receive a number of requests with a regression problems [17]. It is based on a hyper line classifier
specific traffic rate per day. A legitimate website has a rank less that separates and maximizes the margin between two distance
or equal to 150,000 in the Alexa database. However, phishing classes. Let the dataset, D, be given as {(x1, y1), (x2, y2),…,( xN,
websites are not often visited as they have a short lifetime, and yN)}, where xi is the set of training tuples with the associated
their traffic is low. class labeled yi. Each yi can take one of two values, either +1 or
.30==(7 < 150000 → ',-(.(/0., ,/0(' -1, corresponding respectively in our case to the class ‘phishing
Feature 7: if ! email’ or ‘legitimate email’. SVM finds the best decision
1.ℎ,34(%, → %5%6(7(85% ,/0('
boundary to separate these two classes using a hyper line, h,
8) Age of the webpage which can be defined as
Since most phishing webpages have a short lifetime, the age ℎ(J) = W ∗ Y + B = ∑%
$&' \$ Q$ (J$ , J) + B (1)
of the webpages can provide information about their legitimacy.
The age of the authentic website is usually more than one year. where W is the weight, B is the bias, ^ is the number of features
Thus, if the email includes a webpage link with less than one in the dataset, xi is the set of training tuples, and \$ is the
year, then it is suspicious. Lagrange multiplier. In the case of non-linear data, one can first
transform the data through non-linear mapping to another higher
4,B60-, 0-, > 1 Q,03 → ',-(.(/0., ,/0(' dimension space and then use a linear model to separate the data.
Feature8: if !
1.ℎ,34(%, → %5%6(7(85% ,/0(' The mapping function is done by a kernel function K and the
9) Sender’s email address equation can be rewrite the equation (1) as

In some phishing emails, there is an inconsistency between ℎ(J) = W ∗ Y + B = ∑%

$&' \$ Q$ _(J$ , J) + B (2)
the email subject and the address of the sender. For instance, where _(J$ , J) is the kernel function. In this paper, we used the
some malicious emails seem to be emitted by an authentic entity, polynomial function as a kernel. SVM classifies a new email
such as Microsoft or Dropbox, since the email's subject states based on its position with respect to that hyper line. If an email's
something similar to "the user X has shared some files with you" features lie on or above the hyper line, then it belongs to the
or "Reinitialize the password." However, the sender's email phishing email class.
address includes a strange domain name such as
"@sharing.dboxfile.com" or "@dropbox.com." Thus, such LR is a supervised machine learning technique used for
inconsistency can be relevant in detecting malicious senders. predicting discrete output class, classification, and binary
Thus, if a domain name does not belong to the credible domain classification [3]. It is based on different hypothesis functions
names list, then the email is suspicious. for predicting a binary-value output. In this paper, sigmoid
function is considered as a hypothesis function. It is given by
1 (Q = 1); (ii) if l ≪ 0, the hypothesis function satisfies ℎ( (J) <
ℎ( `J ($) a = (3) 0.5, which corresponds to the absence of the attack (Q = 0).
+ ∑% (#)
#&' (! - !
1+,
where 4. is the weight associated with each input J. and ^ is III. RESULTS AND DISCUSSION
the number of features. In this paper, we opted for gradient To train, validate, and test the models, we built a dataset
descent (GD) as an optimization technique to define the consisting of 4000 real phishing emails. These emails were
appropriate weight that minimizes the prediction error. collected from the North Dakota email system from May 22,
ANN is a supervised machine learning algorithm used for 2017, to June 20, 2018. The collected data include some
classification and regression prediction. It is composed of an redundant emails because some attackers sent the same forged
input layer, one or multiple hidden layers, and an output layer email to multiple users, or they used it to conduct the same attack
where each layer is composed of several neurons. A neuron is a several times. Thus, we analyzed and improved the dataset by
computation unit that takes a set of inputs associated with removing the duplicated and redundant emails and reducing the
weights and predicts the output using an activation function. number of instances to 2000 phishing emails. The legitimate
There are several activation functions, including the sigmoid emails were collected from legitimate accounts and emitted by
function, hyperbolic tangent function, and rectified linear unit an authentic entity. To keep the number of phishing and
function [15]. Training an ANN model involves forward legitimates emails equally distributed in the dataset and to avoid
propagation and backward propagation. For each instance in the bias towards any one of these two types of classes, 2000
dataset, the forward propagation is used to compute the legitimate emails were selected. Thus, the final dataset contains
predicted output and compare it with the actual one and then 4000 instances with legitimate and phishing emails, as presented
calculate the error between these two values. To minimize the in Table I.
error, the backward propagation updates the weights associated
TABLE I. COLLECTED PHISHING DATASET
with each input using gradient descent. Forward and backward
propagation are repeated until ANN reaches a minimum error Total samples 4000
value. (/0 neuron of the '/0 layer is given by Total phishing emails 2000
(1) (1) (1) 1+' (1)
0. = -. (∑2
$&' 4.$ 0$ + B.$ ) (4) Total legitimate emails 2000

Total training samples 2800

The activation function of the output layer of an ANN with
one neuron is given as follows: Total testing samples 1200

(1) Total number of features 10

ℎ( (J) = -(∑2 1
$&' 4$ 0$ ) (5)
ANN learn their weights and biases using GD technique,
Given a training set b`J (') , Q (') a, … , `J (2) , Q (2) ad, the cross- Since some classifiers cannot be trained on categorical data,
entropy cost function e(W) is given by the dataset went through a pre-processing process in which all
'
the nominal values were converted into numerical values. The
($)
e(W) = − ∑2 4 ($)
$ ∑3&' Q3 log`ℎ( (J )3 a + (1 − same converting model was used to map the nominal data to the
2
($) 5
∑#+'
8 8 9'
(1) 6 nominal one in the entire dataset. In addition, the dataset went
Q3 )log (1 − ℎ( (J ($) )3 ) + 1&' ∑$&' ∑.&' (4.,$ )
( (
(6)
62 through a feature scaling process to make the data normally
where j is the regularization parameter, m is the training data distributed with zero as a mean and a standard deviation of 1.
size, K is the number of the output classes, and ℎ( is the These processes can reduce the processing time for some
(1) classifiers along with avoiding the divergence issues that could
hypothesis function, and 4.,$ is the weights assigned to the link arise. The performance evaluation of the algorithms was
between the (.ℎ and k.ℎ neurons of '.ℎ layer. conducted using several metrics: Pd, Pfa, Pmd, and accuracy. Pd is
The process consists of minimizing the cross-entropy cost the likelihood to detect suspicious emails when they are
function e(W). Backpropagation aims at updating all the suspicious. Pfa is the likelihood to detect a suspicious email
weights simultaneously to minimize the cost function. The while it is legitimate. Pmd is the likelihood to classify a legitimate
hypothesis is the case of a sigmoid function given as: email when this email is suspicious. The accuracy is the
likelihood that a classifier attributes legitimate email to the class
'
(l) = '9: )* (7) of “legitimate email”. These metrics are expressed as
;<2=:> @A B:/:C/:B 8<8D:C$@<8 :2E$18
where z is the vector of weights associated with the vector of $? = %<2=:> @A 8<8D$C$@<8 :2E$18
(8)
features x. In this binary classification, there are two cases based
;<2=:> @A AE18: B:/:C/:B 8<8D:C$@<8 :2E$18
on the values of l: (i) if l ≫ 0, the hypothesis function satisfies $=0 = (9)
%<2=:> @A 8<8D$C$@<8 :2E$18
ℎ( (J) > 0.5, which corresponds to the presence of the attack
;<2=:> @A 2$88 B:/:C/:B 8<8:DC$@<8 :2E$18
$/? = (10)
%<2=:> @A 8<8D$C$@<8 :2E$18
Examples of results are presented in Table II through Table different values of this parameter. Examples of results are given
IV. ANN performance is affected by many parameters, in Fig. 1 through 5. Fig. 1 represents Pd against the regularization
including the number of hidden layers, the number of hidden parameter. It can be seen that Pd increases with the increase of
neurons in each layer, and activation function. To find the right the regularization parameter, reaching its maximum at 0.006
set of parameters that maximize the ANN performance, we with 87.2%. For values higher than 0.06, Pd decreases slightly
conducted several experiments using the generated dataset with but it remains constant at 87.1%.
different combinations of these parameters.
Examples of results are represented in Table II. As it can be
seen, ANN with two hidden layers of 100 neurons, each with
relu function, has the best performance as it achieves the highest
Pd of 90.3%, the lowest Pmd of 9.7%, and the highest accuracy
of 94.5%. Thus, ANN with two hidden layers of 100 neurons
each and relu activation function has the best performance.

TABLE II. ANN PERFORMANCE EVALUATION

Algorithm Pd Pfa Pmd Accuracy

(100) / Relu function 90.10% 1.40% 9.90% 94.40%

Fig.1. Pd as a function of the regularization parameter.

(100,100) / Relu function 90.30% 1.50% 9.70% 94.50%
Fig. 2 represents Pfa as a function of the regularization
(100) / tanh function 90% 1.40% 10% 94.30% parameter. As one can see Pfa has three different regimes. For
the range [0, 0.1], Pfa is constant with an average equal to 6.5%.
(100,100) / tanh function 90.10% 1.50% 9.90% 94.30% For the range [0.1, 0.4], Pfa is decreasing with the increase of the
regularization parameter to reach its lowest values at 0.4 with
(100) / sigmoid 88.90% 1.40% 11.10% 93.80%
1.4%. For values higher than 0.4, Pfa remains constant at 1.4%.

(100,100) / sigmoid 88.70% 1.40% 11.30% 93.70%

TABLE III. SVM’S PERFORMANCE WITH SEVERAL KERNELS

Algorithm Pd Pfa Pmd Accuracy

Linear SVM 29.8% 44.8% 70.2% 42.6%

Cubic SVM 63.4% 54.5% 36.6% 54.4%

RBF SVM 82.3% 27.7% 17.7% 77.3%

Fig.2. Pfa versus the regularization parameter of logistic regression.
Sigmoid SVM 43.3% 24.7% 56.7% 59.4% Fig.3 represents Pmd against the regularization parameter of
LR. It can be seen that for the range of [0, 0.08], Pmd decreases
with the increase of the regularization parameter to reach its
As the performance of SVM dependents on the kernel used minimum at 0.08 with 12.8%. However, for values higher than
for email classification, four kernels were considered, namely: 0.08, increasing the regularization parameter does not have any
linear, polynomial, radial basis function (RBF), and sigmoid impact on Pmd as it remains constant at around 12.8%.
kernels. Examples of results are presented in Table III. Through
comparing the performance of the SVM algorithms, we can
conclude that SVM based RBF kernel achieves a Pd of 82.3%,
Pfa of 27.7%, Pmd of 17.7%, and overall accuracy of 77.3%.
Thus, it provides better results compared to the other algorithms.
To investigate the impact of the regularization parameter on
the LR performance, several kernels were performed using
0.7, it produces the best results. Based on the best performance
of each classifier, a performance comparison between these
algorithms is given in Table IV. As one can see, ANN with two
hidden layers with Relu function has the highest Pd and
accuracy, the lowest Pfa and Pmd compared to LR and SVM.
CONCLUSION
In this paper, we proposed a phishing attack detection
technique using machine learning. Three classifiers are trained
and tested on the dataset. For each classifier, a parametric study
is conducted, and the best results are reported for evaluation. For
SVM, high accuracy is reported by Gaussian Radial basis
function kernel. For LR, the high accuracy is given by a
Fig.3. Pmd versus the regularization parameter of logistic regression. regularization parameter corresponding to 0.4. For ANN, high
Fig. 4 represents the accuracy as a function of the accuracy is achieved with two hidden layers, 100 neurons each,
regularization parameter. One can see that the accuracy and with the Relu activation function. Therefore, the proposed
increases when the regularization parameter is less than 1, while model allows fast and accurate phishing attacks detection.
it is constant for values higher than 1. It reaches its maximum REFERENCES
value of 92.9% when the regularization parameter is 0.4. Thus,
LR represents better performance with a regularization
[1] F. Salahdine and N, Kaabouch, “Social Engineering Attacks: A
parameter higher than 0.7. Survey,” Future Internet J,, 11, 89, pp. 1-17, 2019.
[2] R. Mohammad, F. Thabtah, and L. McCluskey, “Intelligent rule-based
phishing websites classification,” IET Inf. Secur., pp. 153–160, 2014.
[3] F. Salahdine and N. Kaabouch, “Security threats, detection, and
countermeasures for physical layer in cognitive radio networks: A
survey,” Physical Commun. J., 2020.
[4] J. He and Y. Zhu, “Social engineering/phishing,” Encycl. Soc. Netw. Anal.
Min., pp. 1777–1783, 2014.
[5] M. Moghimi and A. Varjani, “New rule-based phishing detection method,”
Expert Syst. Appl., vol. 53, pp. 231–242, 2016.
[6] B. Gupta, N. Arachchilage, and K. Psannis, “Defending against phishing
attacks: Taxonomy of methods, current issues and future directions,”
Telecommun. Syst. 67, 247–267, 2018.
[7] J. Hong, T. Kim, and S. Kim, “Phishing URL detection with lexical
features and blacklisted domains,” Adaptive Auton. Secur. Cyber Syst.,
pp. 253-267, 2020.
[8] Y. Huang, Q. Yang, J. Qin, W. Wen, “Phishing URL Detection via CNN
and Attention-Based Hierarchical RNN,” IEEE Int. Conf. Trust, Security,
Fig.4. Accuracy versus the regularization parameter of logistic regression. Privacy Comput. Commun., pp. 112-119, 2019.
[9] Moghimi M, Varjani AY. New rule-based phishing detection method.
TABLE IV. COMPARISON BETWEEN ANN, SVM, AND LR Expert systems with applications., 1;53:231-42, 2016.
[10] G. Ramesh, I. Krishnamurthi, and K. Kumar, “An efficacious method for
Algorithm Pd Pfa Pmd Accuracy detecting phishing webpages through target domain identification,”
ANN (100,100) 90.3% 1.5% 9.7% 94.5% Decision Support Systems, vol. 61, no. ,pp. 12–22, 2014.
Relu function [11] Y. Suga, “SSL/TLS servers status survey about enabling forward secrecy,”
Int. Conf. Network-Based Information Systems, pp. 501–505, 2014.
SVM Gaussian 82.3% 27.7% 17.7% 77.3% [12] A. Albarqi, E. Alzaid, F. Ghamdi, S. Asiri, and J. Kar, “Public key
Radial basis function
infrastructure: A survey,” J. Inf. Secur., vol. 06, no. 01, pp. 31–37, 2015.
LR regularization [13] S. Krishnamurthy and A. Ve, “Information retrieval models: Trends and
87.1% 1.4% 12.9% 92.9%
parameter=0.7 techniques,” Web Semant. Textual Vis. Inf. Retr., pp. 17–42, 2017.
[14] A. Kharraz, W. Robertson, and E. Kirda, “Surveylance: Automatically
detecting online survey scams,” IEEE Symp. Secur. Privacy, pp. 723–
739, 2018.
Table IV evaluates the performance of the three classifiers [15] Y. Reddy and N. Varma, “Review on supervised learning techniques,”
based on the four metrics. For ANN, we selected two hidden Emerg. Res. Data Eng. Syst. Comput. Commun. J., pp. 577-587, 2020.
layers with 100 neurons, each with the Relu activation function [16] C. Bircano and N. Arıca, “A comparison of activation functions in artificial
since it produces the best results compared to other activation neural networks,” Signal Proc. Commun. App. Conf., pp. 1-4, 2018.
functions. Regarding SVM, Gaussian Radial basis kernel is [17] Y. Arjoune, F. Salahdine, Md. Islam, E. Ghribi, and N. Kaabouch, “A
novel jamming attacks detection approach based on machine learning for
selected since it produces better results in terms of Pd, Pfa, Pmd, wireless communication,” Int. Conf. Inf. Netw., pp. 1–6, 2020.
and accuracy. For LR, when a regularization parameter equal to

Phishing Detection Based On Machine Learning and Feature Selection Methods
No ratings yet
Phishing Detection Based On Machine Learning and Feature Selection Methods
13 pages
Phishing Websites Classification Using Hybrid SVM
No ratings yet
Phishing Websites Classification Using Hybrid SVM
7 pages
A Structured Synopsis For Phishing Website Identification
No ratings yet
A Structured Synopsis For Phishing Website Identification
5 pages
Review Paper
No ratings yet
Review Paper
8 pages
Based On URL Feature Extraction
No ratings yet
Based On URL Feature Extraction
6 pages
IJRTI2207237
No ratings yet
IJRTI2207237
19 pages
Comparative Study of Catboost, Xgboost, and Lightgbm For Enhanced Url Phishing Detection: A Performance Assessment
No ratings yet
Comparative Study of Catboost, Xgboost, and Lightgbm For Enhanced Url Phishing Detection: A Performance Assessment
11 pages
Detection of Phising Websites Using Machine Learning Approaches
No ratings yet
Detection of Phising Websites Using Machine Learning Approaches
9 pages
Phishing Detection Using Clustering and Machine Learning
No ratings yet
Phishing Detection Using Clustering and Machine Learning
11 pages
Phishing Website Detection Using ML IJERTCONV9IS13006
No ratings yet
Phishing Website Detection Using ML IJERTCONV9IS13006
4 pages
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
No ratings yet
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
4 pages
Part 3 Discription
No ratings yet
Part 3 Discription
27 pages
Fortigate Lab
No ratings yet
Fortigate Lab
242 pages
1 PB
No ratings yet
1 PB
11 pages
Phishing Detection in Email Using Deep Learning
No ratings yet
Phishing Detection in Email Using Deep Learning
8 pages
Review Paper
No ratings yet
Review Paper
9 pages
A Sophisticated Framework For The Accurate Detection of Phishing Websites
No ratings yet
A Sophisticated Framework For The Accurate Detection of Phishing Websites
23 pages
An Investigation Into The Performances of The Current State-Of-The-Art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier For Phishing Detection A Survey
No ratings yet
An Investigation Into The Performances of The Current State-Of-The-Art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier For Phishing Detection A Survey
12 pages
SC 300 Questions Answers Only
No ratings yet
SC 300 Questions Answers Only
334 pages
Phishing 4
No ratings yet
Phishing 4
6 pages
HTMLPhish Enabling Accurate Phishing Web Page Detection by Applying Deep Learning Techniques On HTML Analysis WCCI
No ratings yet
HTMLPhish Enabling Accurate Phishing Web Page Detection by Applying Deep Learning Techniques On HTML Analysis WCCI
8 pages
Phishing Detection (Yamu Research Project)
No ratings yet
Phishing Detection (Yamu Research Project)
19 pages
A Machine Learning Based Approach For Phishing Detection Using
No ratings yet
A Machine Learning Based Approach For Phishing Detection Using
14 pages
Towards Detection of Phishing Websites On Client-Side Using Machine
No ratings yet
Towards Detection of Phishing Websites On Client-Side Using Machine
14 pages
1822 B.E Cse Batchno 287
No ratings yet
1822 B.E Cse Batchno 287
65 pages
Machine Learning For Detecting The Phishing Threats
No ratings yet
Machine Learning For Detecting The Phishing Threats
6 pages
Paper Major1
No ratings yet
Paper Major1
6 pages
Our Paper
No ratings yet
Our Paper
8 pages
Phish Guard Phishing Website Using Machine Learning Algorithms
No ratings yet
Phish Guard Phishing Website Using Machine Learning Algorithms
10 pages
Batch-5 ECE-D
No ratings yet
Batch-5 ECE-D
4 pages
Fake Url
No ratings yet
Fake Url
64 pages
Edited Phishing Domains Detection Using Deep Learning
No ratings yet
Edited Phishing Domains Detection Using Deep Learning
11 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
6 pages
Paper 1
No ratings yet
Paper 1
5 pages
CH 2. Literature Survey
No ratings yet
CH 2. Literature Survey
5 pages
2023 Cyber Security Report
100% (2)
2023 Cyber Security Report
109 pages
Final Paper On Phishing Domains Detection Using Deep Learning
No ratings yet
Final Paper On Phishing Domains Detection Using Deep Learning
11 pages
Web Phishing Detection Using ML
No ratings yet
Web Phishing Detection Using ML
5 pages
Phishing Attacks Detection Using Machine Learning Approach
No ratings yet
Phishing Attacks Detection Using Machine Learning Approach
7 pages
Phishing Web Site Detection Using Diverse Machine Learning Algorithms
No ratings yet
Phishing Web Site Detection Using Diverse Machine Learning Algorithms
16 pages
20mis0106 VL2023240102875 Pe003
No ratings yet
20mis0106 VL2023240102875 Pe003
42 pages
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
No ratings yet
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
11 pages
Contents 1
No ratings yet
Contents 1
19 pages
Jain 2018
No ratings yet
Jain 2018
14 pages
155-Article Text-230-3-10-20230813
No ratings yet
155-Article Text-230-3-10-20230813
7 pages
V6I602
No ratings yet
V6I602
8 pages
IJCRTI020051
No ratings yet
IJCRTI020051
4 pages
A Hybrid Model To Detect Phishing-Sites Using Supervised Learning Algorithms
No ratings yet
A Hybrid Model To Detect Phishing-Sites Using Supervised Learning Algorithms
8 pages
Comparative Analysis of Features Based Machine Learning Approaches For Phishing Detection
No ratings yet
Comparative Analysis of Features Based Machine Learning Approaches For Phishing Detection
6 pages
CSE3502-Final J Comp Report
No ratings yet
CSE3502-Final J Comp Report
20 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
Network Security Report
No ratings yet
Network Security Report
42 pages
Batch-5 Journal-6 ECE-D New
No ratings yet
Batch-5 Journal-6 ECE-D New
6 pages
Detection of Phishing Websites Using Mac
No ratings yet
Detection of Phishing Websites Using Mac
3 pages
Phishing Detection Using Machine Learning
No ratings yet
Phishing Detection Using Machine Learning
9 pages
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
No ratings yet
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
6 pages
Automated Phishing Detection Through URL Analysis and Machine Learning
No ratings yet
Automated Phishing Detection Through URL Analysis and Machine Learning
9 pages
CyberSec Review3 Team10
No ratings yet
CyberSec Review3 Team10
28 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
No ratings yet
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
8 pages
Detection of Phishing WebsitesUsing Random Forest and XGBOOST
No ratings yet
Detection of Phishing WebsitesUsing Random Forest and XGBOOST
14 pages
LIS 2022 New 1-154-160
No ratings yet
LIS 2022 New 1-154-160
7 pages
61k Valid Corporative PDF Free
No ratings yet
61k Valid Corporative PDF Free
105 pages
CISSP Domain 1 v3 Complete
100% (1)
CISSP Domain 1 v3 Complete
92 pages
Data Security Considerations - (Backups, Archival Storage and Disposal of Data)
No ratings yet
Data Security Considerations - (Backups, Archival Storage and Disposal of Data)
3 pages
CTF 1
No ratings yet
CTF 1
14 pages
Usb Wibu Key Dongle Emulator 12
No ratings yet
Usb Wibu Key Dongle Emulator 12
3 pages
Digital Banking Notes
No ratings yet
Digital Banking Notes
7 pages
11.3.1.2 Lab - CCNA Security ASA 5506-X Comprehensive
No ratings yet
11.3.1.2 Lab - CCNA Security ASA 5506-X Comprehensive
19 pages
Classical Encryption Techniques: M. Odeo Lecturer
No ratings yet
Classical Encryption Techniques: M. Odeo Lecturer
39 pages
Al-Bayati 2021
No ratings yet
Al-Bayati 2021
97 pages
TOC CCNP Security Cisco Network With Firepower
No ratings yet
TOC CCNP Security Cisco Network With Firepower
8 pages
Shree Swaminarayan Institute of Technology, Bhat: Computer Engineering Department Subject: Information Security (3170720)
No ratings yet
Shree Swaminarayan Institute of Technology, Bhat: Computer Engineering Department Subject: Information Security (3170720)
3 pages
Security Gateway Datasheet
No ratings yet
Security Gateway Datasheet
5 pages
#1300 Database
No ratings yet
#1300 Database
40 pages
Cybersecurity Through Secure Software Development
No ratings yet
Cybersecurity Through Secure Software Development
11 pages
Sij Ash V7 N3 001
No ratings yet
Sij Ash V7 N3 001
6 pages
Module 1-Platform Technologies
No ratings yet
Module 1-Platform Technologies
5 pages
Krishnamurthy 2021 Martin Luther King JR On Democratic Propaganda Shame and Moral Transformation
No ratings yet
Krishnamurthy 2021 Martin Luther King JR On Democratic Propaganda Shame and Moral Transformation
32 pages
AI and IoT in Smart Ports - Improving Real-Time Decision Making For Supply Chain Resilience
No ratings yet
AI and IoT in Smart Ports - Improving Real-Time Decision Making For Supply Chain Resilience
10 pages
Unit 1: What Is Cybercrime?
No ratings yet
Unit 1: What Is Cybercrime?
11 pages
Adam Slavny: Reasonableness and Risk
No ratings yet
Adam Slavny: Reasonableness and Risk
21 pages
Web Programming Assignment-4
No ratings yet
Web Programming Assignment-4
4 pages
Springboard Courses 2020
No ratings yet
Springboard Courses 2020
27 pages
Plag Report Am0rahosiputywq
No ratings yet
Plag Report Am0rahosiputywq
11 pages
DND CRM
No ratings yet
DND CRM
33 pages
Model-Based Quantitative Network Security Metrics A Survey
No ratings yet
Model-Based Quantitative Network Security Metrics A Survey
30 pages
1 PB
No ratings yet
1 PB
8 pages
SSL Handshake
No ratings yet
SSL Handshake
1 page
Geoinformatics 2006 Vol06
No ratings yet
Geoinformatics 2006 Vol06
50 pages
Css Exp 5
No ratings yet
Css Exp 5
4 pages
DBP's - Moodle: Navigation Login
No ratings yet
DBP's - Moodle: Navigation Login
2 pages
Packet Tracer - Configure SSH: Addressing Table
No ratings yet
Packet Tracer - Configure SSH: Addressing Table
5 pages
HACKING TIPS AND TRICKS: The Art and Science of Cybersecurity and Penetration Testing (2024 Guide for Beginners)
From Everand
HACKING TIPS AND TRICKS: The Art and Science of Cybersecurity and Penetration Testing (2024 Guide for Beginners)
GOODWIN DOYLE
No ratings yet