0% found this document useful (0 votes)
44 views5 pages

A Structured Synopsis For Phishing Website Identification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views5 pages

A Structured Synopsis For Phishing Website Identification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Structured Synopsis for Phishing Website Identification,

Proactive Prevention, and Detection

Prof.Bhairavi Pawar1 , Dr.U.C. Patkar2 , Arushi Bhatnagar3 , Atharva Nawarange4 ,


Khushali Khairnar5 , Mayur Madane6

1
Prof. of Department of Computer Engineering
Bharati Vidyapeeth’s College of Engineering, Lavale
Pune, India

[email protected]

2
Prof. of Department of Computer Engineering
Bharati Vidyapeeth’s College of Engineering, Lavale
Pune, India

[email protected]

3
Student of Department of Computer Engineering
Bharati Vidyapeeth’s College of Engineering, Lavale
Pune, India

[email protected]
4
Student of Department of Computer Engineering
Bharati Vidyapeeth’s College of Engineering, Lavale
Pune, India

[email protected]
5
Student of Department of Computer Engineering
Bharati Vidyapeeth’s College of Engineering, Lavale
Pune, India

[email protected]
6
Student of Department of Computer Engineering
Bharati Vidyapeeth’s College of Engineering, Lavale
Pune, India

[email protected]
Abstract internet. Phishing attacks are becoming more
prevalent and effective. Since there is a chance of
Nowadays, recognising and tracking down phishing identity theft and financial losses, the impact of
websites in real time is a complicated and dynamic phishing is major and important. Phishing attacks
problem involving numerous moving parts and have created some challenges for online banking and
requirements. Since there are numerous ambiguities e-commerce users.
in the method of detecting phishing websites, fuzzy
logic strategies might be a useful instrument for The technique most frequently used by attackers to
investigating and recognising websites. Fuzzy logic get personal information from internet users,
provides a solution to quality variable managing that especially usernames, passwords, and credit card
is more intuitive than applying specific guidelines. information, is phishing, a social engineering attack.
In the Phishing website assessment, a fuzziness Phishing attacks may be used by attackers to
resolution technique as well as an open and distribute malware over the network. Phishing
intelligent phishing website detection method will attacks may take a variety of shapes. Some of the
be proposed. This approach identifies various more well-known ones are spoofing, malware-based
parameters on the phishing website with machine phishing, DNS-based phishing, data theft,
learning approaches and smooth logic. Phishing email/spam, web-based delivery, and phone
website attributes and a total of 30 characteristics phishing. Deep Learning techniques and other forms
can be applied to detect scams with high accuracy. of phishing attacks are present.

Phishing attacks, aimed at password authentication


and sensitive personal information, are a strongly
Keyword well-founded and financially determined crime.
Phishing Website, Machine Learning, Prevention They degrade user confidence in in addition to
Technique, Fuzzy logic, Web Service, creating huge financial harm to individuals,
Classification, Clustering. companies, and financial organisations.

Confidential information is leaked as a result,


compromising information security, and consumers
1. Introduction could face financial or additional harm. Phishing is
a more modern form of the web crime, as compared
The Internet plays an important part in today's
to viruses and attackers. A growing amount of
business, trading purposes education, and
phishing sites have been identified in recent years.
technological advances activities. It additionally
The idea of "fishing" from a phrase such as "website
serves as an important component of everyday life.
phishing" is a derivative of the word "phishing."
Unfortunately, however, the absence of security
offers an excellent opportunity for attackers to earn
financially. Emails do not have security as they get
sent over the Internet. Any information that is
provided or received through websites is so
important and sensitive that it needs the greatest
amount of security measures to prevent malicious
individuals from misusing it.

Phishing Websites are nothing more than malevolent


individuals' or hackers' faked websites designed to
look like real websites. There are many significant
visual similarity throughout many of those websites
when analysing their victims. Some of them have the
exact same design as the original webpages.

Phishing is a combination of technological


techniques used to mislead an individual into
disclosing personal data while participating in social Fig: Phishing attack diagram [6]
engineering. Phishing is now recognised as the most
popular technique employed by thieves on the
2. Literature Review real time. They employ URL characteristics and
explicit highlighting on Twitter to determine if a
Maher Aburrous et al.[1] assessed the e-banking tweet with a URL is phishing or not.
phishing website, proposing an innovative method
for overcoming "fuzziness" and putting forward an Among the unique Twitter factors they use are tweet
intelligent, strong, and efficient e-banking phishing content and its features. Length, hashtags, and
site identification model. Their model uses a mentions are examples of apps. The characteristics
combination of data mining techniques and flippant of the Twitter user who posted the tweet, such as the
logic to identify the characteristics of the phishing e- quantity of tweets, account age, and follower-
banking website, assess its strategies by identifying followee ratio, are also used as Twitter parameters.
phishing forms, and create different criteria for
targeting the structured phishing layer in e-banking. Phishing is a type of cyber-attack in which an
attacker assumes the identity of a reliable source in
An innovative approach to detect phishing websites order to get private or sensitive data. Phishers also
was proposed by Xun Dong et al.[2] and is based on take advantage of people's faith in a website's look
an analysis of user online behaviour, such as by creating webpages that mimic real websites in
websites visited and data given by users. Attackers every way. PhishZoo, a phishing identification
are unable to publicly take advantage of these user method that employs profiles of looks of reliable
patterns, therefore identification based on them is websites to detect phishing, is recommended by
not only highly precise but also naturally resistant to Sadia Afroz et al. [7].
emerging deception techniques.

Aanchal Jain and colleagues [3] have proposed a


novel method for detecting phishing attacks. A 3. Problem Identification
prototype web browser that examines every Worldwide, phishing impacts both people and
incoming email and may be used as a detection agent businesses. Because it is done across borders, it is
for phishing attempts has been unveiled. They challenging to find the culprits. Furthermore, the
demonstrate that their approach can detect more phishers' "fast-flux" technique conceals the true
phishing assaults than existing systems by using location of the phishing site by using a wide range
email data that was collected over time. of proxy servers and URLs.
Phishing and other fraudulent email Blacklisting the website is more difficult at the same
communications provide the greatest security time since it takes a lot of labour to maintain the
dangers. To counter such email risks, automated or server. Phishing attacks target system weaknesses
semi-automatic malicious email detection is a useful brought about by human factors. User vulnerabilities
tool. For these reasons, research on identifying make them the weakest link in the security chain,
communications using fuzzy criteria is reviewed by and they are exploited by many cyber-attacks.
Sudarshan et al. [4]. An experimental evaluation is Different organisations utilise different approaches
conducted to determine the utility of fuzzy rule- to handle the challenge.
based classification for other classifiers, such as
those that depend on decision trees and crisp rules, a
real data set, and an output comparison.
4. Required Model
Phishers construct fraudulent websites that mimic
The project involves supervised machine learning.
authentic websites, tricking users into visiting the
Classification and regression are the two primary
dangerous website. Therefore, consumers need to be
categories of supervised machine learning problems
aware of fraudulent websites in order to protect their
that exist.
sensitive data. It might be challenging for non-
technical users in particular to distinguish between Since the input URL is categorized as either
phoney and authentic websites. Additionally, the legitimate (0) or phishing (1), this data set falls under
number of phishing websites is rising quickly. The the classification problems category.
objective of Ms. Shweta Dasharath Shirsat et al. [5]
is to illustrate phishing detection with fuzzy logic Classification algorithms to train these datasets are:
and analyse findings using various de-fuzzification
techniques.
 Kernel Support Vector Machines
PhishAri is a method that Anupama Aggarwal et al.  Decision Trees
[6] created to identify phishing attacks on Twitter in  Random Forest Classifier
 XG Boost as either legitimate or phishing. The classifier then
 Multilayer Perceptrons utilises the pattern it identified to identify the freshly
supplied input. The classifier is being trained to
classify URLs as phishing or legitimate using the
training dataset.
A. Kernel Support Vector Machine
A list of the characteristics' values is created by
Kernel Support Vector Machine is a supervised
extracting the IP address, domain, length of the
machine learning algorithm. It is mostly used in
classification problem, but it can be used for both URL, presence of a favicon, and other attributes
from the URL. The list is fed into classifiers like
classification and regression problem. The margin
Random Forest, Decision Tree, Kernel SVM, XG
maximization problem is solved with SVM. [10]
Boost and Multilayer Perceptrons .
B. Decision Trees

For classification and regression problems, a non-


parametric supervised learning approach called a
decision tree is used. Its internal nodes, leaf nodes,
branches, and root node make up its hierarchical tree
structure. Regression and classification tasks are
handled by decision trees, which yield models that
are simple to comprehend. [9]

C. Random Forest Classifier

One Approach for ensemble learning is random


forest. It assigns a new data point to a class by
combining different decision trees. Given that it
makes use of several decision trees, this model is
regarded as powerful. [6]

D. XG Boost Fig System Architecture

A distributed gradient boosting library optimised for


efficiency and scalability in machine learning model
training is called XGBoost. It is an ensemble
learning technique that generates a stronger 6. Experimentation and Results
prediction by aggregating the predictions of several
weak models. Extreme Gradient Boosting, or Accuracy: Accuracy is used as a measurement for
"XGBoost," is a machine learning algorithm that has the model's effectiveness. It is calculated as the
gained popularity and widespread usage because it proportion of all accurate instances to all instances.
can handle large datasets and achieve state-of-the-
Accuracy = TP+TN
art performance in a variety of machine learning
tasks, including regression and classification. TP+FP+FN+TN
E. Multilayer Perceptrons Precision: Precision is a measure of how accurate a
model’s positive predictions are. It is defined as the
Another name for multi-layer perception is MLP.
ratio of true positive predictions to the total number
Dense layers that are entirely linked convert any
of positive predictions made by the model
input dimension to the required dimension. A neural
network with several layers is called a multi-layer
perception. Neurons are joined together to form
neural networks, with some neurons' outputs acting
as inputs for other neurons.
Precision = TP

TP+FP
5. System Architecture

The system's architecture is depicted in Figure 1 the


relevant classifier receives the URLs to be classified
Recall: Recall measures the effectiveness of a decision tree, Multilayer Perceptrons and kernel
classification model in identifying all relevant support vector machine.
instances from a dataset. It is the ratio of the
number of true positive (TP) instances to the sum
of true positive and false negative (FN) instances. References

[1] Aburrous, Maher & Hossain, Mohammed &


Recall = TP Dahal, Keshav & Thabtah, Fadi. (2010). Intelligent
phishing detection system for ebanking using fuzzy
TP+FN data mining. Expert Systems with Applications. 37.
7913-7921. 10.1016/j.eswa.2010.04.044.

[2] Dong, Xun & Clark, John & Jacob, Jeremy.


F1-Score: F1-score is used to evaluate the overall (2008). User behaviour based phishing websites
performance of a classification model. It is the detection. 783 - 790.
harmonic mean of precision and recall. 10.1109/IMCSIT.2008.4747332.
F1-Score = 2TP [3] Jain, Aanchal & Richariya, Vineet. (2011).
2TP+FN+FP Implementing a Web Browser with Phishing
Detection Techniques.

[4] Chawathe, Sudarshan. (2018). Improving Email


Algorithm Train Test Security with Fuzzy Rules. 1864-1869.
Accuracy Accuracy 10.1109/TrustCom/BigDataSE.2018.00282
Decision Tree 0.810 0.818
Random 0.815 0.822 [5] S. D. Shirsat, "Demonstrating Different Phishing
Forest Attacks Using Fuzzy Logic," 2018 Second
XG Boost 0.867 0.864 International Conference on Inventive
SVM 0.801 0.807 Communication and Computational Technologies
Multilayer 0.862 0.865 (ICICCT), Coimbatore, 2018, pp. 57-61, doi:
Perceptrons 10.1109/ICICCT.2018.8473309.

[6] A. Lakshmanarao, P.Surya Prabhakara Rao, M


We applied five Machine Learning base M Bala Krishna, “Phishing website detection using
classification algorithms. After applying algorithms, novel machine learning fusion approach” 78-1-
Precision and Recall values are noted. Precision and 7281-9537-7/21, 2021 IEEE.
Recall values are shown in table.
[7] A. Aggarwal, A. Rajadesingan and P.
Algorithm Precision Recall Kumaraguru, "PhishAri: Automatic realtime
Decision Tree 0.850 0.813 phishing detection on twitter," 2012 eCrime
Random 0.8611 0.8245 Researchers Summit, Las Croabas, 2012, pp. 1-12,
Forest doi: 10.1109/eCrime.2012.6489521.
XG Boost 0.869 0.863
SVM 0.849 0.810 [8] Afroz, Sadia & Greenstadt, Rachel. (2011).
Multilayer 0.865 0.864 PhishZoo: Detecting Phishing Websites by Looking
Perceptrons at Them. Proceedings - 5th IEEE International
Conference on Semantic Computing, ICSC 2011.
368 - 375. 10.1109/ICSC.2011.52

[9] Anshul Saini, “Decision Tree Algorithm – A


7. Conclusion Complete Guide” 2021
In these works, the implementation of the [10]Malika Rastogi, Anmol Chhetri, ”Survey on
framework with accuracy, economy, and efficiency Detection and Prevention of Phishing Websites
is the key theme. To finish the task, five machine Using Machine Learning” 2021
learning managed classification models will be used
.Considered were the benefits and drawbacks, as
well as the effectiveness of the five classification
models: random forest classifier, XG Boost,

You might also like