Survey On Phishing Websites Detection Using Machine Learning
Survey On Phishing Websites Detection Using Machine Learning
https://doi.org/10.22214/ijraset.2022.42843
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
Abstract: Phishing is a widespread method of tricking unsuspecting people into disclosing personal information by using fake
websites. Phishing website URLs are designed to steal personal information such as user names, passwords, and online banking
activities. Phishers employ webpages that are visually and semantically identical to legitimate websites. As technology advances,
phishing strategies have become more sophisticated, necessitating the use of anti-phishing measures to identify phishing.
Machine learning is an effective method for combating phishing assaults. This study examines the features utilised in detection
as well as machine learning-based detection approaches.
Phishing is popular among attackers because it is easier to persuade someone to click on a malicious link that appears to be
legitimate than it is to break through a computer's protection measures. The malicious links in the message body are made to
look like they go to the faked organisation by utilising the spoofed organization's logos and other valid material.
We'll go through the characteristics of phishing domains (also known as fraudulent domains), the qualities that distinguish them
from real domains, why it's crucial to detect them, and how they can be discovered using machine learning and natural
language processing techniques.
Keywords: Phishing, personal information, machine learning, malicious links, and phishing domain characteristics are all terms
that come up when people think of phishing.
I. INTRODUCTION
Phishing has become a major source of concern for security professionals in recent years since it is relatively easy to develop a
phoney website that appears to be identical to a legitimate website.
Although experts can detect bogus websites, not all users can, and as a result, they become victims of phishing attacks. The
attacker's main goal is to steal bank account credentials. Because of a lack of user awareness, phishing assaults are becoming more
successful. Because phishing attacks take advantage of user flaws, it is difficult to mitigate them, but it is critical to improve
phishing detection tools. Phishing is a type of wide extortion in which a malicious website imitates a genuine one-time memory with
the sole purpose of obtaining sensitive data, such as passwords, account details, or MasterCard numbers. Despite the fact that there
are still some anti-phishing programming and strategies for detecting possible phishing attempts in messages and typical phishing
content on websites, phishes devise fresh and crossbred procedures to get around public programming and frameworks. Phishing is
a type of fraud that combines social engineering with access to sensitive and personal data, such as passwords and open-end credit
unpretentious components by assuming the characteristics of a trustworthy person or business via electronic correspondence.
Hacking uses spoof messages that appear legitimate and are instructed to originate from legitimate sources such as financial
institutions, online business goals, and so on, to entice users to visit phoney destinations via links provided on phishing websites.
II. LITERATURE SURVEY
Huang et al., (2009) proposed frameworks to distinguish phishing using page section similitude, which breaks down universal
resource locator tokens to create forecast precision phishing pages typically keep their CSS look similar to their target pages.
This strategy was suggested by Marchal et al., (2017) to differentiate The analysis of authentic site server log knowledge is required
for phishing websites. An off-the-shelf programme or the detection of a phishing website. Free has a number of distinguishing
characteristics, including high precision, complete autonomy, and beautiful language-freedom, speed of choosing, flexibility to
dynamic phishing, and flexibility to advance in phishing methods.
By extracting website URL features and analysing subset based feature selection methods, Mustafa Aydin et al. suggested a
classification algorithm for phishing website identification. For the detection of phishing websites, it uses feature extraction and
selection approaches.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2376
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
Alphanumeric Character Analysis, Keyword Analysis, Security Analysis, Domain Identity Analysis, and Rank Based Analysis are
five separate analyses of the retrieved features about the URLs of the sites and the built feature matrix. The majority of these
elements are textual properties of the URL, with others relying on third-party services.
PhishStorm is an automated phishing detection system developed by Samuel Marchal et al. that can analyse any URL in real time to
identify probable phishing sites. To protect consumers from phishing content, Phish Storm is presented as an automatic real-time
URL phishingness evaluation system. PhishStorm can be used as a Website reputation evaluation system that delivers a
phishingness score for URLs.
Fadi Thabtah et al. compared a huge variety of machine learning approaches on real phishing datasets using various metrics. The
goal of this comparison is to highlight the benefits and drawbacks of machine learning predictive models, as well as their real
effectiveness when it comes to phishing assaults. Covering approach models are more appropriate as antiphishing solutions,
according to the experimental data.
Muhemmet Baykara et al. proposed the Anti Phishing Simulator, which provides information about the phishing detection challenge
and how to detect phishing emails. The Bayesian algorithm adds spam emails to the database. Phishing attackers utilise JavaScript
to insert a valid URL into the address bar of the browser. The study recommends using the e-mail text as a keyword only for
advanced word processing.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2377
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
III. MOTIVATION
Detecting and stopping phishing websites is always an important area of research. Different sorts of phishing strategies provide
torrential and vital means for effective police work and, more importantly, for the protection of persons and organisations. In
phishing, the uniform resource location is crucial. The Uniform Resource Locator (URL) is a key location via which websites are
launched and pages are redirected to the next page via links. The vulnerable architecture in phishing (i.e., through the hyperlink) is
redirecting the pages; the pages are redirected to the legitimate web site or the phishing site.
V. PROPOSED SYSTEM
This section explains the suggested phishing attack detection model. The suggested methodology focuses on detecting phishing
attacks using the properties of phishing websites, the Blacklist, and the WHOIS database. Few criteria can be utilised to distinguish
between real and faked web pages, according to experts. URLs, domain identification, security & encryption, source code, page
style and contents, web address bar, and social human component are only a few of the features that have been chosen. This
research is limited to URLs and domain name characteristics. IP addresses, long URL addresses, adding a prefix or suffix,
redirecting with the sign "//," and URLs with the symbol "@" are among the features of URLs and domain names that are verified.
These characteristics are examined using a set of rules in order to distinguish phishing webpage URLs from authentic website
URLs.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2378
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
VII. IMPLEMENTATION
A. Algorithms Used
Three machine learning classification model Decision Tree, Random forest and Support vector machine has been selected to detect
phishing websites.
VIII. RESULTS
A. Home Page
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2379
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
B. Input URL
IX. CONCLUSION
Education awareness is the most significant strategy to protect users from phishing attacks. Internet users should be aware of all
security recommendations made by professionals. Every user should also be taught not to mindlessly follow links to websites where
sensitive information must be entered. Before visiting a website, make sure to check the URL. In the future, the system could be
upgraded to automatically detect the web page and the application's compatibility with the web browser. Additional work can be
done to distinguish fraudulent web pages from authentic web pages by adding certain additional characteristics. In order to detect
phishing on the mobile platform, the PhishChecker programme can be upgraded into a web phone application.
X. FUTURE SCOPE
We'll look into the links between phishing sites and hosting and DNS registration providers in more detail. We'll also look at other
features like Content Security Policies, certificate authorities, and TLS fingerprinting that can be used. In addition, we will compare
SVMs and neural networks to other machine learning techniques such as random forest classifiers for speed and accuracy. Finally,
we'll check for aspects in the underlying HTML structure, such as tag counts, tag positioning, use of and counts of specific
JavaScript functions, inline and included CSS, and so on.
REFERENCES
[1] G. Canbek, "A Review on Information, Information Security, and Security Processes," Politek. Derg., vol. 9, no. 3, 2006, pp. 165–174.
[2] IET Inf. Secur., vol. 8, no. 3, pp. 153–160, 2014. L. McCluskey, F. Thabtah, and R. M. Mohammad, "Intelligent rulebased phishing websites classification,"
IET Inf. Secur., vol. 8, no. 3, pp. 153–160, 2014.
[3] "Predicting phishing websites using a self-structuring neural network," Neural Comput. Appl., vol. 25, no. 2, pp. 443–458, 2014. R. M. Mohammad, F.
Thabtah, and L. McCluskey, "Predicting phishing websites using a self-structuring neural network," Neural Comput. Appl.
[4] "A new fast associative classification method for identifying phishing websites," Appl. Soft Comput. J., vol. 48, pp. 729–734, 2016. W. Hadi, F. Aburub, and
S. Alhawari, "A new fast associative classification algorithm for detecting phishing websites," Appl. Soft Comput..
[5] "Multi-label rules for phishing classification," Appl. Comput. Informatics, vol. 11, no. 1, pp. 29–46, 2015.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2380
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
ACKNOWLEDGMENT
We express our sincere gratitude to our guide, Assistant Professor Mr. B.Ravi raju for suggestions and support during every stage
of this work. We also convey our deep sense of gratitude to Professor Dr. K. S. Reddy, Head of the Information Technology
department.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2381