Phishing Website Detection Using Machine Learning Techniques
Phishing Website Detection Using Machine Learning Techniques
MAHAVIDYALAYA
(Deemed to be university u/s 3 of UGC act 1956)
(Accredited with “A” by NAAC)
Enathur, Kanchipuram – 631561. Tamilnadu
www.kanchiuniv.ac.in
11179A125
Kondeti Prem Sai Swaroop
11179A127
Konka Renuka
Guided By
Ms. S. Kavishree
Assistant Professor
Dept.of CSE
Final Review
Date : 15th April 2021
Contents
⚫ Abstract
⚫ Existing System
⚫ Proposed System
⚫ Literature Reviews
⚫ Modules
⚫ Innovativeness in our approach
⚫ Components/Software needed
⚫ Sample Code
⚫ Result
2 *
Abstract
⚫ In general, usage of websites is the most common things lately, it may be for e-
commerce purposes or entertainment, whatever it may be. In this project, our main
factor is a website, whether it is a fraudulent one or a legit one. Detection of this
quality of a website is the main theme of the project. Conventionally, a website can
be detected whether it is harmful or not by the browser protection service, if it is
redirecting to unusual or malicious sites, such sites are marked as harmful with a
symbol before the URL. Even though, the browser’s firewall is enabled, it can
never detect a phishing website. Because, Phishing site is not malicious site it steals
data without the user even knowing it. So, to detect such sites we are training an
ML model using different algorithms to determine the phishing site based on URL
feature extraction.
⚫ Keywords: Phishing Website; Machine Learning Model (ML Model); URL
(Uniform Resource Locator).
3 *
Existing System
• The existing system uses the Classifiers, Fusion Algorithm, and Bayesian
Model to detect the phishing sites. The classifiers can classify the text content
and image content. Text classifier is to classify the text content and Image
classifier is to classify the image content.
• Bayesian model estimates the threshold value. Fusion Algorithm combines the
both classifier results and decides whether the site is phishing or not.
• The threshold value will be decided by the developer only. This leads to the
problems like false positive and false negative. False positive means, the
probability of being a phishing webpage is greater than the threshold value but
that webpage is not a phishing webpage.
• False negative means, the probability of being a phishing webpage is less than
the threshold value but that webpage is a phishing webpage. This results the
reduction in security levels. The existing system handles the only one kind of
phishing attacks. If that was a phishing site then the existing system only warns
the user.
4 *
Proposed System
• The proposed system can handle two types of phishing threats. They are :
Normal Phishing and In-session Phishing.
• In-session Phishing: This is a major kind of phishing attacks. The user will be
diverted by getting alert message like, “your session timeout and please login
again”. Then that user redirected to phishing site and phisher will get users
account number and password. Using them, the phisher can transfer the funds
from authorized users account that to without that user’s knowledge.
5 *
Literature Reviews
Author Name of the Paper Algorithm Used
Sahar Abdelnabi, Katharina Visual PhishNet: Zero-Day Visual Similarity
Krombholz, Mario Fritz Phishing Website
Detection by Visual
Similarity
Abdulhamit Subasi, Emir Comparing AdaBoost with MultiBoosting
Kremic MultiBoosting for
Phishing Website
Detection
6
Modules
• From the dataset, it is clear that this is a supervised machine learning task.
There are two major types of supervised machine learning problems, called
classification and regression.
• This data set comes under classification problem, as the input URL is
classified as phishing (1) or legitimate (0). The supervised machine learning
models (classification) considered to train the dataset in this notebook are:
• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines
7 *
Modules Contd…
• Decision Tree Classifier: Decision Tree Classifiers are widely used in
classification and regression tasks which involve a decision task such as if/else
question. This is an optimal decision maker that could give us the best
decision much quicker.
8 *
Modules Contd…
• XGBoost Classifier: XGBoost is not any different for classification or
regression process, it is meant for speed and performance. It will add gradient
boosting to decision trees.
• Autoencoder Neural Network: It is like a neural network that has same no. of
input neurons that of output neurons. It has fewer neurons in the hidden layers
of the network that are called as predictors. The input neurons pass information
to the predictors and process the output.
9 *
Innovativeness in our approach
• Through our approach we could be able to detect the legitimate sites without
interrogating the source code of the site.
• Finding out the most optimal algorithm will let us know how the model is getting
to work under various circumstances.
10 *
Components/Software Needed
• Jupyter Notebook
• System with python installed
11 *
Sample Code
• Dataset Splitting and ML Model Creation
12 *
Sample Code
• Module Comparison
13 *
Sample Code
• XGBoost Classifier
14 *
Sample Code
• XGBoost Classifier Contd…
15 *
Output
• In the Input Column when we enter the name of any of the sites that we trained
the model with, the result will be displayed as Legitimate or Phishing.
Legitimate Phishing
16 *
Thank You
17