0% found this document useful (1 vote)
59 views17 pages

Phishing Website Detection Using Machine Learning Techniques

This document describes a project that aims to detect phishing websites using machine learning techniques. It discusses detecting both normal phishing sites and "in-session" phishing sites, which redirect users after a session timeout. Several machine learning algorithms are evaluated, including decision trees, random forests, neural networks, XGBoost, autoencoders, and support vector machines. The document outlines the modules, components needed, sample code, and expected output of detecting if a website is legitimate or phishing. The proposed approach aims to detect phishing sites without analyzing website source code and could later be developed into a browser extension.

Uploaded by

Prem Sai Swaroop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
59 views17 pages

Phishing Website Detection Using Machine Learning Techniques

This document describes a project that aims to detect phishing websites using machine learning techniques. It discusses detecting both normal phishing sites and "in-session" phishing sites, which redirect users after a session timeout. Several machine learning algorithms are evaluated, including decision trees, random forests, neural networks, XGBoost, autoencoders, and support vector machines. The document outlines the modules, components needed, sample code, and expected output of detecting if a website is legitimate or phishing. The proposed approach aims to detect phishing sites without analyzing website source code and could later be developed into a browser extension.

Uploaded by

Prem Sai Swaroop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

SRI CHANDRASEKHARENDRA SARASWATHI VISWA

MAHAVIDYALAYA
(Deemed to be university u/s 3 of UGC act 1956)
(Accredited with “A” by NAAC)
Enathur, Kanchipuram – 631561. Tamilnadu
www.kanchiuniv.ac.in

Phishing Website Detection using Machine


Learning Techniques

11179A125
Kondeti Prem Sai Swaroop
11179A127
Konka Renuka
Guided By
Ms. S. Kavishree
Assistant Professor
Dept.of CSE
Final Review
Date : 15th April 2021
Contents
⚫ Abstract
⚫ Existing System
⚫ Proposed System
⚫ Literature Reviews
⚫ Modules
⚫ Innovativeness in our approach
⚫ Components/Software needed
⚫ Sample Code
⚫ Result

2 *
Abstract
⚫ In general, usage of websites is the most common things lately, it may be for e-
commerce purposes or entertainment, whatever it may be. In this project, our main
factor is a website, whether it is a fraudulent one or a legit one. Detection of this
quality of a website is the main theme of the project. Conventionally, a website can
be detected whether it is harmful or not by the browser protection service, if it is
redirecting to unusual or malicious sites, such sites are marked as harmful with a
symbol before the URL. Even though, the browser’s firewall is enabled, it can
never detect a phishing website. Because, Phishing site is not malicious site it steals
data without the user even knowing it. So, to detect such sites we are training an
ML model using different algorithms to determine the phishing site based on URL
feature extraction.
⚫ Keywords: Phishing Website; Machine Learning Model (ML Model); URL
(Uniform Resource Locator).

3 *
Existing System
• The existing system uses the Classifiers, Fusion Algorithm, and Bayesian
Model to detect the phishing sites. The classifiers can classify the text content
and image content. Text classifier is to classify the text content and Image
classifier is to classify the image content.
• Bayesian model estimates the threshold value. Fusion Algorithm combines the
both classifier results and decides whether the site is phishing or not.
• The threshold value will be decided by the developer only. This leads to the
problems like false positive and false negative. False positive means, the
probability of being a phishing webpage is greater than the threshold value but
that webpage is not a phishing webpage.
• False negative means, the probability of being a phishing webpage is less than
the threshold value but that webpage is a phishing webpage. This results the
reduction in security levels. The existing system handles the only one kind of
phishing attacks. If that was a phishing site then the existing system only warns
the user.

4 *
Proposed System
• The proposed system can handle two types of phishing threats. They are :
Normal Phishing and In-session Phishing.

• In-session Phishing: This is a major kind of phishing attacks. The user will be
diverted by getting alert message like, “your session timeout and please login
again”. Then that user redirected to phishing site and phisher will get users
account number and password. Using them, the phisher can transfer the funds
from authorized users account that to without that user’s knowledge.

• Detection of in-session phishing is a huge challenge but using the process of


URL feature extraction we extract various details of each domain and train the
model based on the legal allowances of the digit length and character space to
used for the valid site. Through this way our proposed system can detect in-
session phishing sites too.

5 *
Literature Reviews
Author Name of the Paper Algorithm Used
Sahar Abdelnabi, Katharina Visual PhishNet: Zero-Day Visual Similarity
Krombholz, Mario Fritz Phishing Website
Detection by Visual
Similarity
Abdulhamit Subasi, Emir Comparing AdaBoost with MultiBoosting
Kremic MultiBoosting for
Phishing Website
Detection

Kieran Rendall, Antonia Multi-layered Phishing Multi-layer Algorithm


Nisioti and Alexios Mylonas Detection

Jiann-Liang Chen, Yi-Wei Intelligent Visual Visual Similarity


Ma and Kuan-Lung Huang Similarity based Phishing
Websites Detection

6
Modules
• From the dataset, it is clear that this is a supervised machine learning task.
There are two major types of supervised machine learning problems, called
classification and regression.

• This data set comes under classification problem, as the input URL is
classified as phishing (1) or legitimate (0). The supervised machine learning
models (classification) considered to train the dataset in this notebook are:

• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines

7 *
Modules Contd…
• Decision Tree Classifier: Decision Tree Classifiers are widely used in
classification and regression tasks which involve a decision task such as if/else
question. This is an optimal decision maker that could give us the best
decision much quicker.

• Random Forest Classifier: It can be defined as a collection of decision trees.


Here, multiple number of decision trees are collected together and worked
simultaneously to get the best of the average of the result.

• Multilayer Perceptrons: Multilayer Perceptrons are known as feed forward


neural networks. They are used to process multiple stages simultaneously and
result in an optimal decision for the processed stage.

8 *
Modules Contd…
• XGBoost Classifier: XGBoost is not any different for classification or
regression process, it is meant for speed and performance. It will add gradient
boosting to decision trees.

• Autoencoder Neural Network: It is like a neural network that has same no. of
input neurons that of output neurons. It has fewer neurons in the hidden layers
of the network that are called as predictors. The input neurons pass information
to the predictors and process the output.

• Support Vector Machines: Support vector machines also know as support


vector networks analyse the data used for classification or regression task. The
training data set is loaded and when analysed will be sorted out in to two
different categories for each new output appeared.

9 *
Innovativeness in our approach
• Through our approach we could be able to detect the legitimate sites without
interrogating the source code of the site.

• This can be later developed to create browser extensions using a GUI.

• Using multiple ML techniques is way more precise than depending on a single


technique or an algorithm.

• During the process of application, the techniques we use will be adapted


according to the nature of the site.

• Finding out the most optimal algorithm will let us know how the model is getting
to work under various circumstances.

10 *
Components/Software Needed
• Jupyter Notebook
• System with python installed

Techniques involved for training model:


• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines

11 *
Sample Code
• Dataset Splitting and ML Model Creation

12 *
Sample Code
• Module Comparison

13 *
Sample Code
• XGBoost Classifier

14 *
Sample Code
• XGBoost Classifier Contd…

15 *
Output
• In the Input Column when we enter the name of any of the sites that we trained
the model with, the result will be displayed as Legitimate or Phishing.

Legitimate Phishing

16 *
Thank You

17

You might also like