0% found this document useful (1 vote)

59 views17 pages

Phishing Website Detection Using Machine Learning Techniques

This document describes a project that aims to detect phishing websites using machine learning techniques. It discusses detecting both normal phishing sites and "in-session" phishing sites, which redirect users after a session timeout. Several machine learning algorithms are evaluated, including decision trees, random forests, neural networks, XGBoost, autoencoders, and support vector machines. The document outlines the modules, components needed, sample code, and expected output of detecting if a website is legitimate or phishing. The proposed approach aims to detect phishing sites without analyzing website source code and could later be developed into a browser extension.

Uploaded by

Prem Sai Swaroop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

59 views17 pages

Phishing Website Detection Using Machine Learning Techniques

Uploaded by

Prem Sai Swaroop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

SRI CHANDRASEKHARENDRA SARASWATHI VISWA

MAHAVIDYALAYA
(Deemed to be university u/s 3 of UGC act 1956)
(Accredited with “A” by NAAC)
Enathur, Kanchipuram – 631561. Tamilnadu
www.kanchiuniv.ac.in

Phishing Website Detection using Machine

Learning Techniques

11179A125
Kondeti Prem Sai Swaroop
11179A127
Konka Renuka
Guided By
Ms. S. Kavishree
Assistant Professor
Dept.of CSE
Final Review
Date : 15th April 2021
Contents
⚫ Abstract
⚫ Existing System
⚫ Proposed System
⚫ Literature Reviews
⚫ Modules
⚫ Innovativeness in our approach
⚫ Components/Software needed
⚫ Sample Code
⚫ Result

2 *
Abstract
⚫ In general, usage of websites is the most common things lately, it may be for e-
commerce purposes or entertainment, whatever it may be. In this project, our main
factor is a website, whether it is a fraudulent one or a legit one. Detection of this
quality of a website is the main theme of the project. Conventionally, a website can
be detected whether it is harmful or not by the browser protection service, if it is
redirecting to unusual or malicious sites, such sites are marked as harmful with a
symbol before the URL. Even though, the browser’s firewall is enabled, it can
never detect a phishing website. Because, Phishing site is not malicious site it steals
data without the user even knowing it. So, to detect such sites we are training an
ML model using different algorithms to determine the phishing site based on URL
feature extraction.
⚫ Keywords: Phishing Website; Machine Learning Model (ML Model); URL
(Uniform Resource Locator).

3 *
Existing System
• The existing system uses the Classifiers, Fusion Algorithm, and Bayesian
Model to detect the phishing sites. The classifiers can classify the text content
and image content. Text classifier is to classify the text content and Image
classifier is to classify the image content.
• Bayesian model estimates the threshold value. Fusion Algorithm combines the
both classifier results and decides whether the site is phishing or not.
• The threshold value will be decided by the developer only. This leads to the
problems like false positive and false negative. False positive means, the
probability of being a phishing webpage is greater than the threshold value but
that webpage is not a phishing webpage.
• False negative means, the probability of being a phishing webpage is less than
the threshold value but that webpage is a phishing webpage. This results the
reduction in security levels. The existing system handles the only one kind of
phishing attacks. If that was a phishing site then the existing system only warns
the user.

4 *
Proposed System
• The proposed system can handle two types of phishing threats. They are :
Normal Phishing and In-session Phishing.

• In-session Phishing: This is a major kind of phishing attacks. The user will be
diverted by getting alert message like, “your session timeout and please login
again”. Then that user redirected to phishing site and phisher will get users
account number and password. Using them, the phisher can transfer the funds
from authorized users account that to without that user’s knowledge.

• Detection of in-session phishing is a huge challenge but using the process of

URL feature extraction we extract various details of each domain and train the
model based on the legal allowances of the digit length and character space to
used for the valid site. Through this way our proposed system can detect in-
session phishing sites too.

5 *
Literature Reviews
Author Name of the Paper Algorithm Used
Sahar Abdelnabi, Katharina Visual PhishNet: Zero-Day Visual Similarity
Krombholz, Mario Fritz Phishing Website
Detection by Visual
Similarity
Abdulhamit Subasi, Emir Comparing AdaBoost with MultiBoosting
Kremic MultiBoosting for
Phishing Website
Detection

Kieran Rendall, Antonia Multi-layered Phishing Multi-layer Algorithm

Nisioti and Alexios Mylonas Detection

Jiann-Liang Chen, Yi-Wei Intelligent Visual Visual Similarity

Ma and Kuan-Lung Huang Similarity based Phishing
Websites Detection

6
Modules
• From the dataset, it is clear that this is a supervised machine learning task.
There are two major types of supervised machine learning problems, called
classification and regression.

• This data set comes under classification problem, as the input URL is
classified as phishing (1) or legitimate (0). The supervised machine learning
models (classification) considered to train the dataset in this notebook are:

• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines

7 *
Modules Contd…
• Decision Tree Classifier: Decision Tree Classifiers are widely used in
classification and regression tasks which involve a decision task such as if/else
question. This is an optimal decision maker that could give us the best
decision much quicker.

• Random Forest Classifier: It can be defined as a collection of decision trees.

Here, multiple number of decision trees are collected together and worked
simultaneously to get the best of the average of the result.

• Multilayer Perceptrons: Multilayer Perceptrons are known as feed forward

neural networks. They are used to process multiple stages simultaneously and
result in an optimal decision for the processed stage.

8 *
Modules Contd…
• XGBoost Classifier: XGBoost is not any different for classification or
regression process, it is meant for speed and performance. It will add gradient
boosting to decision trees.

• Autoencoder Neural Network: It is like a neural network that has same no. of
input neurons that of output neurons. It has fewer neurons in the hidden layers
of the network that are called as predictors. The input neurons pass information
to the predictors and process the output.

• Support Vector Machines: Support vector machines also know as support

vector networks analyse the data used for classification or regression task. The
training data set is loaded and when analysed will be sorted out in to two
different categories for each new output appeared.

9 *
Innovativeness in our approach
• Through our approach we could be able to detect the legitimate sites without
interrogating the source code of the site.

• This can be later developed to create browser extensions using a GUI.

• Using multiple ML techniques is way more precise than depending on a single

technique or an algorithm.

• During the process of application, the techniques we use will be adapted

according to the nature of the site.

• Finding out the most optimal algorithm will let us know how the model is getting
to work under various circumstances.

10 *
Components/Software Needed
• Jupyter Notebook
• System with python installed

Techniques involved for training model:

• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines

11 *
Sample Code
• Dataset Splitting and ML Model Creation

12 *
Sample Code
• Module Comparison

13 *
Sample Code
• XGBoost Classifier

14 *
Sample Code
• XGBoost Classifier Contd…

15 *
Output
• In the Input Column when we enter the name of any of the sites that we trained
the model with, the result will be displayed as Legitimate or Phishing.

Legitimate Phishing

16 *
Thank You

Grade 3 Reading Comprehension Workbook
13% (8)
Grade 3 Reading Comprehension Workbook
3 pages
HCI Chapter 10 - User Support PDF
100% (2)
HCI Chapter 10 - User Support PDF
55 pages
Openview Operations Error Messages
No ratings yet
Openview Operations Error Messages
267 pages
2019 Haplogroup C
No ratings yet
2019 Haplogroup C
424 pages
Manual 1209203 Parrot Asteroid Smart
No ratings yet
Manual 1209203 Parrot Asteroid Smart
128 pages
Beautiful Rising Game
No ratings yet
Beautiful Rising Game
41 pages
Digital Transformation in Water Utilities With ICC
No ratings yet
Digital Transformation in Water Utilities With ICC
22 pages
Diploma in I.T Technical Support: Assignment Title: The Boot Process in Windows and Ubuntu
100% (1)
Diploma in I.T Technical Support: Assignment Title: The Boot Process in Windows and Ubuntu
14 pages
Networker Errors
No ratings yet
Networker Errors
230 pages
Security System PDF
No ratings yet
Security System PDF
4 pages
Bridge Course
No ratings yet
Bridge Course
49 pages
SQL Lab 3
No ratings yet
SQL Lab 3
8 pages
Data Link - Test: C9.3 Marine Auxiliary and Generator Set Engine
No ratings yet
Data Link - Test: C9.3 Marine Auxiliary and Generator Set Engine
7 pages
Schematic Diagram: 7-1. Circuit Descriptions
No ratings yet
Schematic Diagram: 7-1. Circuit Descriptions
6 pages
Array: B. Javascript Array Directly (New Keyword)
No ratings yet
Array: B. Javascript Array Directly (New Keyword)
4 pages
FlashSystem 7300 ENG
No ratings yet
FlashSystem 7300 ENG
12 pages
A Smart Walking Stick For Visually Impaired Using Raspberry Pi
No ratings yet
A Smart Walking Stick For Visually Impaired Using Raspberry Pi
6 pages
Data Quality and Data Preproccessing
No ratings yet
Data Quality and Data Preproccessing
4 pages
Biostar h61mgv3 Spec
No ratings yet
Biostar h61mgv3 Spec
6 pages
Exec Cics Assign
100% (1)
Exec Cics Assign
10 pages
Apacer UH110 UFD1 BiCS5 AN2 118XXG XXX21 Spec v1 1-3107181
No ratings yet
Apacer UH110 UFD1 BiCS5 AN2 118XXG XXX21 Spec v1 1-3107181
17 pages
Excel 365 Charts
No ratings yet
Excel 365 Charts
63 pages
What Is A Domain Name
No ratings yet
What Is A Domain Name
2 pages
Monthly Expense Calculator Using C
No ratings yet
Monthly Expense Calculator Using C
8 pages
LaTex LAB MANUAL 2023-24
No ratings yet
LaTex LAB MANUAL 2023-24
43 pages
6 Lecture6 AI
No ratings yet
6 Lecture6 AI
7 pages
Visual Basic Theory Notes
No ratings yet
Visual Basic Theory Notes
6 pages
SIM767XX Series - CMUX - USER - GUIDE - V1.00
No ratings yet
SIM767XX Series - CMUX - USER - GUIDE - V1.00
22 pages
Previous Year
No ratings yet
Previous Year
14 pages
Corrugated Samadhan ERP Built On Microsoft Dynamics 365 Business Central
No ratings yet
Corrugated Samadhan ERP Built On Microsoft Dynamics 365 Business Central
12 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)

Phishing Website Detection Using Machine Learning Techniques

Uploaded by

Phishing Website Detection Using Machine Learning Techniques

Uploaded by

SRI CHANDRASEKHARENDRA SARASWATHI VISWA

Phishing Website Detection using Machine

• Detection of in-session phishing is a huge challenge but using the process of

Kieran Rendall, Antonia Multi-layered Phishing Multi-layer Algorithm

Jiann-Liang Chen, Yi-Wei Intelligent Visual Visual Similarity

• Random Forest Classifier: It can be defined as a collection of decision trees.

• Multilayer Perceptrons: Multilayer Perceptrons are known as feed forward

• Support Vector Machines: Support vector machines also know as support

• This can be later developed to create browser extensions using a GUI.

• Using multiple ML techniques is way more precise than depending on a single

• During the process of application, the techniques we use will be adapted

Techniques involved for training model:

You might also like