0% found this document useful (0 votes)

24 views13 pages

Phishing Final

Uploaded by

shailumarri29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views13 pages

Phishing Final

Uploaded by

shailumarri29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Abstract

Phishing attacks involve cybercriminals tricking users into revealing sensitive information like passwords
and bank details. This paper explores the use of Machine Learning (ML) to detect phishing URLs by
analyzing various features of URLs through lexical analysis. It evaluates the performance of eleven ML
algorithms, including Decision Tree (DT), Gradient Boost Classifier (GB), Random Forest (RF), Support
Vector Machines (SVM), and Cat Boost Classifier (CB), based on their detection accuracy.
The study finds that ML models can effectively classify URLs as phishing or legitimate, with different
algorithms showing varying levels of performance. The paper highlights how feature extraction from
URL structures plays a critical role in improving phishing detection accuracy.
In conclusion, ML offers a proactive approach to identifying phishing websites, with certain algorithms
performing better in terms of accuracy.
Introduction

Attacks are a major security threat, with attackers creating fake websites that resemble legitimate
ones to steal sensitive information, such as bank account credentials. Traditional methods like
blacklisting URLs and IP addresses are limited, as attackers can bypass them using techniques like
URL obfuscation and fast-flux. Heuristic-based methods can detect zero-hour phishing attacks but
have a high false positive rate. To improve detection, machine learning techniques are being used, as
they can analyze features of both legitimate and phishing URLs to more accurately identify phishing
websites, including those not yet recognized.
Problem statement

The project on "Phishing URL Detection Using Machine Learning" aims to address the escalating
threat of phishing attacks by developing an intelligent system. The challenge lies in the dynamic
nature of phishing techniques, requiring a machine learning model capable of accurately detecting
malicious URLs in real-time. Key objectives include effective feature engineering, selection of
appropriate algorithms for model training, achieving real-time processing capabilities, ensuring
generalization across diverse phishing attacks, and implementing robust data security measures. The
project ultimately seeks to contribute to online security by providing a proactive and adaptive
solution to identify and mitigate phishing threats.
Motivation

The motivation for using machine learning (ML) in phishing URL detection arises from the limitations of
traditional detection methods and the increasing sophistication of phishing attacks. Traditional techniques
like blacklisting URLs or IP addresses are easily bypassed by attackers using techniques such as URL
obfuscation and fast-flux, rendering them ineffective in detecting new, unknown phishing websites
(zero - hour attacks). Heuristic-based methods, while capable of detecting some phishing attempts, often suffer
from high false positive rates, making them unreliable. Machine learning, however, can analyze large datasets of
URLs and identify complex patterns that distinguish legitimate sites from phishing ones,enabling real-time
detection with greater accuracy. Additionally, ML models can continuously learn from new data, adapting to
evolving phishing tactics and automating the detection process, making it a scalable solution to combat phishing
threats.
Technical specifications
Hardware requirements

• RAM : 4GB
• ROM : 128GB
• Processor : Intel Core-i3 and above

Software requirements

• Operating System : Windows / MAC

• Programming language : Python
• Backend Framework : Flask
• Frontend : HTML,CSS
• IDE : VS code, Jupyter Notebook
• Libraries : NumPy , Pandas , Matplotlib, scikit learn
Literature survey
Design and methodology
Describes that web page URLs serve as input data for a machine learning task aimed at
classification. The first step involves pre-processing the URLs, where cleaning and normalization are
applied to ready them for feature extraction. Subsequently, features are extracted from the URLs to
convert them into a format suitable for machine learning. A Gradient Boosting classifier, a type of
ensemble learning method, is chosen as the model for its ability to combine weak models into a
robust predictor. The training of the classifier is performed using a dataset containing labeled URLs,
indicating their respective classes (e.g., malicious or benign). Once trained, the classifier can be
applied to new URLs to predict their classes. The implementation, exemplified in Python using
scikit-learn, includes steps such as splitting the dataset, using TF-IDF vectorization for feature
extraction, and assessing model performance through accuracy evaluation on a test set. This process
offers a systematic approach to discerning the nature of URLs and can be adapted for various
applications in web security and content filtering.
Block Diagram
Implementation

• Data collection
• Data preprocessing
• Feature Extractions
• Model Training
• Prediction using various algorithms(Gradient boosting tree classifier ,
Decision Tree ,Random forest)
• Evaluating the model
• Result
Result
Conclusion and Future scope
• In this project, we implemented seven Machine Learning algorithms including Decision Tree,
Gradient Boosting, Logistic Regression, Random Forest, Support Vector Machine and CatBoost.
These algorithms are the most used in phishing URLs classification. We adopted lexical analysis
approach to extract URL features and we calculated accuracy performance metric for each
algorithm. Thus, we presented the results obtained in a table that allows us to compare the various
algorithms in terms of performance based on accuracy score. It is seen that Gradient boost
classifier algorithm achieved the best accuracy score 97.4%. In the future work, we aim to exploit
the results of the work presented in this document, thus developing a model capable of detecting
simple URLs and those based on Machine Learning. We will also consider the following:
• Introduce URL HTML Encoding and URL Hit approach to extract URL features
• Use other performance metrics: Specificity, Confusion matrix.
References

•Kuraku & Kalla (2023): Examines machine learning and NLP for phishing detection,
focusing on models like Random Forests and SVM

•IEEE Survey (2024): Reviews machine learning techniques, challenges, and datasets
in phishing URL detection

•Hybrid Features Study: Explores combining URL features with hyperlink structures for
improved detection

Final PPT - Phishing Website
100% (1)
Final PPT - Phishing Website
23 pages
Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat
No ratings yet
Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat
18 pages
B5 PPT Final-1
No ratings yet
B5 PPT Final-1
15 pages
Phishing URL Detection Presentation
No ratings yet
Phishing URL Detection Presentation
12 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
24 pages
URL Phishing
No ratings yet
URL Phishing
36 pages
Malicious URL Detection Using Random Forest
No ratings yet
Malicious URL Detection Using Random Forest
36 pages
Report PUD
No ratings yet
Report PUD
20 pages
Appendices e F
No ratings yet
Appendices e F
6 pages
Phishing Website Detection
No ratings yet
Phishing Website Detection
19 pages
Final Synopsisi 2
No ratings yet
Final Synopsisi 2
11 pages
Fake Url
No ratings yet
Fake Url
64 pages
Malicious Site Detection (MSD)
No ratings yet
Malicious Site Detection (MSD)
58 pages
FR - Detecting Malicious Urls Using Data Analytics
No ratings yet
FR - Detecting Malicious Urls Using Data Analytics
17 pages
20mis0106 VL2023240103172 Pe003
No ratings yet
20mis0106 VL2023240103172 Pe003
5 pages
Department of Computer Engineering: Phishing Website Detector Using ML
No ratings yet
Department of Computer Engineering: Phishing Website Detector Using ML
13 pages
A Machine Learning-Based Solution For Enhanced Online Security
No ratings yet
A Machine Learning-Based Solution For Enhanced Online Security
13 pages
1NT21MC081 Research Report
No ratings yet
1NT21MC081 Research Report
5 pages
Url Pishing
No ratings yet
Url Pishing
28 pages
Major Project Final Report
No ratings yet
Major Project Final Report
53 pages
Depuuu DOCNW
No ratings yet
Depuuu DOCNW
28 pages
Phishing-Detection Using ML
No ratings yet
Phishing-Detection Using ML
14 pages
Phishing 094610
No ratings yet
Phishing 094610
26 pages
Fin Irjmets1682919970
No ratings yet
Fin Irjmets1682919970
5 pages
Enhancing Phishing URL Detection Through Comprehen
No ratings yet
Enhancing Phishing URL Detection Through Comprehen
7 pages
Network Security Report
No ratings yet
Network Security Report
42 pages
Phishing Detection Using Machine Learnin
No ratings yet
Phishing Detection Using Machine Learnin
5 pages
Maliciousurlpaper
No ratings yet
Maliciousurlpaper
6 pages
Comparative Evaluation of Machine Learning Models For Malicious URL Detection
No ratings yet
Comparative Evaluation of Machine Learning Models For Malicious URL Detection
7 pages
Final Yr Project PhishingAttack
No ratings yet
Final Yr Project PhishingAttack
12 pages
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
No ratings yet
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
6 pages
Paper 7AdvancesinEngineeringSoftware
No ratings yet
Paper 7AdvancesinEngineeringSoftware
6 pages
Web-Based Machine Learning Framework For Phishing URL Detection and Analysis
No ratings yet
Web-Based Machine Learning Framework For Phishing URL Detection and Analysis
7 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
25 pages
Phishing Review 2023
No ratings yet
Phishing Review 2023
17 pages
Updated Phishing Url Detection
No ratings yet
Updated Phishing Url Detection
13 pages
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
No ratings yet
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
4 pages
Scalable Malicious URL Classification: Leveraging Lexical Analysis and API Integration
No ratings yet
Scalable Malicious URL Classification: Leveraging Lexical Analysis and API Integration
5 pages
Fake Website Detection
No ratings yet
Fake Website Detection
13 pages
Phishing
No ratings yet
Phishing
10 pages
128 Submission
No ratings yet
128 Submission
7 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
B5 - Project Synopsis
No ratings yet
B5 - Project Synopsis
5 pages
Paper 2
No ratings yet
Paper 2
10 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
16 pages
Batch 18-Journal
No ratings yet
Batch 18-Journal
7 pages
Detection of Phishing Websites by Investigating Their Urls Using LSTM Algorithm
No ratings yet
Detection of Phishing Websites by Investigating Their Urls Using LSTM Algorithm
10 pages
Machine Learning For Detecting The Phishing Threats
No ratings yet
Machine Learning For Detecting The Phishing Threats
6 pages
Second Review
No ratings yet
Second Review
26 pages
CT43B0513 Ieee
No ratings yet
CT43B0513 Ieee
6 pages
Automated Phishing Detection Through URL Analysis and Machine Learning
No ratings yet
Automated Phishing Detection Through URL Analysis and Machine Learning
9 pages
Phishing Phase1 Report
No ratings yet
Phishing Phase1 Report
20 pages
Raspberry Pi Measure Record Explore
100% (1)
Raspberry Pi Measure Record Explore
339 pages
Phishing Website Detection by Machine Learning Techniques Presentation
No ratings yet
Phishing Website Detection by Machine Learning Techniques Presentation
12 pages
Phishing Website Detection Using ML 2-1
No ratings yet
Phishing Website Detection Using ML 2-1
20 pages
Tittle of The Project
No ratings yet
Tittle of The Project
1 page
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
No ratings yet
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
8 pages
Phishing Detection Using ML
No ratings yet
Phishing Detection Using ML
11 pages
VLSI Circuit Design Process-Unit-II
No ratings yet
VLSI Circuit Design Process-Unit-II
51 pages
Smart Meter Manual
No ratings yet
Smart Meter Manual
3 pages
NUST Syllabus For DLD
100% (1)
NUST Syllabus For DLD
2 pages
CSC103 Programming Fundamentals
No ratings yet
CSC103 Programming Fundamentals
8 pages
Final Project Report
No ratings yet
Final Project Report
48 pages
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
No ratings yet
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
11 pages
LICA 2021 Unit 1
No ratings yet
LICA 2021 Unit 1
107 pages
Intro To GPS (Lecture Dated 10 1 2025)
No ratings yet
Intro To GPS (Lecture Dated 10 1 2025)
37 pages
7 Principles of Supply Chain Management PDF
0% (1)
7 Principles of Supply Chain Management PDF
5 pages
Command 3G - Baru
No ratings yet
Command 3G - Baru
5 pages
Life 2e - Advanced - Unit 2 Test - Word
No ratings yet
Life 2e - Advanced - Unit 2 Test - Word
7 pages
Recommendation System in System Design
No ratings yet
Recommendation System in System Design
12 pages
Enatel Manual 5U Compact PSC140705xx-107 V1.0
No ratings yet
Enatel Manual 5U Compact PSC140705xx-107 V1.0
22 pages
Information System
No ratings yet
Information System
11 pages
Kubernetes
No ratings yet
Kubernetes
5 pages
FINAL - PPT - IOMP (Autosaved)
No ratings yet
FINAL - PPT - IOMP (Autosaved)
14 pages
2025 Induction
No ratings yet
2025 Induction
16 pages
Win51E User Manual
No ratings yet
Win51E User Manual
9 pages
Soumen Dikpati C.V
No ratings yet
Soumen Dikpati C.V
2 pages
Concur Expense EXP - SG - Workflow - AuthAppr
No ratings yet
Concur Expense EXP - SG - Workflow - AuthAppr
38 pages
Diode Clipping Circuits
No ratings yet
Diode Clipping Circuits
3 pages
Angular 8 Tutorial & Crash Course
No ratings yet
Angular 8 Tutorial & Crash Course
29 pages
Team07 Abstract
No ratings yet
Team07 Abstract
4 pages
Single Bot Vs Multi Bot
No ratings yet
Single Bot Vs Multi Bot
3 pages
Seminar C2
No ratings yet
Seminar C2
14 pages
Seminar C2 Pooja
No ratings yet
Seminar C2 Pooja
11 pages
Volkswagen India Digital Marketing Case Study
No ratings yet
Volkswagen India Digital Marketing Case Study
2 pages
Angular Js
No ratings yet
Angular Js
6 pages
Case Study 1
No ratings yet
Case Study 1
3 pages
4008 Facp
No ratings yet
4008 Facp
8 pages
321DLCR#209
No ratings yet
321DLCR#209
1 page
Technology Transfer Refers To The Process of Transferring Knowledge
No ratings yet
Technology Transfer Refers To The Process of Transferring Knowledge
2 pages
CDT 13003
No ratings yet
CDT 13003
4 pages
ChatGPT Application and Integration Guide: Definitive Reference for Developers and Engineers
From Everand
ChatGPT Application and Integration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical RapidMiner Workflows and Automation: Definitive Reference for Developers and Engineers
From Everand
Practical RapidMiner Workflows and Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Metasploit Techniques and Workflows: Definitive Reference for Developers and Engineers
From Everand
Metasploit Techniques and Workflows: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Phishing Final

Uploaded by

Phishing Final

Uploaded by

Abstract

• Operating System : Windows / MAC

You might also like