0% found this document useful (0 votes)

133 views25 pages

Phishing URL Detection Using ML: Project Report

This document summarizes a project report on detecting phishing URLs using machine learning. It contains 13 chapters, including an abstract, introduction, literature review on related work and their limitations, problem formulation, objectives and proposed algorithm, methodology, flow diagrams, implementation, results and discussion, conclusion and future work, code snippets, appendix, and references. The literature review analyzes and compares three previous studies on phishing detection using machine learning approaches and identifies their advantages and limitations. The proposed method uses supervised learning techniques like ensemble learning algorithms and resource description framework models to classify websites with high true positive and accuracy rates while reducing false positives.

Uploaded by

Krishna Arjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views25 pages

Phishing URL Detection Using ML: Project Report

Uploaded by

Krishna Arjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Phishing URL Detection using ML

PROJECT REPORT

Submitted by

18BCE0256 - Y . Krishna Chaitanya

18BCE0257 - M. Raja Sekhar
19BCE0512 - S. Venkata Phani Kumar

SUBMITTED FOR THE COURSE

CSE4003 Cyber Security

( Slot E1 )

Computer Science and Engineering

Under the guidance of

Prof. Manikandan. K
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

1 Abstract 2
2 Introduction 2
3 Literature Review 3-5

4 Problem Formulation 5

5 Objectives and Algorithm 5-9

6 Methodology 10 - 11

7 Flow Diagrams 12

8 Implementation 13 - 14

9 Results and Discussion 15

10 Conclusion and Future Work 16 - 17

11 Code Snippets 17 - 22

12 Appendix 22

13 References 22 - 23

1
1. Abstract :

Phishing is one of the major threats in this internet era. Phishing is

a smart process where a legitimate website is cloned and victims are lured
to the fake website to provide their personal as well as confidential
information, Sometimes it proves to be costly. Though most of the
websites will give a disclaimer warning to the users about phishing, users
tend to neglect it. It is not a fully responsible action by the websites also
and there is not much that the websites could really do about it. Since
phishing has been in persistence for a long time, many approaches have
been proposed in the past that can detect phishing websites but very few
or none of them detect the target websites for these phishing attacks,
accurately.

In our proposed method we identify phishing websites using a

combined approach by constructing Resource Description Framework
(RDF) models and using ensemble learning algorithms for the
classification of websites. Our approach uses supervised learning
techniques to train our system. This approach has a promising true
positive rate of 98.8%, which is definitely appreciable. As we have used a
random forest classifier that can handle missing values in the dataset, we
were able to reduce the false positive rate of the system to an extent of
1.5%. As our system explores the strength of RDF and ensemble learning
methods and both these approaches work hand in hand, a highly
promising accuracy rate of 98.68% is achieved.

2. Introduction:

As COVID-19 spreads around the world, it is clear that the use of

the web and online services is accelerating, confirming the importance of
this new technology in our modern world.

One of the most widely recognized online security dangers are

Phishing attacks. The purpose of this fraud is to imitate a real website, for
example, internet banking, e-Commerce, or social networking so as to
acquire confidential data such as user-names, passwords, financial and
health-related information from potential victims.

2
3. Literature Review :

Title Objectives Limitations

1. Detection of Classification models The primary advantage of
that detect phishing web blacklists is that querying
phishing URLs
sites by the analysis of is a low overhead
using Machine lexical and host-based operation: the lists of
features of URLs. They malicious sites are
Learning
analyzed different precompiled, so the only
techniques classifying algorithms in computational cost of
the Waikato deployed blacklists is the
Environment for lookup overhead. However,
Knowledge Analysis the need to construct these
(WEKA) workbench lists in advance give rise to
and MATLAB. their disadvantage that
blacklists become stale.
Network administrators
block existing malicious
sites, and enforcement
efforts take down criminal
enterprises behind those
sites. There is a constant
pressure on criminals to
construct new sites and to
find new hosting
infrastructure. As a result,
new malicious URLs are
introduced and blacklist
providers must update their
lists yet again. However, in
this process, criminals are
always ahead because Web
site construction is
inexpensive. Moreover,
free services for blogs e.g.,
Blogger and personal
hosting e.g., Google Sites,
Microsoft Live Spaces
provide another
inexpensive source of
disposable sites.

3
In this paper, they The drawback of this
discussed three system is detecting some
2. Detection and
approaches for detecting minimal false positive and
Prevention of phishing websites. First
false negative results.
is by analyzing various
Phishing Websites These drawbacks can be
features of the URL,
using Machine second is by checking eliminated by introducing
legitimacy of a website much richer features to
by knowing where the feed to the machine
website is being hosted learning algorithm that
and who is managing it, would result in much
the third approach uses
higher accuracy.
visual appearance based
analysis for checking
genuineness of the
website. We make use of
Machine Learning
techniques and
algorithms for
evaluation of these
different features of
URL and websites.
3. Phishing In this paper, they Decision trees Bayes Net,
Detection: A critically analysed
and SVM achieved good
Recent Intelligent recent studies related to
phishing in the research detection rates. However,
Machine Learning
literature based on ML
Comparison models extracted by
techniques. We show
based on Models how these ML decision trees showed very
Content and approaches derive the
large amounts of
Features. classification models
and their advantages and information which may
disadvantages. More
overwhelm novice users
importantly, we
investigate in-depth and security experts, and
eight ML techniques on
thus will be hard to manage
real datasets related to
phishing and perform or understand. Moreover,
thorough comparisons
Bayes Net and SVM
of these techniques. The
aim of the comparisons showed good performance

4
is to determine a with respect to accuracy,
suitable approach that
yet their models are hard to
may serve as an anti
phishing tool, based on understand by end-users.
the model content as
well as the detection rate
of phishing activities.
4. Phishing Website Weight to the extracted Finally, WHOIS lookups are
Detection Using words from website
performed to obtain domain
URL-Assisted content. To form the final
weights, the URL name owners. A successful
Brand Name
weighting system
Weighting System match of the domain name
computes further weight
to be added up with the owner will conclude the
initial weight of the
query website as a legitimate
words. Based on the final
weights, a few of the website. Otherwise, the query
words are selected as the
website will be labelled as a
brand name. The brand
name is then submitted to phishing website. A detailed
the search engine to
explanation of each
retrieve the domain name
with the highest number component will be discussed
of occurrences among the
in the following subsection.
search results.
This section discusses the
steps to extract plain text and
URLs from the HTML source
code. When a web page is
loaded, the browser creates a
Document Object Model
(DOM) of the page. DOM
defines the HTML content in
structured nodes.

5. Phishing Detection Phishers will rip off the

via Identification of visual components (i.e., Knowing the phishers will
Website Identity logo, emblem or use the visual components
trademark) from the ripped off from the legitimate
legitimate website and use

5
them in their phishing website, especially the logo
website. In order to detect in their phishing websites,
phishing websites, the first this motivates us to propose
question to ask is: how to
an anti-phishing method
differentiate a phishing
website from a legitimate based on the identification of
website given the fact that website identity through the
they look identical? If we logo. This is rational as the
can somehow determine logo is usually representing
the real identity of a query the identity of a legitimate
website (if the query website. In this paper, the
website is a phishing
proposed method involves
website, the real identity
will be the identity of the two main processes: logo
targeted website), we can segmentation and website
then differentiate them. identity identification.

6. Problem Formulation :

Most of the previously existing methodologies implemented a

blacklisting and whitelisting process, where the URL is compared with
the existing list of URLs and decision is taken based on the list that the
url belongs to.
In our project we are going to extract the url features and build a
model using ML, using that model we predict the input url’s legitimacy.

7. Objectives and Algorithm :

The main objective of this project is to predict whether a given URL is

Phishing or Safe.

5.1. Decision Trees (DTs) :

Decision Trees are a non-parametric supervised learning method used for
classification and regression. The goal is to create a model that predicts

6
the value of a target variable by learning simple decision rules inferred
from the data features. A tree can be seen as a piecewise constant
approximation.

Merits :
● Simple to understand and to interpret. Trees can be visualised.
● Requires little data preparation. Other techniques often require data
normalisation, dummy variables need to be created and blank
values to be removed. Note however that this module does not
support missing values.
● DT can handle both numerical and categorical data.
● Decision trees provide a clear indication of which fields are most
important for prediction or classification.

Demerits :

● Decision-tree learners can create over-complex trees that do not

generalise the data well. This is called overfitting. Mechanisms
such as pruning, setting the minimum number of samples required
at a leaf node or setting the maximum depth of the tree are
necessary to avoid this problem.
● Decision trees can be unstable because small variations in the data
might result in a completely different tree being generated.
● Decision trees can be computationally expensive to train.

5.2. Logistic Regression :

7
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning technique. It is
used for predicting the categorical dependent variable using a given set of
independent variables. Logistic regression predicts the output of a
categorical dependent variable. Therefore the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or 1, True or
False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.

On the basis of the categories, Logistic Regression can be classified into

three types:

● Binomial: In binomial Logistic regression, there can be only two

possible types of the dependent variables, such as 0 or 1, Pass or
Fail, etc.

● Multinomial: In multinomial Logistic regression, there can be 3 or

more possible unordered types of the dependent variable, such as
"cat", "dogs", or "sheep"

● Ordinal: In ordinal Logistic regression, there can be 3 or more

possible ordered types of dependent variables, such as "low",
"Medium", or "High".

5.3. SVM - Support Vector Machine :

Support Vector Machine or SVM is one of the most popular Supervised

Learning algorithms, which is used for Classification as well as
Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.

8
The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
This best decision boundary is called a hyperplane. SVM chooses the
extreme points/vectors that help in creating the hyperplane. These
extreme cases are called support vectors, and hence the algorithm is
termed as Support Vector Machine.

SVM is classified into two types:

● Linear SVM: Linear SVM is used for linearly separable data,

which means if a dataset can be classified into two classes by using

a single straight line, then such data is termed as linearly separable

data, and classifier is used called as Linear SVM classifier.

● Non-linear SVM: Non-Linear SVM is used for non-linearly

separated data, which means if a dataset cannot be classified by

using a straight line, then such data is termed as non-linear data and

classifier used is called as Non-linear SVM classifier.

5.4. Random Forest :

Random forest is a popular machine learning algorithm that belongs to

supervised learning technique.It can be used for both classification and
regression problems.It is based on the concept of ensemble
learning,which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

Random Forest is a classifier that contains a number of decision trees on

various subsets of the given dataset and takes the average to improve the

9
predictive accuracy of that dataset. Instead of relying on one decision
tree, the random forest takes the prediction from each tree and based on
the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy

and prevents the problem of overfitting.

Assumptions for Random Forest:

Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:

● There should be some actual values in the feature variable of the

dataset so that the classifier can predict accurate results rather than

a guessed result.

● The predictions from each tree must have very low correlations.

Advantages of Random Forest:

● Random Forest is capable of performing both Classification and

Regression tasks.

● It is capable of handling large datasets with high dimensionality.

● It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest:

10
● Although random forest can be used for both classification and regression

tasks, it is not more suitable for Regression tasks.

● Also the cost of implementation is too high.

8. Methodology :

Phishing Domain Detection using Features Engineering There are a

lot of algorithms and a wide variety of data types for phishing detection
in the academic literature and commercial products. A phishing URL and
the corresponding page have several features which can be differentiated
from a malicious URL. For example; an attacker can register a long and
confusing domain to hide the actual domain name (Cybersquatting,
Typosquatting). In some cases attackers can use direct IP addresses
instead of using the domain name.
This type of event is out of our scope, but it can be used for the
same purpose. Attackers can also use short domain names which are
irrelevant to legitimate brand names and don’t have any FreeUrl addition.
But these types of web sites are also out of our scope, because they are
more relevant to fraudulent domains instead of phishing domains.

Beside URL-Based Features, different kinds of features which are

used in machine learning algorithms in the detection process of academic
studies are used. Features collected from academic studies for the
phishing domain detection with machine learning techniques are grouped
as given below.

1. URL-Based Features
2. Domain-Based Features
3. Page-Based Features
4. Content-Based Features

11
URL-Based Features URL is the first thing to analyse a website to decide
whether it is a phishing or not. As we mentioned before, URLs of
phishing domains have some distinctive points. Features which are
related to these points are obtained when the URL is processed.
Some of URL-Based Features are given below.
1. Digit count in the URL
2. Total length of URL
3. Checking whether the URL is Typosquatting or not. (google.com
→ goggle.com)
4. Checking whether it includes a legitimate brand name or not
(apple-icloud-login.com)
5. Number of subdomains in URL
6. Is Top Level Domain (TLD) one of the commonly used one?

machine learning algorithms and each algorithm has its own working
mechanism. In this project, we have explained the Decision Tree
Algorithm, because I think this algorithm is simple and powerful.

Modules included:
1. Data training : We have used the Random Forest algorithm to train
our data set.
2. FrontEnd and Server maintenance : A localhost server is created
and all the required HTML files are hosted over there. This module
will take care of flow of the data among the programme files.
3. Extracting URL features : This module takes the URL and pass it
through various filters and extract the features like domain,
protocols, sub-domain, SSL certificates etc..,
4. Predicting of URL type : This module takes the output of the
previous module and processes it and assigns a flag value of the
URL, which later helps in identifying its safety.

12
7. Flow Diagrams :

Fig1. Module flow diagram

Fig2. Model selection flow diagram

13
8. Implementation :

14
15
9. Results and Discussion :

● System info :

○ Hardware specifications :
- Intel core processor ( i5 - recommended )
- Memory : 2GB ( 4GB - recommended )
- Disk space : 1GB ( >1GB - recommended )

○ Software specification :
- Windows OS
- Python3 : with required modules
- Jupyter notebook ( Google Colab - recommended )
- IDE to code the front end( VS Code-recommended)
- Browser ( Chrome - recommended )
- Modules supporting server maintenance

● Dataset :
- Phishcoop.csv ( taken from UCI-repo )
- Contains 11055 entries each with 32 - attributes
- No null entries
- 6157 - positive examples, 4898 - negative examples

● Input type :
- URL of a site to be verified

● We used different algorithms to generate our model like

16
○ Decision Tree
○ Logistic Regression
○ Support-Vector Machine
○ Random Forest
Out of these Random Forest gives a maximum accuracy score of 98.6%.
So we generated a finalised_model.pkl which predicts the input urls.

● RF gained an accuracy of 98.6% while testing but after storing the

predicted values and testing again it gave an accuracy of 99.4%.
● The model generated by RF is saved and sent to a validation.py file
to predict the input url’s legitimacy.

10. Conclusion and Future work :

This project helps in detection of phishing attacks as they are

carried out to individuals and to organizations. It proposed the use of
technological factors and the human factor to end the threat posed by
phishing attacks and we believe that it can effectively help to reduce their
impacts on individuals.

17
Because of the threat posed by phishing attacks, more research
should still be carried out to add on the existing knowledge solutions.
Hackers are still creating new ways to exploit the human trust nature.
And a more adequate technique for model testing should be considered to
help in a better way of validation for a model before its deployment in the
real world.

Future work :
Our project has some limitations checking multiple domains and ip
addresses. So we are planning to overcome those limitations in the
coming future. Then we’ll try to make it as a chrome extension and
deploy to real use.

11. Code Snippets :

Fig3. Loading dataset

18
Fig4. Decision tree rules generation

Fig5. Decision tree accuracy score

19
Fig6. Logistic regression correlation among features

Fig7. Logistic regression accuracy score

20
Fig8. SVM accuracy score

Fig9. Random Forest implementation

21
Fig10. Random Forest accuracy score

Fig11. Accuracy scores

22
Fig12. Accuracy comparison graph

12. Appendix :

● Colab file :
https://colab.research.google.com/drive/1ehQDur3iPhPpa2r2GArdtF5Qv
6DtRPjR?usp=sharing

● Project files :
https://github.com/Krishnachaitanya-learn/Phishing_detection4QgbIchQf
xkmOCllw4CX4X_GV?usp=sharing

13. References :

1. James, J., Sandhya, L., & Thomas, C. (2013, December).

Detection of phishing URLs using machine learning techniques.
In 2013 International conference on control communication and
computing (ICCC) (pp. 304-309). IEEE.
2. Patil, V., Thakkar, P., Shah, C., Bhat, T., & Godse, S. P. (2018,
August). Detection and prevention of phishing websites using
machine learning approach. In 2018 Fourth international conference
on computing communication control and automation (ICCUBEA)
(pp. 1-5). IEEE.

23
3. Abdelhamid, N., Thabtah, F., & Abdel-jaber, H. (2017, July).
Phishing detection: A recent intelligent machine learning comparison
based on models content and features. In 2017 IEEE international
conference on intelligence and security informatics (ISI) (pp. 72-77).
IEEE.
4. Bhat, T., & Godse, S. P. (2018, August). In 2018 Fourth international
conference on computing communication control and automation
(ICCUBEA) (pp. 1-5). IEEE.
5. Thomas, C. (2019, December). Detection of phishing URLs
using machine learning techniques. recent intelligent machine
learning comparison based on models content and features. In
2020 IEEE international conference on intelligence and security
informatics (ISI) (pp. 71-87).

Final PPT - Phishing Website
100% (1)
Final PPT - Phishing Website
23 pages
Phishing
No ratings yet
Phishing
18 pages
Detection of Phishing On Apps and Websites - Project Report
No ratings yet
Detection of Phishing On Apps and Websites - Project Report
21 pages
B5 PPT Final-1
No ratings yet
B5 PPT Final-1
15 pages
(IJETA-V11I3P35) : Ms. Apoorva Joshi, Ms. Apoorva Joshi, Manvi Bhardwaj
No ratings yet
(IJETA-V11I3P35) : Ms. Apoorva Joshi, Ms. Apoorva Joshi, Manvi Bhardwaj
4 pages
Cse3502-Information Security Management: Phishing Detection Using Data Mining Techniques
No ratings yet
Cse3502-Information Security Management: Phishing Detection Using Data Mining Techniques
25 pages
Midterm Project Report
No ratings yet
Midterm Project Report
21 pages
Final
No ratings yet
Final
26 pages
Random Forest
No ratings yet
Random Forest
10 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
6 pages
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
No ratings yet
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
11 pages
Report PUD
No ratings yet
Report PUD
20 pages
Final Synopsisi 2
No ratings yet
Final Synopsisi 2
11 pages
22 04 CPE Presentation
No ratings yet
22 04 CPE Presentation
18 pages
Phishing Website Detection
No ratings yet
Phishing Website Detection
19 pages
1NH16CS054
No ratings yet
1NH16CS054
95 pages
A Machine Learning Based Approach For Phishing Detection Using
No ratings yet
A Machine Learning Based Approach For Phishing Detection Using
14 pages
Towards Detection of Phishing Websites On Client-Side Using Machine
No ratings yet
Towards Detection of Phishing Websites On Client-Side Using Machine
14 pages
Presentation Slides
No ratings yet
Presentation Slides
42 pages
Malicious Site Detection (MSD)
No ratings yet
Malicious Site Detection (MSD)
58 pages
Phishing
No ratings yet
Phishing
8 pages
Fake Url
No ratings yet
Fake Url
64 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
24 pages
Major Project Final Report
No ratings yet
Major Project Final Report
53 pages
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
No ratings yet
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
4 pages
1NT21MC081 Research Report
No ratings yet
1NT21MC081 Research Report
5 pages
Phishing-Detection Using ML
No ratings yet
Phishing-Detection Using ML
14 pages
Department of Computer Engineering: Phishing Website Detector Using ML
No ratings yet
Department of Computer Engineering: Phishing Website Detector Using ML
13 pages
Depuuu DOCNW
No ratings yet
Depuuu DOCNW
28 pages
Jain 2018
No ratings yet
Jain 2018
14 pages
Phisingppt
No ratings yet
Phisingppt
15 pages
Phishing Detection (Yamu Research Project)
No ratings yet
Phishing Detection (Yamu Research Project)
19 pages
Phishing Detection Using Machine Learnin
No ratings yet
Phishing Detection Using Machine Learnin
5 pages
Final Yr Project PhishingAttack
No ratings yet
Final Yr Project PhishingAttack
12 pages
Paper 7AdvancesinEngineeringSoftware
No ratings yet
Paper 7AdvancesinEngineeringSoftware
6 pages
CSE3502-Final J Comp Report
No ratings yet
CSE3502-Final J Comp Report
20 pages
Phishing Review 2023
No ratings yet
Phishing Review 2023
17 pages
Network Security Report
No ratings yet
Network Security Report
42 pages
Updated Phishing Url Detection
No ratings yet
Updated Phishing Url Detection
13 pages
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
No ratings yet
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
4 pages
CyberSec Review3 Team10
No ratings yet
CyberSec Review3 Team10
28 pages
B5 - Project Synopsis
No ratings yet
B5 - Project Synopsis
5 pages
128 Submission
No ratings yet
128 Submission
7 pages
Phishing
No ratings yet
Phishing
10 pages
Paper 2
No ratings yet
Paper 2
10 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
Phishing 4
No ratings yet
Phishing 4
6 pages
Web Phishing Detection Using ML
No ratings yet
Web Phishing Detection Using ML
5 pages
Fake Website Detection
No ratings yet
Fake Website Detection
13 pages
Second Review
No ratings yet
Second Review
26 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
16 pages
Automated Phishing Detection Through URL Analysis and Machine Learning
No ratings yet
Automated Phishing Detection Through URL Analysis and Machine Learning
9 pages
Phishing 5
No ratings yet
Phishing 5
5 pages
Project Report1
No ratings yet
Project Report1
83 pages
A Sophisticated Framework For The Accurate Detection of Phishing Websites
No ratings yet
A Sophisticated Framework For The Accurate Detection of Phishing Websites
23 pages
Phishing Website Detection Using ML 2-1
No ratings yet
Phishing Website Detection Using ML 2-1
20 pages
Phishing Phase1 Report
No ratings yet
Phishing Phase1 Report
20 pages
Complete Data Science, Machine Learning, DL, NLP Bootcamp - Udemy Business
No ratings yet
Complete Data Science, Machine Learning, DL, NLP Bootcamp - Udemy Business
25 pages
Phishing Detection Using ML
No ratings yet
Phishing Detection Using ML
11 pages
Predictive Analysis For Big Mart Sales Using Machine
100% (1)
Predictive Analysis For Big Mart Sales Using Machine
11 pages
Amit Kumar: Bigmart Sales Prediction A Project Report
No ratings yet
Amit Kumar: Bigmart Sales Prediction A Project Report
47 pages
Rainfall
No ratings yet
Rainfall
24 pages
Question Set Machine Learning A Revolution in Risk Management and Compliance
100% (11)
Question Set Machine Learning A Revolution in Risk Management and Compliance
11 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
8 pages
Business Analyst Slide Deck
No ratings yet
Business Analyst Slide Deck
120 pages
Detection of Autism Spectrum Disorder
No ratings yet
Detection of Autism Spectrum Disorder
52 pages
Random Forest
No ratings yet
Random Forest
25 pages
Applications of Big Data Analytics in E-Commerce
No ratings yet
Applications of Big Data Analytics in E-Commerce
11 pages
11.ABM SoftSensor MachineLearning DeepLearning
No ratings yet
11.ABM SoftSensor MachineLearning DeepLearning
13 pages
Cloud Env
No ratings yet
Cloud Env
15 pages
Software Defect Prediction - Final - Doc - Phase 1
No ratings yet
Software Defect Prediction - Final - Doc - Phase 1
36 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
On Stock Price Prediction - A Deep Learning Approach Using Bidirectional Long-Short Term Memory (Bilstm) - 20230227 - 202813
No ratings yet
On Stock Price Prediction - A Deep Learning Approach Using Bidirectional Long-Short Term Memory (Bilstm) - 20230227 - 202813
59 pages
Article PP 1416-1433
No ratings yet
Article PP 1416-1433
18 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
67 pages
Coffee Beans Quality Prediction Usin - Machine Learning
No ratings yet
Coffee Beans Quality Prediction Usin - Machine Learning
7 pages
Liu Et Al 2022 Data Driven Machine Learning in Environmental Pollution Gains and Problems
No ratings yet
Liu Et Al 2022 Data Driven Machine Learning in Environmental Pollution Gains and Problems
10 pages
Big Data Projecct
No ratings yet
Big Data Projecct
12 pages
Sentiment Analysis Using Twitter Data
No ratings yet
Sentiment Analysis Using Twitter Data
7 pages
Marketing Thesis
No ratings yet
Marketing Thesis
18 pages
IIITBHOPAL CSE - pdf-1
No ratings yet
IIITBHOPAL CSE - pdf-1
1 page
Requested
No ratings yet
Requested
8 pages
Dynamic Strategies With Machine Learning
No ratings yet
Dynamic Strategies With Machine Learning
94 pages
Ashoka Women'S Engineering College
No ratings yet
Ashoka Women'S Engineering College
26 pages
Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
10 pages
Atharva Deshmukh Resume PDF
No ratings yet
Atharva Deshmukh Resume PDF
1 page
Presentation Salaid
No ratings yet
Presentation Salaid
21 pages
Sumatra Traditional Food Image Classification Using Classical Machine Learning
No ratings yet
Sumatra Traditional Food Image Classification Using Classical Machine Learning
5 pages
Web3 Kickstart Guide: Empowering Founders, Managers, Engineers & Professionals for the Decentralized Future
From Everand
Web3 Kickstart Guide: Empowering Founders, Managers, Engineers & Professionals for the Decentralized Future
Muhammad Ahsan Khan
No ratings yet
Mastering the Art of Web Scraping: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Web Scraping: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet

Phishing URL Detection Using ML: Project Report

Uploaded by

Phishing URL Detection Using ML: Project Report

Uploaded by

Phishing URL Detection using ML

18BCE0256 - Y . Krishna Chaitanya

SUBMITTED FOR THE COURSE

CSE4003 Cyber Security

Computer Science and Engineering

Under the guidance of

CHAPTER NO. TITLE PAGE NO.

5 Objectives and Algorithm 5-9

9 Results and Discussion 15

10 Conclusion and Future Work 16 - 17

Phishing is one of the major threats in this internet era. Phishing is

In our proposed method we identify phishing websites using a

As COVID-19 spreads around the world, it is clear that the use of

One of the most widely recognized online security dangers are

Title Objectives Limitations

5. Phishing Detection Phishers will rip off the

Most of the previously existing methodologies implemented a

7. Objectives and Algorithm :

The main objective of this project is to predict whether a given URL is

5.1. Decision Trees (DTs) :

● Decision-tree learners can create over-complex trees that do not

5.2. Logistic Regression :

On the basis of the categories, Logistic Regression can be classified into

● Binomial: In binomial Logistic regression, there can be only two

● Multinomial: In multinomial Logistic regression, there can be 3 or

● Ordinal: In ordinal Logistic regression, there can be 3 or more

5.3. SVM - Support Vector Machine :

Support Vector Machine or SVM is one of the most popular Supervised

SVM is classified into two types:

● Linear SVM: Linear SVM is used for linearly separable data,

which means if a dataset can be classified into two classes by using

a single straight line, then such data is termed as linearly separable

data, and classifier is used called as Linear SVM classifier.

● Non-linear SVM: Non-Linear SVM is used for non-linearly

separated data, which means if a dataset cannot be classified by

classifier used is called as Non-linear SVM classifier.

5.4. Random Forest :

Random forest is a popular machine learning algorithm that belongs to

Random Forest is a classifier that contains a number of decision trees on

The greater number of trees in the forest leads to higher accuracy

Assumptions for Random Forest:

● There should be some actual values in the feature variable of the

Advantages of Random Forest:

● Random Forest is capable of performing both Classification and

● It is capable of handling large datasets with high dimensionality.

Disadvantages of Random Forest:

tasks, it is not more suitable for Regression tasks.

● Also the cost of implementation is too high.

Phishing Domain Detection using Features Engineering There are a

Beside URL-Based Features, different kinds of features which are

Fig1. Module flow diagram

Fig2. Model selection flow diagram

● We used different algorithms to generate our model like

● RF gained an accuracy of 98.6% while testing but after storing the

10. Conclusion and Future work :

This project helps in detection of phishing attacks as they are

11. Code Snippets :

Fig3. Loading dataset

Fig5. Decision tree accuracy score

Fig7. Logistic regression accuracy score

Fig9. Random Forest implementation

Fig11. Accuracy scores

1. James, J., Sandhya, L., & Thomas, C. (2013, December).

You might also like