Major Project Final Report
Major Project Final Report
TAMILNADU, INDIA
PHISHING WEBSITES DETECTION USING MACHINE
LEARNING
i
ABSTRACT
ii
LIST OF ACRONYMS AND
ABBREVIATIONS
iii
INTRODUCTION
Introduction
The aim of phishing website detection using a machine learning project is to develop
a system that can automatically identify and flag potentially dangerous phishing web-
sites. The ultimate goal is to provide users with an extra layer of protection against
scams and phishing attacks in order to minimize the risks of identity theft and finan-
cial fraud. By analyzing website content, behavior, and other relevant factors, the
machine learning algorithm can learn to distinguish between legitimate and fraudu-
1
lent websites, allowing it to make more accurate and effective predictions over time.
This can help users make better-informed decisions about which websites to trust
and which to avoid.
Project Domain
The scope of phishing website detection using machine learning is significant and
offers a range of potential benefits. Some of the key areas where machine learning
can be applied to this problem include:
Early Detection
2
Real-Time Detection
With machine learning, it is possible to detect phishing sites in real-time, which can
help prevent potential victims from falling prey to these scams. One approach to real-
time phishing website detection using machine learning is to use supervised learning
algorithms, such as decision trees, random forests, and support vector machines, that
are trained on a large dataset of known phishing and legitimate websites. These
algorithms can then be used to classify new websites as either phishing or legitimate
based on their features.
Accuracy
Scalability
As the number of new phishing sites continues to grow, machine learning can help
scale the detection and screening process, allowing organizations to stay ahead of
the curve. By leveraging large datasets of known phishing and legitimate websites,
machine learning algorithms can identify patterns and features that are common to
phishing sites, such as the use of misleading URLs, the presence of suspicious links,
and the incorporation of fake login pages. This allows for the automation of the
detection process and the efficient screening of large volumes of web traffic.
Reduced Cost
3
LITERATURE REVIEW
J. Shad et al., [4] confined that the Web harms users by stealing their confiden-
tial information such as account ID, user name, password, etc. Phishing is a social
engineering attack and current attacks on mobile devices. That might result in the
form of financial losses. In this paper, we described many detection techniques us-
ing URL, Hyperlinks features that can be used to differentiate between defective and
non-defective websites. There are six main approaches: heuristic, blacklist, Fuzzy
Rule, machine learning, image processing, and CANTINA-based approach. It deliv-
ers a good consideration of the phishing issue, a present machine learning solution,
and future studies about Phishing threats by using the machine learning approach.
4
Kumar.A et al., [5] focused on the International Conference on Internet of Things
Smart Innovation and Usages (pp. 1-6). IEEE. This paper proposes a machine
learning-based approach for detecting phishing websites by analyzing various fea-
tures such as URL, domain, and content. The proposed approach is evaluated on a
dataset of phishing and legitimate websites and achieves high accuracy rates.
M. Karabatak et al., [7] proposed the approach that, These days, numerous en-
emies of phishing frameworks are being created to recognize phishing substances
in online correspondence frameworks. In spite of the accessibility of hordes hostile
to phishing frameworks, phishing proceeds unabated because of lacking recognition
of a zero-day assault, pointless computational overhead, and high bogus rates. In
spite of the fact that Machine Learning approaches have accomplished promising
exactness rates, the decision and the exhibition of the component vector limit their
successful location. In this work, an upgraded AI-based prescient model is proposed
to improve the effectiveness of phishing plans.
Naik et al., [8] proposed a new approach to detecting phishing websites using
convolutional neural networks (CNNs) and a feature extraction technique based on
the distribution of ASCII characters in the URL. This paper provides a survey of the
existing literature on phishing website detection using machine learning. It covers
different approaches and techniques, as well as challenges and future directions.
T. Peng et al., [9] focused on Phishing attacks are one of the most common
and least defended security threats today. We present an approach that uses natural
language processing techniques to analyze text and detect inappropriate statements
which are indicative of phishing attacks.
5
Y. Sonmez et al., [10] proposed that Phishing commonly attacks credulous peo-
ple by making them disclose their unique information using counterfeit websites.
Phishing website URLs aim to purloin personal information like usernames, pass-
words, and online banking transactions. Phishers use websites that are visually and
semantically similar to those real websites. As technology continues to grow, phish-
ing techniques started to progress rapidly and this needs to be prevented by using
antiphishing mechanisms to detect phishing. Machine learning is a powerful tool
used to strive against phishing attacks. This paper surveys the features used for de-
tection and detection techniques using machine learning.
6
PROJECT DESCRIPTION
Existing System
Where in the case of the existing system means that what is the previous system says
a Manual human intervention is not that much applicable and error-prone. Legacy
and Conventional Data Mining Algorithms can’t deal with huge volumes of data,
slower and more inaccurate, the existing system is late processing the modules, it
takes more time to find the accuracy of the classifiers and finally, the accurate results
are less than proposed system
Proposed System
Machine Learning is cutting edge and trending for different kinds of diverse applica-
tions in a society where it can deal with tons of data, refined and revised algorithms,
and available heavy processing power in terms of GPU. the proposed system is hav-
ing better processing than the existing system, it is less time-consuming to provide
results, and finally gives better accuracy than the existing system
Feasibility Study
The feasibility of the project is to show if machine learning can effectively detect
phishing websites. The accuracy, precision, and recall of the models will determine
if they can be implemented for real-world applications. The study will also reveal
the most effective features for identifying phishing websites, which can be used to
improve the accuracy of the models.
7
Economic Feasibility
This study is carried out to check the economic impact that the system will have on
the organization. The amount of funds that the company can pour into the research
and development of the system is limited. The expenditures must be justified. Thus,
the developed system is well within the budget and this was achieved because most
of the technologies used are freely available. Only the customized products had to
be purchased.
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical re-
quirements of the system. Any system developed must not have a high demand on
the available technical resources. This will lead to high demands on the available
technical resources. This will lead to high demands being placed on the client. The
developed system must have a modest requirement, as only minimal or null changes
are required for implementing this system.
Social Feasibility
The aspect of the study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user
must not feel threatened by the system, instead must accept it as a necessity. The
level of acceptance by the users solely depends on the methods that are employed
to educate the user about the system and to make him familiar with it. His level of
confidence must be raised so that he is also able to make some constructive criticism,
which is welcomed, as he is the final user of the system.
System Specification
Requirement analysis is a very critical process that enables the success of a system
or software project to be assessed. Requirements are generally split into two types:
Functional and non-functional requirements.
8
Functional Requirements
These are the requirements that the end user specifically demands as basic facilities
that the system should offer. All these functionalities need to be necessarily incor-
porated into the system as a part of the contract. These are represented or stated in
the form of input to be given to the system, the operation performed and the output
expected. They are basically the requirements stated by the user which one can see
directly in the final product, unlike the non-functional requirements.
Non-functional requirements
These are basically the quality constraints that the system must satisfy according to
the project contract. The priority or extent to which these factors are implemented
varies from one project to other. They are also called non-behavioral requirements.
They basically deal with issues like:
• Portability
• Security
• Maintainability
• Reliability
• Scalability
• Performance
• Reusability
9
Examples of non-functional requirements
• Emails should be sent with a latency of no greater than 12 hours from such an
activity.
• The processing of each request should be done within 10 seconds
• The site should load in 3 seconds whenever simultaneous users are 10000
Hardware Specification
Software Specification
The details of the User and Admin will be private and their details should not be
released until and unless the user allows to share them for some specific reason. for
example, the details of the user will not be shared with the Admin until he chooses
a course to enroll in and he needs to fill in certain details required for enrolling pro-
cess. Similarly, the user will not be allowed to get the data of the Admin until Admin
allows them to share the details with the user.
Details provided during the registration on a role basis will allow you to provide
or get service through this web application. A user will not be allowed to use the
feature of the admin part until he is admitted as an admin.
10
METHODOLOGY
General Architecture
In Figure 4.1 has a data collection component at the beginning, followed by a prepro-
cessing component. The preprocessed data would then be fed into a feature selection
component, followed by a model selection component, and then a model training
component. The trained model would then be evaluated using a model evaluation
component. The final component would be a model deployment component, which
would deploy the model in a real-time environment. Finally, a monitoring and up-
dates component would be used to ensure that the model remains effective over time.
11
4.1.1 Data Flow Diagram
Inputs:
The inputs for the system would be the URLs or web addresses of the websites that
need to be checked for phishing. This could be provided by a human user or gener-
ated automatically by a system that collects URLs from various sources.
Preprocessing:
The URLs would then be preprocessed to extract the relevant features that could be
used in the machine learning model. Features could include characteristics such as
the length of the URL, the number of subdomains, the presence of suspicious key-
words, etc.
Machine Learning Model: The preprocessed data would be fed into a machine
learning model that has been trained to detect phishing websites. The model could
use techniques such as supervised learning, unsupervised learning, or a combination
of both, to classify the websites as legitimate or phishing.
12
Outputs: The output of the system would be a prediction of whether each website
is a phishing site or not. This could be presented to the user in various formats, such
as a list of URLs with phishing scores or a visualization that highlights suspicious
websites.
Feedback: The system could also include a feedback loop that allows users to
report false positives or false negatives, which can be used to improve the accuracy
of the model over time.
Database: The data flow diagram would also include a database that stores the
URLs and their corresponding classifications. This could be used to train the ma-
chine learning model or to monitor the performance of the system over time.
Overall, the data flow diagram for phishing website detection using machine learn-
ing would be designed to efficiently process large volumes of data and provide accu-
rate predictions of phishing websites. Through a combination of machine learning
algorithms and user feedback loops, the system could continue to improve and adapt
to new threats over time.
13
Use Case Diagram
In Figure 4.3 depicts the different actors and their interactions with the machine
learning system, as well as the different use cases and their relationships. For exam-
ple, the user may interact with the system through a web browser, while the phishing
website may attempt to deceive the user by masquerading as a legitimate website.
The machine learning system would sit in between these two actors, analyzing the
web traffic and identifying potential threats. The system would then take appropriate
actions such as alerting the user, blocking access to the phishing website, or updat-
ing the machine learning model. The reports generated by the system would provide
valuable insights into the effectiveness of the machine learning system in detecting
and preventing phishing attacks.
14
Class Diagram
In Figure 4.4 depicts the different classes and their relationships within the phishing
website detection system. The class serves as the main interface between the user
and the phishing website, relying on sub-systems such, as Feature Extractor to detect
and prevent phishing attacks. The different classes and their relationships provide a
clear understanding of how the system functions and how different components work
together to achieve the overall goal of detecting and preventing phishing attacks.
15
Sequence Diagram
In Figure 4.5 would depict the different components of the system and their interac-
tions, showing the flow of data and control between them. For example, the diagram
would show how the user’s web request is processed and how the system uses sub-
systems such as to detect and prevent phishing attacks. The diagram would also show
how the system communicates with the user through their web browser to provide
warnings and alerts when potential phishing attacks are detected. Overall, the se-
quence diagram would provide a clear understanding of the different steps involved
in detecting and preventing phishing attacks using machine learning.
16
Collaboration Diagram
In Figure 4.6 depicts the interactions and communication between these components,
showing how they collaborate to detect and prevent phishing attacks using machine
learning. For example, the diagram would show how the User’s web browser com-
municates with the to send web requests and receive warnings and alerts when po-
tential phishing attacks are detected. The diagram would also show how the com-
municates with sub-systems such as analyzing website data, training and evaluating
machine learning models and updating the system periodically. The diagram would
also show how it communicates with potentially malicious websites to analyze their
content and determine whether they are phishing websites or not. Overall, the collab-
oration diagram would provide a clear understanding of the different components of
the system and how they collaborate to achieve the goal of detecting and preventing
phishing attacks using machine learning.
17
Activity Diagram
In Figure 4.7 depicts the different activities involved in detecting and preventing
phishing attacks using machine learning, showing the flow of control between them.
For example, the diagram would show how data is collected and how features are ex-
tracted by the Feature Extractor, how machine learning algorithms are selected, and
how the machine learning model is trained and evaluated by the and respectively. The
diagram would also show how the analyzes website content, alerts the user, blocks
the website if necessary, and how periodically updates the machine learning model
to improve its accuracy. Overall, the activity diagram would provide a clear un-
derstanding of the different activities involved in detecting and preventing phishing
attacks using machine learning algorithms.
18
Algorithm and Pseudo Code
3. Implement App.py component, which will serve as the main component of the
website.
5. The algorithms that are used are ADA Boost, XG-Boost, Random Forest Clas-
sifier, Gradient Boost Classifier
6. The above classifiers are used to collect and preprocess the data.
7. Collect a dataset of URLs with labels indicating whether they are phishing or le-
gitimate. Preprocess the data by extracting relevant features such as URL length,
domain age, and presence of certain keywords.
8. Split the dataset into training and testing sets. The training set will be used to
train the models, and the testing set will be used to evaluate the performances of
the models.
9. Train the model on the training set by sequentially adding weak classifiers to
the model. Each weak classifier is trained on a subset of the training data and
assigned a weight based on its performance. The final model is a weighted
combination of the weak classifiers.
10. Evaluate the performance of the models on the testing set by calculating metrics
such as accuracy, precision, recall, and F1-score.
11. Tune the hype parameters of the models to optimize their performance of the
models. Hyperparameters include the number of weak classifiers and the learn-
ing rate.
12. Deploy the trained models to detect phishing websites in real time by inputting
a URL and predicting whether it is phishing or legitimate
19
Pseudo Code
1 # Preprocess data
2 features = extract features ( website data )
3
4 # T r a i n model
5 model = t r a i n m o d e l ( t r a i n i n g d a t a , labels )
6
7 # E v a l u a t e model
8 a c c u r a c y = e v a l u a t e m o d e l ( model , test data , test labels )
9
13 # Output result
14 if p r e d i c t i o n == ’ p h i s h i n g ’ :
15 p r i n t ( ”WARNING: T h i s w e b s i t e may be a p h i s h i n g s i t e ! ” )
16 else :
17 p r i n t ( ” T h i s w e b s i t e a p p e a r s t o be l e g i t i m a t e . ” )
18
19 def login () :
20 if r e q u e s t . method== ’POST ’ :
21 u s e r e m a i l = r e q u e s t . form [ ’ u s e r e m a i l ’ ]
22 session [ ’useremail ’]=useremail
23 u s e r p a s s w o r d = r e q u e s t . form [ ’ u s e r p a s s w o r d ’ ]
24 s q l =” s e l e c t c o u n t ( * ) from u s e r where Email=’% s ’ and Password=’% s ’ ”%( u s e r e m a i l , u s e r p a s s w o r d )
25 u e r y ( s q l , db )
26 print (x)
27 p r i n t ( ’ ######################## ’ )
28 c o u n t =x . v a l u e s [ 0 ] [ 0 ]
29
30 if c o u n t == 0 :
31 msg=” u s e r C r e d e n t i a l s Are n o t valid”
32 return r e n d e r t e m p l a t e ( ” l o g i n . html ” , name=msg )
33 else :
34 s=” s e l e c t * from u s e r where Email=’% s ’ and Password=’% s ’ ”%( u s e r e m a i l , u s e r p a s s w o r d )
35 z=pd . r e a d s q l q u e r y ( s , db )
36 session [ ’email ’]=useremail
37 pno= s t r ( z . v a l u e s [ 0 ] [ 5 ] )
38 p r i n t ( pno )
39 name= s t r ( z . v a l u e s [ 0 ] [ 1 ] )
40 p r i n t ( name )
41 s e s s i o n [ ’ pno ’ ] = pno
42 s e s s i o n [ ’ name ’ ] = name
43 return r e n d e r t e m p l a t e ( ” userhome . html ” , myname=name )
44 return r e n d e r t e m p l a t e ( ’ l o g i n . html ’ )
45
20
49 i f r e q u e s t . method == ”POST” :
50 model = i n t ( r e q u e s t . form [ ’ s e l e c t e d ’ ] )
51 p r i n t ( model )
52 p a t h = os . l i s t d i r ( app . c o n f i g [ ’ u p l o a d f o l d e r ’ ] )
53 f i l e = os . p a t h . j o i n ( app . c o n f i g [ ’ u p l o a d f o l d e r ’ ] , p a t h [ 0 ] )
54 df = pd . r e a d c s v ( f i l e )
55
56 p r i n t ( df . columns )
57 p r i n t ( ’ ####################################################### ’ )
58
63 p r i n t ( df )
64 i f model == 1 :
65 from s k l e a r n . ensemble import RandomForestClassifier
66 rfr = RandomForestClassifier ()
67 rfr . fit ( x train , y train )
68 pred = rfr . predict ( x test )
69 s c o r e 1 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
70 print (score1)
71 msg = ’ The a c c u r a c y o b t a i n e d by Random F o r e s t C l a s s i f i e r i s ’ + s t r ( s c o r e 1 ) + s t r ( ’%’ )
72 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
73 elif model == 2 :
74 classifier = AdaBoostClassifier ()
75 classifier . fit ( x train , y train )
76 pred = classifier . predict ( x test )
77 s c o r e 2 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
78 print (score2)
79 msg = ’ The a c c u r a c y o b t a i n e d by Ada Boost C l a s s i f i e r i s ’ + s t r ( s c o r e 2 ) + s t r ( ’%’ )
80 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
81 elif model == 3 :
82 from x g b o o s t i m p o r t X G B C l a s s i f i e r
83 xgb = X G B C l a s s i f i e r ( )
84 xgb . f i t ( x t r a i n , y t r a i n )
85 p r e d = xgb . p r e d i c t ( x t e s t )
86 s c o r e 3 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
87 print (score3)
88 msg = ’ The a c c u r a c y o b t a i n e d by XGBoost C l a s s i f i e r i s ’ + s t r ( s c o r e 3 ) + s t r ( ’%’ )
89 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
90 elif model == 4 :
91 c f = SVC( k e r n e l = ’ l i n e a r ’ )
92 cf . fit ( x train , y train )
93 pred = cf . predict ( x test )
94 s c o r e 4 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
95 print (score4)
96 msg = ’ The a c c u r a c y o b t a i n e d by S u p p o r t V e c t o r Machine i s ’ + s t r ( s c o r e 4 ) + s t r ( ’%’ )
97 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
98 elif model == 5 :
21
99 gb = G r a d i e n t B o o s t i n g C l a s s i f i e r (
100 ) gb . f i t ( x t r a i n , y t r a i n )
101 p r e d = gb . p r e d i c t ( x t e s t )
102 score5 = accuracy score ( y test , pred)
103 *100 p r i n t ( s c o r e 5 )
104 msg = ’ The a c c u r a c y o b t a i n e d by G r a d i e n t B o o s Classifie is ’ + str (score5) + str (
r ’%
ting’)
105 r e t u r n r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
106 r e t u r n r e n d e r t e m p l a t e ( ’ model . html ’ )
107
108
109
110 # f i t t h e model
111 forest.fit(X train , y train)
112 p r i n t ( ’ aa ’ )
113 print ( url1 )
114 print (type( url1 ))
115 my features = featureExtraction ( url
116 1 ) p r o b o f d o m s = top doms [ 1 ] . v a l u e s
117 if my features [0] in prob of doms:
118 r e t u r n r e n d e r t e m p l a t e ( ’ p r e d i c t i o n . html ’ , msg = ’ s u c c e s
119 s ’)else :
120 pred 1 = f o r e s t . p r e d i c t ( [ m y f e a t u r e s [ 1 : ]
121 ] ) p r i n t ( pred 1 )
122 i f pred 1 == 0 :
123 msg=” ”
124 else :
125 msg=” ”
126 return r e n d e r t e m p l a t e ( ’ p r e d i c t i o n . html ’ , r e s u l t =pred 1 , msg =
127 msg ) r e t u r n r e n d e r t e m p l a t e ( ’ p r e d i c t i o n . html ’ )
This pseudo-code includes an ’App’ component that fetches project data and renders
the classifier components. It Gathers a large dataset of labeled website examples,
including both legitimate and phishing websites. It Converts the website data into
a format that can be used by the machine learning algorithm. This may include ex-
tracting features such as URL length, domain age, and the presence of suspicious
keywords. It uses a supervised learning algorithm, such as a decision tree, random
forest, XG-Boost, Gradient Boost, and Support Vector Machine to train the model
on the preprocessed training data. It uses a separate test dataset to evaluate the per-
formance of the trained model. Finally, evaluate the model’s performance, and use
the model to classify new websites as either legitimate or potential phishing sites.
22
Module Description
Data Collection
As you know, machines initially learn from the data that you give them. It is of
the utmost importance to collect reliable data so that your machine-learning model
can find the correct patterns. The quality of the data that you feed to the machine
will determine how accurately your model is performing. If you have incorrect or
outdated data, you will have wrong outcomes or predictions which are not relevant.
Putting together all the data you have and randomizing it. This helps make sure
that data is evenly distributed, and that the ordering does not affect the learning
process. Cleaning the data to remove unwanted data, missing values, rows, columns,
duplicate values, data type conversion, etc. You might even have to restructure the
dataset and change the rows and columns or index of rows and columns. Visualize
the data to understand how it is structured and understand the relationship between
various variables and classes present.
Splitting the cleaned data into two sets - a training set and a testing set. The training
set is the set your model learns from. A testing set is used to check the accuracy of
your model after training.
Model Training
Training is the most important step in machine learning. In training, you pass the
prepared data to your machine-learning model to find patterns and make predictions.
It results in the model learning from the data so that it can accomplish the task set.
Over time, with training, the model gets better at predicting.
Making Predictions
In the end, you can use your model on unseen data to make predictions accurately.
and finally, it generates a graph that shows which model secured high accuracy and
shows us if the URL is legitimate for phishing.
23
Steps to execute/run/implement the project
Home page
Here users view the home page of the phishing website prediction web application
which shows the interface like register, login, logout, etc.
Uploading dataset
Download the dataset from Kaggle so that the data of legitimate and phished websites
list is already collected and uploaded to Kaggle. Use this data set and submit it in
load data section.
The uploaded data set is shown on the screen which contains the phished and legiti-
mate URL’s
Testing
Testing the data set using selected algorithms like ADA Boost, Support Vector Ma-
chine, Random Forest Classifier and etc.,
Prediction
Get the accuracy of the selected algorithms and evaluate the model’s performance,
and use the model to classify new websites as either legitimate or potential phishing
sites.
24
IMPLEMENTATION AND TESTING
Input Design
In an information system, input is the raw data that is processed to produce output.
During the input design, the developers must consider the input devices such as PC,
MICR, and OMR so on.
Therefore, the quality of system input determines the quality of system output.
Output Design
The design of output is the most important task of any system. During output design,
developers identify the type of outputs needed and consider the necessary output
controls and prototype report layouts.
25
Testing
Types of Testing
There are several types of tests done depending on the project functionality and re-
quirements some of the testings are mentioned below.
Unit testing
Unit testing is a software development process in which the smallest testable parts
of an application, called units, are individually and independently tested for proper
operation. This testing helps the developer to understand whether all the components
are working accordingly or not.
26
Input
1 <!DOCTYPE html>
2 <html l a n g =” en ”>
3
4 <head>
5 <meta c h a r s e t =” u t f −8 ”>
6 <meta c o n t e n t =” width = device −width , i n i t i a l − s c a l e = 1 . 0 ” name=” v i e w p o r t ”>
7
12 <!−− F a v i c o n s −−>
13 <link h r e f =” s t a t i c / a s s e t s / img / f a v i c o n . png ” r e l =” i c o n ”>
14 <link h r e f =” s t a t i c / a s s e t s / img / apple − touch − i c o n . png ” r e l =” apple − touch − i c o n ”>
15
33
34 </ head>
35
36 <body>
37
27
46 <nav c l a s s =” nav −menu d−none d−lg − b l o c k ”>
47
48 <ul >
49 < l i ><a h r e f =” / ”>Home</a></ l i >
50 < l i c l a s s =” a c t i v e ”><a h r e f =” / l o g i n ”>Log In </a></ l i >
51 <l i ><a h r e f =” / r e g i s t r a t i o n ”>R e g i s t e r </a></ l i >
52
53
60 </ ul >
61
69 <d i v c l a s s =” o v e r l a y ”>
70 <d i v c l a s s =” gtco − c o n t a i n e r ”>
71 <d i v c l a s s =” row ”>
72 <d i v c l a s s =” col −md−12 col −md− o f f s e t −0 t e x t − c e n t e r ”>
73 <d i v c l a s s =” d i s p l a y − t ”>
74 <d i v c l a s s =” d i s p l a y − t c animate −box ” data − animate − e f f e c t =” f a d e I n ”>
75 <c e n t e r >
76
77
78 <!−− <h3 s t y l e =” bottom : 151 px ; c o l o r : rgb ( 1 1 , 203 , 236 ) ; t o p : − 222 ; ”>{{msg}}</ h3> −−>
79 </ c e n t e r >
80 <!−− <h3 s t y l e =” c o l o r : rgb ( 1 1 , 203 , 236 ) ; bottom : 115 px ; ”> Welcome To t h e w e b s i t e </h3>
−−>
81 </ div >
82 </ div >
83 </ div >
84 </ div >
85 </ div >
86 </ div >
87 <!−− ======= Hero S e c t i o n ======= −−>
88 <section i d =” he r o ”>
89 <d i v i d =” h e r o C a r o u s e l ” c l a s s =” c a r o u s e l slide c a r o u s e l − f a d e ” data − r i d e =” c a r o u s e l ”>
90
93
94
28
95 <!−− <d i v c l a s s =” c a r o u s e l − i n n e r ” r o l e =” l i s t b o x ”>−−>
96
97 <!−− S l i d e 1 −−>
98 <d i v c l a s s =” c a r o u s e l − i t em a c t i v e ” s t y l e =” background −image : u r l ( s t a t i c / a s s e t s / img / s l i d e / 2 . j p g
)”
>
99
29
Test result
Integration testing
Ensure that the system can correctly receive input data from various sources, such
as user input or website data feeds. Test the preprocessing stage to ensure that the
data is cleaned and formatted correctly. Verify that the feature extraction stage is
accurately extracting relevant features from the preprocessed data. Test the feature
selection techniques to ensure that they are effectively selecting the most relevant
features for the machine learning model. Test the machine learning model selection
process to ensure that the appropriate model is chosen for the detection task. Verify
that the model is trained correctly on the preprocessed and feature-selected data.
Test the evaluation metrics to ensure that the performance of the model is accurately
measured. Verify that the model parameters and feature selection techniques are
optimized to improve its performance. Test the deployment process to ensure that the
optimized model is successfully integrated into the system for real-world phishing
website detection. Verify that the model is being monitored and updated periodically
to maintain its effectiveness against evolving phishing tactics.
30
Input
1 /*
2 SQLyog E n t e r p r i s e − MySQL GUI v6 . 5 6
3 MySQL − 5 . 5 . 5 − 1 0 . 1 . 1 3 − MariaDB : D a t a b a s e − p h i s h i n g
4
15 / * Data f o r t h e t a b l e ‘ user ‘ * /
16
17 insert i n t o ‘ user ‘ ( ‘ Id ‘ , ‘ Name ‘ , ‘ Email ‘ , ‘ Password ‘ , ‘ Age ‘ , ‘ Mob ‘ ) v a l u e s ( 1 , ’ Balaram ’ , ’ balaram@ gmail .
com ’ , ’ 1234 ’ , ’ 26 ’ , ’ 7853011277 ’ ) ;
18
31
Test result
System testing
Input
3 < s e c t i o n i d =” he r o ”>
4 <d i v i d =” h e r o C a r o u s e l ” c l a s s =” c a r o u s e l slide c a r o u s e l − f a d e ” data − r i d e =” c a r o u
s e l ”>
5
<h1 s t y l e =” c o l o r : w h i t e ; ”>{{msg}}</ h1>
6
7
<o l c l a s s =” c a r o u s e l − i n d i c a t o r s ” i d =” hero − c a r o u s e l − i n d i c a t o r s ”></ol >
8
13 <d i v c l a s s =” c a r o u s e l − c o n t a i n e r ”>
14 <c e n t e r ><h4 s t y l e =” c o l o r : w h i t e ; ”>{{msg}}</ h4></ c e n t e
r>
15
<d i v c l a s s =” c o n t a i n e r ”>
16
{%b l o c k body %}
17
{% i f msg == ’ s u c c e s s ’ %}
32
18 <h3 s t y l e =” c o l o r : w h i t e ; background : g r e e n ”><i >The w e b s i t e i
s ” L e g i t i m a t e ” </ i ></h3>
19 {% e l i f r e s u l t == [ 0 ] %}
20 <h3 s t y l e =” c o l o r : w h i t e ; background : g r e e n ”><i >The w e b s i t e i
s ” L e g i t i m a t e ” </ i ></h3>
21 {% e l i f r e s u l t == [ 1 ] %}
22 <h3 s t y l e =” c o l o r : w h i t e ; background : g r e e n ”><i >The w e b s i t e i
s ” p h i s h i n g ” </ i ></h3>
23 {% e n d i f %}
24 {% e n d b l o c k %}
25 <h3 s t y l e =” c o l o r : w h i t e ”>ENTER URL</h3>
26 <form a c t i o n =” {{ u r l f o r ( ’ p r e d i c t i o n ’ ) }} ” method=” p o s t ”>
27 <i n p u t t y p e =” u r l ” name=” a ” s t y l e =” width : 50 0 px ” p l a c e h o l d e r
=” E n t e r URL”><b r><b r>
28
29
32
33
34 </ form>
35
43
44
45
46
47
60 </ body>
61
62 </ html>
33
Test result
In Figure 5.3 the graph shows us the models and the accuracy obtained by them here
the most accurate and highest result is obtained by Random Forest Classifier(89.78),
the second highest accuracy is obtained by XG-Boost Classifier(83.54), the third
accurate result are obtained by the gradient boost classifier(82.34), and finally, ADA
Boost classifier and Support vector machine obtained the same level of accuracy that
is (78.67).
34
Test Result
In Figure 5.4 the model is trained by checking the classifiers giving the accuracy
based on that the model is selected and is ready to predict whether the URL given is
legitimate or a phishing URL.
35
RESULTS AND DISCUSSIONS
Phishing websites are fraudulent websites that are designed to trick users into giving
away their personal or financial information. Phishing attacks are becoming more
sophisticated, and traditional anti-phishing measures are no longer enough. There-
fore, a proposed system of phishing website detection using machine learning can be
developed. Machine learning algorithms are designed to detect patterns in data that
human analysts may not be able to spot. The proposed system of phishing website
detection would utilize machine learning algorithms to analyze website features such
as domain age, certificate issuer, IP address location, and HTML code. The analysis
can then be compared to known phishing websites to determine if the website poses
a threat.
Once the system identifies a potential phishing website, it would send an alert to the
user warning them of the threat. The user can then choose to proceed with caution or
avoid the site altogether. Overall, the proposed system of phishing website detection
using machine learning has the potential to significantly reduce the risk of phishing
attacks. However, it is important to note that no system is foolproof and the human
factor remains critical in ensuring online safety. Therefore, users must remain vig-
ilant and aware of potential threats to their personal and financial information, even
when using machine learning-based detection systems.
36
Comparison of Existing and Proposed System
In Table 6.1 proposed system utilizes advanced machine learning algorithms, feature
engineering, and feature selection to improve the accuracy of phishing website detec-
tion. Additionally, the proposed system uses advanced phishing detection techniques
to identify more sophisticated phishing websites. Finally, the proposed system gen-
erates advanced alerts and reports to ensure that users and authorities are quickly
alerted to phishing websites.
Sample Code
1 w r i t e your code h e r e
2 main code
3 i m p o r t os
4 from flask import *
5 i m p o r t m a t p l o t l i b . p y p l o t as
6 p l t i m p o r t s e a b o r n as s n s
7 i m p o r t pandas as pd
8 from urllib . parse import urlpar
9 seimport ipaddress
10 import re
11 from bs 4 i m p o r t B e a u t i f u l S o
12 u p i m p o r t whois
37
13 import urllib
14 import urllib . request
15 from d a t e t i m e i m p o r t d a t e t i m e
16 import requests
17
27 app = F l a s k ( name )
28 app . s e c r e t k e y = ” f g h h d f g d f g r t h r t t g d f s a d f s a f f g d ”
29
30 app . c o n f i g [ ’ u p l o a d f o l d e r ’ ] = r ’ u p l o a d s ’
31 top doms = pd . r e a d c s v ( ’ top −1m. csv ’ , h e a d e r =None )
32
33 @app . r o u t e ( ’ / ’ )
34 d e f home ( ) :
35 return r e n d e r t e m p l a t e ( ’ i n d e x . html ’ )
36
52 if c o u n t == 0 :
53 msg=” u s e r C r e d e n t i a l s Are n o t valid”
54 return r e n d e r t e m p l a t e ( ” l o g i n . html ” , name=msg )
55 else :
56 s=” s e l e c t * from u s e r where Email=’% s ’ and Password=’% s ’ ”%( u s e r e m a i l , u s e r p a s s w o r d )
57 z=pd . r e a d s q l q u e r y ( s , db )
58 session [ ’email ’]=useremail
59 pno= s t r ( z . v a l u e s [ 0 ] [ 5 ] )
60 p r i n t ( pno )
61 name= s t r ( z . v a l u e s [ 0 ] [ 1 ] )
62 p r i n t ( name )
38
63 s e s s i o n [ ’ pno ’ ] = pno
64 s e s s i o n [ ’ name ’ ] = name
65 return r e n d e r t e m p l a t e ( ” userhome . html ” , myname=name )
66 return r e n d e r t e m p l a t e ( ’ l o g i n . html ’ )
67 @app . r o u t e ( ’ / r e g i s t r a t i o n ’ , methods =[ ”POST” , ”GET” ] )
68 def registration () :
69 if r e q u e s t . method== ’POST ’ :
70 username = r e q u e s t . form [ ’ username ’ ]
71 u s e r e m a i l = r e q u e s t . form [ ’ u s e r e m a i l ’ ]
72 u s e r p a s s w o r d = r e q u e s t . form [ ’ u s e r p a s s w o r d ’ ]
73 conpassword = r e q u e s t . form [ ’ conpassword ’ ]
74 Age = r e q u e s t . form [ ’ Age ’ ]
75
76 c o n t a c t = r e q u e s t . form [ ’ c o n t a c t ’ ]
77 if u s e r p a s s w o r d == conpassword :
78 s q l =” s e l e c t * from u s e r where Email=’% s ’ and Password=’% s ’ ”%( u s e r e m a i l , u s e r p a s s w o r d )
79 cur . execute( sql )
80 data=cur . fetchall ()
81 db . commit ( )
82 print (data)
83 if data ==[]:
84
99
39
113
131 p r i n t ( df . columns )
132 p r i n t ( ’ ####################################################### ’ )
133
138 p r i n t ( df )
139 i f model == 1 :
140 from s k l e a r n . ensemble import RandomForestClassifier
141 rfr = RandomForestClassifier ()
142 rfr . fit ( x train , y train )
143 pred = rfr . predict ( x test )
144 s c o r e 1 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
145 print (score1)
146 msg = ’ The a c c u r a c y o b t a i n e d by Random F o r e s t C l a s s i f i e r i s ’ + s t r ( s c o r e 1 ) + s t r ( ’%’ )
147 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
148 elif model == 2 :
149 classifier = AdaBoostClassifier ()
150 classifier . fit ( x train , y train )
151 pred = classifier . predict ( x test )
152 s c o r e 2 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
153 print (score2)
154 msg = ’ The a c c u r a c y o b t a i n e d by Ada Boost C l a s s i f i e r i s ’ + s t r ( s c o r e 2 ) + s t r ( ’%’ )
155 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
156 elif model == 3 :
157 from x g b o o s t i m p o r t X G B C l a s s i f i e r
158 xgb = X G B C l a s s i f i e r ( )
159 xgb . f i t ( x t r a i n , y t r a i n )
160 p r e d = xgb . p r e d i c t ( x t e s t )
161 s c o r e 3 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
40
162 print (score3)
163 msg = ’ The a c c u r a c y o b t a i n e d by XGBoost C l a s s i f i e r i s ’ + s t r ( s c o r e 3 ) + s t r
164 ( ’%’ ) r e t u r n r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
165 e l i f model == 4 :
166 c f = SVC( k e r n e l = ’ l i n e a r ’
167 )cf . fit ( xtrain, ytrain )
168 pred = cf . predict ( x test )
169 score4 = accuracy score ( y test , pred)
170 *100 p r i n t ( s c o r e 4 )
171 msg = ’ The a c c u r a c y o b t a i n e d by S u p p o r t V e c t o r Machine i s ’ + s t r ( s c o r e 4 ) + s t
172 r ( ’%’ ) r e t u r n r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
173 e l i f model == 5 :
174 gb = G r a d i e n t B o o s t i n g C l a s s i f i e r (
175 ) gb . f i t ( x t r a i n , y t r a i n )
176 p r e d = gb . p r e d i c t ( x t e s t )
177 score5 = accuracy score ( y test , pred)
178 *100 p r i n t ( s c o r e 5 )
179 msg = ’ The a c c u r a c y o b t a i n e d by G r a d i e n t B o o s t i n g C l a s s i f i e r i s ’ + s t r ( s c o r e 5 ) +
s t r ( ’% ’ )
180 return r e n d e r t e m p l a t e ( ’ model . html ’ ,
181 msg=msg ) r e t u r n r e n d e r t e m p l a t e ( ’ model . html ’ )
182
41
Output
In Figure 6.1 when users open the website the home page is viewed which contains
the load data section, view data section, select model section, prediction section,
graph section, and finally log out section.
42
Figure 6.2: Prediction
In Figure 6.2 the algorithms that are used will be displayed from which the user
selects the algorithms to obtain accuracy after obtaining accuracy. We input the
random url to check whether the URL is legitimate or phishing.
43
CONCLUSION AND FUTURE
ENHANCEMENTS
Conclusion
This project presented various algorithms and approaches to detect phishing websites
by several researchers in Machine Learning. On reviewing the papers, we came to
the conclusion that most of the work is done by using familiar machine learning
algorithms like XG-Boost, Decision Tree and Random Forest, and MLP classifier
which generates the neural network results. Some authors proposed a new system
like Phish Score and Phish Checker for detection. The combinations of features with
regard to accuracy, precision, and recall were used. As phishing websites increase
day by day, some features may be included or replaced with new ones to detect them.
There are quite a few things that can be polished or added in future work. We have
opted to use two data mining classifiers in this project namely the ID3 and Naive
Bayes classifiers.
There are more classes such as the Bayesian network classifier, Neural Network
classifier, and C4.5 classifier. Such classifiers were not included in our project and
could be counted in the future to give more data to be compared with.
Future Enhancements
In the future, if we get a structured dataset of phishing we can perform phishing de-
tection much more faster than any other technique. In the future, we can use a com-
bination of any other two or more classifiers to get maximum accuracy. Our project
also plans to explore various phishing techniques that use Lexical features, Network-
based features, Content-based features, Webpage-based features, and HTML and
JavaScript features of web pages which can improve the performance of the system.
In particular, we extract features from URLs and pass them through the classifiers.
44
INDUSTRY DETAILS
SOURCE CODE
1 i m p o r t os
2 from flask import *
3 i m p o r t m a t p l o t l i b . p y p l o t as
4 p l t i m p o r t s e a b o r n as s n s
5 i m p o r t pandas as pd
6 from urllib . parse import urlpar
7 seimport ipaddress
8 import re
9 from bs 4 i m p o r t B e a u t i f u l S o
10 u p i m p o r t whois
11 import urllib
12 import urllib . request
13 from d a t e t i m e i m p o r t d a t e t i
14 meimport requests
15
24
25 app = F l a s k ( name )
26 app . s e c r e t k e y = ” f g h h d f g d f g r t h r t t g d f s a d f s a f f g d ”
27
28 app . c o n f i g [ ’ u p l o a d f o l d e r ’ ] = r ’ u p l o a d s ’
30
31 @app . r o u t e ( ’ / ’ )
32 d e f home ( ) :
33 r e t u r n r e n d e r t e m p l a t e ( ’ i n d e x . html ’ )
34
45
36 def login () :
37 if r e q u e s t . method== ’POST ’ :
38 u s e r e m a i l = r e q u e s t . form [ ’ u s e r e m a i l ’ ]
39 session [ ’useremail ’]=useremail
40 u s e r p a s s w o r d = r e q u e s t . form [ ’ u s e r p a s s w o r d ’ ]
41 s q l =” s e l e c t c o u n t ( * ) from u s e r where Email=’% s ’ and Password=’% s ’ ”%( u s e r e m a i l , u s e r p a s s w o r d )
42 # cur . execute( sql )
43 # data=cur . fetchall ()
44 # db . commit ( )
45 x=pd . r e a d s q l q u e r y ( s q l , db )
46 print (x)
47 p r i n t ( ’ ######################## ’ )
48 c o u n t =x . v a l u e s [ 0 ] [ 0 ]
49
50 if c o u n t == 0 :
51 msg=” u s e r C r e d e n t i a l s Are n o t valid”
52 return r e n d e r t e m p l a t e ( ” l o g i n . html ” , name=msg )
53 else :
54 s=” s e l e c t * from u s e r where Email=’% s ’ and Password=’% s ’ ”%( u s e r e m a i l , u s e r p a s s w o r d )
55 z=pd . r e a d s q l q u e r y ( s , db )
56 session [ ’email ’]=useremail
57 pno= s t r ( z . v a l u e s [ 0 ] [ 5 ] )
58 p r i n t ( pno )
59 name= s t r ( z . v a l u e s [ 0 ] [ 1 ] )
60 p r i n t ( name )
61 s e s s i o n [ ’ pno ’ ] = pno
62 s e s s i o n [ ’ name ’ ] = name
63 return r e n d e r t e m p l a t e ( ” userhome . html ” , myname=name )
64 return r e n d e r t e m p l a t e ( ’ l o g i n . html ’ )
65 @app . r o u t e ( ’ / r e g i s t r a t i o n ’ , methods =[ ”POST” , ”GET” ] )
66 def registration () :
67 if r e q u e s t . method== ’POST ’ :
68 username = r e q u e s t . form [ ’ username ’ ]
69 u s e r e m a i l = r e q u e s t . form [ ’ u s e r e m a i l ’ ]
70 u s e r p a s s w o r d = r e q u e s t . form [ ’ u s e r p a s s w o r d ’ ]
71 conpassword = r e q u e s t . form [ ’ conpassword ’ ]
72 Age = r e q u e s t . form [ ’ Age ’ ]
73
74 c o n t a c t = r e q u e s t . form [ ’ c o n t a c t ’ ]
75 if u s e r p a s s w o r d == conpassword :
76 s q l =” s e l e c t * from u s e r where Email=’% s ’ and Password=’% s ’ ”%( u s e r e m a i l , u s e r p a s s w o r d )
77 cur . execute( sql )
78 data=cur . fetchall ()
79 db . commit ( )
80 print (data)
81 if data ==[]:
82
46
86 db . commit ( )
87 flash (”Registered successfully” ,”success”)
88 return r e n d e r t e m p l a t e ( ” l o g i n . html ” )
89 else :
90 flash (”Details are i n v a l i d ” , ” warning ” )
91 return r e n d e r t e m p l a t e ( ” r e g i s t r a t i o n . html ” )
92 else :
93 f l a s h ( ” Password doesn ’ t match ” , ” warning ” )
94 return r e n d e r t e m p l a t e ( ” r e g i s t r a t i o n . html ” )
95 return r e n d e r t e m p l a t e ( ’ r e g i s t r a t i o n . html ’ )
96
97
129 p r i n t ( df . columns
130 X = df . drop ( [ ’ Label ’ , ’ Domain ’ , ’ W e b T r a f f i c ’ ] , axis =1)
131 y = df . Label
132 x t r a i n , x t e s t , y t r a i n , y t e s t = t r a i n t e s t s p l i t ( X, y , t e s t s i z e = 0 . 3 , r a n d o m s t a t e = 20 )
133
134 p r i n t ( df )
47
135 i f model == 1 :
136 from s k l e a r n . ensemble import RandomForestClassifier
137 rfr = RandomForestClassifier ()
138 rfr . fit ( x train , y train )
139 pred = rfr . predict ( x test )
140 s c o r e 1 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
141 print (score1)
142 msg = ’ The a c c u r a c y o b t a i n e d by Random F o r e s t C l a s s i f i e r i s ’ + s t r ( s c o r e 1 ) + s t r ( ’%’ )
143 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
144 elif model == 2 :
145 classifier = AdaBoostClassifier ()
146 classifier . fit ( x train , y train )
147 pred = classifier . predict ( x test )
148 s c o r e 2 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
149 print (score2)
150 msg = ’ The a c c u r a c y o b t a i n e d by Ada Boost C l a s s i f i e r i s ’ + s t r ( s c o r e 2 ) + s t r ( ’%’ )
151 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
152 elif model == 3 :
153 from x g b o o s t i m p o r t X G B C l a s s i f i e r
154 xgb = X G B C l a s s i f i e r ( )
155 xgb . f i t ( x t r a i n , y t r a i n )
156 p r e d = xgb . p r e d i c t ( x t e s t )
157 s c o r e 3 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
158 print (score3)
159 msg = ’ The a c c u r a c y o b t a i n e d by XGBoost C l a s s i f i e r i s ’ + s t r ( s c o r e 3 ) + s t r ( ’%’ )
160 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
161 elif model == 4 :
162 c f = SVC( k e r n e l = ’ l i n e a r ’ )
163 cf . fit ( x train , y train )
164 pred = cf . predict ( x test )
165 s c o r e 4 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
166 print (score4)
167 msg = ’ The a c c u r a c y o b t a i n e d by S u p p o r t V e c t o r Machine i s ’ + s t r ( s c o r e 4 ) + s t r ( ’%’ )
168 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
169 elif model == 5 :
170 gb = G r a d i e n t B o o s t i n g C l a s s i f i e r ( )
171 gb . f i t ( x t r a i n , y t r a i n )
172 p r e d = gb . p r e d i c t ( x t e s t )
173 s c o r e 5 = a c c u r a c y s c o r e ( y t e s t , p r e d ) *100
174 print (score5)
175 msg = ’ The a c c u r a c y o b t a i n e d by G r a d i e n t B o o s t i n g C l a s s i f i e r i s ’ + s t r ( s c o r e 5 ) +
s t r ( ’% ’ )
176 return r e n d e r t e m p l a t e ( ’ model . html ’ , msg=msg )
177 return r e n d e r t e m p l a t e ( ’ model . html ’ )
178
48
184 domain = u r l p r s e ( u r l ) . n e t l o c
a
185 ˆwww. ” , domain ) :
i f r e . match ( r ”
omain . r e p l a c e ( ”www. ” , ” ” )
186 domain = d
187 r e t u r n domain
188
49
References
[1] Anjali, S., Kumar, S., Singh, S. (2017). Machine learning techniques for
detecting phishing websites. International Journal of Computer Applications,
164(6),14-18.,2017.
[2] Al-Rfou, R., Dawoud, A., Alafandi, A., Saad, M. . Ensemble machine learn-
ing for detecting phishing websites. Journal of Intelligent Fuzzy Systems, 37(4),
5587-5596.2019.
[3] C.Santhosh Kumar, C. S., Sundararajan, E., Kalpana, R, Phishing website de-
tection using random forest and hybrid feature selection. International Journal of
Advanced Science and Technology, 28(3), 67-72.,2019.
[4] J. Shad and S. Sharma, “A Novel Machine Learning Approach to Detect Phishing
Websites Jaypee Institute of Information Technology,” 425–430, 2018.
[5] Kumar, A., Singhal, M., Jain, M. Machine learning-based approach for detecting
phishing websites. In 2019 International Conference on Internet of Things Smart
Innovation and Usages (IoT-SIU) (pp. 1-6). IEEE, 2019.
[6] K. Shima et al., “Classification of URL bitstreams using bag of bytes,” in 2018
21st Conference on Innovation in Clouds, Internet and Networks and Workshops
(ICIN), vol. 91, pp. 1–5.2018
[7] M. Karabatak and T. Mustafa, “Performance comparison of classifiers on re-
duced phishing website dataset,” 6th Int. Symp. Digit. Forensic Secur. ISDFS
1–5, 2018.
[8] Naik, S. P., Patil, R. B., Jadhav, D. A. Detection of phishing websites using
CNN and feature extraction technique. International Conference on Computing
Methodologies and Communication (ICCMC) (pp. 285-289). IEEE.2019.
[9] T. Peng, I. Harris, and Y. Sawa, “Detecting Phishing Attacks Using Natural Lan-
guage Processing and Machine Learning,” Proc. - 12th IEEE Int. Conf. Semant.
Comput. ICSC 300–301, 2018.
[10] Y. Sonmez, T. Tuncer, H. Gokal, and E. Avci, “Phishing web sites feature clas-
sification based on extreme learning machine,” 6th Int. Symp. Digit. Forensic
Secure. ISDFS - Proceeding, vol. 2018–Janua, 1–5, 2018.
50