0% found this document useful (0 votes)

82 views

Web Phishing Detection Using Machine Learning

This research paper outlines a methodology to determine URL legitimacy and detect phishing attempts. Python modules like who is, socket, re, IP address, and BeautifulSoup are employed to extract features such as IP address, URL length, domain name, subdomains, and favicon presence. These values are stored as a list and used to train classifiers. Kernel SVM, KNN, Random Forest, and decision tree classifiers are implemented.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views

Web Phishing Detection Using Machine Learning

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Web Phishing Detection using

Machine Learning
1
Seemantula Nischal; 2Bhunesh.K; 3Sudhish Reddy D
1,2,3
Student, Department of CSE, R.M.D. Engineering College, Kavaraipettai
4
Seemantula Namratha
4
Student, Department of IT, SSN College of Engineering

IJISRT23JUL457 www.ijisrt.com 120

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
ABSTRACT
This research paper outlines a methodology to determine URL legitimacy and detect phishing attempts. Python
modules like who is, socket, re, IP address, and BeautifulSoup are employed to extract features such as IP address, URL
length, domain name, subdomains, and favicon presence. These values are stored as a list and used to train classifiers.
Kernel SVM, KNN, Random Forest, and decision tree classifiers are implemented.

The Kernel SVM classifier (sklearn.svm.SVC) with the "rbf" kernel handles nonlinearity. Decision tree classification
is based on the "entropy" criterion using the sklearn.tree module. Random Forest combines multiple decision trees, with
final classification based on majority voting.The paper presents a user-friendly UI design for websites focused on phishing
detection. These websites utilize machine learning algorithms to assess URL authenticity and provide user feedback.
Integration of frameworks like Bootstrap and Particles.js enhances visual appeal and user experience.

The machine learning algorithm analyzes website content, structure, and other factors to determine legitimacy,
presenting results with a legitimacy percentage and associated risks.The study explores Flask, a flexible Python web
framework for rapid development of online applications. Flask provides built-in routing, templating, and supports
machine learning integration, enabling user input and result retrieval. It simplifies machine learning model deployment as
web services through APIs, facilitating integration with other applications.

Additionally, the research emphasizes Anaconda as an essential tool for data science and machine learning projects.
Anaconda offers efficient package management, simplifying installation, removal, and updating of required libraries. It
provides a comprehensive set of tools for the complete data science workflow, including data exploration, cleaning, model
construction, and deployment. Integration with Jupyter Notebook further enhances its capabilities.

In conclusion, this research paper presents a comprehensive approach for URL legitimacy assessment and phishing
detection, combining Python modules, machine learning classifiers, user-friendly UI design, Flask framework, and the
benefits of Anaconda for data science and machine learning projects

IJISRT23JUL457 www.ijisrt.com 121

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
TABLE OF CONTENT

S No. Title Page no.

1. Chapter one Introduction 123
2. Chapter Two Literature Review 125
3. Chapter Three Fundamentals 127
4. Chapter Four System Requirement Specification 132
5. Chapter Five System Design 133
6. Chapter Six Implementation 137
7. Chapter Seven Testing and Validation 138
8. Chapter Eight Laboratory Investigation 139
9. Chapter Nine Final Comments and Future Works 140
10. REFERENCES 141

IJISRT23JUL457 www.ijisrt.com 122

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER ONE
INTRODUCTION
Artificial intelligence (AI) is a field of study that focuses on creating, expanding, and enhancing human knowledge through
the development of theories, strategies, procedures, and applications. Machine learning (ML), a branch of AI, is closely related to
computational measurements and involves using computers to make predictions. ML is closely intertwined with scientific
progress and informs the discipline about methodologies, hypotheses, and areas of application. Data mining, although distinct
from ML, is sometimes integrated with it and is often referred to as unsupervised learning. Unsupervised ML can be used to learn
and establish patterns for different entities, enabling the detection of significant anomalies.

Cybersecurity refers to a range of technologies and practices aimed at protecting computers, networks, programs, and data
from attacks, unauthorized access, modification, or destruction. It encompasses network protection systems and computer
protection systems, which may include firewalls, antivirus software, and intrusion detection systems (IDS). IDSs help identify and
determine unauthorized network behavior, such as usage, replication, modification, and destruction.

Intrusion detection systems employ different types of network analysis techniques, including misuse-based network analysis,
anomaly-based network analysis, and hybrid network analysis.

 Misuse-based detection techniques aim to differentiate known attacks by utilizing specific characteristics associated with
those attacks.

 Anomaly-based methods analyze the normal behavior of a system and identify anomalies as deviations from this normal
behavior

Hybrid detection combines the capabilities of anomaly detection and abuse detection to enhance the effectiveness of
identifying known intrusions while minimizing false positives from unknown attackers.

The application of machine learning (ML) technologies in the field of cybersecurity is experiencing rapid growth, as
depicted in Figure 1.1. ML provides a powerful solution against zero-day attacks by categorizing IP traffic and isolating malicious
data for effective intrusion detection. Ongoing research focuses on leveraging measured traffic parameters and machine learning
approaches to strengthen cybersecurity measures.

The term "phishing" was coined in 1987 and refers to a form of online theft that aims to obtain personal information and
identity. Phishing attacks often involve the creation of deceptive websites that closely resemble legitimate ones, making it
challenging to discern their fraudulent nature. By gaining the trust of unsuspecting users, attackers deceive them into sharing
confidential and identifying information. As online transactions, such as bill payments and money transfers, have become
increasingly prevalent [12], the ability to recognize and identify fraudulent websites is of utmost importance. According to data
from the Anti-Phishing Working Group, there were a total of 647,592 different phishing sites recorded as of September 2018 [13].
Once attackers gain access to user passwords, they can easily carry out malicious activities.

Due to the rise in phishing attacks, various solutions have been proposed to address this issue. Several methodologies have
been developed to establish frameworks aimed at safeguarding against phishing attacks. Detection techniques for phishing attacks
encompass blacklisting, fuzzy rule-based approaches, whitelisting, machine learning-based methods, heuristic approaches, and
image-based techniques [14][15]. Numerous studies [16][17][18] discuss a range of strategies and tactics for detecting different
types of phishing attacks [19][20][21]. Phishing websites often appear authentic, making it challenging for individuals to
differentiate them from legitimate sites. Various web browsers incorporate anti-phishing measures to mitigate this risk [22].

 Motivation
A plethora of anti-phishing techniques are available to assist in the protection against phishing websites. Mozilla Firefox,
Safari, and Google Chrome utilize the Google Safe Browsing (GSB) [13] service to proactively block phishing websites. Other
widely used products, including McFee Site Advisor, Quick Heal, Avast, and Netcraft, offer extensive protection measures. GSB
employs a blacklist strategy to analyze URLs. However, a notable limitation of GSB is its inability to detect phishing websites if
the blacklist is not regularly updated. On the other hand, Netcraft labels a website as phishing only if it is explicitly blacklisted.
The warning notification is displayed when the user selects the appropriate button to obtain the risk rating. The risk arises when
the user fails to review the rating or proceeds despite the warning. Some software solutions, such as QuickHeal and Avast, provide
online security against security breaches. During testing, the functionality of Avast antivirus software was evaluated. However,
the Avast browser failed to identify a suspicious URI that was successfully detected by both Netcraft and GSB.

IJISRT23JUL457 www.ijisrt.com 123

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Problem Statement
The problem arises from a thorough examination and analysis of the machine learning-based classification method used for
identifying phishing websites. Our objective is to develop a system that can:
 Effectively and swiftly classify websites as either legitimate or phishing.
 Reduce the time and cost involved in the detection process.

 Aim and Objectives

The project aims to accomplish the following objectives:

 Explore different automated techniques for phishing detection.

 Determine and define appropriate machine learning approaches.
 Select a suitable dataset that addresses the problem statement.
 Utilize relevant algorithms to devise a solution for combating phishing attacks.

 Scope
The project specifically focuses on employing machine learning (ML) methods for network analysis and intrusion detection,
with a particular emphasis on detecting phishing website attacks.

 Challenges
The project entails several challenges, including:

 Identifying an appropriate dataset that aligns with the research goals.

 Conducting thorough feature extraction, necessitating a comprehensive understanding of various modules and achieving the
desired outcomes from each module.

 Organization of Thesis

Chapter 1 provides an overview of the utilization of machine learning in cybersecurity. It outlines the problem statement,
objectives, scope, and challenges encountered during the project. Chapter 2 presents a comprehensive review of the relevant
literature.

Chapter 3 delves into the research and exploration of phishing attacks and their detection using machine learning
approaches. It provides an extensive overview of previous studies conducted in this domain, highlighting their contributions and
limitations.

Chapter 4 outlines the specific software and hardware requirements needed for the system. It discusses the fundamental
prerequisites for the project and provides insights into the Python modules utilized in the implementation process.

The design of the system is elaborated in Chapter 5, which includes the representation of the system through architecture
diagrams, data flow diagrams, and activity diagrams. These visual representations offer a comprehensive understanding of the
system's functionality from various perspectives, including the system itself, the user, and its runtime behavior.

The implementation of the project is thoroughly addressed in Chapter 6. This chapter encompasses essential aspects such as
the selection and utilization of the dataset, the step-by-step implementation process, and the application of relevant classifiers.

Chapter 7 focuses on the analysis of test cases, examining the comparison between predicted and actual outputs to validate
the accuracy and effectiveness of the system.

Chapter 8 presents the obtained results and provides an overview of the project's environmental setup, offering insights into
the performance and efficiency of the system.

Finally, in Chapter 9, the project is concluded with a summary of the key findings. Additionally, potential future
improvements and advancements are outlined, indicating areas for further enhancement and development of the system.

IJISRT23JUL457 www.ijisrt.com 124

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER TWO
LITERATURE REVIEW
 Detecting Phishing Websites Using Machine Learning
Publication Year: 2019 Authors: Amani Alswailem, Bashayr Alabdullah, Norah Alrumayh, Dr. Aram Alsedrani Published
In: IEEE

Phishing attacks pose a significant threat as they employ social engineering techniques and malware to deceive individuals
and organizations. The attackers send emails or texts containing URLs that appear genuine but are intended to trick recipients into
revealing sensitive information. Machine learning techniques have been employed to detect and mitigate these phishing links. The
primary objective of this literature review is to increase awareness about phishing attacks and explore preventive strategies. Given
the alarming frequency of phishing emails being sent daily, it is essential to focus on prevention through user education,
technological solutions, and established protocols.

 Phishing Website Detection Based on Machine Learning: A Survey

Publication Year: 2020 Authors: Smt. Meeenu, Charu Singh Published In: ICACCS

Phishing attacks, characterized by their use of social engineering tactics and malware, represent a form of cybercrime where
individuals and organizations are lured into divulging sensitive information. Typically, phishers employ deceptive email or text
messages containing web URLs that mimic legitimate sites but are designed to extract confidential data. Given the pervasive
nature of phishing attempts, it becomes a challenge for individuals and organizations to identify and counteract them effectively.
In this context, the utilization of machine learning techniques has gained prominence as a means to enhance phishing website
detection. This survey aims to provide a comprehensive overview of the existing research in this domain, exploring various
machine learning approaches and methodologies employed in combating phishing attacks.

 Machine Learning Techniques for Detecting Phishing Websites: Reviewing Promises and Challenges
Publication Year: 2020 Authors: Eman Abdelfattah, Ismail Keshta, Ammar Odeh Published In: IEEE

This scholarly article addresses the issue of detecting phishing websites, which aim to deceive internet users and obtain
sensitive information through social engineering and malware. The authors investigate the application of machine learning
techniques such as Random Forest, Support Vector Machine, and Naive Bayes for identifying phishing attempts. However, they
highlight that deep learning approaches have shown better performance in this area. The challenges faced by machine learning
methods, including overfitting and limited training data, are also discussed. The research emphasizes the importance of user
education and proposes an automated approach to detect phishing websites effectively.

 Machine Learning-based URL Analysis for Phishing Website Detection

Publication Year: 2020 Authors: Mehmet Korkmaz, Banu Diri, Ozgur Korn Sahingoz Published In ICCCNT

This scholarly paper examines the growing trend of conducting real-world activities through mobile devices, which has led
to security vulnerabilities, including phishing attacks. Various techniques for detecting phishing websites have been developed,
with a particular focus on machine learning-based anomaly detection due to its dynamic nature. The authors propose a machine
learning-based system that analyzes URLs using eight different algorithms. The effectiveness of the system is evaluated by
comparing its results with previous work using three different datasets, demonstrating a high success rate in detecting phishing
attempts.

 Phishing Detection System Using Machine Learning

Publication Year: 2019 Authors: Che-Yu Wu, Cheng-Chung Kuo, Chu-Sing Yang Published In: ICEA

This research article investigates the issue of phishing attempts on the internet, which exploit human vulnerabilities to trick
users into revealing sensitive information. The authors present a detection method based on analyzing URLs, employing string
similarity calculations and the Support Vector Machine technique in machine learning. The aim is to achieve accurate detection of
unknown phishing pages with a low rate of false positives.

 Review of Phishing Detection Approaches

Publication Year: 2020 Author: Mouhammd Alkasassbeh AlMaha Abu Zuraiq Published In: IEEE

This comprehensive review paper explores the problem of phishing attempts on the internet and the potential financial losses
they can cause for individuals. The study examines various strategies for detecting phishing, including content-based, heuristic-
based, and fuzzy rule-based methods. Additionally, the article compares the performance of deep learning-based approaches with
traditional machine learning methods for website recognition.

IJISRT23JUL457 www.ijisrt.com 125

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Detection of Phishing Websites Using Machine Learning
Year of publishing: 2020 Authors: Mohammed Hazim Alkawaz, Stephanie JoanneSteven, Asif Iqbal Hajamydeen Published
In: IEEE

This study presents a system for detecting phishing websites that aims to raise awareness among users about the potential
risks associated with accessing such websites. The system employs machine learning techniques and sends email and pop-up
notifications to users when they attempt to visit blacklisted or phishing websites. The main objective is to prevent users from
falling victim to phishing attacks and inadvertently disclosing sensitive information. Additionally, the system can be utilized for
identification and authentication purposes to enhance overall cybersecurity measures against phishing attempts.

IJISRT23JUL457 www.ijisrt.com 126

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER THREE
FUNDAMENTALS
In the domain of machine learning and statistics, classification algorithms play a crucial role in supervised learning tasks,
enabling computers to learn from input data and make accurate predictions or categorizations. This chapter focuses on exploring
different classification algorithms specifically designed for detecting phishing URLs.

 Logistic Regression
Logistic regression, also known as a logit model, is a widely used statistical technique for classification and predictive
analytics. It assesses the likelihood of a specific event occurring based on a set of independent variables. By transforming the odds
of success into probabilities, logistic regression restricts the dependent variable to values between 0 and 1. The model's
coefficients, known as beta parameters, are typically estimated using maximum likelihood estimation (MLE). The logistic
regression equation calculates log-odds, which can be converted into conditional probabilities for each observation. Model
evaluation is commonly performed using measures such as the Hosmer-Lemeshow test.

Fig 3.1 Illustrates the Graphical Representation of Logistic Regression

 Naïve Bayes Classifier

The Naïve Bayes classifier is a supervised machine learning technique widely used for text categorization tasks. It belongs to
the generative learning algorithm family and aims to model the distribution of inputs across different classes or categories. Unlike
discriminative classifiers such as logistic regression, Naïve Bayes does not prioritize learning the importance of specific features
in discriminating between classes.

The Naïve Bayes classifier is often referred to as a probabilistic classifier because it relies on the principles of Bayesian
statistics. Bayes' Theorem allows for the inversion of conditional probabilities, enabling the incorporation of new information to
update prior probabilities into posterior probabilities. This classifier is particularly useful for tasks involving sequential events,
where new data influences the initial probability estimates.

Fig 3.2 Naïve Bias Classifier

IJISRT23JUL457 www.ijisrt.com 127

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 K-Nearest Neighbor Algorithm
The K-nearest neighbors (KNN) classifier is a straightforward and easily implementable machine learning technique used
for classification and regression tasks.

Fig 3.3 K Nearest Neighbour Algorithm

KNN is based on the assumption that similar data points are located close to each other. It captures the concept of similarity
by measuring the distances between points on a graph, as depicted in Figure 3.3. Various distance metrics, such as the Euclidean
distance, can be used to calculate these distances.

 Kernel Support Vector Machine

The kernel support vector machine (SVM) leverages the idea of adding additional dimensions to data to make it separable
when it is indistinguishable in the current dimensions. This technique, known as the kernel trick, allows SVM to transform the
data into a higher-dimensional space without explicitly increasing the dimensionality. Figure 3.4 and Figure 3.5 provide visual
representations of the impact of the kernel and transformations in SVM.

Fig 3.4 Kernelo Support5 Vector Machine

IJISRT23JUL457 www.ijisrt.com 128

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 3.5 Provide Visual Representations of the Impact of the Kernel and Transformations in SVM

In SVM, a kernel is employed to consciously establish a distinct level of separation. Commonly used kernels include the
Gaussian kernel, Sigmoid kernel, and Radial Basis Function.

 Decision Tree
A decision tree is a fundamental method used for categorizing instances. It consists of nodes, branches, and leaf nodes.
Nodes evaluate specific properties, branches connect nodes or lead to leaf nodes, and leaf nodes represent the final outcome.

Fig 3.6 Provides an Illustrative Example of A Decision Tree.

The process of building a decision tree involves recursively partitioning the data in a binary manner. It divides the data based
on certain criteria and repeats this process on each branch. In decision tree classification, a new instance is classified by following
a series of tests represented by the decision tree structure.

IJISRT23JUL457 www.ijisrt.com 129

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Random Forest Classifier
A random forest classifier is an ensemble of multiple decision trees. Each tree predicts a class, and the final prediction of the
model is determined by majority voting among the individual trees. Figure 3.7 illustrates the process of random forest
classification.

Fig 3.7 Illustrates the Process of Random Forest Classification

The random forest works by combining the knowledge from a large number of independent models (trees). This ensemble
approach outperforms any single model within the random forest.

 Gradient Boost Classifier

Gradient boosting is an ensemble learning strategy that aims to minimize training errors by combining weak learners into a
strong learner. The algorithm trains models sequentially, with each model compensating for the weaknesses of its predecessor.
The gradient boost classifier, depicted in Figure 3.8, aggregates the weak rules from each classifier to create a strong prediction
rule.

Fig 3.8 Aggregates the Weak Rules from Each Classifier to Create A Strong Prediction Rule

Boosting and bagging are two types of ensemble learning methods. Boosting trains weak learners sequentially, while
bagging trains them in parallel. Boosting is often suitable for scenarios with low variance and high bias, whereas bagging is
effective for models with high variance and low bias.

IJISRT23JUL457 www.ijisrt.com 130

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Cat Boost Classifier
CatBoost is a gradient boosting technique commonly used for regression and classification tasks. It extends the boosting
technique by training each new model in the ensemble using the residuals of the previous model. Instead of fitting the original
target variable, CatBoost aims to fit the residuals of the prior model. The algorithm employs gradient descent optimization to
determine the best weights for the ensemble models.

CatBoost is a powerful algorithm capable of handling challenging scenarios, such as noisy data, missing values, and outliers.
However, careful parameter tuning is essential to prevent computational inefficiency and overfitting.

IJISRT23JUL457 www.ijisrt.com 131

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER FOUR
SYSTEM REQUIREMENT SPECIFICATION
 Hardware Requirements:

 Processor CPU: Intel Pentium Dual Core or higher

 Minimum Hard Disk capacity: 512MB of space
 Minimum RAM: 4GB

 Software Requirements:

 Programming language: Python

 Operating system: Windows 8.1 or above
 IDE: Anaconda with Python version 3.x

 Supporting Python Modules:

Python provides a convenient mechanism for defining and utilizing modules within the interpreter. Modules are files that
contain definitions, which can be imported and used in other modules or the main module. Table 3.1 presents some of the modules
employed in this project.

Table 1 Supporting Python Modules

S.No. Python Modules Description
1 Ipaddress Enables the generation, control, and operation with IPv4 and IPv6 addresses and networks.
2 Re Offers regular expression matching functions similar to those found in Perl.
3 urllib.request Defines functions and classes for handling URL opening, particularly in the HTTP context.
4 BeautifulSoup A Python package for parsing HTML and XML documents, commonly used for web scraping.
5 Socket Provides access to the BSD interface of sockets.
6 Requests Supports the sending of HTTP requests using Python.
Implements the WHOIS query and response protocol for retrieving information about Internet resource
7 Whois owners.

IJISRT23JUL457 www.ijisrt.com 132

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER FIVE
SYSTEM DESIGN
 System Architecture

Fig 5.1 System Architecture

The system architecture, depicted in Figure 5.1, involves supplying URLs to the relevant classifier for categorization as
either authentic or phishing. Trained classifiers utilize patterns identified from the training dataset to classify the provided input.
Information such as IP address, URL length, domain, and favicon presence is extracted from the URLs, generating a list of their
values. This list is then inputted into classifiers like KNN, kernel SVM, Decision tree, and Random Forest. The performance of
these models is evaluated, resulting in an accuracy score. Based on the provided list, the trained classifier predicts whether the
URL is legitimate or phishing.

 Data Flow Diagrams

Data Flow Diagrams (DFDs) are graphical representations illustrating the flow of data within a system. They depict the
processes involved, from input to report generation, and the connections between system entities. DFDs can be categorized into
different levels of detail, namely 0, 1, and 2. The Gane-Sarson method is utilized for drawing DFDs in this chapter.

IJISRT23JUL457 www.ijisrt.com 133

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

5.2.1 Data Flow Diagram - Level 0

A Context Diagram, representing a DFD level 0 diagram, provides a high-level overview of the entire system. Figure 5.2
displays the system's DFD level 0, portraying the system as a high-level process connected to external entities. It aims to be easily
comprehensible by stakeholders, developers, and data analysts.

5.2.2 Data Flow Diagram - Level 1

DFD level 1 offers a more detailed representation of the Context diagram, breaking down the high-level process into
subprocesses. Figure 5.3 depicts the system's DFD level 1.

IJISRT23JUL457 www.ijisrt.com 134

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 5.3 Depicts the System's DFD Level 1.

5.2.3 Data Flow Diagram - Level 2

DFD level 2 delves further into the processes involved in the system, encompassing feature extraction, dataset splitting, and
classifier construction. Figure 5.4 illustrates the system's DFD level 2, providing a comprehensive understanding of its
functioning.

IJISRT23JUL457 www.ijisrt.com 135

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 5.4 Illustrates the System's DFD Level 2, Providing A Comprehensive Understanding of its Functioning

 UML Activity Diagram

Fig 5.5 Behavioral Diagram that Visualizes the Control Flow

The system's activity diagram, presented in Figure 5.5, is a behavioral diagram that visualizes the control flow from a
starting point to an ending point.

IJISRT23JUL457 www.ijisrt.com 136

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER SIX
IMPLEMENTATION

 This chapter outlines the methodology employed to determine the legitimacy of a URL and identify potential phishing
attempts. Python modules such as whois, socket, re, IP address, BeautifulSoup, etc., are utilized to extract feature values
such as IP address, URL length, domain name, subdomains, and favicon presence. These values are stored in a list format,
which serves as the input for training the classifier. When a URL is entered, it is converted into a Python list representing
different characteristics. The implementation includes the adoption of Kernel SVM, KNN, Random Forest, and decision tree
classifiers.

For Kernel SVM, the classifier used is sklearn.svm.SVC, with the nonlinear algorithm specified by setting the kernel
parameter to "rbf". The decision tree classifier is implemented using the sklearn.tree module and evaluates splits based on the
"entropy" criterion. The Random Forest classifier comprises multiple decision trees, and the final classification is based on the
majority vote from individual trees.

 User interface (UI) design of websites created specifically for web phishing detection. These websites incorporate powerful
machine learning algorithms to evaluate the authenticity of URLs and provide users with feedback. The UI is designed to be
user-friendly and intuitive, enabling users to easily recognize and avoid fraudulent phishing websites. Frameworks such as
Bootstrap and Particles.js are integrated into the websites to enhance their visual appeal and overall user experience. The
machine learning algorithm analyzes website content, structure, and other factors to determine its legitimacy, and the results
are presented in an accessible format, including a legitimacy percentage and information about associated risks.

 Flask, a flexible Python web framework widely used for rapid development of online applications. Flask provides built-in
functions such as routing and templating and can be extended using third-party libraries. It is commonly employed for
building webpages that incorporate machine learning, enabling users to input data and obtain results. Additionally, Flask
facilitates the deployment of machine learning models as web services through APIs, allowing integration with other
applications.

 Anaconda, an essential tool for data science and machine learning projects. Anaconda offers efficient package management,
simplifying the installation, removal, and updating of libraries necessary for data science and machine learning tasks. It
provides a comprehensive set of tools and packages that support the complete data science workflow, including data
exploration, cleaning, model construction, and deployment. The integration of Jupyter Notebook further enhances its
capabilities

IJISRT23JUL457 www.ijisrt.com 137

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER SEVEN
TESTING AND VALIDATION
This chapter focuses on testing and validating the proposed system by comparing the algorithm's results with the actual
outcomes. Each algorithm undergoes rigorous testing with both legitimate and phishing URLs, and the results are carefully
analyzed and presented.

 Unit Testing
Unit testing is conducted to thoroughly assess the functionality of individual modules and ensure their suitability for
implementation.

IJISRT23JUL457 www.ijisrt.com 138

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER EIGHT
LABORATORY INVESTIGATION
The performance of the classifier is evaluated using a confusion matrix (CM), which visually represents the accuracy of the
predictions made. The CM provides important measures such as true positives (TP), true negatives (TN), false positives (FP), and
false negatives (FN).

IJISRT23JUL457 www.ijisrt.com 139

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
CHAPTER NINE
FINAL COMMENTS AND FUTURE WORKS
This chapter presents the final conclusions drawn from the study and provides recommendations for future research and
improvements.

 The Verdict In our rapidly advancing technological world, phishing is evolving into a more sophisticated threat. With the
global economy shifting towards cashless and paperless transactions, phishing acts as a hindrance to this progress. The trust
in the internet's reliability has diminished, and while Artificial Intelligence (AI) can be a valuable tool for information
gathering, individuals lacking the ability to identify security risks should avoid online financial transactions. Phishers
primarily target the installation industry and cloud benefits. This research aims to address this issue by utilizing Machine
Intelligence (MI) to detect phishing websites. The objective is to develop an effective, accurate, and affordable mechanism by
leveraging machine learning techniques. The study was conducted using the Anaconda IDE and implemented in Python. The
proposed approach involved employing four machine learning classifiers and conducting a comparative analysis. The results
showed a commendable accuracy rating, with the Random Forest Classifier achieving the highest accuracy score of 96.82%.
However, it should be noted that the accuracy may vary with different datasets and algorithms, potentially surpassing the
performance of the Random Forest Classifier. The ensemble classifier Dashier exhibited good precision and can effectively
identify legitimate URLs in real-world scenarios (Pages 99–104 of the 2016 Conference on Digital Information Processing,
De Mining, and Wireless Communications (DIPDMWC)).

IJISRT23JUL457 www.ijisrt.com 140

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
REFERENCES
[1]. Giri, M., Jain, S., & Sahare, V. (2015). Cloud-based visual cryptography anti-phishing system. JAFRC, 2(01).
[2]. Dhage, S., & Patil, S. (2019). An organized process for building an anti-phishing framework and a rigorous overview of
phishing detection. In Fifth International Conference on Advanced Computing Communication Systems (ICACCS), pp.
588-593.
[3]. Vaya, D., Khandelwal, S., & Hadpawat, T. (2017). Visual cryptography: A review. International Journal of Computer
Applications, 174(40–43).
[4]. Saoji, S. (2015). Phishing detection system utilizing visual cryptography.
[5]. Pham, C., Tran, N. H., Huh, E., & Hong, C. S. (2018). A neuro-fuzzy method for phishing detection in fog networks. IEEE
Transactions on Network and Service Management, 15(3), 1076-1089.
[6]. Yong, K. S. C., Chiew, K. L., & Tan, C. L. (2019). A survey of QR code phishing: Current attacks and countermeasures. In
7th International Conference on Smart Computing Communications (ICSCC), pp. 1-5.
[7]. Egozi, & Verma, R. (2018). Phishing email detection using robust NLP techniques. In IEEE International Conference on
Data Mining Workshops (ICDMW), pp. 7-12.
[8]. Mao, J., Tian, W., Li, P., Wei, T., & Liang, Z. (2017). Phishing-alarm: Robust and efficient phishing detection via page
component similarity. IEEE Access, 5, 17020-17030.
[9]. Kathrine, G. J. W., Praise, P. M., Rose, A. A., & Kalavani, E. C. (2019). Variants of phishing attacks and their detection
techniques. In 3rd International Conference on Trends in Electronics and Informatics (COED), pp. 255-259.

IJISRT23JUL457 www.ijisrt.com 141

Project Report On Real State Web App
75% (4)
Project Report On Real State Web App
124 pages
Warehouse Management System
67% (3)
Warehouse Management System
55 pages
Phishing Website Detection DOCUMENTATION
0% (2)
Phishing Website Detection DOCUMENTATION
80 pages
Cargo Management
75% (8)
Cargo Management
45 pages
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
No ratings yet
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
6 pages
Phishing Website Detector Using ML
No ratings yet
Phishing Website Detector Using ML
8 pages
Phishdect: An Optimised Deep Neural Network Algorithm For Detecting Phishing Attacks in Online Platform
No ratings yet
Phishdect: An Optimised Deep Neural Network Algorithm For Detecting Phishing Attacks in Online Platform
7 pages
Real Time Phishing Website Detectionusing ML
No ratings yet
Real Time Phishing Website Detectionusing ML
4 pages
894-902_87
No ratings yet
894-902_87
9 pages
Phishing Detection With Machine Learning
No ratings yet
Phishing Detection With Machine Learning
9 pages
Analyzing and Performance of The Credit Card Fraud Detection Using Machine Learning
No ratings yet
Analyzing and Performance of The Credit Card Fraud Detection Using Machine Learning
5 pages
Classification of Phishing Website Using Hybrid Machine Learning Techniques
No ratings yet
Classification of Phishing Website Using Hybrid Machine Learning Techniques
6 pages
final_thesis_report_merged
No ratings yet
final_thesis_report_merged
72 pages
Major Project File
No ratings yet
Major Project File
53 pages
Aaaaaaaaaaa
No ratings yet
Aaaaaaaaaaa
52 pages
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
No ratings yet
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
4 pages
phishing final
No ratings yet
phishing final
13 pages
My Mini Project Final
No ratings yet
My Mini Project Final
32 pages
Improved Detection of Phishing Websites Using Machine Learning 11-6-2024
No ratings yet
Improved Detection of Phishing Websites Using Machine Learning 11-6-2024
15 pages
Machine Learning Techniques for Polymorphic Malware Analysis and Identification
No ratings yet
Machine Learning Techniques for Polymorphic Malware Analysis and Identification
8 pages
Phishing Websites Spotting With Help of Using Machine Learning Tools
No ratings yet
Phishing Websites Spotting With Help of Using Machine Learning Tools
12 pages
20mis0106 VL2023240103172 Pe003
No ratings yet
20mis0106 VL2023240103172 Pe003
5 pages
1NH16CS054
No ratings yet
1NH16CS054
95 pages
Final Yr Project PhishingAttack Ppt
No ratings yet
Final Yr Project PhishingAttack Ppt
12 pages
11 V May 2023
No ratings yet
11 V May 2023
6 pages
Malicious Site Detection (MSD)
No ratings yet
Malicious Site Detection (MSD)
58 pages
Machine_Learning_for_Detecting_the_Phishing_Threats
No ratings yet
Machine_Learning_for_Detecting_the_Phishing_Threats
6 pages
Phishing Website Detection Using ML 2-1
No ratings yet
Phishing Website Detection Using ML 2-1
20 pages
Project
No ratings yet
Project
12 pages
Enhancing Phishing URL Detection Through Comprehen
No ratings yet
Enhancing Phishing URL Detection Through Comprehen
7 pages
Phishing 094610
No ratings yet
Phishing 094610
26 pages
fin_irjmets1709201453
No ratings yet
fin_irjmets1709201453
6 pages
Innovative Nitesh
No ratings yet
Innovative Nitesh
14 pages
Innovative Nitesh
No ratings yet
Innovative Nitesh
14 pages
Innovative Nitesh
No ratings yet
Innovative Nitesh
11 pages
81 Cse e
No ratings yet
81 Cse e
5 pages
Aaaaaaaaaaa
No ratings yet
Aaaaaaaaaaa
59 pages
JETIR2504A41
No ratings yet
JETIR2504A41
7 pages
Report
No ratings yet
Report
49 pages
Detecting Phishing Domains Using Deep Learning
No ratings yet
Detecting Phishing Domains Using Deep Learning
15 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
7 pages
Phishing Website Detection Using Machine Learning
No ratings yet
Phishing Website Detection Using Machine Learning
31 pages
Major Proj Sumanthppt
No ratings yet
Major Proj Sumanthppt
13 pages
39 (1)
No ratings yet
39 (1)
11 pages
Social Engineering Detection: Phishing URLs
No ratings yet
Social Engineering Detection: Phishing URLs
7 pages
Final Synopsisi 2
No ratings yet
Final Synopsisi 2
11 pages
Proposal for Research
No ratings yet
Proposal for Research
23 pages
IJCRTI020051
No ratings yet
IJCRTI020051
4 pages
Technical Seminar Report-1
No ratings yet
Technical Seminar Report-1
29 pages
Final Review 1
No ratings yet
Final Review 1
29 pages
Proactive Phishing Website URL Scanner
No ratings yet
Proactive Phishing Website URL Scanner
59 pages
Phishing Paper 2
No ratings yet
Phishing Paper 2
6 pages
Ins Research Paper New
No ratings yet
Ins Research Paper New
6 pages
Paper 1412
No ratings yet
Paper 1412
8 pages
Credit Card Fraud Detection Using Machine Learning and Blockchain
100% (1)
Credit Card Fraud Detection Using Machine Learning and Blockchain
9 pages
Visvesvaraya Technological University: "Machine Learning Based Approach To Detect Phishing Attacks"
No ratings yet
Visvesvaraya Technological University: "Machine Learning Based Approach To Detect Phishing Attacks"
78 pages
1822 B.E Cse Batchno 84
No ratings yet
1822 B.E Cse Batchno 84
37 pages
Trust Confidence Hit Phishing Website Detection Using Random Forest (RF) Model
No ratings yet
Trust Confidence Hit Phishing Website Detection Using Random Forest (RF) Model
8 pages
sravs mini[1]
No ratings yet
sravs mini[1]
65 pages
Development of A Phishing Detection System Using Support Vector Machine
No ratings yet
Development of A Phishing Detection System Using Support Vector Machine
11 pages
An Efficient Spam Detection Technique For IoT Devices Using Machine Learning
No ratings yet
An Efficient Spam Detection Technique For IoT Devices Using Machine Learning
7 pages
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
From Everand
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Detectron2 in Practice: Definitive Reference for Developers and Engineers
From Everand
Detectron2 in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Investigating the Interplay between Climate Change and Sustainable Environment Development: Challenges, Strategies and Future Directions
No ratings yet
Investigating the Interplay between Climate Change and Sustainable Environment Development: Challenges, Strategies and Future Directions
11 pages
Transition to Telepsychotherapy: Experiential Perspective of Debutant Therapists
No ratings yet
Transition to Telepsychotherapy: Experiential Perspective of Debutant Therapists
6 pages
Unlocking the Therapeutic Power of Coriander: A Review of Coriandrum Sativum’s Bioactive Compounds and Health Benefits
No ratings yet
Unlocking the Therapeutic Power of Coriander: A Review of Coriandrum Sativum’s Bioactive Compounds and Health Benefits
15 pages
Assessment of Underground Water Quality of Gosa Landfill Site of the Federal Capital Territory, Abuja Nigeria
No ratings yet
Assessment of Underground Water Quality of Gosa Landfill Site of the Federal Capital Territory, Abuja Nigeria
11 pages
Crude Oil Price Volatility and its Impact on Nigeria’s Balance of Trade: An Empirical Assessment (2000–2023)
No ratings yet
Crude Oil Price Volatility and its Impact on Nigeria’s Balance of Trade: An Empirical Assessment (2000–2023)
13 pages
Perception, Attitude, and Readiness in Artificial Intelligence Adoption among Academic Librarians in the Bicol Region Librarians Council (BRLC)
No ratings yet
Perception, Attitude, and Readiness in Artificial Intelligence Adoption among Academic Librarians in the Bicol Region Librarians Council (BRLC)
6 pages
Developing Gamified Educational Technologies to Enhance Learning and Motivate Student Engagement in Education: A Quantitative Study Using Human-Computer Interaction (HCI)
No ratings yet
Developing Gamified Educational Technologies to Enhance Learning and Motivate Student Engagement in Education: A Quantitative Study Using Human-Computer Interaction (HCI)
16 pages
Monte Carlo-Based Modeling of 2-D Ising Systems Using Metropolis Algorithm, Simulation Techniques, Thermodynamic Behavior and Magnetization Patterns
No ratings yet
Monte Carlo-Based Modeling of 2-D Ising Systems Using Metropolis Algorithm, Simulation Techniques, Thermodynamic Behavior and Magnetization Patterns
16 pages
Optimal Voltage Regulation in Standalone Photovoltaic Systems Using Model Predictive Control and MOGA
No ratings yet
Optimal Voltage Regulation in Standalone Photovoltaic Systems Using Model Predictive Control and MOGA
8 pages
Analysis of the Role of Websites, Design, and Performance Metrics in Improving Company Performance in Medan City
No ratings yet
Analysis of the Role of Websites, Design, and Performance Metrics in Improving Company Performance in Medan City
4 pages
A Review on Gold Nanoparticles: Properties, Synthesis and Biomedical Application in Drug Delivery and Cancer Therapy
No ratings yet
A Review on Gold Nanoparticles: Properties, Synthesis and Biomedical Application in Drug Delivery and Cancer Therapy
6 pages
A Phytochemical Evaluation of Sierra Leonean Cassia siamea: A Source of Bioactive Compounds
No ratings yet
A Phytochemical Evaluation of Sierra Leonean Cassia siamea: A Source of Bioactive Compounds
5 pages
Real - Time Recognition of Cardiovascular Conditions from ECG Images with Deep Learning
No ratings yet
Real - Time Recognition of Cardiovascular Conditions from ECG Images with Deep Learning
10 pages
Cost Comparative Analysis of Solar/Utility and Diesel/Utility Hybrid Power System for a Typical Residential Building
No ratings yet
Cost Comparative Analysis of Solar/Utility and Diesel/Utility Hybrid Power System for a Typical Residential Building
8 pages
A MIC-MAC-Based Structural Exploration of Determinants Impacting Investment Sensitivity
No ratings yet
A MIC-MAC-Based Structural Exploration of Determinants Impacting Investment Sensitivity
8 pages
Analyzing Social Communication Deficits in Autism Using Wearable Sensors and Real-Time Affective Computing Systems
No ratings yet
Analyzing Social Communication Deficits in Autism Using Wearable Sensors and Real-Time Affective Computing Systems
17 pages
Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
No ratings yet
Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
7 pages
ResumeMatch: Intelligent Resume Enhancement & Job Fit Analysis
No ratings yet
ResumeMatch: Intelligent Resume Enhancement & Job Fit Analysis
7 pages
Assessing the Achievements of the Re-Alignment of an Industry Educatiocal Based System in Society
No ratings yet
Assessing the Achievements of the Re-Alignment of an Industry Educatiocal Based System in Society
5 pages
Development of Mirror Biosensor in Saliva pH Measurement in Health Services
No ratings yet
Development of Mirror Biosensor in Saliva pH Measurement in Health Services
7 pages
Smart Narrator Robot: Enhancing Experiential Learning through Conditional Autonomy
No ratings yet
Smart Narrator Robot: Enhancing Experiential Learning through Conditional Autonomy
6 pages
A Decade of Genome Editing: Comparative Review of ZFN, Talen, and CRISPR/CAS9
No ratings yet
A Decade of Genome Editing: Comparative Review of ZFN, Talen, and CRISPR/CAS9
10 pages
Architecture as a Reflection of Cultural Continuity: A Study of Traditional Trends
No ratings yet
Architecture as a Reflection of Cultural Continuity: A Study of Traditional Trends
3 pages
Evaluating the Impact of Shopee Mall on Consumer Purchase: Basis for Developing an Effective Marketing Plan
No ratings yet
Evaluating the Impact of Shopee Mall on Consumer Purchase: Basis for Developing an Effective Marketing Plan
61 pages
EduTech Portal: An AI-Powered Student Assistant Chatbot
No ratings yet
EduTech Portal: An AI-Powered Student Assistant Chatbot
12 pages
Continuing Training and Professional Performance of Primary School Teachers in Tchad: The Case of Teachers in the Farchana Refugee Camp
No ratings yet
Continuing Training and Professional Performance of Primary School Teachers in Tchad: The Case of Teachers in the Farchana Refugee Camp
7 pages
Design and Implementation of a GPS-GSM based Real-Time Vehicle Theft Tracking System for Urban Security in Uganda
No ratings yet
Design and Implementation of a GPS-GSM based Real-Time Vehicle Theft Tracking System for Urban Security in Uganda
7 pages
Analysis of the Export Competitiveness of Indonesia's Horticultural Fruit Products in the International Market
No ratings yet
Analysis of the Export Competitiveness of Indonesia's Horticultural Fruit Products in the International Market
8 pages
Behavior Addiction in Adolescents Post COVID 19: A Systematic Mental Health Review
No ratings yet
Behavior Addiction in Adolescents Post COVID 19: A Systematic Mental Health Review
8 pages
Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms
No ratings yet
Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms
14 pages
Hospital Management System
100% (1)
Hospital Management System
62 pages
AIS Chapter 2
No ratings yet
AIS Chapter 2
10 pages
Online Exam Project Report
No ratings yet
Online Exam Project Report
52 pages
Report-2 (1) NEW
No ratings yet
Report-2 (1) NEW
22 pages
Pharmacy Management System Data Flow Data Flow Diagram
No ratings yet
Pharmacy Management System Data Flow Data Flow Diagram
3 pages
Sem 5 E-Commerce Report
No ratings yet
Sem 5 E-Commerce Report
65 pages
Ai Shopping system
No ratings yet
Ai Shopping system
13 pages
A Project Report: in Partial Fulfilment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfilment For The Award of The Degree
32 pages
Software Engineering (Week-3)
No ratings yet
Software Engineering (Week-3)
27 pages
Kendall Sad9 PP 07
No ratings yet
Kendall Sad9 PP 07
51 pages
Project-Report-Farhan Mikrani
No ratings yet
Project-Report-Farhan Mikrani
29 pages
Complete Answer Guide for Systems Analysis and Design Rosenblatt 10th Edition Test Bank
100% (7)
Complete Answer Guide for Systems Analysis and Design Rosenblatt 10th Edition Test Bank
62 pages
Online Bus Ticket Reservation System
No ratings yet
Online Bus Ticket Reservation System
18 pages
HOTEL MANAGEMENT TRAINING REPORT DOCUMENT DF
100% (1)
HOTEL MANAGEMENT TRAINING REPORT DOCUMENT DF
6 pages
School Management 1
No ratings yet
School Management 1
45 pages
Case Tool Alignment
No ratings yet
Case Tool Alignment
66 pages
Industrial Report 7th Sem - Harsan
No ratings yet
Industrial Report 7th Sem - Harsan
116 pages
SPM Chapter4
No ratings yet
SPM Chapter4
77 pages
Online - Book - Reselling RECORRECT
No ratings yet
Online - Book - Reselling RECORRECT
49 pages
11 - ICT The Systems Life Cycle
No ratings yet
11 - ICT The Systems Life Cycle
11 pages
Week 6
No ratings yet
Week 6
135 pages
Mini-Project Documentation
No ratings yet
Mini-Project Documentation
76 pages
Hand Gesture Recognition
No ratings yet
Hand Gesture Recognition
28 pages
PUGAL
No ratings yet
PUGAL
20 pages
Supermarket Billing System Report
No ratings yet
Supermarket Billing System Report
37 pages
SMART TASK MANAGER WITH AI POWERED PRIORITIZATION
No ratings yet
SMART TASK MANAGER WITH AI POWERED PRIORITIZATION
40 pages
My-Mini Project Report-converted (1)(1)
No ratings yet
My-Mini Project Report-converted (1)(1)
42 pages

Web Phishing Detection Using Machine Learning

Uploaded by

Web Phishing Detection Using Machine Learning

Uploaded by

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology

Web Phishing Detection using

IJISRT23JUL457 www.ijisrt.com 120

IJISRT23JUL457 www.ijisrt.com 121

S No. Title Page no.

IJISRT23JUL457 www.ijisrt.com 122

IJISRT23JUL457 www.ijisrt.com 123

 Aim and Objectives

 Explore different automated techniques for phishing detection.

 Identifying an appropriate dataset that aligns with the research goals.

IJISRT23JUL457 www.ijisrt.com 124

 Phishing Website Detection Based on Machine Learning: A Survey

 Machine Learning-based URL Analysis for Phishing Website Detection

 Phishing Detection System Using Machine Learning

 Review of Phishing Detection Approaches

IJISRT23JUL457 www.ijisrt.com 125

IJISRT23JUL457 www.ijisrt.com 126

Fig 3.1 Illustrates the Graphical Representation of Logistic Regression

 Naïve Bayes Classifier

Fig 3.2 Naïve Bias Classifier

IJISRT23JUL457 www.ijisrt.com 127

Fig 3.3 K Nearest Neighbour Algorithm

 Kernel Support Vector Machine

Fig 3.4 Kernelo Support5 Vector Machine

IJISRT23JUL457 www.ijisrt.com 128

Fig 3.6 Provides an Illustrative Example of A Decision Tree.

IJISRT23JUL457 www.ijisrt.com 129

Fig 3.7 Illustrates the Process of Random Forest Classification

 Gradient Boost Classifier

IJISRT23JUL457 www.ijisrt.com 130

IJISRT23JUL457 www.ijisrt.com 131

 Processor CPU: Intel Pentium Dual Core or higher

 Programming language: Python

 Supporting Python Modules:

Table 1 Supporting Python Modules

IJISRT23JUL457 www.ijisrt.com 132

Fig 5.1 System Architecture

 Data Flow Diagrams

IJISRT23JUL457 www.ijisrt.com 133

5.2.1 Data Flow Diagram - Level 0

5.2.2 Data Flow Diagram - Level 1

IJISRT23JUL457 www.ijisrt.com 134

Fig 5.3 Depicts the System's DFD Level 1.

5.2.3 Data Flow Diagram - Level 2

IJISRT23JUL457 www.ijisrt.com 135

 UML Activity Diagram

Fig 5.5 Behavioral Diagram that Visualizes the Control Flow

IJISRT23JUL457 www.ijisrt.com 136

IJISRT23JUL457 www.ijisrt.com 137

IJISRT23JUL457 www.ijisrt.com 138

IJISRT23JUL457 www.ijisrt.com 139

IJISRT23JUL457 www.ijisrt.com 140

IJISRT23JUL457 www.ijisrt.com 141

You might also like