0% found this document useful (0 votes)

71 views20 pages

Phishing Phase1 Report

This document presents a literature review of prior research on phishing website detection using machine learning techniques. Several studies are summarized that used datasets of phishing and legitimate URLs to evaluate classifiers including support vector machines, random forests, neural networks, decision trees and Naive Bayes. Accuracies of over 90% were typically achieved, with one study obtaining 99.18% using an RNN-GRU model. Features examined included attributes from the URL, web traffic data, port numbers and IP addresses. Some limitations around small datasets and discrete features are also noted.

Uploaded by

5082 SAKTHIVEL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views20 pages

Phishing Phase1 Report

Uploaded by

5082 SAKTHIVEL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

PHISHING WEBSITE ANALYZER

USING MACHINE LEARNING

IT8811 - PROJECT WORK
PHASE 1 - REPORT

Submitted by

NARESH R (312420205062)

RAJASEKARAN B (312420205074)

BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY

St. JOSEPH’S INSTITUTE OF TECHNOLOGY, CHENNAI- 600 119

(An Autonomous Institution)

ANNA UNIVERSITY,CHENNAI 600025

OCTOBER 2023

i
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “PHISHING WEBSITE ANALYZER

USING MACHINE LEARNING” is the bonafide work of Naresh R
(312420205062) and Rajasekaran B (312420205074) who carried out the Mini
project work under my supervision.

SIGNATURE SIGNATURE
Dr. S.KALARANI M.E., Ph.D., Ms. S. Anslam Sibi M.E.,(Ph.D).,
Professor Assistant Professor
HEAD OF THE DEPARTMENT SUPERVISOR
Department Of Department Of
Information Technology Information Technology
St.Joseph’s Institute of Technology St.Joseph’s Institute of
Old Mamallapuram Road Technology Old Mamallapuram
Chennai-600119 Road
Chennai-600119

Submitted for the Viva-Voce held on

(INTERNAL EXAMINER) (EXTERNAL EXAMINER)

ii
CERTIFICATE OF EVALUATION

College Name : St. Joseph’s Institute of Technology

Branch & Semester : Information Technology (VII)

S.NO NAMES OF TITLE OF THE NAME OF THE

STUDENTS PROJECT SUPERVISOR
WITH
DESIGNATION
1. Naresh R “PHISHING Dr. L. Javid Ali
WEBSITE
(312420205062)
ANALYZER
2. Rajasekaran B USING

(312420205074) MACHINE
LEARNING”

The report of the project work submitted by the above students for Project in
information technology of Anna University were evaluated and confirmed to be
reports of the work done by the above students and then evaluated.

(INTERNAL EXAMINER) (EXTERNAL EXAMINER)

iii
ABSTRACT:

Due to the rapid growth of internet services has been accompanied by a range of
malicious attempts to trick individuals into performing undesired actions, by using the
Internet, attackers set out new techniques, such as phishing.
With the use of false websites, attackers collect sensitive information such as user
data, login credentials, social security number, banking information etc. Recognizing
whether a website is authorized or phishing is a difficult problem.
In this paper a phishing website analyzer using machine learning is proposed ,this
model predicts whether the website is recognized or not, which uses different
classification algorithms and natural language processing (NLP) based features.

iv
LIST OF FIGURES

FIG NO NAME OF THE FIGURE PAGE

4.1 ARCHITECTURE DIAGRAM

4.2 USE CASE DIAGRAM

4.3 ACTIVITY DIAGRAM

4.4 SEQUENCE DIAGRAM

4.5 COMPONENT DIAGRAM

v
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO

ABSTRACT

LIST OF

1 FIGURES

INTRODUCTION

System Overview

Scope of the project

2 LITERARTURE SURVEY
3
SYSTEM ANALYSIS

Existing System

Proposed System

Advantages of the Proposed System

Disadvantages of the Proposed System

Requirement Specification
Software Requirement
Hardware Requirements
4
SYSTEM DESIGN
Architecture Diagram

Use case diagram

vi
Activity diagram

Sequence diagram
Component diagram
5
SYSTEM IMPLEMENTATION

Data Collection Module

Data Preprocessing Module
Machine Learning Model Module
User Interface Module
Continuous Update Module
Security and Ethical Module

6 CONCLUSION AND FUTURE

ENHANCEMENTS

vii
CHAPTER 1

INTRODUCTION
1.1 SYSTEM OVERVIEW
Phishing attacks are derived from the word ‘fishing’ for victims. Attackers are named as
phishers, they attract the user by creating fraudulent websites with a similar design of the
popular and legal sites on the internet.
Main focus of this paper is real-time detection of phishing web pages by investigating the
URL of the web page with different machine learning algorithms.
Therefore, firstly we collect lots of legitimate and fraudulent web page URLs from the
dataset ,Natural Language Processing(NLP) based features are used, after that machine
learning algorithms logistic regression, Support vector machine ,Naive bayes, Random
forest algorithm, K-Nearest Neighbor are implemented , to measure the efficiency of the
proposed system.

1.2 AIM OF THE PROJECT

The aim of the "Phishing Website Analyzer" project is to develop a system that
effectively detects and prevents phishing websites using Natural Language
Processing (NLP) and machine learning. The project focuses on enhancing online
security by addressing the limitations of existing systems or the lack of a
systematic approach to phishing detection. The primary goal is to achieve accurate
phishing URL detection, reducing false positives and false negatives. The system
will provide real-time or on-demand analysis of websites, enabling timely threat
detection and response. Continuous adaptation to evolving phishing techniques
through updates and model retraining is a key objective. A user-friendly interface
will simplify URL analysis, while ethical considerations will ensure user privacy
and prevent misuse. Additionally, user education efforts will empower users to
recognize and protect themselves against phishing threats. The project's ultimate
aim is to contribute to a safer online environment by prioritizing user security and
privacy.

8
CHAPTER 2

LITERATURE SURVEY

N. Choudhary b, K. Jain, S. Jain : This study emphasizes the significance of only

using attributes from the URL. Both the Kaggle and Phishtank websites make it easy

to get the dataset used in this study. The researchers used a hybrid approach that com-

bined Principal Component Analysis (PCA) with Support Vector Machine (SVM) and

Random Forest algorithms to reduce the dataset's dimensionality while keeping all im-

portant data, and it produced a higher accuracy rate of 96.8% compared to other tech-

niques investigated.

A. Lakshmanarao, P. Surya, M Bala Krishna : This thesis collected a dataset of

phishing websites from the UCI repository and used various Machine learning tech-

niques, including decision trees, AdaBoost, support vector machines (SVM), and ran-

dom forests, to analyze selected features (such as web traffic, port, URL length, IP

address, and URL_of_Anchor). The most effective model for detecting phishing web-

sites was chosen, and two priority-based algorithms (PA1 and PA2) were proposed.

The team utilized a new fusion classifier in conjunction with these algorithms and at-

tained an accuracy rate of 97%. when compared to previous works in phishing website

detection

L. Tang, Q. Mahmoud : The proposed approach in the current study uses URLs

collected from a variety of platforms, including Kaggle, Phish Storm, Phish Tank, and

ISCX-UR, to identify phishing websites. The researchers made a big contribution since

they created a browser plug-in that can quickly recognize phishing risks and offer warn-
9
ings. Various datasets and machine learning techniques were investigated, and the pro-

posed RNN-GRU model outperformed SVM, Random Forest (RF), and Logistic Re-

gression with a maximum accuracy rate of 99.18%. On the other hand, the suggested

method is not always accurate in identifying if short links are phishing risks.

A. Kulkarni & L. Brown: A machine learning system was created to categorize

websites based on URLs from the University of California, Irvine Machine Learning

Repository. Four classifiers were used: SVM, decision tree, Naive Bayesian, and neural

network. The outcome of experiments utilizing the model developed with the support

of a training set of data demonstrates that the classifiers were able to successfully dif-

ferentiate authentic websites from fake ones with an accuracy rate of over 90%. Limi-

tations include a small dataset and all features being discrete, which may not be suitable

for some classifiers.

Tyagi; J. Shad; S. Sharma; S. Gaur Gagandeep Kaur : The research taken into

account focuses on the use of various machine learning algorithms to identify if a web-

site is legitimate or a phishing site based on a URL. This study's most important con-

tribution is the creation of the Generalized Linear Model (GLM), a brand-new model.

This model combines the results of two various methods. With a 98.4% accuracy rate,

the Random Forest and GLM combination produced the best results for detecting phish-

ing websites.

10
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

In this section, you'll describe the limitations and drawbacks of the current state of
phishing website detection or explain the absence of a systematic approach:

Absence of a System: Currently, there is no dedicated system or method in place for

phishing website detection. This means that users are not provided with any protection or
warnings when visiting potentially harmful websites.

Lack of Accuracy: Without a proper system, there is a higher risk of users encountering
phishing websites without realizing it. This leads to a lack of accuracy in identifying and
blocking malicious sites, potentially resulting in users falling victim to phishing attacks.

Inability to Adapt: In the absence of a dedicated system, there is no mechanism for

adapting to evolving phishing techniques. Traditional methods for detecting phishing
websites may be outdated and unable to keep up with the sophistication of modern attacks.

User Vulnerability: Users are left vulnerable to phishing attacks due to the absence of an
effective detection system. This can result in financial losses, data breaches, and
compromised personal information.

Challenges: The existing system, which is effectively a lack of a system, presents various
challenges in ensuring the safety and security of online activities. Users have to rely on
their own judgment and awareness to identify potential threats.

3.1.1 DISADVANTAGES:

 Lack of Systematic Approach: Due to the absence of a dedicated system, there is no

systematic approach to identify and block phishing websites. Users are left vulnerable to
potentially harmful sites.
 Accuracy Issues: The absence of a formal system leads to accuracy issues. Users might
encounter phishing websites without proper detection, leading to potential financial losses,
data breaches, and compromised personal information.

11
 Inability to Adapt: The current system, essentially the lack of one, struggles with
adapting to evolving phishing techniques. Traditional methods for detecting phishing
websites may not be equipped to handle the ever-evolving sophistication of modern
attacks.
 User Vulnerabilities: Users are currently vulnerable to phishing attacks due to the
absence of an effective detection system. This puts their financial security and personal
data at risk.
 Reliance on User Judgment: Without a dedicated system, users have to rely on their own
judgment and awareness to identify potential threats, which is not foolproof.
3.2 PROPOSED SYSTEM

In this section, you'll introduce your proposed phishing website analyzer, highlighting the
key features and benefits:

Novel Approach: The proposed system represents a novel approach to phishing website
detection. It addresses the limitations of the existing system by providing a structured
method for identifying and blocking phishing websites.

Accurate Detection: The core advantage of the proposed system is its ability to
significantly enhance phishing URL detection accuracy. By leveraging machine learning
and NLP techniques, it can distinguish between legitimate and phishing websites with a
high level of precision.

Real-time Analysis: The system allows for real-time or on-demand analysis of websites.
Users can input URLs for immediate analysis, which is crucial for timely threat detection
and response.
3.2.1 ADVANTAGES

Enhanced Security Measures: The proposed system incorporates advanced security

measures to protect against evolving phishing tactics. It can adapt to new threats by
regularly updating its database and retraining the machine learning model.
12
User-friendly Interface: A user-friendly interface has been designed to ensure that users
can easily interact with the system. This includes a straightforward process for submitting
URLs for analysis, with clear and intuitive feedback on the potential threat level.

Continuous Updates: To stay effective, the system is designed to continuously update its
dataset and retrain the machine learning model. This ensures that it remains current and
can identify the latest phishing techniques.

Ethical Considerations: Ethical considerations are integral to the system's design. User
privacy and consent are respected, and measures are in place to prevent misuse of the
system by malicious actors.

Costs and Resources: The development and implementation of the proposed system
come with associated costs, such as hardware, software, and manpower. However, these
costs are justified by the system's ability to enhance online security.

User Education: While the system offers advanced protection, it is important to note that
user education remains a key component of online security. The system complements user
awareness efforts but does not replace them.

3.3 REQUIREMENT SPECIFICATION

The requirements for a phishing website analyzer project involve both software and hardware
elements, as well as other considerations such as data and ethical requirements. Here's a detailed
breakdown of these requirements:

3.3.1 Software Requirements:

Operating System: The project should specify the supported operating systems for
running the software. Common choices include Windows, macOS, and Linux.

Programming Language: Define the programming language for system development.

Python is often used for machine learning and NLP applications.

13
Machine Learning Libraries: Specify the machine learning libraries that will be used for
implementing the detection algorithms. Common libraries include scikit-learn,
TensorFlow, and PyTorch.

NLP Libraries: Mention the natural language processing libraries required for analyzing
textual content. Popular choices include NLTK, spaCy, and Gensim.

Web Scraping Tools: Identify the tools or libraries needed to collect website data for the
dataset. Tools like Beautiful Soup or Scrapy are often used.

Database Management System: Specify the database management system for storing
and managing datasets. Options include MySQL, PostgreSQL, MongoDB, or SQLite.

User Interface Development Tools: Detail the tools, frameworks, or libraries used to
create a user-friendly front-end for users to interact with the system. Common choices
include Flask, Django, or JavaScript frameworks like React or Angular.

Security Tools: Consider including security tools or libraries for securing the system
against potential attacks and ensuring data privacy.

Web Hosting: If the system includes a web-based component, specify the web hosting
service or server requirements.

3.3.2 Hardware Requirements:

Computational Resources: Ensure that you have sufficient computational resources for
training machine learning models. This may involve powerful CPUs and GPUs.

Storage Capacity: Allocate enough storage capacity for maintaining datasets, model
checkpoints, and other relevant data. SSDs are often preferred for faster data access.

Internet Connectivity: A reliable internet connection is necessary, especially if your

system will conduct real-time website analysis.
14
CHAPTER 4

SYSTEM DESIGN

4.1 SYSTEM ARCHITECTURE

The system architecture for our phishing URL detection project leverages transfer
learning and It encompasses data loading and preprocessing modules, feature extraction,
and custom classification. This architecture aims to provide a precise and scalable solution
for phishing URL detection while accommodating future enhancements for improved
performance and accessibility.

15
4.2 ACTIVITY DIAGRAM

This activity diagram illustrates the core workflow of phishing website detection using
Natural language processing.
Fig 4.3 Activity diagram

16
4.3 SEQUENCE DIAGRAM

In the sequence diagram for our phishing URL detection, we can visualize the interaction
between actors.

Fig 4.4 Sequence diagram

17
CHAPTER 5

SYSTEM IMPLEMENTATION

5.1 MODULES

Data Collection Module:

Responsible for gathering a dataset of URLs that includes both legitimate and phishing
sites.

Data Preprocessing Module:

Cleans and preprocesses the URL data, removing duplicates, special characters, and
normalizing the data.

Machine Learning Model Module:

Implements the machine learning algorithms for URL classification, such as Logistic
Regression, Naive Bayes, or Random Forest.

User Interface Module:

Develops the user interface for users to interact with the system, allowing them to input
URLs for analysis and displaying results.

Continuous Update Module:

Handles regular updates and retraining of the machine learning model to keep the system
effective against evolving phishing techniques.

Security and Ethical Module:

Implements security measures to protect the system and user data and ensures ethical
considerations and privacy measures are in place.

18
CHAPTER 6
CONCLUSION AND FUTURE ENHANCEMENT

6.1 CONCLUSION

In conclusion, the development of a phishing website analyzer using Natural Language

Processing (NLP) and machine learning represents a significant step toward enhancing
online security and protecting users from phishing threats. This project addresses the
limitations of the existing systems, offering a novel approach to accurately detect and
block phishing websites. The proposed system's advantages include enhanced accuracy,
real-time analysis, advanced security measures, a user-friendly interface, continuous
updates, and a strong emphasis on ethical considerations.

The system's modules, from data collection and preprocessing to machine learning, user
interface, continuous updates, and security and ethics, form a cohesive framework that
ensures efficient phishing detection and user protection.

Future Enhancements:

The phishing website analyzer project can be further improved and expanded in several
ways:

Advanced Machine Learning Models: Explore more advanced machine learning models
and deep learning techniques for even higher accuracy in phishing detection.

Behavioral Analysis: Implement behavioral analysis of websites in addition to NLP-

based analysis to enhance detection capabilities.

Real-time Alerts: Develop a real-time alerting system that can notify users when they
visit a potentially phishing website, enhancing proactive protection.

User Feedback Mechanism: Incorporate a user feedback mechanism to allow users to

report suspicious websites, thereby enhancing the system's learning and adaptability.

Integration with Browsers: Create browser extensions or plugins that integrate directly
with popular web browsers to provide seamless protection.

Multi-language Support: Extend the system's language support to detect phishing

websites in multiple languages.

19
Mobile Application: Develop a mobile application for on-the-go URL analysis and
protection.

Collaboration with ISPs: Collaborate with internet service providers (ISPs) to implement
phishing website detection at the network level, preventing users from accessing harmful
websites.

AI-Driven Analysis: Incorporate artificial intelligence (AI) components for improved

decision-making and threat identification.

Blockchain-based Data Security: Implement blockchain technology to secure and

protect the dataset, ensuring data integrity and privacy.

Open-source Initiative: Consider making the project open-source to encourage

collaboration and contributions from the cybersecurity community.

User Education Campaigns: Continue to focus on user education, with awareness

campaigns and resources to empower users to identify phishing threats.

Phishing Website Detection DOCUMENTATION
0% (2)
Phishing Website Detection DOCUMENTATION
80 pages
Final PPT - Phishing Website
100% (1)
Final PPT - Phishing Website
23 pages
1NH16CS054
No ratings yet
1NH16CS054
95 pages
Ivtl Iva 18
No ratings yet
Ivtl Iva 18
18 pages
Cse3502-Information Security Management: Phishing Detection Using Data Mining Techniques
No ratings yet
Cse3502-Information Security Management: Phishing Detection Using Data Mining Techniques
25 pages
Malicious Site Detection (MSD)
No ratings yet
Malicious Site Detection (MSD)
58 pages
Phase - 1 - Report - Template (Aki, Jaga)
No ratings yet
Phase - 1 - Report - Template (Aki, Jaga)
29 pages
HM Unit5 Part2
No ratings yet
HM Unit5 Part2
26 pages
Ivtl Iva 18
No ratings yet
Ivtl Iva 18
20 pages
Project Report1
No ratings yet
Project Report1
83 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
25 pages
CyberSec Review3 Team10
No ratings yet
CyberSec Review3 Team10
28 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
Progress of Project Review 31 07 2023 - 0001
No ratings yet
Progress of Project Review 31 07 2023 - 0001
2 pages
1822 B.E Cse Batchno 287
No ratings yet
1822 B.E Cse Batchno 287
65 pages
Fin Irjmets1682919970
No ratings yet
Fin Irjmets1682919970
5 pages
SAKTHIVEL's Resume
No ratings yet
SAKTHIVEL's Resume
1 page
Project Guide Form
No ratings yet
Project Guide Form
1 page
Department of Computer Engineering: Phishing Website Detector Using ML
No ratings yet
Department of Computer Engineering: Phishing Website Detector Using ML
13 pages
Project Docoment Merged
No ratings yet
Project Docoment Merged
86 pages
Network Security Report
No ratings yet
Network Security Report
42 pages
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
No ratings yet
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
4 pages
1NT21MC081 Research Report
No ratings yet
1NT21MC081 Research Report
5 pages
Phishing Website Detection by Machine Learning Techniques Presentation
No ratings yet
Phishing Website Detection by Machine Learning Techniques Presentation
12 pages
22 04 CPE Presentation
No ratings yet
22 04 CPE Presentation
18 pages
Fake Website Detection
No ratings yet
Fake Website Detection
13 pages
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
No ratings yet
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
11 pages
Midterm Project Report
No ratings yet
Midterm Project Report
21 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
16 pages
Phishing Website Detection Using ML 2-1
No ratings yet
Phishing Website Detection Using ML 2-1
20 pages
V6I602
No ratings yet
V6I602
8 pages
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
No ratings yet
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
6 pages
Phishing Website Detection
No ratings yet
Phishing Website Detection
19 pages
A Machine Learning Based Approach For Phishing Detection Using
No ratings yet
A Machine Learning Based Approach For Phishing Detection Using
14 pages
Towards Detection of Phishing Websites On Client-Side Using Machine
No ratings yet
Towards Detection of Phishing Websites On Client-Side Using Machine
14 pages
Detection of Phishing Website
No ratings yet
Detection of Phishing Website
12 pages
Mini Project Phishing Website Detection Using ML
No ratings yet
Mini Project Phishing Website Detection Using ML
45 pages
Jain 2018
No ratings yet
Jain 2018
14 pages
Paper 1
No ratings yet
Paper 1
5 pages
Phishing Detection Using Machine Learnin
No ratings yet
Phishing Detection Using Machine Learnin
5 pages
Batch 22
No ratings yet
Batch 22
14 pages
Phishing-Detection Using ML
No ratings yet
Phishing-Detection Using ML
14 pages
Phish Guard Phishing Website Using Machine Learning Algorithms
No ratings yet
Phish Guard Phishing Website Using Machine Learning Algorithms
10 pages
B5 Project Report Format SEM I 2022
No ratings yet
B5 Project Report Format SEM I 2022
16 pages
Batch-5 Journal-6 ECE-D New
No ratings yet
Batch-5 Journal-6 ECE-D New
6 pages
Review 0 - Phishing Website in SEO
No ratings yet
Review 0 - Phishing Website in SEO
6 pages
Final
No ratings yet
Final
26 pages
Phisingppt
No ratings yet
Phisingppt
15 pages
Phishing 5
No ratings yet
Phishing 5
5 pages
Final Yr Project PhishingAttack
No ratings yet
Final Yr Project PhishingAttack
12 pages
Major Project Final Report
No ratings yet
Major Project Final Report
53 pages
Presentation Slides
No ratings yet
Presentation Slides
42 pages
Phishing
No ratings yet
Phishing
10 pages
B5 PPT Final-1
No ratings yet
B5 PPT Final-1
15 pages
B5 - Project Synopsis
No ratings yet
B5 - Project Synopsis
5 pages
Phishing Review 2023
No ratings yet
Phishing Review 2023
17 pages
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
No ratings yet
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
4 pages
Machine Learning For Detecting The Phishing Threats
No ratings yet
Machine Learning For Detecting The Phishing Threats
6 pages
Updated Phishing Url Detection
No ratings yet
Updated Phishing Url Detection
13 pages
Phishing 4
No ratings yet
Phishing 4
6 pages
Paper 2
No ratings yet
Paper 2
10 pages
Second Review
No ratings yet
Second Review
26 pages
Automated Phishing Detection Through URL Analysis and Machine Learning
No ratings yet
Automated Phishing Detection Through URL Analysis and Machine Learning
9 pages
Phishing Detection Using ML
No ratings yet
Phishing Detection Using ML
11 pages
Phishingdmreport
No ratings yet
Phishingdmreport
19 pages
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet

Phishing Phase1 Report

Uploaded by

Phishing Phase1 Report

Uploaded by

PHISHING WEBSITE ANALYZER

USING MACHINE LEARNING

St. JOSEPH’S INSTITUTE OF TECHNOLOGY, CHENNAI- 600 119

(An Autonomous Institution)

ANNA UNIVERSITY,CHENNAI 600025

Certified that this project report “PHISHING WEBSITE ANALYZER

Submitted for the Viva-Voce held on

(INTERNAL EXAMINER) (EXTERNAL EXAMINER)

College Name : St. Joseph’s Institute of Technology

Branch & Semester : Information Technology (VII)

S.NO NAMES OF TITLE OF THE NAME OF THE

(INTERNAL EXAMINER) (EXTERNAL EXAMINER)

FIG NO NAME OF THE FIGURE PAGE

4.1 ARCHITECTURE DIAGRAM

4.2 USE CASE DIAGRAM

4.3 ACTIVITY DIAGRAM

4.4 SEQUENCE DIAGRAM

4.5 COMPONENT DIAGRAM

CHAPTER TITLE PAGE NO

Scope of the project

Advantages of the Proposed System

Use case diagram

Data Collection Module

6 CONCLUSION AND FUTURE

1.2 AIM OF THE PROJECT

N. Choudhary b, K. Jain, S. Jain : This study emphasizes the significance of only

A. Lakshmanarao, P. Surya, M Bala Krishna : This thesis collected a dataset of

A. Kulkarni & L. Brown: A machine learning system was created to categorize

for some classifiers.

3.1 EXISTING SYSTEM

Absence of a System: Currently, there is no dedicated system or method in place for

Inability to Adapt: In the absence of a dedicated system, there is no mechanism for

 Lack of Systematic Approach: Due to the absence of a dedicated system, there is no

Enhanced Security Measures: The proposed system incorporates advanced security

3.3 REQUIREMENT SPECIFICATION

3.3.1 Software Requirements:

Programming Language: Define the programming language for system development.

3.3.2 Hardware Requirements:

Internet Connectivity: A reliable internet connection is necessary, especially if your

4.1 SYSTEM ARCHITECTURE

Fig 4.4 Sequence diagram

Data Collection Module:

Data Preprocessing Module:

Machine Learning Model Module:

User Interface Module:

Continuous Update Module:

Security and Ethical Module:

In conclusion, the development of a phishing website analyzer using Natural Language

Behavioral Analysis: Implement behavioral analysis of websites in addition to NLP-

User Feedback Mechanism: Incorporate a user feedback mechanism to allow users to

Multi-language Support: Extend the system's language support to detect phishing

AI-Driven Analysis: Incorporate artificial intelligence (AI) components for improved

Blockchain-based Data Security: Implement blockchain technology to secure and

Open-source Initiative: Consider making the project open-source to encourage

User Education Campaigns: Continue to focus on user education, with awareness

You might also like