0% found this document useful (0 votes)
15 views7 pages

JETIR2504A41

Uploaded by

Adnan Mohsin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

JETIR2504A41

Uploaded by

Adnan Mohsin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

© 2025 JETIR April 2025, Volume 12, Issue 4 www.jetir.

org (ISSN-2349-5162)

PHISHNET: Online Spam and Phishing Detection


Using Machine Learning
Vankudothu Jeevan Bushaboina Uday Kumar
Dept. of Electronics and Computer EngineeringDept. of Electronics and Computer Engineering
Sreenidhi Institute of Science and TechnologySreenidhi Institute of Science and Technology
Yamnampet, Ghatkesar,Hyderabad Telangana 501301 Yamnampet, Ghatkesar,Hyderabad Telangana 501301
[email protected] [email protected]

Nangunuri Sai Teja K Sreelatha


Dept. of Electronics and Computer EngineeringAssistant professor
Sreenidhi Institute of Science and Technology Dept. of Electronics and Computer Engineering
Yamnampet, Ghatkesar,Hyderabad Telangana 501301 Sreenidhi institute of science and technology
[email protected] Yamnampet, Ghatkesar,Hyderabad
Telangana 501301
Mohan Dholvan Professor [email protected]
Dept. of Electronics and Computer Engineering
Sreenidhi institute of science and technology
Yamnampet, Ghatkesar,Hyderabad
Telangana 501301
[email protected]

Abstract: Phishing attacks, which use phony


websites, emails, and SMS messages to target current systems, such as false positives and negatives.
people and organizations, are a serious threat in the
digital age. The goal of this project, "PHISHNET," Keywords: Phishing, Machine learning, XG Boost,
is to use cutting-edge machine learning techniques Random Forest, Naïve Bayes.
to create a reliable and scalable defense against
phishing threats. Utilizing algorithms like Random I. INTRODUCTION
Forest, Gradient Boost, and Naïve Bayes, the PHISHNET: Phishing Attack Threat Intelligence
system is built to identify and stop phishing System One cybersecurity initiative aims to identify
attempts instantly, offering complete defense and evaluate possible phishing threats by utilizing
against constantly changing online threats. Website, machine learning. Phishing, which involves using
email, and SMS phishing detection are the three phony websites, emails, and messages to trick users, is
main areas covered by this project's scope. It still a major attack technique in light of the increase in
incorporates proactive prevention measures, such as cybercrimes. In order to determine whether URLs,
a blocking mechanism for phishing websites that emails, and messages are authentic or fraudulent, this
have been identified, and real-time classification project uses machine learning algorithms. PHISHNET
mechanisms to identify malicious activities. Easy improves detection capabilities and aids in the
interaction is ensured by a user-friendly interface, prevention of phishing attacks by incorporating threat
which also allows for informed decision-making by intelligence techniques.
displaying phishing risk levels via safety Based on a number of characteristics, the system
percentages. The literature review stresses the need analyzes user-inputted URLs, emails, and messages to
for updated techniques to handle small datasets and determine whether they are malicious or safe. It uses
computational limitations, as well as issues with sophisticated feature extraction techniques to reduce
false positives and increase accuracy. Because of its
scalable architecture, PHISHNET can continuously
learn and adjust to new phishing tactics.The project
aims to provide an effective and reliable solution for
users, businesses, and cybersecurity professionals to
combat phishing threats efficiently. The system

JETIR2504A40 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org k1


© 2025 JETIR April 2025, Volume 12, Issue 4 www.jetir.org (ISSN-2349-5162)

processes user-inputted URLs, emails, and diversity. The study lays the groundwork for creating
messages to classify them as either safe or stronger models with cutting-edge machine learning
malicious based on various features. It employs methods like XGBoost, SVM, and decision trees.
advanced feature extraction methods to improve Arathi Krishna V et al. (2021) [2] Using neural
accuracy and minimize false positives. PHISHNET networks, random forest, and SVM classifiers, Arathi
is designed to be scalable, allowing continuous Krishna's paper investigates phishing detection and
learning and adaptation to new phishing achieves 90.70% accuracy in URL analysis. The study
techniquesThe project's goal is to give users, highlights the necessity of using a variety of machine
companies, and cybersecurity experts a dependable learning algorithms in order to detect different types of
and efficient way to counteract phishing threats. phishing attacks. It draws attention to the ways in which
various classifiers enhance detection precision and
Because of its intuitive interface, PHISHNET lower false positives. According to the research,
makes it simple to detect phishing attempts without integrating different models improves the capacity to
the need for technical knowledge. To keep ahead of assess and counteract various phishing threat types.
new phishing techniques, it makes use of real-time
data analysis and regularly updates its detection Taun Dung Pham and associates (2021) [3] They
models. The system is a flexible tool for people and perform two URL classifiers, LSTM and GRU, to
organizations looking for improved cybersecurity provide comparative results after integrating the
because it can be integrated into a variety of generated phishing URL data into the current URL
platforms. PHISHNET offers a proactive method of database. They suggest using the available phishing
detecting and reducing phishing threats by URL data to train WGAN-GP, a GA network, to
employing machine learning-driven threat generate malicious URLs. Using a machine learning
intelligence, guaranteeing a safer online experience approach, it is necessary to identify the state-of-the-art
for users. and choose the best machine learning algorithm.

Any users who unintentionally respond to Chapla et al., Happy (2019) [4] This model
misleading messages, open malicious emails, or incorporates the features that have been extracted from
click on fraudulent links risk financial losses and both phishing and legitimate URLs. It is necessary to
security breaches. Attackers employ advanced identify useful lexical features. Our system receives
techniques to produce phony emails and websites real, accurate results.
that closely mimic authentic sources, making it
challenging for users to discern between authentic III. SYSTEM ARCHITECTURE
and fraudulent content. The necessity for an
intelligent system that can successfully identify and
stop phishing attempts before they cause harm is
highlighted by this growing threat. 1. User Interface (UI)

II. LITERATURE SURVEY Description: The front-end where users interact with
PHISHNET to input data and view results.
Manuel Sánchez-Paniagua and associates (2022)
[1] A Logistic Regression model with TF-IDF Components:
feature extraction is presented in the paper by
Manuel Sánchez-Paniagua et al. (2022), and it  Web portal (main interface for interaction)
achieves 96.50% accuracy on a dataset of login  Real-time phishing detection results
URLs for phishing detection. Although useful, the  Risk level visualization (e.g., safe, suspicious,
study emphasizes that TF-IDF and Logistic dangerous)
Regression alone are not enough to identify  Upload/input options for URL, email content, or
changing phishing tactics. To increase accuracy and SMS text
lower false positives, the authors advise combining
several machine learning algorithms. In order to
improve phishing detection, they also stress the
significance of feature engineering and dataset
JETIR2504A40 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org k2
© 2025 JETIR April 2025, Volume 12, Issue 4 www.jetir.org (ISSN-2349-5162)

2. Backend (Machine Learning Layer) Components:

Description: The core of phishing detection using  Website Blocking: Displays strong warnings when
machine learning algorithms. a URL is flagged
 Email/SMS Alerts: Textual warning messages
Components: indicating the threat level

ML Model Deployment:

 Models like XG Boost, Random Forest, Naïve 5. Real-Time Threat Monitoring


Bayes
 Integrated directly into the backend logic (no Description: Monitors phishing trends and updates
API call needed) detection logic periodically.

Model Prediction Logic: Components:

 Direct method calls to prediction functions  Threat feed integration (via direct scripts or file
ingestion)
Detection Modules:  Scheduled retraining/updating of models

 Website Detection: Uses URL analysis and


webpage features
 Email Detection: Analyzes headers, content,
and links
 SMS Detection: Scans text for suspicious links
or phrases

3. Data Layer (Phishing Datasets)

Description: Source of truth for training and


evaluating models.

Components:

 Preloaded datasets for different attack vectors


(web, email, SMS)
 Local or cloud-based storage for:
 Model training
 Testing/evaluation

Dataset versioning
Fig.1. Project Architecture
4. Blocking & Alerting Mechanism
IV. System Flow
Description: Prevents or warns users about
malicious content.
Workflow and Data Flow
The system workflow follows a structured approach to
travel planning:

JETIR2504A40 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org k3


© 2025 JETIR April 2025, Volume 12, Issue 4 www.jetir.org (ISSN-2349-5162)

1. User Input Stage  May include suggestions (e.g., "Do not click",
"Report this message").
 User provides:

 A URL (to check for phishing website)


 An Email text (to analyze for phishing signs)
 An SMS message (to scan for suspicious
content)

 Inputs are sent directly to the backend via form


submission or HTTP request.

2. Preprocessing Stage

 Input data is cleaned and converted into


features for the model:
 For URLs: Domain name, presence of "https",
length, special characters, etc.
 For Emails: Analyzes structure, content,
phishing keywords, presence of links.
 For SMS: Looks for common phishing patterns
and malicious links.

3. Feature Extraction & Transformation

 Preprocessed data is transformed into the


format expected by the ML model.
 Example: Categorical encoding, vectorization, Fig.2. User-Interface Diagram
etc.
 These features are passed to the corresponding
ML model (e.g., Random Forest, Gradient V. IMPLEMENTATION
Boosting).
Several technologies must be integrated in order to
implement TripTrail, guaranteeing accuracy, efficiency,
and a flawless user experience. The implementation
4. ML Model Prediction process is thoroughly covered in this section, which
also covers database administration, chatbot
 The selected machine learning model evaluates functionality, AI and ML integration, front-end and
the input. back-end development, and system security.
 It predicts the phishing probability or label: The user-friendly, responsive web interface of
PHISHNET makes it simple for users to enter dubious
 Phishing URLs, emails, or SMS messages for phishing detection.
 Safe The following are the main elements of the front-end:
 Suspicious Interface for Users (UI): The user interface, which is
made with HTML, CSS, and JavaScript, guarantees a
5. Output Generation smooth and eye-catching experience. In addition to
displaying phishing risk levels and safety scores, it
 The system generates a clear output message offers simple navigation and fast access to phishing
based on the model's prediction. detection features, including input forms for websites,
 Example: “⚠️ This URL is likely a phishing site emails, and SMS.
(95% confidence)”
 Risk level: Safe, Moderate Risk, High Risk Frameworks and Libraries: React.js is used by the

JETIR2504A40 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org k4


© 2025 JETIR April 2025, Volume 12, Issue 4 www.jetir.org (ISSN-2349-5162)

front-end to produce dynamic, interactive user


interface elements. By updating only the necessary Historical Data Storage: Information about phishing
portions of the user interface, React's efficient attempts, user interactions, and detection results is kept
rendering guarantees a seamless user experience. It in the database. By improving detection models and
also provides quick response times and real-time increasing overall accuracy over time, this data aids in
updates on the results of phishing detection. system improvement.
Machine Learning Integration for PHISHNET
Back-End Implementation:
The machine learning elements are essential to
As the central processing unit, the back-end improving PHISHNET's capacity to identify and stop
manages data, performs AI calculations, and detects phishing attempts.
phishing attempts. It is in charge of delivering
phishing risk assessments, processing user inputs, Models for Phishing Detection: Websites, emails, and
and executing machine learning models. Important SMS messages are categorized as either legitimate or
elements consist of: phishing using algorithms like Random Forest, XG
Boosting, and Naïve Bayes. Using freshly labeled
1. Web Framework: Python is used to build the phishing data, these models continuously increase their
back-end, and frameworks like Flask ensure a accuracy.
scalable and reliable architecture that can
effectively handle numerous phishing detection Training Models: In order to improve its capacity to
requests. identify new phishing techniques and reduce false
2. Machine Learning Models: To perform real- positives or negatives, the system constantly updates
time phishing classification for websites, emails, and retrains models using fresh data.
and SMS, the back-end incorporates pre-trained
machine learning models (XG Boost, Random Real-Time Detection: The system incorporates real-
Forest, and Naïve Bayes). After processing time information from user inquiries to guarantee
incoming data, these models categorize it as either current Web, email, and SMS message detection while
legitimate or phishing. adjusting to new phishing techniques as they appear.

Data Processing: Incoming user data (URLs,


emails, and SMS messages) is processed by the
back-end, which then classifies it using the relevant
machine learning model and sends the safety
percentages and phishing risk results to the front-
end for display.

PHISHNET stores and manages user queries,


system results, and phishing detection logs using a
structured database.

Database System: To guarantee effective and


dependable data storage and retrieval for phishing
detection results and system interactions, a SQL-
based relational database (like MySQL or
PostgreSQL) is utilized.

Data Optimization: To enhance performance and


guarantee prompt access to sizable datasets of
phishing URLs, emails, and SMS data for real-time
analysis, indexing and query optimization
techniques are used.
Fig.3.Sequence Diagram

JETIR2504A40 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org k5


© 2025 JETIR April 2025, Volume 12, Issue 4 www.jetir.org (ISSN-2349-5162)

VI.RESULTS

Fig.6. URL Detection

Fig.4. User Interface page

Fig.7. SMS Detection

VII. CONCLUSION
Fig.5. Email Detection
 By identifying and stopping phishing attempts on
websites, emails, and SMS messages, our project
seeks to improve internet security.

JETIR2504A40 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org k6


© 2025 JETIR April 2025, Volume 12, Issue 4 www.jetir.org (ISSN-2349-5162)

 We make it easier for users and enhance their [2] Edward Wijaya, Gracella Noveliora, Kharisma Dwi
overall online safety by offering a single Utami , Rojali, Ghinaa Zain Nabiilah ,Spam Detection
platform for phishing detection and prevention. in Short Message Service (SMS) Using Naïve Bayes,
SVM, LSTM, and CNN , 2023 10th International
 The system offers real-time phishing detection, Conference on Information Technology, Computer and
assisting users in making well-informed Electrical Engineering (ICITACEE) .
security decisions. [3] Ammar Odeh , Ismail Keshta , Eman Abdelfattah,
 The incorporation of machine learning Machine Learning Techniques for Detection of
techniques improves detection accuracy and Website Phishing: A Review for Promises and
lowers false positives Challenges , 2021 IEEE 11th Annual Computing and
Communication Workshop and Conference (CCWC).
 All things considered, by assisting users in [4] V Dharani, Divyashree Hegde, Mohana, Spam
making wise decisions and avoiding phishing SMS (or) Email Detection and Classification using
scams, our project makes the internet a safer Machine Learning , 2023 5th International Conference
place. on Smart Systems and Inventive Technology (ICSSIT).
[5] Kerin Pithawala, Sakshi Jagtap and Preksha
Cholachgud , Detecting Phishing of Short Uniform
Resource Locators using classification techniques ,
VIII. FUTURE ENHANCEMENTS 2021 12th International Conference on Computing
Communication and Networking Technologies
 Improved Machine Learning Models: By (ICCCNT).
combining more sophisticated deep learning
and natural language processing methods, the
detection accuracy is increased.

 Support for Mobile Applications:


PHISHNET will now be available as a mobile
app to examine phishing links and messages
sent through messaging apps and SMS.

 Multi-Language Support: To improve


accessibility worldwide, the system is being
improved to identify phishing attempts in
various languages.

 Cloud-Based Deployment: PHISHNET can be


hosted as a cloud-based service that offers real-
time phishing detection without the need for
local installations.

 Blockchain for Data Integrity: Investigating


the use of blockchain technology to safely store
phishing data and guard against manipulation.

IX. REFERENCES

[1] Adarsh Mandadi, Saikiran Bopanna, Vishnu


Ravella, Dr. R Kavitha, Phishing Website
Detection Using Machine Learning , 2022 IEEE
7th International conference for Convergence in
Technology (I2CT).

JETIR2504A40 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org k7

You might also like