0% found this document useful (0 votes)
29 views17 pages

CCPe

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views17 pages

CCPe

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Government Polytechnic Khamgaon

Computer Department

Capstone Project Presentation on


Phishing Website Detection System using ML

Presented By -

2100210097 27 Aasawari Kshirsagar


2100210098 28 Rasika Majgaonkar Guided By: Prof. V. M. Bande
2100210112 37 Vaishnavi Sable
2100210125 49 Tanvi Wankhede
CONTENTS

 Introduction
 Project Overview
 Problem Statement
 Methodology
 Project Plan and Timeline
 Challenges Encountered
 Achievements and Progress
 Lessons Learned
 Next Steps
 Conclusion
 References
INTRODUCTION

 Phishing Website attack is a type of cyber threat where attackers create


deceptive websites that mimic legitimate ones, aiming to trick users into
divulging sensitive information.

 The ultimate goal of a phishing attack is to exploit the victim's trust and
obtain sensitive information that can be used for fraudulent activities,
unauthorized access, or identity theft.

 Phishing website detection involves the use of machine learning


techniques to identify and block websites.
PROJECT OVERVIEW

 The Phishing Website Detection project aims to create a robust system


that accurately identifies whether a user-entered website is a phishing or
not.

 The project aims to improve the accuracy of identifying phishing


websites compared to existing models, addressing the growing social
issue of increased phishing attacks despite strong security measures.

 The ultimate objective is to contribute to overcoming this social problem


by implementing a highly effective phishing detection system
PROBLEM STATEMENT
METHODOLOGY

01 02 03 04

Model Deployment and


Data Collection Feature Extraction Implementation Monitoring
Utilized the Kaggle dataset as our We focus on feature selection and and Training Preparing for the deployment of
primary source of data. engineering. We carefully choose Implemented machine learning trained models in the upcoming
Preprocess the dataset by features that are highly indicative models using the selected weeks. This involves finalizing
handling missing values, of phishing behavior. By selecting features. Trained these models on model selection, conducting
removing duplicates, and and engineering these features, the preprocessed dataset to learn thorough testing, and ensuring
normalizing features to ensure we aim to provide our models patterns and relationships compatibility with the
data quality and consistency with the necessary information to between features and phishing deployment environment.
make accurate predictions. behavior.
TOOLS AND TECHNOLOGIES

 Anaconda Environment with Python : Anaconda provides a convenient environment for managing Python
packages and dependencies.

 Python Flask, HTML, CSS, JS : For designing the user-interface and backend integration.

 Machine Learning with Python Libraries : for Training our model using Scikit-learn's algorithms and evaluate its
performance. Once trained, integrate the model into your Flask application to perform real-time detection.
PROJECT PLAN AND TIMELINE
Week 4
Week 2 We implemented the
We studied different algorithms and trained our
datasets and decided models on the selected
which dataset to use. dataset .

Week 1 Week 3 Week 5


We created the user We learnt about the We compared the
interface for our algorithms which we accuracy of all the
project decided to implemented models
implement . and for improvement
in it implemented
hyperparameter
tunning.

www.free-powerpoint-templates-design.com
CHALLENGES ENCOUNTERED

1. Finding Suitable Environment -

Anaconda is one of the best environment as it already includes most of the pre
installed libraries such as scikit learn,pandas,etc

2. Accuracy -

We use technique called hyperparameter tuning to increase accuarcy of the algorithms and
to find out the parameters that contributes maximum to the accuracy
MODELS IMPLEMENTED

• Ensemble Technique

1] Bagging
Random Forest Algorithm
XGBOOST

• Ensemble Technique
2] Boosting Diagram:

Not Phishing
LOGISTIC REGRESSION

• Logistic regression is a statistical method used for binary classification by estimating the probability of a binary
outcome based on one or more predictor variables.
K - NEAREST NEIGHBOUR (KNN)

• The K-NN algorithm works by finding the K nearest neighbors to a given data point based on a distance metric,
such as Euclidean distance.
LESSONS LEARNED

 Python proficiency: Learned python to understand machine learning algorithms python libraries such as NumPy,
Pandas, Matplotlib and Scikit-learn.
 Environment Choice: Selected Anaconda as primary environment for its efficient package management system
and support for data science tools.
 Dataset selection: Identified and acquired a suitable dataset to project’s requirements.
 Prioritized data preprocessing to ensure high-quality input for model training.
 Recognized the importance of hyperparameters, and allocated more resources and time for hyperparameter tuning.
CONCLUSION
REFERENCES

 https://ieeexplore.ieee.org/document/9730579
 https://ieeexplore.ieee.org/document/10169697
 https://ieeexplore.ieee.org/document/10249799
 https://ieeexplore.ieee.org/document/9824544
 https://ieeexplore.ieee.org/document/10049452
THANK YOU

You might also like