0% found this document useful (0 votes)
5 views20 pages

Detection of Phishing Web Page Using Machine Learning

Uploaded by

tharanrr3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

Detection of Phishing Web Page Using Machine Learning

Uploaded by

tharanrr3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

DETECTION OF

PHISHING WEB
PAGE
E C 9 5 60 DATA M IN I N G
2 02 0 /E /155
• Objective

• Scope Identification

• Data Description
Contents
• Comprehensive study on
Data

• Data pre-processing

2
OBJECTIVE

To develop a ML classifier to predict whether a


website is a phishing or legitimate considering the
URL of the web page.
Scope
Examining the relationships
between URL characteristics such
as length, special characters,
number of redirections, and
phishing likelihood. This scope is
crucial for identifying patterns in
URL Pattern Analysis URLs that may distinguish phishing
websites from legitimate ones.
Insights from this analysis can help
improve the model's accuracy in
real-time phishing detection,
supporting cybersecurity measures
and user safety.

4
• This dataset was collected from
Kaggle.

• Each row represents a website DATA


with last column representing
whether it is phishing or not. DESCRIPTION
• Contains totally 100077 web
pages each with 20 features.
Comprehensive study on Data

About the Data Distribution Analysis

Q2 Q4

Q1 Q3

Data Visualization Correlation Analysis

6
Click to add photo

About the Data

7
Handling
Null Values

8
Data Visualization & Distribution
Analysis

9
Box Plot

10
Pair Plot

11
Correlation Analysis

12
Correlation Matrix

13
• REMOVE DUPLICATES

• OUTLIER DETECTION
AND REMOVING DATA
• FEATURE SCALING PREPROCESSING
• TRAIN TEST SPLIT
Handle Duplicates
• As they all are numerical
values, there will be some
duplicate values like 0.

• No need for categorical


encoding

15
Outlier detection and Removal
• Z-Score method

• Removes the rows with values


beyond a threshold.

16
Feature Scaling

• Features and Target were


separated

• Standard scaler for


normalizing the data to bring
all the features to a similar
scale

17
Train-Test Split

• 15% for Test

• 15% for validation

• 70% for Test

18
Future Works

FEATURE
SELECTION
MODEL
BUILDING
MODEL
EVALUATION

19
THANK
YOU
T H A RAN YA A. R
2 02 0 /E /155

You might also like