NK DT Project
NK DT Project
2022-2023
GEETHANJALICOLLEGEOFENGINEERINGAND TECHNOLOGY
Communication Engineering
Dept. of ECE,
GCET
CERTIFICATE
being submitted by T NITIN KUMAR, S IRFAN AND PAKHILESHbearing roll number 21R11A04R4
,21R11A04R3 AND 21R11A04Q5 respectively, in partial fulfillment of the requirements for the
Signature HoD-ECE
Nameofin-chargefaculty Designation
Dept. of ECE,
GCET
CONTENTS
Page No.
References 51
Dept. of ECE,
GCET
ABSTRACT
With raising in-depth amalgamation of the Internet and social life, the Internet is looking
differently at how people are learning and working, meanwhile opening us to growing
serious security attacks. The ways to recognize various network threats, specifically attacks
not seen before, is a primary issue that needs to be looked into immediately.
The aim of phishing site URLs is to collect the private information like user’s identity,
passwords and online money related exchanges. Phishers use the sites which are visibly
and semantically like those of authentic websites.
Since the majority of the clients go online to get to the administrations given by the
government and money related organizations, there has been a vital increment
inphishing threats and attacks since some years. As technology is growing, phishing
methods have started to progress briskly and this should be avoided by making use of
anti-phishintechniques to detect phishing. Machine learning is a authoritative tool that can
be used to aim against phishing assaults.
This study develops and creates a model that can predict whether a URL link is legitimate
or phishing Cyber security persons are now looking for trustworthy and steady detection
techniques for phishing websites detection. By extracting and evaluating numerous aspects
of authentic and phishing URLs, thisproject uses machine learning technology to detect
phishing URLs.In conclusion, the study provided a model for URL classification into
phishing and legitimate URLs.
This would be very valuable in assisting individuals and companies in identifying phishing
attacks by authenticating any link supplied to them to prove its validity.
Keywords: Phishing attacks, legitimate, trust worthy, Machine Learning, Personal
Information,
Malicious links, Phishing domain characteristics
Dept. of ECE,
GCET
Chapter 1. Introduction to Design Thinking
Dept. of ECE,
GCET
Design Thinking was popularized by design consultancy firms like IDEO and the Stanford
d.school. It draws inspiration from design processes but has been adapted and integrated
into various disciplines due to its effectiveness in fostering creativity and problem-solving.
The central tenet of Design Thinking is the focus on understanding and empathizing
with end-users. By putting users at the center of the process, designers can create solutions
that truly address their needs,
preferences, and pain points.
Design Thinking is not a linear process but rather iterative, meaning that it involves
continuous cycles of exploration, ideation, prototyping, and testing. Each iteration brings
the design closer to an optimal solution
through continuous refinement.
Design Thinking is versatile and applicable to a wide array of industries and sectors.
It has been
successfully used in product development, service design, business strategy, social
innovation, healthcare,
education, and more.
Visual tools, such as sketches, storyboards, and mind maps, play a significant role in
Design Thinking.
They help externalize ideas, facilitate communication, and promote a shared
understanding among team
members.
Various tools and methods are employed throughout the Design Thinking process,
such as personal
development, journey mapping, brainstorming techniques, rapid prototyping, and
usability testing.
Dept. of ECE,
GCET
Design Thinking aligns with the principles of Human-Centered Design (HCD), which
emphasizes the importance of designing for the needs and experiences of people. HCD and
Design Thinking often go hand in hand in creating impactful solutions.
Design Thinking has been associated with numerous successful case studies where
innovative products,
services, or systems have been developed to address real-world problems. Its ability to
foster user-centricity has led to the creation of solutions that resonate with their intended
audiences.
Dept. of ECE,
GCET
Fig:1.1-steps of design thinking
Design Thinking and User Experience (UX) design are closely intertwined. UX design
applies Design
Thinking principles to create intuitive, enjoyable, and user-friendly products and services.
The iterative nature
of Design Thinking aligns well with the continuous improvement cycle that
characterizes UX design.
Design Thinking and Agile methodologies share some similarities, such as their
iterative nature and
emphasis on user feedback. Both approaches aim to deliver value early and often, but they
differ in their core
focus. Design Thinking concentrates on problem-solving and user empathy, while Agile
focuses on project
management and software development.
Design Thinking is not limited to business and product development; it has also been
effectively used in
social innovation and addressing complex societal challenges. Nonprofits,
governments, and social enterprises
leverage Design Thinking to create impactful solutions for issues like poverty,
healthcare, education, and
sustainability.
Dept. of ECE,
GCET
Despite its benefits, Design Thinking also faces some challenges. Ensuring effective
collaboration within
diverse teams, maintaining the right balance between creativity and feasibility, and
avoiding "design for
design's sake" are some of the common challenges designers encounter.
As Design Thinking gains popularity, many institutions and organizations offer Design
Thinking courses
and certifications. These programs equip individuals with the knowledge and skills to apply
the methodology in
various settings.
Designers using the Design Thinking approach should also consider ethical
implications. Understanding
the potential consequences of a design solution on different stakeholders is crucial to ensure
that the outcome
aligns with ethical standards and avoids unintended negative impacts.
Design Thinking has found its way into educational settings, transforming the way
students learn and
solve problems. It encourages active learning, critical thinking, and creativity, preparing
students to become
adaptable problem-solvers in the real world.
Dept. of ECE,
GCET
Chapter 2. Identifying the problem statement:Empathy Phase
The Empathy phase is one of the foundational stages of the Design Thinking
process. During this phase, designers and problem solvers aim to deeply
understand the perspectives, needs, desires, and challenges of the users or
stakeholders for whom they are designing a solution. It involves putting aside
preconceptions and immersing oneself in the users experiences to gain valuable
insights that will guide the rest of the design
process.
Fig:2.1-spectrum of empathy
Dept. of ECE,
1. Observation: Designers observe users in their natural environment, paying close
attention to their behaviors, actions, and interactions. This helps identify patterns
and understand how users currently approach and deal with the problem at hand.
2. Interviewing: Engaging in one-on-one interviews with users allows
designers to delve deeper into their thoughts, feelings, and motivations. Open-
ended questions encourage users to express their needs and preferences, leading
to more profound insights.
3. Empathetic Listening: Empathy is at the core of this phase. Designers
actively listen to users without judgment, seeking to understand their emotions
and perspectives. By putting themselves in the users' shoes, designers can develop
a more comprehensive understanding of their experiences.
4. Building Empathy Tools: Designers often create empathy tools, such as
empathy maps and personas, to synthesize and visualize the collected user
insights. Empathy maps help in organizing observations, emotions,
thoughts, and pain points, while personas represent fictional characters embodying
specific user characteristics.
Fig:2.2-empathy phase
Dept. of ECE,
needs and pain points that users might not articulate explicitly. These hidden
insights can lead to innovative solutions that address real and meaningful problems.
3. Building Empathy in the Team: By actively engaging in empathetic
research, the design team cultivates a shared understanding and empathy for
the users. This shared empathy lays the foundation for collaborative problem-
solving and creative idea generation.
Enhancing Creativity: Empathy provides designers with a rich pool of
experiences and emotions to draw upon during the subsequent stages of the
Design Thinking process, fueling creativity and ideation. The Empathy phase sets
the stage for a successful Design Thinking process. By gaining a profound
understanding of users, designers can define the problem more accurately and
develop solutions that are tailored to meet real user needs, resulting in more
impactful and user-centric outcomes.
CHAPTER 2
LITERATURE SURVEY
2.1 OverviewoftheStudy
2.2 LiteratureSurvey
Dept. of ECE,
1 FS-NN: ”An effecti Proposed method 2019 The continuous
Phishing Websites has 3 stages: growing of
Detection Defines new index, features that
Model Based on Designs optimal are sensitive of
Optimal feature selection phishing
Feature Selection and algorithm,Produce attacks need
Neural Network” OFSNN model collection of
more features
for the OFS
Dept. of ECE,
3 ”Phishing Website The 2019 It requires
Detection based on proposed more
Multidimensional method has computation and
Features driven by the following therefore an
Deep stages: expensive
Learning” 1.character method
succession
features of the
URL are
extricated as
well as utilized
for fast
characterization
2. the LSTM
(long short-term
memory)
network is
utilized to
catch setting
semantic
and dependency
features of URL
character
groupings.
3. softmax
classifies the
features
extracted
Dept. of ECE,
4 ”WC-PAD: Web It is a 3-phase 2019 Time consuming
Crawling based Phishing detection of as it involves
Attack Detection” phishing attack three phases and
approach. The 3 each website has
phases of WC- to go through the
PAD are 1) three phases.
blacklist of DNS
2) Approach
based on
Heuristics and
3) Approach
based on Web
crawler. Feature
extraction as
well as phishing
attack detection
both makes use
of web crawler.
combined by
using a CNN of 3
layers
to create
precise feature
representations
of URLs. That
is then used for
training the
classifier of
phishing URLs.
Dept. of ECE,
6 ”An Adaptive Machine A phishing 2020
Learning Based detection
Approach for system was
Phishing Detection developed by
Using making use of
Hybrid Features” classifier of
Machine
learning called
XCS. It is an
adaptive ML
technique that
is online. This
advances a lot
of
rules called
classifiers. This
model derives 38
features from
source code of
webpage and
URLs.
7 ”A new method for The three 2020 Does not give full
Detection of Phishing major phases in information about
Websites: URL this work the techniques use
Detection are Parsing,
Heuristic
Classification of
data,
Performance
Analysis in
this model. All
of these phases
use various and
distinctive
methods for
data processing
to get
results
that are
better.
Dept. of ECE,
From the above, ML methods plays a vital role in many applications
of cybersecurity and shall remain an encouraging path that
captivates more such investigations. When coming to the reality,
there are several barriers that are limitations during
implementations. As discussed, there are many approaches earlier
proposed for detecting phishing website attack and they also have
their own limitations. Therefore, the aim of the project is detection of
phishing website attack using a novel Machine learning technique.
2.3 AnalysisofExistingSystem
Dept. of ECE,
literature survey and various research paper analyzed and we
specified some important points of each paper and related diagrams
or graphs are included. In comparison section we have mainly
highlighted few important advantages and disadvantages in each
paper and comparison between those papers. This chapter also
introduces drawbacks of existing system and functionality of
proposed system and their advantages.
Dept. of ECE,
CHAPTER 3
ANALYSIS
3.1 OverviewofSystemAnalysis
Dept. of ECE,
below:
Dept. of ECE,
3.4 SystemArchitecture
Dept. of ECE,
a file is known as a module; definitions a module can be brought into
different modules or into the fundamental module.
Some of the modules used in the project are as shown in Table 3.1
Dept. of ECE,
3.6 Machinelearningmodels
Dept. of ECE,
value is lesser than 0.5. In this way, we can use Logistic
Regression to classification problems and get accurate predictions.
Dept. of ECE,
support vector machine uses kernel trick which transforms lower dimensional
space to higher dimensional space.
Dept. of ECE,
CHAPTER 4
DESIGN
4.1 SystemModelling
4.2 UMLActivityDiagram
Dept. of ECE,
Figure 4.1: UML activity diagram
4.3 DataFlowDiagrams
Dept. of ECE,
4.1.1 Data Flow Diagram – Level 0
Dept. of ECE,
Figure 4.3: DFD - level 1
Dept. of ECE,
4.1.3 Data Flow Diagram – Level 2
DFD level 2 goes one more step deeper into the subprocesses of Level
1. Fig 4.4 shows the DFD level 2 of the system. It might require more
text to get into the necessary level of detail about the functioning of
the system. The Level 2 gives a more detailed sight of the system by
categorizing the processes involved in the system to three categories
namely preprocessing, feature scaling and classification. It also
graphically depicts each of these categories in detail and gives a
complete idea of how the system works.
Dept. of ECE,
Figure 4.4: DFD - level 2
4.2Summary
The system’s architecture, the processes involved from input to
output with varying levels of complexity and the system’s behaviour
is graphically represented for better understanding of the system in
the above chapter.
CHAPTER 5
Dept. of ECE,
IMPLEMENTATION
5.1 TechnologyUsed
PYTHON
Dept. of ECE,
because it allows teams to work collaboratively without significant
lang- uage and experience barriers.
Additionally, Python supports the use of modules and packages,
which means that programs can be designed in a modular style and
code can be reused across a variety of projects. Once you have
developed a module or package you need, it can be scaled for use in
other projects, and it’s easy to import these modules.
One of the most promising benefits of Python is that both the
standard library and the interpreter are available free of charge, in
both binary and source form. There is no exclusivity either, as
Python and all the necessary tools are available on all major
platforms. Therefore, it is an enticing option for developers who
don't want to worry about paying high development costs.
That makes Python accessible to almost anyone. If you have the time
to learn, you can create some amazing things with the language.
MACHINE LEARNING
PANDAS
Dept. of ECE,
in need of high performance, flexible tool for analysis of data. Prior
to Pandas, Python was majorly used for data munging and
preparation. It had very little contribution towards data analysis.
Pandas solved this problem. Using Pandas, we can accomplish five
typical steps in the processing and analysis of data, regardless of the
origin of data — load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including
academic and commercial domains including finance, economics,
Statistics, analytics, etc.
NUMPY
Dept. of ECE,
Figure 5.2 Flowchart of the proposed System
Dept. of ECE,
Figure 5.3 Flowchart of the web interface
Dept. of ECE,
46
5.1 Dataset
5.2 Processinvovedinimpementatonill
The first step of the research work was determining the right data
set. The dataset selected was collected from Kaggle for this task.
The reasons behind selecting this dataset are several. It includes:
1. The data set is large, so working with it is intriguing
2. The number of features in the data set is 30 giving a
wide range of features making the predictions a little
more accurate. 3.The number of URLs is quite evenly
distributed among the 2 categories.
5.5.1 Splitting:
The dataset into training part of dataset and testing part of dataset.
The dataset was split into training and testing dataset with 75% for
training and 25% for testing using the
“train test split” method. The splitting was done after assigning the dependent
variables and independent variables. 5.5.2
Preprocessing:
Dept. of ECE,
5.5.3 Feature extraction
The home page contains a session for a user to enter a URL and
predict if it is phishing or legitimate. It predicts the state of the URL
base on the feature selection. The purpose of this page is to help its
users validate a URL link
Dept. of ECE,
5.6.2 FAQs Page
The FAQs Page contains a series of questions and answers about the
phishing attacks and how the users can get prevented from getting
attacked by the phishing sites.
Dept. of ECE,
Figure 5.5 Code for the web application
Dept. of ECE,
Dept. of ECE,
5.4 SUMMARY
Dept. of ECE,
CHAPTER 6
RESULTS AND DISCUSSIONS
Dept. of ECE,
6.1 TableandGraphsofresults
Dept. of ECE,
Figure 6.1 Graph of accuracy
6.2 Resultscomparisonandgraphs
The phishing scam in websites classification model is generated by
implementing random forest algorithm, Logistic regression and
support vector machine algorithms. The goal of this project is to
compare the performance of different classifiers and find out the
best approach for classification phishing and non-phishing website.
These algorithms were implemented in python.
Dept. of ECE,
CHAPTER 7
7.1 TestingTypes
Dept. of ECE,
7.2.1ValidationTesting
Dept. of ECE,
S Input URL Expected Actual Remarks
No Output Output
Dept. of ECE,
7.3 Summary
8.1 Conclusion
Dept. of ECE,
aimed to build a phishing detection mechanism using machine learning
tools and techniques which is efficient, accurate and cost effective. The
project was carried out in Anaconda IDE and was written in Python.
The proposed method used four machine learning classifiers to achieve
this and a comparative study of the four algorithms was made. A good
accuracy score was also achieved.
The project can also include other variants of phishing like smishing,
vishing, etc. to complete the system. Looking even further out, the
methodology needs to be evaluated on how it might handle collection
growth. The collections will ideally grow incrementally over time so
there will need to be a way to apply a classifier incrementally to the new
data, but also potentially have this classifier receive feedback that might
modify it over time.
8.3 Recommendation
Through this project, one can know a lot about phishing attacks and
how to prevent them. This project can be taken further by creating a
browser extension that can be installed on any web browser to detect
phishing URL Links.
Dept. of ECE,
REFERENCES
[8] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, and C.
Wang.
Machine learning and deep learning methods for cybersecurity. IEEE
Access, 6:35365– 35381, 2018.
[9] Neha R. Israni and Anil N. Jaiswal. A survey on various phishing
and anti- phishing measures. International journal of engineering
Dept. of ECE,
research and technology, 4, 2015.
[10] Pingchuan Liu and Teng-Sheng Moh. Content based spam e- mail
filtering. pages 218–224, 10 2016.
[11] N. Agrawal and S. Singh. Origin (dynamic blacklisting) based
spammer detection and spam mail filtering approach. In 2016 Third
International Conference on Digital Information Processing, Data
Mining, and Wireless Communications (DIPDMWC), pages 99–104,
2016.
[12] Vikas Sahare, Sheetalkumar Jain, and Manish Giri. Survey:anti-
phishing framework using visual cryptography on cloud. JAFRC, 2,
01 2015.
[13] S. Patil and S. Dhage. A methodical overview on phishing
detection along with an organized way to construct an anti- phishing
framework. In 2019 5th International Conference on Advanced
Computing Communication Systems (ICACCS), pages 588– 593,
2019.
[14] Dipesh Vaya, Sarika Khandelwal, and Teena Hadpawat. Visual
cryptography: A review. International Journal of Computer
Applications, 174:40–43, 09 2017. [15] Saurabh Saoji. Phishing
detection system using visual cryptography, 03 2015.
Dept. of ECE,
(ICDMW), pages 7– 12, 2018.
[19] J. Mao, W. Tian, P. Li, T. Wei, and Z. Liang. Phishing-alarm:
Robust and efficient phishing detection via page component similarity.
IEEE Access, 5:17020– 17030, 2017.
[20] G. J. W. Kathrine, P. M. Praise, A. A. Rose, and E. C. Kalaivani.
Variants of phishing attacks and their detection techniques. In 2019
3rd International Conference on Trends in Electronics and Informatics
(ICOEI), pages 255–259, 2019.
[21] Muhammet Baykara and Zahit Gurel. Detection of phishing
attacks. pages 1–5, 03 2018. [22] Prof. Gayathri Naidu . A survey on
various phishing detection and prevention techniques. International
Journal of Engineering and Computer Science, 5(9), May 2016.
[23] E. Zhu, Y. Chen, C. Ye, X. Li, and F. Liu. Ofs-nn: An effective
phishing websites detection model based on optimal feature selection
and neural network. IEEE Access, 7:73271–73284, 2019.
[24] Mahdieh Zabihimayvan and Derek Doran. Fuzzy rough set
feature selection to enhance phishing attack detection, 03 2019.
[25] P. Yang, G. Zhao, and P. Zeng. Phishing website detection based
on multidimensional features driven by deep learning. IEEE Access,
7:15196–15209, 2019.
[26] T. Nathezhtha, D. Sangeetha, and V. Vaidehi. Wc-pad: Web
crawling based phishing attack detection. In 2019 International
Carnahan Conference on Security
Technology (ICCST), pages 1–6, 20
Dept. of ECE,
Dept. of ECE,