Phishing Detection System Through Hybrid
Phishing Detection System Through Hybrid
• Phishing is the most significant issue in the field of networks and the Internet. Many researchers have
attempted to provide facilities to protect users from cyber-attacks by preventing the phishing of URLs using
machine learning, deep learning, black lists, and white lists. Two groups of phishing detection systems have
been proposed and implemented in previous studies: list-based and machine-learning-based phishing
identification systems. This section is divided into two parts: previous list-based and machine-learning-based
studies.
• LIST BASED PHISHING IDENTIFICATION SYSTEM Phishing identification systems based on List use two
different lists white lists and blacklists for the association and classification of authorized and phishing
webpages. Whitlistbased Phishing identification systems produce protected and reliable websites to
produce the required data.
DISADVANTAGES:
1. Complexity: Hybrid models can be complex to design, implement, and maintain, requiring expertise in multiple machine learning
techniques.
2. Resource Intensive: Training and deploying hybrid models can demand more computational resources compared to single-algorithm
models.
3. Interpretability: As hybrid models involve multiple algorithms, understanding why a certain decision was made can be challenging,
impacting the interpretability of the system.
4. Training Data: Developing hybrid models often requires diverse and representative training data for each algorithm, which can be time-
consuming and require careful curation.
5. Hyperparameter Tuning: Hybrid models typically have more hyperparameters to tune, making the optimization process more intricate.
6. Overfitting: With the inclusion of multiple algorithms, there's a risk of overfitting, where the model fits the training data too closely and
performs poorly on new data.
7. Algorithm Compatibility: Integrating different algorithms into a cohesive hybrid system may be challenging due to differences in their
underlying methodologies.
8. Maintenance: Hybrid models may need continuous maintenance and updates as algorithms evolve and new phishing tactics emerge.
9. Model Complexity Trade-off: While hybrid models aim to improve accuracy, there's a trade-off between complexity and performance.
The increased complexity might not always translate into substantial gains in accuracy.
10. Implementation Challenges: Building a hybrid model requires expertise in multiple machine learning algorithms, potentially making the
development process more intricate
PROPOSED SYSTEM
• The major contributions of this study are as follows. • Phishing URL-based cyberattack detection is
proposed in this study to prevent crime and protect people’s privacy. • The dataset consists of 11000+
phishing URL attributes that help classify phishing URLs based on these attributes. • Machine learning
models have been applied, such as decision tree (DT), linear regression (LR), naive Bayes (NB), random
forest (RF), gradient boosting machine (GBM), support vector classifier (SVC), K-Neighbors classifier
(KNN), and the proposed hybrid model (LR+SVC+DT) LSD with soft and hard voting, which can accurately
classify the threats of phishing URLs. • Cross-fold validation with a grid search parameter based on the
canopy feature selection technique was used with the proposed LSD hybrid model to improve
prediction results. • The proposed methodology must be evaluated using evaluation parameters, such
as accuracy, precision, recall, specificity, and F1-score.
ADVANTAGES:
1. Improved Accuracy: Hybrid machine learning models combine the strengths of multiple algorithms, potentially
leading to higher accuracy in phishing detection compared to individual models.
2. Feature Extraction: Hybrid models can effectively extract and combine features from different algorithms,
enabling them to capture a wider range of characteristics that might indicate phishing.
3. Robustness: Hybrid models are often more robust against noise and variations in data, as they can balance out
the weaknesses of individual algorithms.
4. Reduced False Positives: By combining multiple algorithms, hybrid models can mitigate the tendency for false
positives, resulting in fewer legitimate URLs being misclassified as phishing.
5. Adaptability: Hybrid models can be adapted to changing phishing tactics and techniques, as different algorithms
may excel at detecting certain types of phishing attacks.
6. Enhanced Generalization: The combination of different algorithms can lead to improved generalization, allowing
the model to perform well on unseen and evolving phishing URLs.
MODULES