0% found this document useful (0 votes)
6 views

DDOS Attack Classifier Using Machine Learning

Uploaded by

Vraj Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

DDOS Attack Classifier Using Machine Learning

Uploaded by

Vraj Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Scientific Research in Engineering and Management (IJSREM)

Volume: 08 Issue: 11 | Nov - 2024 SJIF Rating: 8.448 ISSN: 2582-3930

DDOS Attack Classifier Using Machine Learning

Aryan Chauhan1, Prince Gandhi2, Nisarg Patel3 , Dhara Parikh4


123Department of Information Technology, Institute of Information Technology, Krishna School Of Emerging
Technology & Applied Research, KPGU University , Varnama, Vadodara, Gujarat, India

4 Assistant Professor, Department of Information Technology and Engineering, Krishna School Of Emerging
Technology & Applied Research, KPGU University. Varnama, Vadodara, Gujarat, India

---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The increasing frequency and complexity of bandwidth of a network by overwhelming it with massive
Distributed Denial of Service (DDoS) attacks present amounts of data. Protocol attacks exploit vulnerabilities in
substantial challenges to network security. Traditional network protocols, using methods like SYN floods and
Intrusion Detection Systems (IDS) often struggle to detect fragmented packet attacks to consume server resources.
Application-layer attacks, such as HTTP floods, target specific
intricate attack patterns in real-time. This research introduces
functions of applications, causing a slower but equally
a machine learning-based classifier designed to accurately
disruptive denial of service. The complexity and variety of
identify and classify DDoS attacks. Leveraging the Intrusion these attack types make DDoS attacks particularly challenging
Detection Evaluation Dataset (CIC-IDS2017), which contains to detect and mitigate, as they require systems that can handle
realistic and labeled network traffic, we assess multiple a wide range of patterns and adapt to evolving tactics.
models—Random Forest, Logistic Regression, Gradient
Boosting, and Naive Bayes. By incorporating advanced Intrusion Detection Systems (IDS) and Intrusion
feature selection and hyperparameter tuning, this classifier Prevention Systems (IPS) are fundamental defenses against
effectively minimizes false positives and demonstrates strong DDoS attacks. However, traditional IDS/IPS approaches often
performance across metrics like accuracy, precision, and rely on signature-based methods, which detect attacks based on
recall, making it a promising candidate for real-time DDoS known patterns or rules. These approaches are limited in their
ability to recognize new, emerging attack patterns and are
detection in modern networks.
prone to high false positive rates in complex network
environments. To address these limitations, security
researchers are increasingly turning to machine learning-based
Key Words: Distributed Denial of Service (DDoS), Machine methods, which use data analysis and pattern recognition to
Learning, Intrusion Detection System (IDS), Random Forest, detect anomalous behavior that may signify a DDoS attack.
Logistic Regression, Naive Bayes, Gradient Boosting, Unlike rule-based methods, machine learning models can adapt
Exploratory Data Analysis (EDA) to new attack patterns, making them well-suited for the
dynamic and evolving nature of network traffic.
The goal of this study is to leverage machine learning
techniques to classify DDoS attacks, offering a more flexible
and robust approach to network defense. Machine learning
1. Introduction classifiers analyze network traffic data, learning patterns
associated with malicious activity, and can thereby achieve
This Distributed Denial of Service (DDoS) attacks have higher detection accuracy than traditional techniques. By
emerged as a critical challenge in modern network security, employing algorithms such as Random Forest, Gradient
aimed at disrupting the availability of services by Boosting, Logistic Regression, and Naive Bayes, this study
overwhelming them with excessive, malicious traffic. In a seeks to identify classifiers that not only detect DDoS attacks
DDoS attack, multiple compromised systems, often organized with high accuracy but also minimize false positives, which is
into a botnet, are used to flood a target—such as a server, critical for reducing unnecessary alerts and resource
network, or application—with an unmanageable volume of consumption.
requests. This traffic surge causes the target system to slow
down or crash, leading to service disruptions that impact
businesses, public services, and critical infrastructure. The rise
of internet-connected devices, particularly in the Internet of
Things (IoT) domain, has exacerbated this issue by providing
attackers with a vast pool of unsecured devices that can be
easily co-opted into large-scale botnets.
DDoS attacks vary significantly in type and sophistication,
generally falling into categories like volumetric attacks,
protocol attacks, and application-layer attacks. Volumetric
attacks, such as UDP and ICMP floods, aim to exhaust the

© 2024, IJSREM | www.ijsrem.com | Page 1


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 08 Issue: 11 | Nov - 2024 SJIF Rating: 8.448 ISSN: 2582-3930

which limits generalizability to real-world conditions. This


underscores the need for models that handle diverse attack data
effectively, a challenge our study addresses through feature
selection and preprocessing techniques to improve model
generalization.
Feature selection has been emphasized in studies such as
[4], where techniques like PCA and SVD optimized
computational efficiency without sacrificing accuracy.
Similarly, our approach integrates feature encoding,
normalization, and high-impact feature selection, reducing
complexity and enabling models like Gradient Boosting to
achieve high precision and recall, as confirmed by our
findings.
While deep learning models, such as Convolutional
The architecture of a DDoS attack typically includes Neural Networks (CNN), have shown promise in DDoS
multiple phases: the recruitment of devices into a botnet, the detection, as noted in [5], they require extensive computational
orchestration of the attack, and the delivery of coordinated
traffic to the target system. The following diagram illustrates resources, posing challenges for real-time applications. To
the architecture of a DDoS attack, showing how compromised address this, our study focuses on efficient machine learning
devices (“IoT devices”or "bots") are directed by an attacker to models like Random Forest and Gradient Boosting, which
simultaneously flood a target. This multi-layered structure, balance accuracy and processing demands, making them
which often involves traffic from various IP addresses,
suitable for real-time and scalable DDoS detection
complicates detection efforts, as it can make malicious traffic
appear similar to legitimate user activity. In summary, while existing work has advanced DDoS
detection, gaps remain in achieving scalability, real-world
With the continuous rise in DDoS incidents across sectors,
from finance to healthcare, it is essential to develop classifiers generalizability, and computational efficiency. Our solution
that can offer real-time, accurate detection. By exploring the addresses these challenges through a carefully selected
performance of different machine learning models, this ensemble of machine learning models optimized by thorough
research aims to contribute insights that can guide the preprocessing and feature selection. These improvements lead
development of more effective and adaptive DDoS mitigation to a high-accuracy classifier that meets the demands of real-
strategies, ultimately helping to safeguard critical network
infrastructures against future attacks. time DDoS detection in diverse network environments,
reducing false positives and enhancing adaptability across
attack scenarios.

2. Related Work
Several studies have highlighted the importance of robust 3. Methodology
DDoS detection mechanisms, increasingly favoring machine In this research, we designed a machine learning-based
learning techniques over traditional rule-based or statistical classifier to detect and classify DDoS attacks by following a
structured approach involving dataset loading, data
methods. These conventional approaches, reliant on predefined
preprocessing, model training, evaluation, and comparison.
signatures and thresholds, struggle to adapt to the evolving
nature of DDoS attacks, often resulting in high false-positive
rates. In contrast, machine learning methods offer greater
adaptability and precision by analyzing complex traffic
patterns, enabling more effective detection with minimal
human intervention.
Recent research has applied various machine learning
classifiers to DDoS detection, demonstrating improved
accuracy. [1] utilized models like Random Forest and Support
Vector Machines to enhance detection rates, while [2] showed
that machine learning models can achieve high accuracy for
medium-scale attacks. However, scalability remains a
challenge in large-scale attack scenarios, highlighting the need
for more efficient, adaptable classifiers.
[3] researchers found that supervised machine learning
models outperform rule-based methods in IoT environments.
However, they noted a reliance on simulation-based validation,

© 2024, IJSREM | www.ijsrem.com | Page 2


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 08 Issue: 11 | Nov - 2024 SJIF Rating: 8.448 ISSN: 2582-3930

A. Dataset Exploratory Data Analysis (EDA): EDA was performed


For this research, we utilized the [6] Friday- to understand the data distribution and statistical properties of
WorkingHours-Afternoon-DDos dataset from the each feature. Descriptive statistics were generated to provide a
CICIDS2017 dataset collection by the Canadian Institute for summary of each feature’s distribution, aiding in identifying
Cybersecurity (CIC). Widely adopted for its realistic relevant features for detecting DDoS patterns.
simulation of modern network conditions, the CICIDS2017 Distribution histograms for key features, such as Flow
dataset is commonly used in network security research and Duration, Total Fwd Packets, and Packet Length Mean,
encompasses various types of attacks, including DDoS. illustrate the spread and concentration of values, helping to
Specifically, this subset consists of labeled network traffic visualize patterns typical of benign and DDoS traffic. These
data collected during simulated DDoS attack scenarios. three features are highlighted as examples of primary
The dataset contains 79 features capturing detailed indicators for DDoS detection; however, similar analyses
network parameters such as packet length, flow duration, flag were conducted for all major features to provide a
counts, and data rates, which serve as inputs for machine comprehensive understanding of the dataset.
learning models to distinguish between benign and malicious
traffic. Key features include Flow Duration, Total Fwd
Packets, Flow IAT Mean, and Packet Length Mean, among
others, which provide crucial insights into traffic patterns
typical of DDoS attacks.
This dataset was selected for its thorough labeling and
comprehensive feature set, enabling robust model training and
evaluation for DDoS attack detection.

B. Data Preprocessing
Data preprocessing is a crucial step to prepare the dataset
for model training. This process involves cleaning the data,
encoding labels, normalization, data exploration, and splitting
the dataset, as described below:

Column Cleaning: Leading and trailing spaces in


column names were removed to standardize feature labels and
avoid errors during data manipulation.
Target Verification: The target classes (Label column)
were inspected to confirm unique values, showing 'BENIGN',
'DDoS', and NaN as possible values. Missing values were
subsequently removed to ensure a clean dataset.
Data Cleaning: Missing values were dropped from the
dataset, resulting in a clean dataset ready for analysis. Non-
numeric columns, such as Label, were converted into
numerical values (BENIGN = 0, DDoS = 1) for model
compatibility.
Class Distribution: The distribution of classes in the
dataset was checked post-cleaning to ensure an adequate
representation of each class for training purposes.

© 2024, IJSREM | www.ijsrem.com | Page 3


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 08 Issue: 11 | Nov - 2024 SJIF Rating: 8.448 ISSN: 2582-3930

Precision and Recall: These metrics evaluate the


classifier’s ability to identify DDoS attacks accurately.
Precision (positive predictive value) represents the
relevance of positive predictions, defined as

Recall (sensitivity) measures how effectively the model


detects DDoS attacks

F1 Score: The harmonic mean of precision and recall,


providing a single balanced metric, especially useful for
imbalanced datasets.

.
ROC-AUC Curve: The Receiver Operating
Data Splitting: The data was split into training and Characteristic (ROC) curve plots the true positive rate against
testing sets using a 70:30 ratio, with the training set used to the false positive rate, with the Area Under the Curve (AUC)
fit models and the test set reserved for evaluating model providing a single score to evaluate overall performance
performance. Features (X) and target (y) were separated across various classification thresholds.
before applying the split, ensuring clear boundaries between Confusion Matrix: A confusion matrix is a table that
predictors and the label. visually represents the classification performance by
summarizing true positives, true negatives, false positives, and
false negatives. It provides insights into each model’s
C. Model Training strengths and weaknesses, showing how well the classifier
To classify DDoS attacks effectively, we trained four distinguishes between DDoS and non-DDoS traffic.
machine learning models, each with unique strengths in
handling binary classification tasks. These models were
selected for their diversity in approach, allowing us to
True Positives (TP): Correctly identified DDoS attacks.
compare their effectiveness in distinguishing between benign
True Negatives (TN): Correctly identified benign traffic.
and malicious traffic. Each model was trained on the False Positives (FP): Benign traffic incorrectly classified as DDoS.
preprocessed training data, and hyperparameter tuning was False Negatives (FN): DDoS traffic incorrectly classified as benign.
applied to maximize accuracy and generalizability.
Each model was evaluated based on the metrics
Random Forest: An ensemble method that builds mentioned above, and results were visualized using ROC-
multiple decision trees and averages their predictions to AUC curves and confusion matrices. The Random Forest and
improve accuracy and reduce overfitting. Gradient Boosting models outperformed Logistic Regression
and Naive Bayes, achieving high precision, recall, and AUC
Logistic Regression: A linear model for binary scores, indicating they are the most suitable for real-time
classification, predicting class probability with a logistic DDoS detection applications.
function.

Gradient Boosting: An iterative method that adds ·


models sequentially to correct previous errors, minimizing a 4. Experiment Results
loss function. The performance of each model was evaluated using
several key metrics: accuracy, precision, recall, F1 score, and
Naive Bayes: A probabilistic model that uses Bayes' ROC-AUC, which collectively provide insights into the
theorem, assuming feature independence. models’ abilities to accurately classify DDoS and benign
traffic. Below is a detailed analysis of each model's
performance and a comparison of their strengths and
D. Model Evaluation and Comparison weaknesses for DDoS detection.
To assess model performance, several evaluation metrics Naive Bayes achieved high recall, effectively identifying
were employed to capture various aspects of classification most DDoS traffic, but had lower precision, leading to more
quality: false positives. This is likely due to its assumption of feature
Accuracy: Measures the proportion of correct predictions independence, which can be limiting in complex data.
out of total predictions. Logistic Regression showed balanced performance with
high recall and good precision, making it a solid baseline for
DDoS detection, though it lags behind ensemble models. It’s
efficient and suitable for low-resource environments.

© 2024, IJSREM | www.ijsrem.com | Page 4


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 08 Issue: 11 | Nov - 2024 SJIF Rating: 8.448 ISSN: 2582-3930

Gradient Boosting excelled with near-perfect results, ROC-AUC Curve:


offering 100% precision and very high recall, minimizing
false positives and negatives. Its iterative approach makes it
ideal for high-accuracy scenarios.
Random Forest’s performance closely matched Gradient
Boosting, with high precision and recall, making it highly
reliable for DDoS detection. Its ensemble of decision trees
prevents overfitting and captures DDoS traffic patterns well.

Comparative Analysis :
The following table summarizes the accuracy, precision,
recall, and F1 score for each model:

Both Gradient Boosting and Random Forest achieved the The evaluation metrics and visual comparisons indicate
highest performance in terms of accuracy and F1 score, with that Gradient Boosting and Random Forest are the most
minimal false positives, making them well-suited for precise effective models for DDoS attack classification, achieving the
and reliable DDoS detection. Naive Bayes, while having a highest accuracy, precision, and F1 scores. These models
high recall, was prone to false positives, whereas Logistic demonstrate minimal false positives and false negatives,
Regression provided a solid balance across metrics. making them reliable candidates for real-time DDoS detection
systems. Naive Bayes and Logistic Regression, while
performing adequately, were outperformed by the ensemble
models, particularly in handling the complexity of DDoS
traffic patterns.

5. Conclusion
In conclusion, this study successfully developed and
evaluated machine learning classifiers to detect DDoS attacks
using the CICIDS2017 dataset, focusing on Random Forest
and Gradient Boosting as top-performing models. These
classifiers demonstrated strong accuracy, precision, and recall,
with minimal false positives, making them suitable for real-
time DDoS detection. By addressing the challenges of high
accuracy and low false positive rates, this approach shows
promise in maintaining the integrity and availability of
network services. Future work could explore deep learning
Confusion Matrices: models and hybrid methods to improve adaptability and
generalization across complex datasets, as well as
optimization algorithms to enhance computational efficiency.
Testing these models in live network environments will be
critical to ensure their practical applicability and robustness
under real-world conditions

© 2024, IJSREM | www.ijsrem.com | Page 5


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 08 Issue: 11 | Nov - 2024 SJIF Rating: 8.448 ISSN: 2582-3930

References

1) DDoS Attacks Detection and Classification Based on


Deep Learning Model - Research Square, Tlemcen
University, 2023
2) An Approach of DDoS Attack Detection Using
Classifiers - Emerging Research in Computing,
Information, Communication, and Applications,
Springer India , 2015
3) Survey and classification of Dos and DDos attack
detection and validation approaches for IoT
environments - Elsevier , 2024
4) Denial of Service Attack Classification Using Machine
Learning with Multi-Features - MDPI , 2022
5) Classification of DDoS attack traffic on SDN network
environment using deep learning - Springer Open, 2024
6) Dataset : https://www.unb.ca/cic/datasets/ids-2017.html

© 2024, IJSREM | www.ijsrem.com | Page 6

You might also like