0% found this document useful (0 votes)
31 views

S Detection Using Machine Learning

The document discusses using machine learning for detecting distributed denial of service (DDoS) attacks. It loads and explores a dataset containing network traffic features and labels, identifies correlated and uninformative features, and prepares the data for analysis and modeling to classify examples as benign or attacks.

Uploaded by

soham pawar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

S Detection Using Machine Learning

The document discusses using machine learning for detecting distributed denial of service (DDoS) attacks. It loads and explores a dataset containing network traffic features and labels, identifies correlated and uninformative features, and prepares the data for analysis and modeling to classify examples as benign or attacks.

Uploaded by

soham pawar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

s-detection-using-machine-learning

March 2, 2024

[1]: import numpy as np


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from sklearn.metrics import accuracy_score, confusion_matrix, accuracy_score,␣
↪precision_recall_curve, roc_curve, auc

from sklearn.compose import ColumnTransformer


from sklearn.pipeline import Pipeline

from sklearn.tree import DecisionTreeClassifier


from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from xgboost import plot_importance

[2]: ddos=pd.read_csv("APA-DDoS-Dataset.csv")

[3]: ddos

[3]: ip.src ip.dst tcp.srcport tcp.dstport ip.proto \


0 192.168.1.1 192.168.23.2 2412 8000 6
1 192.168.1.1 192.168.23.2 2413 8000 6
2 192.168.1.1 192.168.23.2 2414 8000 6
3 192.168.1.1 192.168.23.2 2415 8000 6
4 192.168.1.1 192.168.23.2 2416 8000 6
… … … … … …
151195 192.168.19.1 192.168.23.2 37360 8000 6
151196 192.168.19.1 192.168.23.2 37362 8000 6
151197 192.168.19.1 192.168.23.2 37364 8000 6
151198 192.168.19.1 192.168.23.2 37366 8000 6
151199 192.168.19.1 192.168.23.2 37368 8000 6

frame.len tcp.flags.syn tcp.flags.reset tcp.flags.push \


0 54 0 0 1
1 54 0 0 1

1
2 54 0 0 1
3 54 0 0 1
4 54 0 0 1
… … … … …
151195 66 0 0 0
151196 66 0 0 0
151197 66 0 0 0
151198 66 0 0 0
151199 66 0 0 0

tcp.flags.ack … tcp.seq tcp.ack \


0 1 … 1 1
1 1 … 1 1
2 1 … 1 1
3 1 … 1 1
4 1 … 1 1
… … … … …
151195 1 … 1 1
151196 1 … 1 1
151197 1 … 1 1
151198 1 … 1 1
151199 1 … 1 1

frame.time Packets Bytes \


0 16-Jun 2020 20:18:15.071112000 Mountain Dayli… 8 432
1 16-Jun 2020 20:18:15.071138000 Mountain Dayli… 10 540
2 16-Jun 2020 20:18:15.071146000 Mountain Dayli… 12 648
3 16-Jun 2020 20:18:15.071152000 Mountain Dayli… 10 540
4 16-Jun 2020 20:18:15.071159000 Mountain Dayli… 6 324
… … … …
151195 16-Jun 2020 22:10:46.923006000 Mountain Dayli… 10 1146
151196 16-Jun 2020 22:10:46.935672000 Mountain Dayli… 10 1151
151197 16-Jun 2020 22:10:46.957469000 Mountain Dayli… 10 1144
151198 16-Jun 2020 22:10:46.970971000 Mountain Dayli… 10 1175
151199 16-Jun 2020 22:10:46.984798000 Mountain Dayli… 10 1146

Tx Packets Tx Bytes Rx Packets Rx Bytes Label


0 4 216 4 216 DDoS-PSH-ACK
1 5 270 5 270 DDoS-PSH-ACK
2 6 324 6 324 DDoS-PSH-ACK
3 5 270 5 270 DDoS-PSH-ACK
4 3 162 3 162 DDoS-PSH-ACK
… … … … … …
151195 6 560 4 586 Benign
151196 6 560 4 591 Benign
151197 6 560 4 584 Benign
151198 6 560 4 615 Benign

2
151199 6 560 4 586 Benign

[151200 rows x 23 columns]

[5]: ddos.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 151200 entries, 0 to 151199

Data columns (total 23 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 ip.src 151200 non-null object

1 ip.dst 151200 non-null object

2 tcp.srcport 151200 non-null int64

3 tcp.dstport 151200 non-null int64

4 ip.proto 151200 non-null int64

5 frame.len 151200 non-null int64

6 tcp.flags.syn 151200 non-null int64

7 tcp.flags.reset 151200 non-null int64

8 tcp.flags.push 151200 non-null int64

9 tcp.flags.ack 151200 non-null int64

10 ip.flags.mf 151200 non-null int64

11 ip.flags.df 151200 non-null int64

12 ip.flags.rb 151200 non-null int64

13 tcp.seq 151200 non-null int64

14 tcp.ack 151200 non-null int64

15 frame.time 151200 non-null object

3
16 Packets 151200 non-null int64

17 Bytes 151200 non-null int64

18 Tx Packets 151200 non-null int64

19 Tx Bytes 151200 non-null int64

20 Rx Packets 151200 non-null int64

21 Rx Bytes 151200 non-null int64

22 Label 151200 non-null object

dtypes: int64(19), object(4)

memory usage: 26.5+ MB

[6]: ddos.isna().sum()

[6]: ip.src 0
ip.dst 0
tcp.srcport 0
tcp.dstport 0
ip.proto 0
frame.len 0
tcp.flags.syn 0
tcp.flags.reset 0
tcp.flags.push 0
tcp.flags.ack 0
ip.flags.mf 0
ip.flags.df 0
ip.flags.rb 0
tcp.seq 0
tcp.ack 0
frame.time 0
Packets 0
Bytes 0
Tx Packets 0
Tx Bytes 0
Rx Packets 0
Rx Bytes 0
Label 0
dtype: int64

[7]: ddos.duplicated().sum()

4
[7]: 0

There are no duplicates or nulls that needs to be dropped, so we can proceed into our analysis.
[8]: ddos.groupby('Label').size()

[8]: Label
Benign 75600
DDoS-ACK 37800
DDoS-PSH-ACK 37800
dtype: int64

We have 75600 Benign, and 75600 DDOS attacks

0.1 Exploring Relations between features


[10]: sns.pairplot(ddos, hue = 'Label', size = 2, diag_kind = 'kde')
plt.show()

/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:2095: UserWarning:
The `size` parameter has been renamed to `height`; please update your code.

warnings.warn(msg, UserWarning)

5
By observing the pairplot generated, we can notice too many features that have just a single value
across the coloumn, they can be dropped, we can notice which ones exactly
[16]: numeric_data = ddos.select_dtypes(include='number')# select only the columns in␣
↪the DataFrame data that have numeric (number) data

correlation_matrix = numeric_data.corr()
fig, ax = plt.subplots(figsize=(15,8))
sns.heatmap(correlation_matrix, annot=True,ax=ax, cmap="RdPu")
plt.title('Correlation Between the Variables')
#plt.xticks(rotation=45);
plt.show()

6
[9]: columns_to_drop = ['tcp.dstport', 'ip.proto', 'tcp.flags.syn', 'tcp.flags.
↪reset', 'tcp.flags.ack', 'ip.flags.mf', 'ip.flags.rb', 'tcp.seq', 'tcp.ack']

ddos_new= ddos.drop(columns=columns_to_drop).copy()
ddos_new

[9]: ip.src ip.dst tcp.srcport frame.len tcp.flags.push \


0 192.168.1.1 192.168.23.2 2412 54 1
1 192.168.1.1 192.168.23.2 2413 54 1
2 192.168.1.1 192.168.23.2 2414 54 1
3 192.168.1.1 192.168.23.2 2415 54 1
4 192.168.1.1 192.168.23.2 2416 54 1
… … … … … …
151195 192.168.19.1 192.168.23.2 37360 66 0
151196 192.168.19.1 192.168.23.2 37362 66 0
151197 192.168.19.1 192.168.23.2 37364 66 0
151198 192.168.19.1 192.168.23.2 37366 66 0
151199 192.168.19.1 192.168.23.2 37368 66 0

ip.flags.df frame.time \
0 0 16-Jun 2020 20:18:15.071112000 Mountain Dayli…
1 0 16-Jun 2020 20:18:15.071138000 Mountain Dayli…
2 0 16-Jun 2020 20:18:15.071146000 Mountain Dayli…
3 0 16-Jun 2020 20:18:15.071152000 Mountain Dayli…
4 0 16-Jun 2020 20:18:15.071159000 Mountain Dayli…

7
… … …
151195 1 16-Jun 2020 22:10:46.923006000 Mountain Dayli…
151196 1 16-Jun 2020 22:10:46.935672000 Mountain Dayli…
151197 1 16-Jun 2020 22:10:46.957469000 Mountain Dayli…
151198 1 16-Jun 2020 22:10:46.970971000 Mountain Dayli…
151199 1 16-Jun 2020 22:10:46.984798000 Mountain Dayli…

Packets Bytes Tx Packets Tx Bytes Rx Packets Rx Bytes \


0 8 432 4 216 4 216
1 10 540 5 270 5 270
2 12 648 6 324 6 324
3 10 540 5 270 5 270
4 6 324 3 162 3 162
… … … … … … …
151195 10 1146 6 560 4 586
151196 10 1151 6 560 4 591
151197 10 1144 6 560 4 584
151198 10 1175 6 560 4 615
151199 10 1146 6 560 4 586

Label
0 DDoS-PSH-ACK
1 DDoS-PSH-ACK
2 DDoS-PSH-ACK
3 DDoS-PSH-ACK
4 DDoS-PSH-ACK
… …
151195 Benign
151196 Benign
151197 Benign
151198 Benign
151199 Benign

[151200 rows x 14 columns]

We don’t need the frame.time as well


[10]: ddos_new= ddos_new.drop(columns=['frame.time'])
ddos_new

[10]: ip.src ip.dst tcp.srcport frame.len tcp.flags.push \


0 192.168.1.1 192.168.23.2 2412 54 1
1 192.168.1.1 192.168.23.2 2413 54 1
2 192.168.1.1 192.168.23.2 2414 54 1
3 192.168.1.1 192.168.23.2 2415 54 1
4 192.168.1.1 192.168.23.2 2416 54 1
… … … … … …

8
151195 192.168.19.1 192.168.23.2 37360 66 0
151196 192.168.19.1 192.168.23.2 37362 66 0
151197 192.168.19.1 192.168.23.2 37364 66 0
151198 192.168.19.1 192.168.23.2 37366 66 0
151199 192.168.19.1 192.168.23.2 37368 66 0

ip.flags.df Packets Bytes Tx Packets Tx Bytes Rx Packets \


0 0 8 432 4 216 4
1 0 10 540 5 270 5
2 0 12 648 6 324 6
3 0 10 540 5 270 5
4 0 6 324 3 162 3
… … … … … … …
151195 1 10 1146 6 560 4
151196 1 10 1151 6 560 4
151197 1 10 1144 6 560 4
151198 1 10 1175 6 560 4
151199 1 10 1146 6 560 4

Rx Bytes Label
0 216 DDoS-PSH-ACK
1 270 DDoS-PSH-ACK
2 324 DDoS-PSH-ACK
3 270 DDoS-PSH-ACK
4 162 DDoS-PSH-ACK
… … …
151195 586 Benign
151196 591 Benign
151197 584 Benign
151198 615 Benign
151199 586 Benign

[151200 rows x 13 columns]

0.2 Preparing the Data


[11]: # Assuming your DataFrame is named df
ddos_new['Label_new'] = ddos_new['Label'].apply(lambda x: 'Benign' if x ==␣
↪'Benign' else 'DDoS')

ddos_new.drop(columns=['Label'], inplace=True)
ddos_new.rename(columns={'Label_new': 'Label'}, inplace=True)
ddos_new

[11]: ip.src ip.dst tcp.srcport frame.len tcp.flags.push \


0 192.168.1.1 192.168.23.2 2412 54 1
1 192.168.1.1 192.168.23.2 2413 54 1
2 192.168.1.1 192.168.23.2 2414 54 1

9
3 192.168.1.1 192.168.23.2 2415 54 1
4 192.168.1.1 192.168.23.2 2416 54 1
… … … … … …
151195 192.168.19.1 192.168.23.2 37360 66 0
151196 192.168.19.1 192.168.23.2 37362 66 0
151197 192.168.19.1 192.168.23.2 37364 66 0
151198 192.168.19.1 192.168.23.2 37366 66 0
151199 192.168.19.1 192.168.23.2 37368 66 0

ip.flags.df Packets Bytes Tx Packets Tx Bytes Rx Packets \


0 0 8 432 4 216 4
1 0 10 540 5 270 5
2 0 12 648 6 324 6
3 0 10 540 5 270 5
4 0 6 324 3 162 3
… … … … … … …
151195 1 10 1146 6 560 4
151196 1 10 1151 6 560 4
151197 1 10 1144 6 560 4
151198 1 10 1175 6 560 4
151199 1 10 1146 6 560 4

Rx Bytes Label
0 216 DDoS
1 270 DDoS
2 324 DDoS
3 270 DDoS
4 162 DDoS
… … …
151195 586 Benign
151196 591 Benign
151197 584 Benign
151198 615 Benign
151199 586 Benign

[151200 rows x 13 columns]

[12]: y = ddos_new['Label']
y

[12]: 0 DDoS
1 DDoS
2 DDoS
3 DDoS
4 DDoS

151195 Benign

10
151196 Benign
151197 Benign
151198 Benign
151199 Benign
Name: Label, Length: 151200, dtype: object

[13]: label_encoder = LabelEncoder()


y = label_encoder.fit_transform(y)

[14]: y

[14]: array([1, 1, 1, …, 0, 0, 0])

There are many ip addresses so we need to encode them using one hot encoding. There is no
ordinality so using label encoding for would be biases.
[15]: X = ddos_new.drop(columns=['Label']).copy()

categorical_columns = ['ip.src', 'ip.dst']# Select categorical columns for␣


↪one-hot encoding

# Create a ColumnTransformer
preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(sparse=False, handle_unknown='ignore'),␣
↪categorical_columns)

],
remainder='passthrough'
)

pipeline = Pipeline(steps=[('preprocessor', preprocessor)])# Create a pipeline

X_encoded = pipeline.fit_transform(X)# Fit and transform

# Get the column names after encoding


encoded_column_names = pipeline.named_steps['preprocessor'].
↪named_transformers_['cat'].get_feature_names_out(categorical_columns)

column_names = list(encoded_column_names) + list(X.columns.


↪difference(categorical_columns))

X = pd.DataFrame(X_encoded, columns=column_names)

/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py:975:

11
FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will
be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its
default value.

warnings.warn(

[16]: X

[16]: ip.src_192.168.1.1 ip.src_192.168.10.1 ip.src_192.168.11.1 \


0 1.0 0.0 0.0
1 1.0 0.0 0.0
2 1.0 0.0 0.0
3 1.0 0.0 0.0
4 1.0 0.0 0.0
… … … …
151195 0.0 0.0 0.0
151196 0.0 0.0 0.0
151197 0.0 0.0 0.0
151198 0.0 0.0 0.0
151199 0.0 0.0 0.0

ip.src_192.168.13.1 ip.src_192.168.14.1 ip.src_192.168.16.1 \


0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
… … … …
151195 0.0 0.0 0.0
151196 0.0 0.0 0.0
151197 0.0 0.0 0.0
151198 0.0 0.0 0.0
151199 0.0 0.0 0.0

ip.src_192.168.17.1 ip.src_192.168.19.1 ip.src_192.168.2.1 \


0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
… … … …
151195 0.0 1.0 0.0
151196 0.0 1.0 0.0
151197 0.0 1.0 0.0
151198 0.0 1.0 0.0
151199 0.0 1.0 0.0

12
ip.src_192.168.20.1 … Bytes Packets Rx Bytes Rx Packets \
0 0.0 … 2412.0 54.0 1.0 0.0
1 0.0 … 2413.0 54.0 1.0 0.0
2 0.0 … 2414.0 54.0 1.0 0.0
3 0.0 … 2415.0 54.0 1.0 0.0
4 0.0 … 2416.0 54.0 1.0 0.0
… … … … … … …
151195 0.0 … 37360.0 66.0 0.0 1.0
151196 0.0 … 37362.0 66.0 0.0 1.0
151197 0.0 … 37364.0 66.0 0.0 1.0
151198 0.0 … 37366.0 66.0 0.0 1.0
151199 0.0 … 37368.0 66.0 0.0 1.0

Tx Bytes Tx Packets frame.len ip.flags.df tcp.flags.push \


0 8.0 432.0 4.0 216.0 4.0
1 10.0 540.0 5.0 270.0 5.0
2 12.0 648.0 6.0 324.0 6.0
3 10.0 540.0 5.0 270.0 5.0
4 6.0 324.0 3.0 162.0 3.0
… … … … … …
151195 10.0 1146.0 6.0 560.0 4.0
151196 10.0 1151.0 6.0 560.0 4.0
151197 10.0 1144.0 6.0 560.0 4.0
151198 10.0 1175.0 6.0 560.0 4.0
151199 10.0 1146.0 6.0 560.0 4.0

tcp.srcport
0 216.0
1 270.0
2 324.0
3 270.0
4 162.0
… …
151195 586.0
151196 591.0
151197 584.0
151198 615.0
151199 586.0

[151200 rows x 25 columns]

[17]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣


↪random_state=42)# Split the data into training and testing sets

[18]: X_train

13
[18]: ip.src_192.168.1.1 ip.src_192.168.10.1 ip.src_192.168.11.1 \
39462 0.0 0.0 0.0
86399 0.0 0.0 0.0
46424 0.0 0.0 0.0
123679 0.0 0.0 0.0
23643 0.0 0.0 0.0
… … … …
119879 0.0 0.0 0.0
103694 0.0 0.0 1.0
131932 0.0 0.0 0.0
146867 0.0 0.0 0.0
121958 0.0 0.0 0.0

ip.src_192.168.13.1 ip.src_192.168.14.1 ip.src_192.168.16.1 \


39462 0.0 0.0 0.0
86399 0.0 0.0 0.0
46424 0.0 0.0 0.0
123679 0.0 0.0 0.0
23643 0.0 1.0 0.0
… … … …
119879 0.0 0.0 0.0
103694 0.0 0.0 0.0
131932 0.0 0.0 1.0
146867 0.0 0.0 0.0
121958 0.0 0.0 0.0

ip.src_192.168.17.1 ip.src_192.168.19.1 ip.src_192.168.2.1 \


39462 0.0 0.0 1.0
86399 0.0 0.0 0.0
46424 0.0 0.0 0.0
123679 0.0 0.0 0.0
23643 0.0 0.0 0.0
… … … …
119879 0.0 0.0 0.0
103694 0.0 0.0 0.0
131932 0.0 0.0 0.0
146867 0.0 1.0 0.0
121958 0.0 0.0 0.0

ip.src_192.168.20.1 … Bytes Packets Rx Bytes Rx Packets \


39462 0.0 … 34562.0 223.0 1.0 1.0
86399 0.0 … 13220.0 54.0 0.0 0.0
46424 0.0 … 40372.0 223.0 1.0 1.0
123679 0.0 … 54482.0 66.0 0.0 1.0
23643 0.0 … 5706.0 54.0 1.0 0.0
… … … … … … …
119879 0.0 … 46882.0 66.0 0.0 1.0

14
103694 0.0 … 6153.0 54.0 0.0 0.0
131932 0.0 … 46358.0 66.0 0.0 1.0
146867 0.0 … 56936.0 66.0 0.0 1.0
121958 0.0 … 51040.0 66.0 0.0 1.0

Tx Bytes Tx Packets frame.len ip.flags.df tcp.flags.push \


39462 10.0 1229.0 6.0 561.0 4.0
86399 10.0 540.0 5.0 270.0 5.0
46424 10.0 1229.0 6.0 561.0 4.0
123679 10.0 1170.0 6.0 560.0 4.0
23643 8.0 432.0 4.0 216.0 4.0
… … … … … …
119879 10.0 1151.0 6.0 560.0 4.0
103694 10.0 540.0 5.0 270.0 5.0
131932 10.0 1229.0 6.0 561.0 4.0
146867 10.0 1151.0 6.0 560.0 4.0
121958 10.0 1175.0 6.0 560.0 4.0

tcp.srcport
39462 668.0
86399 270.0
46424 668.0
123679 610.0
23643 216.0
… …
119879 591.0
103694 270.0
131932 668.0
146867 591.0
121958 615.0

[120960 rows x 25 columns]

[19]: y_train

[19]: array([0, 1, 0, …, 0, 0, 0])

[20]: class_counts = np.bincount(y_train)


print(f'Count for class 0 (Benign): {class_counts[0]}')
print(f'Count for class 1 (DDoS): {class_counts[1]}')

Count for class 0 (Benign): 60431

Count for class 1 (DDoS): 60529

15
0.3 Building the Model
[21]: X_train

[21]: ip.src_192.168.1.1 ip.src_192.168.10.1 ip.src_192.168.11.1 \


39462 0.0 0.0 0.0
86399 0.0 0.0 0.0
46424 0.0 0.0 0.0
123679 0.0 0.0 0.0
23643 0.0 0.0 0.0
… … … …
119879 0.0 0.0 0.0
103694 0.0 0.0 1.0
131932 0.0 0.0 0.0
146867 0.0 0.0 0.0
121958 0.0 0.0 0.0

ip.src_192.168.13.1 ip.src_192.168.14.1 ip.src_192.168.16.1 \


39462 0.0 0.0 0.0
86399 0.0 0.0 0.0
46424 0.0 0.0 0.0
123679 0.0 0.0 0.0
23643 0.0 1.0 0.0
… … … …
119879 0.0 0.0 0.0
103694 0.0 0.0 0.0
131932 0.0 0.0 1.0
146867 0.0 0.0 0.0
121958 0.0 0.0 0.0

ip.src_192.168.17.1 ip.src_192.168.19.1 ip.src_192.168.2.1 \


39462 0.0 0.0 1.0
86399 0.0 0.0 0.0
46424 0.0 0.0 0.0
123679 0.0 0.0 0.0
23643 0.0 0.0 0.0
… … … …
119879 0.0 0.0 0.0
103694 0.0 0.0 0.0
131932 0.0 0.0 0.0
146867 0.0 1.0 0.0
121958 0.0 0.0 0.0

ip.src_192.168.20.1 … Bytes Packets Rx Bytes Rx Packets \


39462 0.0 … 34562.0 223.0 1.0 1.0
86399 0.0 … 13220.0 54.0 0.0 0.0
46424 0.0 … 40372.0 223.0 1.0 1.0

16
123679 0.0 … 54482.0 66.0 0.0 1.0
23643 0.0 … 5706.0 54.0 1.0 0.0
… … … … … … …
119879 0.0 … 46882.0 66.0 0.0 1.0
103694 0.0 … 6153.0 54.0 0.0 0.0
131932 0.0 … 46358.0 66.0 0.0 1.0
146867 0.0 … 56936.0 66.0 0.0 1.0
121958 0.0 … 51040.0 66.0 0.0 1.0

Tx Bytes Tx Packets frame.len ip.flags.df tcp.flags.push \


39462 10.0 1229.0 6.0 561.0 4.0
86399 10.0 540.0 5.0 270.0 5.0
46424 10.0 1229.0 6.0 561.0 4.0
123679 10.0 1170.0 6.0 560.0 4.0
23643 8.0 432.0 4.0 216.0 4.0
… … … … … …
119879 10.0 1151.0 6.0 560.0 4.0
103694 10.0 540.0 5.0 270.0 5.0
131932 10.0 1229.0 6.0 561.0 4.0
146867 10.0 1151.0 6.0 560.0 4.0
121958 10.0 1175.0 6.0 560.0 4.0

tcp.srcport
39462 668.0
86399 270.0
46424 668.0
123679 610.0
23643 216.0
… …
119879 591.0
103694 270.0
131932 668.0
146867 591.0
121958 615.0

[120960 rows x 25 columns]

0.4 Decision Tree


[43]: decision_tree_model = DecisionTreeClassifier()
decision_tree_model.fit(X_train, y_train)
y_pred_decision_tree = decision_tree_model.predict(X_test)
accuracy_decision_tree = accuracy_score(y_test, y_pred_decision_tree)
print(f"Decision Tree Accuracy: {accuracy_decision_tree * 100:.2f}%")

Decision Tree Accuracy: 100.00%

17
[44]: cm = confusion_matrix(y_test, y_pred_decision_tree)
class_labels = ["Benign", "DDoS"]
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", cbar=False,␣
↪xticklabels=class_labels, yticklabels=class_labels)

plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

0.5 Random Forest


[45]: rf_model = RandomForestClassifier()
rf_model.fit(X_train, y_train)

y_pred = rf_model.predict(X_test)# predict

accuracy = accuracy_score(y_test, y_pred)


print(f"Accuracy: {accuracy * 100:.2f}%")

# Plot precision-recall curve


fig, ax = plt.subplots(figsize=(8, 8))

18
precision, recall, _ = precision_recall_curve(y_test, rf_model.
↪predict_proba(X_test)[:, 1])

area = auc(recall, precision)

plt.plot(recall, precision, label=f'Precision-Recall curve (area = {area:.2f})')


plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc='best')
plt.show()

# Plot F1 score
fig, ax = plt.subplots(figsize=(8, 8))
f1 = 2 * (precision * recall) / (precision + recall)
plt.plot(recall, f1, label='F1 Score')
plt.xlabel('Recall')
plt.ylabel('F1 Score')
plt.title('F1 Score Curve')
plt.legend(loc='best')
plt.show()

Accuracy: 100.00%

19
20
0.6 XGBOOST
[47]: xgb_model = XGBClassifier()

xgb_model.fit(X_train, y_train)
y_pred = xgb_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 100.00%

21
[34]: plot_importance(xgb_model)
plt.show()

[49]: cm = confusion_matrix(y_test, y_pred)


class_labels = ["Benign", "DDoS"]
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", cbar=False,␣
↪xticklabels=class_labels, yticklabels=class_labels)

plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

22
[51]: y_prob = xgb_model.predict_proba(X_test)[:, 1]

fpr, tpr, thresholds = roc_curve(y_test, y_prob)


roc_auc = auc(fpr, tpr)

plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.2f}')


plt.plot([0, 1], [0, 1], linestyle='--', color='gray')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
plt.show()

23
24

You might also like