ML Classifier To Detect Network Attacks in IoT

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

NetSim IoT

ML Driven Classifiers for Attack Detection in


RPL-Based IoT Networks

29Aug2024

Applicable Release: NetSim v14.1 or higher


Applicable Version(s): NetSim Standard
Project download link: https://github.com/NetSim-TETCOS/ML-Classifier-to-detect-network-attack-in-IoT-
v14.1/archive/refs/heads/main.zip
The URL has the exported NetSim scenario for the examples used in this document and the python scripts to run classifiers.
Outline
• Introduction to RPL protocol
• Objective function and Link quality
• Rank Calculations in NetSim
• Rank attack in RPL using NetSim
• Attack scenarios with malicious nodes - Training data
• Attack scenarios with 2, 4, 5, 6, 8, 10, 12 and 14 malicious nodes
• Data processing
• Feature visualization
• Attack scenarios with malicious nodes - Test data
• Attack scenarios with 3, 7, 9, 11, 13, and 15 malicious nodes
• Data processing
• Feature visualization
• Classification
• Detection of malicious nodes using ML based classifiers
• Confusion Matrix: Accuracy, Precision, F1 Score, Recall
• Comparison between different classifiers: Logistic Regression, Naïve Bayes, KNN, Support Vector Machine
1
Introduction to RPL Protocol
• RPL: Routing Protocol for Low-Power and Lossy Networks.
• Purpose: Designed for IPv6-based routing in Low-Power and Lossy
Networks (LLNs)
• Key concept: Constructs a Directed Acyclic Graph (DAG) rooted at the sink
• Goal: Minimize the cost of reaching the sink from any node based on the
Objective Function (OF)
• Key Terminology:
• DAG (Directed Acyclic Graph): A directed graph without cycles
• DAG root: Node with o outgoing edges
• DODAG ID: Unique IPv6 ID assigned to the root
• Rank: Defines node positions relative to the DODAG root Control messages in RPL
• RPL implementation in NetSim is based on RFC 6550.

RFC Reference: https://www.rfc-editor.org/rfc/rfc6550


2
Objective function and Link quality
• Objective Function (OF): Determines route prioritization
• NetSim implementation: OF prioritizes routes with the best link quality
• Link quality depends on:
• Received power
• Receiver sensitivity of nodes
• Link Quality Calculation in NetSim
𝑝
• Calculate in both direction 1 −
𝑟𝑠

• 𝑝 is the received power (dBm)


• 𝑟𝑠 is the receiver sensitivity (dBm)
• Denote as Transmit link quality, 𝑇𝐿𝑞 and receive link quality 𝑅𝐿𝑞.
𝑇𝐿𝑞+𝑅𝐿𝑞
• Final link quality: 𝐿𝑞 =
2

3
Rank calculation in NetSim
• Rank: Scalar representation of node location within DODAG
• Purpose:
• Measure distance from root
• Avoid and detect loops
• Root node always has Rank 1 (also the border router in IoT)
• The rank calculation is based on the objective function defined.
• Rank Increase Formula:
2
𝑅𝐼 = 𝑀𝑎𝑥𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡 − 𝑀𝑖𝑛𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡 × 1 − 𝐿𝑞 + 𝑀𝑖𝑛𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡
𝑅𝑎𝑛𝑘 = 𝑅𝐼 + 𝑅𝑎𝑛𝑘(𝑃𝑎𝑟𝑒𝑛𝑡)

Where: DODAG with Node Ranks in an IoT Network

• 𝑅𝐼 is the Rank Increase


• MaxIncrement = 16
• MinIncrement = 1 as per RFC 6550
• 𝐿𝑞 is the Link quality
4
Example calculation for better understanding
• Rank of the root node: 1,

• Parent node of S2 is the root node.

• The received power at S2 can be calculated using the following formula,

𝜆 𝑑0
𝑃𝑟 𝑑𝐵𝑚 = 𝑃𝑡 + 𝐺𝑡 + 𝐺𝑟 + 20 log10 + 10 × 𝜂 × log10
4𝜋𝑑0 𝑑

0.125 8
𝑃𝑟 𝑑𝐵𝑚 = 1 + 0 + 0 + 20 log10 + 10 × 3 × log10
4 × 3.14 × 8 48.55

𝑃𝑟 𝑑𝐵𝑚 = −81.86 The network topology in IoT using RPL Protocol,


Pathloss Model: Log Distance, Pathloss Exponent = 3
where, 𝑃𝑡 = 1𝑚𝑊 , 𝑑 is the distance between s2 and root node, and is Transmit power = 1mW, Receiver Sensitivity = -85 dBm

𝑐
equal to 48.55m,𝑑0 = 8, 𝐺𝑡 = 0, 𝐺𝑟 = 0, 𝜂 = 3, 𝜆 = = 0.125𝑚, 𝑓 =
𝑓
2400𝑀𝐻𝑧

𝑃𝑡 −81.86
• One way link quality 𝐿𝑞 = 1 − = 1− = 1 − 0. 963 = 0.036
𝑟𝑠 −85

5
Rank calculations in NetSim

𝑇𝐿𝑞 + 𝑅𝐿𝑞 0.036 + 0.036


𝐿𝑞 = = = 0.036
2 2
2
𝑅𝑎𝑛𝑘𝐼𝑛𝑐𝑟𝑒𝑎𝑠𝑒 = 𝐹𝑙𝑜𝑜𝑟( 𝑀𝑎𝑥𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡 − 𝑀𝑖𝑛𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡 × 1 − 𝐿𝑞 + 𝑀𝑖𝑛𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡)

where, Lq = 0.036, MaxIncrement = 16, and MinIncrement = 1


2
𝑅𝑎𝑛𝑘𝐼𝑛𝑐𝑟𝑒𝑎𝑠𝑒 = 𝐹𝑙𝑜𝑜𝑟( 16 − 1 × 1 − 0.036 +1) = (15 × 0.929 + 1) = 𝑓𝑙𝑜𝑜𝑟(14.939) = 14
𝑅𝑎𝑛𝑘 = 𝑅𝑎𝑛𝑘𝐼𝑛𝑐𝑟𝑒𝑎𝑠𝑒 + 𝑅𝑎𝑛𝑘 (𝑃𝑎𝑟𝑒𝑛𝑡)
𝑅𝑎𝑛𝑘 = 14 + 1 = 15

• The rank of S2 is 15. We next calculate the rank for S1

• The received power at S1 can be calculated using the following formula,

0.125 8
𝑃𝑟 𝑑𝐵𝑚 = 1 + 0 + 0 + 20 log10 + 10 × 3 × log10 = −73.35
4 × 3.14 × 8 27.85

𝑐
where 𝑑 is the distance between s2 and root node, 𝑑0 = 8, 𝐺𝑡 , 𝐺𝑟 = 0, 𝜂 = 3, 𝜆 = 𝑓 = 0.125𝑚, 𝑓 = 2400𝑀𝐻𝑧

−73.35
𝐿𝑖𝑛𝑘 𝑞𝑢𝑎𝑙𝑖𝑡𝑦 = 1 − = 1 − 0.862 = 0.137
−85

6
Rank calculations in NetSim

𝑇𝐿𝑞 + 𝑅𝐿𝑞 0.137 + 0.137


𝐿𝑞 = = = 0.137
2 2
2
𝑅𝑎𝑛𝑘𝐼𝑛𝑐𝑟𝑒𝑎𝑠𝑒 = 𝐹𝑙𝑜𝑜𝑟( 16 − 1 × 1 − 0.137 +1)
= (15 × 0.744 + 1) = 𝑓𝑙𝑜𝑜𝑟(12.16) = 12

• In this case, the parent of S2 is S1, and the rank of S2 is 15

𝑅𝑎𝑛𝑘 = 𝑅𝑎𝑛𝑘𝐼𝑛𝑐𝑟𝑒𝑎𝑠𝑒 + 𝑅𝑎𝑛𝑘 (𝑃𝑎𝑟𝑒𝑛𝑡) = 12 + 15 = 27

• The Rank of S1 is 27.

• Similarly, the rank for other nodes will be calculated. The rank
of a node in NetSim can be observed through the DODAG
Visualizer.
DODAG visualizer showing information about
• We see that the Ranks of S4, S3, S5 are 15, 28, 27 rank and parent relationships
respectively

7
Rank attack in RPL using NetSim
• Normal RPL process:
• Transmitter broadcasts DIO during DODAG formation
• Receiver updates parent list, sibling list, and rank
• Receiver sends DAO message with route information
• Malicious node behavior: All nodes in the network choose their parent
based on link quality
• Receives DIO but doesn't update its rank
• Advertises a fake (lower) rank
• Other nodes update their rank based on this fake information
• Attack impact:
• Nodes choose malicious node as preferred parent due to lower
rank
• Malicious node drops packets instead of forwarding
• Result: Zero network throughput Nodes 3, 4, and 5 choose parent as node 1 due
to its lower rank

Rehman et al., “Rank Attack using Objective Function in RPL for Low Power and Lossy Networks,” 2016, IEEE International Conference on
Industrial Informatics and Computer Systems (CIICS).
8
Rank attack in RPL using NetSim
• Consider the scenario shown. The root node(LOWPAN Gateway) has rank
1. It sends DIO messages to Sensor 5 and Sensor 7, which are within its
range.
• Both Sensor 5 and Sensor 7 recognize the DODAG ID of the root node.
They identify the root node as their parent.
• After this, Sensor 5 and Sensor 7 transmit DAO messages to the root node.
These DAO messages help to propagate destination information upward
along the DODAG. Sensor 5 then updates its rank and broadcasts DIO
messages.
• However, Sensor 7 is a malicious node. It also updates its rank but The network topology in IoT using RPL Protocol,
advertises a fake, lower rank after receiving the DIO message from the root Pathloss Model: Log Distance, Pathloss Exponent: 2
node.
• Sensors 6 and 4 receive DIO messages from both Sensor 5 and Sensor 7.
Due to Sensor 7's falsely advertised lower rank, Sensors 6 and 4 choose
Sensor 7 as their preferred parent.
• After selecting Sensor 7 as their parent, Sensors 6 and 4 send DAO
messages and data packets to Sensor 7.But instead of forwarding the data
packets, Sensor 7 drops them.

9
Rank attack in RPL using NetSim
• The results can be observed in the Results window, showing
that the network has zero throughput.
• Users can also observe Packet trace that after Sensor 7
receives packets, it does not forward them, resulting in no
data packet transmission from Sensor 7.
• Additionally, users can generate the DODAG visualizer using
Python and MATLAB utilities. In the DODAG, it can be
observed that Sensor 6 and Sensor 4 have chosen Sensor 7
as their parent. The packet trace shows that packets from Sensor-4 and
Sensor-6 are received by Sensor-7, but Sensor-7 is not
transmitting packets.

Throughput for the two applications is zero because the malicious sensor is
collecting all the packets

The DODAG visualizer shows that Sensor-6 and Sensor-4 are


choosing Sensor-7 as a parent node. 10
Training

11
Attack scenarios - Training data generation
• Created 8 scenarios with varying node counts (6 to 39)

• Malicious node count: 2, 4, 5, 6, 8, 10, 12, and 14

• Simulations run with 3 random seeds for each scenario

• Enabled packet trace for all scenarios

• Used a python script to calculate the number of DAO, DIO,


and data packets received by each sensor from packet trace.

• Feature Extraction
o DAO Sent
o DAO Received
o DIO Sent
o DIO Received The network topology in IoT using RPL Protocol with 2
malicious nodes
o Data Packets Received

12
Data processing and Feature Visualization

• Data extraction from packet trace to Excel using


Python script
• Total dataset: 534 sensors, 5 features each
• Feature normalization process:
• Calculate max value for each feature across
all sensors
• Divide each sensor's value by the max to get
0-1 range
• Manual labeling: 1 for non-malicious, 0 for
malicious
We label the sensors based on the features

13
Feature visualization: 2 malicious nodes; 3 random seeds

14
Feature visualization: 4 malicious nodes; 3 random seeds

15
Feature visualization: 5 malicious nodes; 3 random seeds

16
Feature visualization: 6 malicious nodes; 3 random seeds

17
Feature visualization: 8 malicious nodes; 3 random seeds

18
Feature visualization: 10 malicious nodes; 3 random seeds

19
Feature visualization: 12 malicious nodes; 3 random seeds

20
Classifier Training

Features data was used to train the following classifiers:


• K-Nearest Neighbor
• Naive Bayes
• Support Vector Machine
• Logistic Regression

21
Inference

22
Inference and Test Scenarios

• Created 6 new scenarios with different node counts (9 to 42)


• Malicious node count: 3, 7, 9, 11, 13, and 15
• Simulations run with 3 random seeds for each scenario

The network topology in IoT using RPL Protocol includes 3 malicious nodes
23
Feature visualization: 3 malicious nodes; 3 random seeds

24
Feature visualization: 7 malicious nodes; 3 random seeds

25
Feature visualization: 9 malicious nodes; 3 random seeds

26
Feature visualization: 11 malicious nodes; 3 random seeds

27
Feature visualization: 13 malicious nodes; 3 random seeds

28
Feature visualization: 14 malicious nodes; 3 random seeds

29
Feature visualization: 15 malicious nodes; 3 random seeds

30
Confusion matrix
• Confusion matrix summarizes the performance of a machine learning model on a set of test data.
• Displays the number of accurate and inaccurate instances based on the model’s predictions.
• Used to measure the performance of classification models
• Confusion matrix components:
• True Positive (TP): Predicted as positive, and it actually is positive.
• True Negative (TN): Predicted as negative, and it actually is negative.
• False Positive (FP): Predicted as positive, but it is actually negative.
• False Negative (FN): Predicted as negative, but it is actually positive
• Performance metrics:
• Accuracy: The overall correct predictions (TP + TN) divided by the total number of instances.
• Precision: The number of true positives divided by the total number of predicted positives (TP + FP).
• Recall: The number of true positives divided by the total number of actual positives (TP + FN).
• F1 Score: The harmonic mean of precision and recall, providing a balance between the two.

31
Confusion Matrix: Accuracy, Precision, F1 Score, Recall

Metric Value Metric Value


Accuracy 0.9639 Accuracy 0.9980
Precision 0.9474 Precision 0.9969
Recall 1.0000 Recall 1.0000
F1 Score 0.9730 F1 Score 0.9985 32
Confusion Matrix: Accuracy, Precision, F1 Score, Recall

Metric Value Metric Value


Accuracy 0.9980 Accuracy 0.9799
Precision 1.0000 Precision 0.9701
Recall 0.9969 Recall 1.0000
F1 Score 0.9985 F1 Score 0.9848 33
Comparison and Future Work
True True False False
Classifier Accuracy Precision Recall F1 Score
Positives Negatives Positives Negatives
Naïve Bayes 323 174 0 1 0.9980 1.00 0.9969 0.9985

KNN 324 164 10 0 0.9799 0.9701 1.000 0.9848

Logistic Regression 324 156 18 0 0.9639 0.9474 1.000 0.9730

SVM 324 173 1 0 0.9980 0.9969 1.000 0.9985

Key Observations
• High Precision (>94%): Low false positive rate; malicious classifications are likely correct.
• Near-Perfect Recall (≥99.69%): Classifiers rarely miss malicious nodes.
• Robust F1 Scores (>0.97): Well-balanced performance in identifying threats and avoiding false alarms.

Future Work
• Testing with larger networks
• Exploring other types of IoT network attacks

Choukri et al., “RPL rank attack detection using Deep Learning,” 2020 International Conference on Innovation and Intelligence for
Informatics.
34
Appendix: How-to-Guide

35
How to classify the data?
To generate an excel file containing the 5 feature message counts for each sensor, follow these steps:
• Modify the file paths in the python script according to your setup.
• Open the command prompt.
• Navigate to the folder containing the python script.
• Run the Feature-Count-CSV script to process the packet trace file and generate the excel file.
• You can place the python script anywhere, as long as the file paths are correctly set to locate the necessary data
files, including the test and training data scenarios that contain the packet trace files.
• After running the script, it will generate the Sensor_Message_Count.csv file in each folder that contains packet
traces.

36
How to Normalize the data?
To generate an excel file containing the normalization of 5 feature message counts for each sensor, follow these steps:
• Modify the file paths in the python script according to your setup.
• Open the command prompt.
• Navigate to the folder containing the python script.
• Run the Merged-Data-NormalizedData script to process the Sensor_Message_Counts file and generate the
normalize excel file.
• You can place the python script anywhere, as long as the file paths are correctly set to locate the necessary data
files, including the test and training data scenarios that contain the Sensor_Message_Counts.csv.
• After running the script, it will generate the merged and normalized data file in the folder that contains the python
script.

37
How to generate plots?
Download the Workspace and place it in your desired location.
• The Python scripts can be placed anywhere, as long as the paths are correctly set.
• Modify the file paths in the Python scripts according to your setup.
• Open the command prompt.
• Navigate to the folder containing the Python scripts.
• Run the DAO , DIO , Packet Received scripts to process the packet trace files and generate the plots.
• Make sure the necessary packet trace files are accessible based on the paths defined in the scripts.

38
How to run the classifiers?
• After normalizing the data, name it 'Test-Data’.
• Run each classifier against the trained data by providing the file locations for both the Trained-Data and Test
data.
• Place the Test-Data , and Training-data in the same folder.
• Modify the path in the python script of each classifier according to your setup.
• Open the command prompt and run the script as shown below

The Python script generates six sets of predicted labels and uses this data to create confusion
matrices.
39
How to get the confusion matrix?
• After obtaining the predicted labels from the classifiers,
• Use these predicted label excel files along with the real training data to run the python script and generate the
confusion matrix.
• Place the predicted label excel files, and the real training data in the same folder.
• Modify the paths in the python script of Confusion-Matrix according to your setup.
• Open the command prompt and run the script as shown below.
• Each classifier's data generates a confusion matrix for that classifier

40

You might also like