ML Classifier To Detect Network Attacks in IoT
ML Classifier To Detect Network Attacks in IoT
ML Classifier To Detect Network Attacks in IoT
29Aug2024
3
Rank calculation in NetSim
• Rank: Scalar representation of node location within DODAG
• Purpose:
• Measure distance from root
• Avoid and detect loops
• Root node always has Rank 1 (also the border router in IoT)
• The rank calculation is based on the objective function defined.
• Rank Increase Formula:
2
𝑅𝐼 = 𝑀𝑎𝑥𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡 − 𝑀𝑖𝑛𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡 × 1 − 𝐿𝑞 + 𝑀𝑖𝑛𝐼𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡
𝑅𝑎𝑛𝑘 = 𝑅𝐼 + 𝑅𝑎𝑛𝑘(𝑃𝑎𝑟𝑒𝑛𝑡)
𝜆 𝑑0
𝑃𝑟 𝑑𝐵𝑚 = 𝑃𝑡 + 𝐺𝑡 + 𝐺𝑟 + 20 log10 + 10 × 𝜂 × log10
4𝜋𝑑0 𝑑
0.125 8
𝑃𝑟 𝑑𝐵𝑚 = 1 + 0 + 0 + 20 log10 + 10 × 3 × log10
4 × 3.14 × 8 48.55
𝑐
equal to 48.55m,𝑑0 = 8, 𝐺𝑡 = 0, 𝐺𝑟 = 0, 𝜂 = 3, 𝜆 = = 0.125𝑚, 𝑓 =
𝑓
2400𝑀𝐻𝑧
𝑃𝑡 −81.86
• One way link quality 𝐿𝑞 = 1 − = 1− = 1 − 0. 963 = 0.036
𝑟𝑠 −85
5
Rank calculations in NetSim
0.125 8
𝑃𝑟 𝑑𝐵𝑚 = 1 + 0 + 0 + 20 log10 + 10 × 3 × log10 = −73.35
4 × 3.14 × 8 27.85
𝑐
where 𝑑 is the distance between s2 and root node, 𝑑0 = 8, 𝐺𝑡 , 𝐺𝑟 = 0, 𝜂 = 3, 𝜆 = 𝑓 = 0.125𝑚, 𝑓 = 2400𝑀𝐻𝑧
−73.35
𝐿𝑖𝑛𝑘 𝑞𝑢𝑎𝑙𝑖𝑡𝑦 = 1 − = 1 − 0.862 = 0.137
−85
6
Rank calculations in NetSim
• Similarly, the rank for other nodes will be calculated. The rank
of a node in NetSim can be observed through the DODAG
Visualizer.
DODAG visualizer showing information about
• We see that the Ranks of S4, S3, S5 are 15, 28, 27 rank and parent relationships
respectively
7
Rank attack in RPL using NetSim
• Normal RPL process:
• Transmitter broadcasts DIO during DODAG formation
• Receiver updates parent list, sibling list, and rank
• Receiver sends DAO message with route information
• Malicious node behavior: All nodes in the network choose their parent
based on link quality
• Receives DIO but doesn't update its rank
• Advertises a fake (lower) rank
• Other nodes update their rank based on this fake information
• Attack impact:
• Nodes choose malicious node as preferred parent due to lower
rank
• Malicious node drops packets instead of forwarding
• Result: Zero network throughput Nodes 3, 4, and 5 choose parent as node 1 due
to its lower rank
Rehman et al., “Rank Attack using Objective Function in RPL for Low Power and Lossy Networks,” 2016, IEEE International Conference on
Industrial Informatics and Computer Systems (CIICS).
8
Rank attack in RPL using NetSim
• Consider the scenario shown. The root node(LOWPAN Gateway) has rank
1. It sends DIO messages to Sensor 5 and Sensor 7, which are within its
range.
• Both Sensor 5 and Sensor 7 recognize the DODAG ID of the root node.
They identify the root node as their parent.
• After this, Sensor 5 and Sensor 7 transmit DAO messages to the root node.
These DAO messages help to propagate destination information upward
along the DODAG. Sensor 5 then updates its rank and broadcasts DIO
messages.
• However, Sensor 7 is a malicious node. It also updates its rank but The network topology in IoT using RPL Protocol,
advertises a fake, lower rank after receiving the DIO message from the root Pathloss Model: Log Distance, Pathloss Exponent: 2
node.
• Sensors 6 and 4 receive DIO messages from both Sensor 5 and Sensor 7.
Due to Sensor 7's falsely advertised lower rank, Sensors 6 and 4 choose
Sensor 7 as their preferred parent.
• After selecting Sensor 7 as their parent, Sensors 6 and 4 send DAO
messages and data packets to Sensor 7.But instead of forwarding the data
packets, Sensor 7 drops them.
9
Rank attack in RPL using NetSim
• The results can be observed in the Results window, showing
that the network has zero throughput.
• Users can also observe Packet trace that after Sensor 7
receives packets, it does not forward them, resulting in no
data packet transmission from Sensor 7.
• Additionally, users can generate the DODAG visualizer using
Python and MATLAB utilities. In the DODAG, it can be
observed that Sensor 6 and Sensor 4 have chosen Sensor 7
as their parent. The packet trace shows that packets from Sensor-4 and
Sensor-6 are received by Sensor-7, but Sensor-7 is not
transmitting packets.
Throughput for the two applications is zero because the malicious sensor is
collecting all the packets
11
Attack scenarios - Training data generation
• Created 8 scenarios with varying node counts (6 to 39)
• Feature Extraction
o DAO Sent
o DAO Received
o DIO Sent
o DIO Received The network topology in IoT using RPL Protocol with 2
malicious nodes
o Data Packets Received
12
Data processing and Feature Visualization
13
Feature visualization: 2 malicious nodes; 3 random seeds
14
Feature visualization: 4 malicious nodes; 3 random seeds
15
Feature visualization: 5 malicious nodes; 3 random seeds
16
Feature visualization: 6 malicious nodes; 3 random seeds
17
Feature visualization: 8 malicious nodes; 3 random seeds
18
Feature visualization: 10 malicious nodes; 3 random seeds
19
Feature visualization: 12 malicious nodes; 3 random seeds
20
Classifier Training
21
Inference
22
Inference and Test Scenarios
The network topology in IoT using RPL Protocol includes 3 malicious nodes
23
Feature visualization: 3 malicious nodes; 3 random seeds
24
Feature visualization: 7 malicious nodes; 3 random seeds
25
Feature visualization: 9 malicious nodes; 3 random seeds
26
Feature visualization: 11 malicious nodes; 3 random seeds
27
Feature visualization: 13 malicious nodes; 3 random seeds
28
Feature visualization: 14 malicious nodes; 3 random seeds
29
Feature visualization: 15 malicious nodes; 3 random seeds
30
Confusion matrix
• Confusion matrix summarizes the performance of a machine learning model on a set of test data.
• Displays the number of accurate and inaccurate instances based on the model’s predictions.
• Used to measure the performance of classification models
• Confusion matrix components:
• True Positive (TP): Predicted as positive, and it actually is positive.
• True Negative (TN): Predicted as negative, and it actually is negative.
• False Positive (FP): Predicted as positive, but it is actually negative.
• False Negative (FN): Predicted as negative, but it is actually positive
• Performance metrics:
• Accuracy: The overall correct predictions (TP + TN) divided by the total number of instances.
• Precision: The number of true positives divided by the total number of predicted positives (TP + FP).
• Recall: The number of true positives divided by the total number of actual positives (TP + FN).
• F1 Score: The harmonic mean of precision and recall, providing a balance between the two.
31
Confusion Matrix: Accuracy, Precision, F1 Score, Recall
Key Observations
• High Precision (>94%): Low false positive rate; malicious classifications are likely correct.
• Near-Perfect Recall (≥99.69%): Classifiers rarely miss malicious nodes.
• Robust F1 Scores (>0.97): Well-balanced performance in identifying threats and avoiding false alarms.
Future Work
• Testing with larger networks
• Exploring other types of IoT network attacks
Choukri et al., “RPL rank attack detection using Deep Learning,” 2020 International Conference on Innovation and Intelligence for
Informatics.
34
Appendix: How-to-Guide
35
How to classify the data?
To generate an excel file containing the 5 feature message counts for each sensor, follow these steps:
• Modify the file paths in the python script according to your setup.
• Open the command prompt.
• Navigate to the folder containing the python script.
• Run the Feature-Count-CSV script to process the packet trace file and generate the excel file.
• You can place the python script anywhere, as long as the file paths are correctly set to locate the necessary data
files, including the test and training data scenarios that contain the packet trace files.
• After running the script, it will generate the Sensor_Message_Count.csv file in each folder that contains packet
traces.
36
How to Normalize the data?
To generate an excel file containing the normalization of 5 feature message counts for each sensor, follow these steps:
• Modify the file paths in the python script according to your setup.
• Open the command prompt.
• Navigate to the folder containing the python script.
• Run the Merged-Data-NormalizedData script to process the Sensor_Message_Counts file and generate the
normalize excel file.
• You can place the python script anywhere, as long as the file paths are correctly set to locate the necessary data
files, including the test and training data scenarios that contain the Sensor_Message_Counts.csv.
• After running the script, it will generate the merged and normalized data file in the folder that contains the python
script.
37
How to generate plots?
Download the Workspace and place it in your desired location.
• The Python scripts can be placed anywhere, as long as the paths are correctly set.
• Modify the file paths in the Python scripts according to your setup.
• Open the command prompt.
• Navigate to the folder containing the Python scripts.
• Run the DAO , DIO , Packet Received scripts to process the packet trace files and generate the plots.
• Make sure the necessary packet trace files are accessible based on the paths defined in the scripts.
38
How to run the classifiers?
• After normalizing the data, name it 'Test-Data’.
• Run each classifier against the trained data by providing the file locations for both the Trained-Data and Test
data.
• Place the Test-Data , and Training-data in the same folder.
• Modify the path in the python script of each classifier according to your setup.
• Open the command prompt and run the script as shown below
The Python script generates six sets of predicted labels and uses this data to create confusion
matrices.
39
How to get the confusion matrix?
• After obtaining the predicted labels from the classifiers,
• Use these predicted label excel files along with the real training data to run the python script and generate the
confusion matrix.
• Place the predicted label excel files, and the real training data in the same folder.
• Modify the paths in the python script of Confusion-Matrix according to your setup.
• Open the command prompt and run the script as shown below.
• Each classifier's data generates a confusion matrix for that classifier
40