0% found this document useful (0 votes)
6 views

Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Blockchain: Research and Applications 5 (2024) 100174

Contents lists available at ScienceDirect

Blockchain: Research and Applications


journal homepage: www.journals.elsevier.com/blockchain-research-and-applications

Research Article

Towards a lightweight security framework using blockchain and machine


learning
Shereen Ismail a,∗ , Muhammad Nouman b , Diana W. Dawoud c , Hassan Reza a
a
School of Electrical Engineering and Computer Science, University of North Dakota, ND 58202, USA
b
Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
c
College of Engineering and Information Technology, University of Dubai, Dubai, United Arab Emirates

A R T I C L E I N F O A B S T R A C T

Keywords: Cyber-attacks pose a significant challenge to the security of Internet of Things (IoT) sensor networks,
Blockchain necessitating the development of robust countermeasures tailored to their unique characteristics and limitations.
Machine learning Various prevention and detection techniques have been proposed to mitigate these attacks. In this paper, we
IoT
propose an integrated security framework using blockchain and Machine Learning (ML) to protect IoT sensor
Security
networks. The framework consists of two modules: a blockchain prevention module and an ML detection
Integration
Smart contracts module. The blockchain prevention module has two lightweight mechanisms: identity management and trust
management. Identity management employs a lightweight Smart Contract (SC) to manage node registration
and authentication, ensuring that unauthorized entities are prohibited from engaging in any tasks, while trust
management uses a lightweight SC that is responsible for maintaining trust and credibility between sensor nodes
throughout the network’s lifetime and tracking historical node behaviors. Consensus and transaction validation
are achieved through a Verifiable Byzantine Fault Tolerance (VBFT) mechanism to ensure network reliability
and integrity. The ML detection module utilizes the Light Gradient Boosting Machine (LightGBM) algorithm
to classify malicious nodes and notify the blockchain network if it must make decisions to mitigate their
impacts. We investigate the performance of several off-the-shelf ML algorithms, including Logistic Regression,
Complement Naive Bayes, Nearest Centroid, and Stacking, using the WSN-DS dataset. LightGBM is selected
following a detailed comparative analysis conducted using accuracy, precision, recall, F1-score, processing
time, training time, prediction time, computational complexity, and Matthews Correlation Coefficient (MCC)
evaluation metrics.

1. Introduction dynamic typologies, medium to large network scales, heterogeneous


node fabrication, and unsecured routing protocols [3,4]. In the absence
Security is a crucial concern in the realm of the Internet of Things of robust security countermeasures, IoT sensor networks become frag-
(IoT), and it has garnered significant attention in industry and academia ile and susceptible to insider cyber-attacks, allowing malicious nodes to
[1]. A promising avenue for addressing security challenges in IoT sys- manipulate data, extract sensitive information, and risk the network for
tems is integrating blockchain and Machine Learning (ML) technologies, their own gain [5].
which can offer effective prevention and detection techniques against The nodes within the IoT network are openly accessible and often
cyber-attacks; however, this approach is still in its nascent stages of de- deployed in challenging environments to support various IoT applica-
velopment, and there is a lack of research focused specifically on its tions; therefore, they are highly vulnerable to various cyber-attacks.
suitability for securing IoT sensor networks [2]. Detecting and identifying malicious nodes is of paramount importance.
IoT sensor networks possess various challenging characteristics, in- Many conventional IoT security measures either centralize control or
cluding limited resources in terms of energy, bandwidth, and storage, as depend on third-party entities, leaving them susceptible to single point
well as multi-hop relays, unsecured wireless communication channels, of failure. The distributive nature of blockchain makes the network

* Corresponding author.
E-mail address: [email protected] (S. Ismail).

https://doi.org/10.1016/j.bcra.2023.100174
Received 19 July 2023; Received in revised form 12 October 2023; Accepted 27 November 2023
Available online 30 November 2023
2096-7209/© 2023 THE AUTHORS. Published by Elsevier B.V. on behalf of Zhejiang University Press. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

Table 1
Mapping of the proposed framework modules with key remarks.

Module Contribution Adopted method Remarks

Blockchain prevention Lightweight identity management Identity management and secure Registration, authentication, and revocation
Smart Contract (SC) authentication mechanism phases
Lightweight trust management SC Trust evaluation mechanism Trust score calculation for each of the nodes
using a set of evaluation metrics
Consensus Proof-of-Work (PoW) vs. Verifiable VBFT achieves overall lower transaction cost
Byzantine Fault Tolerance (VBFT) than PoW

Machine learning detection Light Gradient Boosting Machine LightGBM, Logistic Regression (LR), LightGBM has the superiority in terms of
(LightGBM) Complement Naive Bayes (CompNB), accuracy, precision, recall, and F1-score
Nearest Centroid (NC), and Stacking while NC performs better in terms of
performance analysis processing time

more robust and immune to single point of failure [6], and data are • Proposing an ML detection module that will be deployed on the
safe and immutable; therefore data will never be tampered with by any BS and CHs to classify malicious nodes and notify the blockchain
malicious party once added to the blockchain ledger [7]. network. A malicious node’s registration is revoked from the
The purpose of ML is to analyze the data generated by IoT devices blockchain network using the identity management SC once iden-
and make classifications and predictions based on that data [8]. ML- tified and classified. We compared the performance of several
based approaches for cyber-attack detection are effective when identi- off-the-shelf supervised ML algorithms, including Light Gradient
fying and mitigating threats. While blockchain can be recognized as a Boosting Machine (LightGBM), Logistic Regression (LR), Comple-
trusted layer for IoT network participants. Combining blockchain and ment Naive Bayes (CompNB), Nearest Centroid (NC), and Stacking,
IoT devices with the ability to automatically record data and transfer it for cyber-attack detection to implement the ML malicious detec-
over a network ensures that unauthorized entities are prohibited from tion module and select the classifier with the appropriate perfor-
engaging in any tasks and can maintain trust and credibility between mance. The comparative analysis is conducted using the following
sensor nodes throughout the network’s lifetime. evaluation metrics: accuracy, precision, recall, F1-score, process-
ML and blockchain integration has recently emerged as a promising ing time, training time, prediction time, computational complexity,
security approach for safeguarding IoT sensor networks against mali- and Matthews Correlation Coefficient (MCC). Table 1 summarizes
cious nodes. The decentralized nature of the blockchain contributes to the key contributions of the proposed framework and its key re-
the network’s resilience by eliminating a single point of failure [6]. marks.
Data become tamper-proof once added to the blockchain ledger, assur-
ing that they cannot be manipulated by malicious parties [7,9]. ML The rest of this paper is organized as follows. Section 2 outlines
applications focus on detecting and classifying malicious nodes in IoT some necessary preliminaries for integrating ML and blockchain to se-
sensor networks. Trained ML classifiers are deployed to analyze node cure IoT sensor networks. Section 3 reviews the literature and identifies
the recent work on ML and blockchain integrated solutions. Section 4
behavior, enabling the network to respond appropriately and mitigate
discusses the proposed integrated security framework in detail. Sec-
their impacts [10]. The response can take the form of generating alarms
tion 5 extends the discussion to include the system implementation,
within the blockchain network, which can lead to isolating the attacker
and results illustration and discussion. Section 6 concludes this paper
node or revoking its identity, preventing further transactions within the
with key remarks.
network.
Unlike the related literature discussed in Section 2, this study
2. Preliminary
presents a lightweight, integrated security framework that combines
the power of blockchain and ML technologies to strengthen IoT sen-
Integrating ML and blockchain for IoT sensor network security is a
sor network security from the time of network node deployment and
promising approach to address vulnerabilities and possible threats as-
throughout the network’s lifetime. This study has several contributions,
sociated with IoT devices and data; however, the potential benefits and
including: challenges involved in combining these two technologies to enhance IoT
sensor network security have not been widely investigated in existing
• Deploying a permissionless blockchain on the Base Station (BS) literature, primarily because this represents a relatively novel research
and Cluster Heads (CHs) to register and authenticate the Moni- direction.
tor Nodes (MNs) within its vicinity using their credentials. Identity Blockchain offers a decentralized and trustless approach to man-
information is stored on the public blockchain network after au- aging transactions and data, eliminating the need for a third-party
thentication. VBFT is the consensus mechanism of the proposed central authority. Blockchain records all committed transactions on a
framework, which is one of the preferable low-complexity proto- distributed ledger, making it particularly valuable for enhancing cryp-
cols developed for distributed systems with connected unreliable tocurrency system security. The blockchain architecture is well-suited
wireless nodes [11]. for applications involving distributed transactions, decentralized com-
• Proposing a lightweight blockchain prevention module that con- putation, and management in a trustless environment [12].
sists of two mechanisms: identity management and trust manage- Integrating blockchain with IoT sensor networks can mitigate se-
ment. The identity management Smart Contract (SC) employs a curity risks associated with data storage, resource access, routing, and
lightweight SC that is responsible for verifying and registering identity authentication [13]. The blockchain’s Peer-To-Peer (P2P) dis-
nodes at network node deployment, while the trust management SC tributed ledger, which supports scalability and faster settlement for
has a lightweight SC that maintains the nodes’ trust and credibility coordinating and securing nodes, makes it a promising solution for se-
throughout the network’s lifetime and helps track their historical curing data and authenticating identities in IoT networks, as discussed
behaviors. This SC calculates a Trust Score (TS) value for each node in Ref. [14].
that indicates if the node is normal or misbehaving. The ML detec- Applying blockchain in IoT sensor networks presents challenges due
tion module is triggered to perform malicious detection once the to high storage and computational demands, resulting in increased de-
trust management SC determines that a node is misbehaving. lays and reduced network throughput. Blockchain often incurs costs

2
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

related to communication, memory usage, and power consumption, [23], a novel routing protocol was proposed called Generative Adver-
which can be at odds with the resource constraints of the sensor devices; sarial Network (GAN) and Blockchain-based Secured Routing Protocol
however, leveraging blockchain can help reduce the costs associated (GBCRP), which combines the Fully Distributed Generative Adversarial
with setting up and maintaining a centralized database, making effi- Networks (FDGAN)-DL method with Intrusion Detection System (IDS)
cient use of node idle states in terms of computational, storage, and and blockchain. The integration of GAN, IDS, and blockchain in their
bandwidth capabilities, ultimately lowering network calculations and protocol revealed an improvement in the overall security and efficiency
storage costs. of routing in WSNs.
On the other hand, ML algorithms provide efficient classification Malicious node detection based on ML-blockchain integrated ap-
models to identify cyber-attacks, which enhance the network nodes’ proaches has been explored in Refs. [24–27]. For instance, in Ref.
ability to learn without being explicitly programmed. The models are [26], we proposed a blockchain-based identity management and se-
used to make future predictions with new input data. ML algorithms cure authentication mechanism to be deployed on a hybrid blockchain
are currently used in various IoT sensor network applications. One architecture along with a Naive Bayes (NB) detection module. The ob-
approach is using ML to design lightweight detection and mitigation jective was to mitigate potential insider Denial of Service (DoS) attacks
systems to secure IoT sensor networks against cyber-attacks. ML models targeting CH nodes; however, the work did not implement a mecha-
can be trained to recognize patterns associated with known cyber-attack nism to perform registration and authentication for the nodes before
vectors and detect unauthorized access attempts or malicious activities joining the network. Yang et al. [24] used the Isolated Forest as an
within the IoT network [15]. anomaly detection model. This model was chosen for its computational
In this way, the sensor network can detect possible attacks and im- efficiency and high detection performance, particularly when managing
mediately take appropriate actions to mitigate the impact by triggering large volumes of dimensional data. The blockchain component ensured
an alarm, determining the degree of the risk, or isolating the attacker secure storage and updates for the global detection model by provid-
node from the next round of network progress [16]. ing trusted blocks (isolated trees) for model formation. The reported
Integrating ML and blockchain can significantly enhance IoT sen- results demonstrated that the proposed integrated blockchain-Isolated
sor network security by providing data integrity, device identity, access Forest IDS model achieved high detection accuracy for various attacks,
control, and real-time attack detection [17,18]; however, it is necessary while requiring lower communication and storage overhead compared
to develop lightweight security mechanisms that carefully consider the to similar blockchain-based models; however, it only stores the detec-
trade-offs among ML, blockchain, and the design factors and security tion model itself and not the detection results, which eliminates any
requirements of IoT sensor networks, including device resource limita- record of node behavior. Sajid et al. [25] proposed a joint identity
tions, particularly in terms of power consumption and latency [19]. management and secure routing model. The authors examined ML tech-
Our approach has two lines of defense that utilize blockchain and niques such as the Genetic Algorithm-based Support Vector Machine
ML integration. The first line of defense is attack prevention using (GA-SVM) and the Genetic Algorithm-based Decision Tree (GA-DT) to
blockchain, while the second one is attack detection using ML. In the detect malicious nodes, and the results showed that GA-SVM outper-
proposed framework, the first line of defense is represented through formed GA-DT in terms of detection accuracy. The node’s involvement
two lightweight mechanisms: 1) handling registration and authentica- in the routing process or its registration revocation from the blockchain
tion, preventing node failures to prove its identity to join the network, network was determined based on the outcome of the GA-DT process.
and 2) maintaining trust and credibility between sensor nodes by cal- The security of the routing transactions was ensured using PoA con-
culating a trust value for each node to select the trustworthy nodes as sensus. Removing malicious nodes resulted in a packet delivery rate
reliable data sources. The second line of defense is an ML detection increase to 99.72%. This work [25] improved malicious node isolation;
module that is responsible for verifying and examining the incoming however, it only targeted routing process security. Nouman et al. [27]
traffic for any malicious behavior, alerting the network to the presence used the VBFT-blockchain network for node registration and authenti-
of an attacker node. cation. The authors also proposed a Histogram Gradient Boost (HGB)
classifier for detecting DoS attacks. Data associated with normal nodes
3. Related work were stored in an Interplanetary File System (IPFS) to generate hashed
chunks that could be stored in the blockchain ledger. Performance com-
Previous studies have explored integrating ML and blockchain tech- parisons demonstrated high precision (at least 98%) achieved by HGB,
nologies to enhance IoT-Wireless Sensor Networks (WSNs) security via surpassing its counterparts. The transaction costs of VBFT were lower
various approaches. These approaches encompass secure routing, iden- than Proof-of-Work (PoW); however, the proposed model eliminated
tity authentication, attack localization, malicious node detection, and any records of previous node behaviors, similar to Ref. [24].
trust management (Table 2). In addition to malicious detection, trust evaluation was proposed
Integrating blockchain and ML to enhance routing protocol secu- in Ref. [28] considering blockchain-ML integration. In Ref. [28], the
rity in IoT-WSNs has been discussed in several studies [20–23]. Yang Sybil attack detection scheme and blockchain-based trust model were
et al. [20] proposed a framework that leverages a Proof of Author- introduced. The trust model was able to identify the Sybil nodes by
ity (PoA)-blockchain network to securely record routing information computing a trust value for each node using the Hidden Markov Model
using registration and token contracts. A reinforcement learning (RL) (HMM), and these trust values were added to the blockchain for le-
algorithm was applied to dynamically select the trusted routes. The gitimate node reference; however, this only mitigates Sybil attacks in
results revealed an 81% reduction in average packet delay compared underwater sensor networks.
to existing techniques, attributed to the trusted queue length informa- In Ref. [18], Gebremariam et al. used the blockchain-ML integration
tion provided in their framework. Revanesh and Sridhar [21] proposed for attack localization combined with a trust evaluation mechanism.
trusted routing using blockchain and the Salp Swarm Optimization al- The authors specifically proposed an attack localization and detection
gorithm. A Deep learning (DL)-Convolutional Neural Network (CNN) technique incorporated with cascade encryption and trust evaluation
was implemented to manage the decision of routing link selection based using blockchain and hybrid Federated Learning (FL) to secure large-
on the trusted routing information obtained from the blockchain. The scale IoT-WSNs [18]. As is known in the literature, DL techniques are
work of Ref. [22] also employed PoA-based blockchain and introduced more demanding in terms of computational complexity and process-
a DL model using CNN to determine validators for the PoA-SC. By ing power than ML. We have selected supervised ML classification for
pre-selecting and limiting the number of validators, their PoA-DL con- the attack detection module in this work for that reason. ML classifica-
sensus mechanism demonstrated lower latency and enhanced transac- tion algorithms often have a simpler structure and require less data for
tion processing capacity compared to conventional approaches. In Ref. training. Collecting a large, labeled dataset to train DL models can be

3
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

Table 2
Existing work on integrated blockchain-machine learning systems for wireless sensor networks security (N/S means not specified).

Blockchain type ML Ref. Security threat Consensus SC Study highlights


algorithm

Public DL-CNN [21] Insider attacks PoA Registration contract Trustworthy cluster-based routing using
and token contract blockchain and DL-CNN for optimal
routing nodes selection
HMM [28] Sybil attacks N/S N/S Trust management model for the
detection of Sybil nodes using
blockchain and HMM
HGB [27] DoS attacks VBFT vs. PoW Registration and Authentication mechanism to mitigate
authentication contract DoS attacks using blockchain and HGB
integrated with IPFS for data storage

Private Isolated [24] Insider attacks IOTA tangle N/S Distributed anomaly detection using
Forest Isolated Forest algorithm and
blockchain
GA-SVM [25] Grayhole, PoA vs. PoW Agreement SC Nodes registration and authentication
and GA-DT mistreatment, and and routing data storage using
MITM attacks blockchain and malicious node
detection using GA-SVM and GA-DT

Consortium RL [20] Blackhole attacks PoA Registration contract Trusted routing using blockchain and
and token contract RL
CNN [22] Routing attacks PoA Registration contract Trusted routing using blockchain and
(Blackhole) and token contract CNN

Hybrid Gaussian [26] Insider attacks N/S Registration and Identity management and secure
NB authentication contract authentication mechanism using
blockchain and malicious node
detection using Gaussian NB

FL [18] Routing attacks PoA vs. PoW Registration and Attack localization and detection
authentication contract incorporated with cascade encryption
and trust evaluation using blockchain
and FL

N/S GAN [23] Network layer N/S N/S Secure routing using blockchain and
attacks authentication and validation of routing
procedures using GAN

Note: SC: Smart Contract, DL-CNN: Deep Learning-Convolutional Neural Network, PoA: Proof-of-Authority, HMM: Hidden Markov Model, HGB:
Histogram Gradient Boost, DoS: Denial of Service, PoW: Proof-of-Work, GA-SVM: Genetic Algorithm-based Support Vector Machine, GA-DT:
Genetic Algorithm-based Decision Tree, MITM: Man-in-the-middle, RL: Reinforcement Learning, CNN: Convolutional Neural Network, Gaussian
NB: Gaussian Naive Bayes, FL: Federated Learning, GAN: Generative Adversarial Network.

challenging in IoT environments. ML classifiers can often perform well The ML detection module identifies and classifies any malicious
with smaller datasets, making them more practical for many IoT use node using an efficient ML model for the blockchain network to be
cases. This simplicity can be advantageous when designing solutions notified to take appropriate actions and isolate this node. Extensive
for IoT networks, making them less vulnerable to numerous attacks performance comparisons are conducted to select the appropriate ML
since it reduces the system’s complexity. We consider our results pre- algorithm to classify the detected malicious nodes. Fig. 1 depicts the
viously published in Ref. [29] as a motivation to use LightGBM for the proposed framework that deploys both modules in a permissionless
ML detection module. Moreover, the mechanisms employed within the blockchain network consisting of multiple clusters through the BS and
blockchain module are considered lightweight SCs in terms of opcodes CH nodes.
and calculations, which help reduce the gas cost of calling SC functions.
4.1. Blockchain prevention module
4. Proposed framework
Identity management and trust evaluation are two important means
for preserving IoT sensor network security, ensuring that legitimate
The proposed framework uses a permissionless decentralized blockchain nodes can access network services or resources and maintain trustwor-
structure on a hierarchical cluster-based architecture to benefit from thiness between them throughout the network’s lifetime. This module
blockchain immutability and, at the same time, reduce its complexity, is responsible for preventing attacks within a blockchain-based IoT sen-
allowing it to be used in IoT networks. sor network that can harm network services and stop legitimate traffic
The proposed framework comprises two modules: the blockchain from accessing the network. The module can achieve this by implement-
prevention module and the ML detection module. The blockchain pre- ing the following measures: registering and authenticating nodes before
vention module employs two mechanisms: identity management and they are granted permission to transact over the network. This process
trust management. Identity management uses an SC to manage node guarantees that only authorized nodes can participate in network ac-
registration and verification, ensuring that unauthorized entities are tivities. Each registered node is assigned a unique identity nameplate,
prohibited from engaging in the blockchain network, while trust man- referred to as an IDCard, which is generated by the identity manage-
agement has an SC that periodically computes a TS for each MN to ment SC. The module then uses VBFT to validate transactions before
evaluate its behavior throughout the network’s operation. Transaction they are added to the blockchain ledger, which ensures the integrity
validation is performed using the VBFT consensus algorithm. and reliability of recorded transactions.

4
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

Fig. 1. Illustration of the proposed framework.

The trust management SC establishes trustworthiness among the and Energy Consumption Amount (ECA). Each node is equipped with
nodes by periodically calculating the TS value for each node. Each node a TrustCard that contains SN_id, Status, TS, TSS, PSR, PFR,
should be classified into one of two levels of trust: trustworthy or risky. FD, EnergyLevel, and ECA. Nodes are classified into two trust levels:
If a node is classified as risky, the ML detection module is triggered to trustworthy and risky. The trustworthy node is deemed safe for sending,
identify and classify its malicious behavior. The result of the ML detec- receiving, and forwarding packets within the network, while the risky
tion will be recorded on the blockchain. Subsequently, the blockchain node should be recorded on the blockchain to be closely monitored in
network utilizes the identity management SC to revoke the identity subsequent assessment rounds. We adopted the same trust mechanism
of the identified malicious node, which ensures that the node will no discussed in Ref. [31], but with two trust levels.
longer participate in the network operation throughout the network’s
lifetime and consequently mitigates the potential damage caused by its 4.1.3. Blockchain consensus
malicious actions. Data consistency can be assured by blockchain-based consensus on
the data without involving a central authority or a third party [32]. Ex-
4.1.1. Lightweight identity management mechanism amples of consensus algorithms include PoA, PoW, Proof of Stake (PoS),
Centralized security and authentication authorities, such as identity and RAFT. In this study, VBFT is adopted as the consensus mechanism
providers or central access servers, have limitations in terms of sin- for the proposed framework. VBFT enhances the traditional Byzantine
gle point of failure and scalability. Using permissionless decentralized Fault Tolerance (BFT) by introducing verifiable randomness in the selec-
blockchain for identity management and secure authentication should tion of consensus peers for the next block. This randomness, achieved
by applying a random function to the current block, fortifies the al-
avoid single point of failure and support network scalability regardless
gorithm against malicious attacks. VBFT combines Verifiable Random
of the number of managed identities [30].
Function (VRF), BFT, and PoS, making it a hybrid consensus algorithm.
In this work, lightweight identity management is employed to fa-
PoW is used as a benchmark scheme, where the nodes compete to solve
cilitate the registration and authentication of nodes and record their
a mathematical puzzle, which typically requires significant computa-
identities into the blockchain ledger. During the initial deployment, the
tional resources. The node that successfully solves the puzzle first is
node’s credentials will be registered on the blockchain to be able to
granted the authority to add the new block to the blockchain.
transact and communicate over the network. Each node is assigned a
unique IDCard that should include SN_id, SN_password, CH_id,
4.2. Machine learning detection module
BS_id, and SN_time. The identity management SC encompasses sev-
eral functions, including RegisterNode() for node registration, AuthN- The proposed framework deploys an ML detection model on both the
ode() for node authentication, RevokeNode() for node revocation, In- BS and CHs in order to effectively identify and classify malicious nodes.
foNode() for querying node information, and TotalNode() for querying We conduct a comprehensive performance comparison of various su-
the total number of registered and authenticated nodes. pervised ML algorithms to detect cyber-attacks in IoT sensor networks,
specifically LightGBM, LR, CompNB, NC, and Stacking.
4.1.2. Lightweight trust management mechanism
The trust management mechanism proposed in this study aims to en- 4.2.1. Dataset and data preprocessing
sure the selection of reliable data sources in a lightweight manner. The The ML models are trained using the specialized imbalanced dataset,
proposed trust management SC is responsible for maintaining node trust WSN-DS, which contains samples of four insider DoS attacks: Blackhole,
and detecting malicious nodes within the network by periodically evalu- Grayhole, Flooding, and TDMA scheduling [33]. Data preprocessing in-
ating the TS value for each sensor node using a set of assessment metrics volves cleaning, normalization, handling duplicates or missing data,
that are determined during the network’s operations. These metrics feature encoding, dimensionality reduction, and labeling [34].
include node status, Transmitted Signal Strength (TSS), Packet Send- The RandomOverSampler technique is applied to balance the orig-
ing Rate (PSR), Packet Forwarding Rate (PFR), Forwarding Delay (FD), inal dataset using the imbalanced-learn Python library. This technique

5
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

Table 3 • LR. LR is a supervised technique used for categorical data analy-


Description of corresponding dataset. sis that is not continuous as is the case in linear regression, so it is
Dataset Class Size Proportion Total mainly used to accurately predict category trends in a set of data.
(%) In LR, the prediction probability of a certain outcome can be ex-
Original Normal 340,066 90.77 374,661
pressed mathematically in its simplest form by Ref. [36]
Imbalanced Blackhole 14,596 3.90 e𝛼+𝛽𝑥
Grayhole 10,049 2.68 𝑃 (𝑌 |𝑋 = 𝑥) =
Flooding 6,638 1.77 1 + e𝛼+𝛽𝑥
TDMA 3,312 0.88 where 𝛼 is the 𝑌 -intercept, 𝛽 is the regression coefficient, and
e=2.71828 is the natural logarithmic base. Note here that 𝑌 is cat-
Balanced Normal 340,066 20 1,700,330 egorical, but 𝑋 can be either; i.e., categorical or continuous, and
Blackhole 340,066 20
the relationship between 𝑋 and log(𝑌 ) is linear, where
Grayhole 340,066 20
Flooding 340,066 20
TDMA 340,066 20 log(𝑌 ) = 𝛼 + 𝛽𝑋, (1)
𝛽 in Eq. (1) can take positive or negative values that specify the
replicates minority class samples that can introduce high computational relationship between 𝑋 and log(𝑌 ). The former implies that larg-
requirements if the original dataset is relatively large but imbalanced, er/smaller values of 𝑋 are associated with larger/smaller values of
which is not the case with the WSN-DS dataset. The original dataset log(𝑌 ), while the latter implies that smaller/larger values of 𝑋 are
contains 374,661 samples: 14,596 Blackhole, 10,049 Grayhole, 6,638 associated with larger/smaller values of log(𝑌 ).
Flooding, and 3,312 TDMA. RandomOverSampler increases the fre- • CompNB. CompNB belongs to the NB algorithm family, which is
quency of these four classes, resulting in an equal number of samples for popular among ML classifiers. Unlike multinomial NB, CompNB
each attack type (340,066 samples each) and normal samples, totaling can be efficiently deployed with imbalanced datasets. The term
1,700,330 samples. complement in CompNB refers to the selection mechanism in the
The dataset is divided into training and testing subsets at a 70:30 CompNB algorithm, which relies on the probability of an item be-
ratio by importing the train_test_split method from Scikit-learn. Each longing to a specific class and not all the classes, in contrast to
algorithm is trained on multi-label classification using both the original classicalNB algorithms. In CompNB, the class is selected based on
imbalanced dataset and the balanced dataset (Table 3). the maximum posterior probability that can be estimated by Ref.
[37]
4.2.2. Machine learning algorithms [ #» ∑ 𝑁 + 𝛼𝑖 ]
A wide range of ML algorithms are available in the literature; there- 𝑙(𝑑) = argmax𝑐 log 𝑝( 𝜃 𝑐 ) − 𝑓𝑖 log 𝑐𝑖̃ (2)
𝑖
𝑁𝑐̃ + 𝛼
fore, the performance of several off-the-shelf ML techniques is evaluated
to efficiently select a suitable model for our proposed detection module. where 𝑓𝑖 is the frequency of word occurrence in document 𝑑 that
The performance comparison involves LightGBM, LR, CompNB, NC, and belongs to every class but 𝑐 , and 𝑁𝑐𝑖 is the total number of word
̃ #»
Stacking algorithms. occurrences in classes but 𝑐 . While 𝑝( 𝜃 𝑐 ) refers to the class’s prior

estimate. Also, 𝛼 = 𝑖 𝛼𝑖 , here 𝛼𝑖 imagined occurrences so that the
• LightGBM. LightGBM stands for light gradient-boosting machine, estimate is a smoothed version of the maximum likelihood esti-
which is a modified algorithm of the computationally demanding mate.
Gradient Boosting Decision Tree (GBDT), and thus it is a distributed • NC. NC is another widely used ML technique that relies on the cen-
boosting ML framework [35]. The algorithm is called light because troid, which is the mean of the class training samples, to classify an
it can achieve the same accuracy as GBM but with less processing observation. This technique assigns a label to an observation based
time. The algorithm grows trees leaf-wise, unlike classical algo- on its proximity to the mean of the corresponding class. The esti-
rithms that grow trees level-wise where each leaf is chosen based mation of NC is commonly done using a straightforward approach,
on its potential to have the largest decrease in loss. It is also more where the Euclidean distance is calculated by Ref. [38]
efficient and less memory demanding in comparison to its counter-
parts since it relies on the optimized histogram-based decision tree 𝛿𝑘 = 𝐷( #»
𝑥 ′ − #»
𝜇 𝑘) (3)
learning algorithm. Gradient-Based One-Side Sampling (GOSS) and where 𝛿𝑘 is the distance between the new observation #» 𝑥 ′ and the
Exclusive Feature Bundling (EFB) [35] enable LightGBM to man- mean of the class set 𝑘. Then, the algorithm decides the class 𝑔̂ ′
age high imbalance datasets and give more consideration to minor that’s associated with the new observation by the following rule:
classes. Generally, gradient tree boosting aims to minimize the fol-
lowing objective function: 𝑔̂ ′ = argmin𝑘 𝛿𝑘 . (4)
𝑛
∑ 𝐾
∑ • Stacking. Stacking is an ensemble ML technique that combine the
𝐿(𝜙) = 𝑙(𝑦𝑖 , 𝑦̂𝑙 ) + Ω(𝑓𝑘 ) results of multiple ML algorithms to enhance the detection accu-
𝑖=1 𝑘=1
racy and efficiency, hence the name. This technique usually per-
s.t. Ω(𝑓𝑘 ) = 𝛾𝑇𝑘 + 0.5𝜆‖𝐰𝑘 ‖2 forms at two levels, 0 and 1. Level 0 trains several base learners
where, (𝑦𝑖 , 𝑦̂𝑙 ) is a differentiable convex loss function, where 𝑦̂𝑙 is while at level 1, the algorithm learns from the best estimate of pre-
the final predicted output that can be expressed by vious level learners [34]. The Stacking starts with evaluating the
class distribution vector for the 𝑗 -th classifier such that [39]
𝐾

𝑦̂𝑙 = 𝑓𝑘 (𝑥𝑖 ), 𝑓𝑘 ∈ 𝐹 . Δ𝑗 = [𝛿1𝑗 𝛿2𝑗 ⋯ 𝛿𝑐𝑗 ] 1 ≤ 𝑗 ≤ 𝑛,
𝑘=1
where 𝑐 is the number of classes and
Here, 𝐾 is the number of trees, 𝑓𝑘 is the 𝑘-th tree model, and 𝑓𝑘 (𝑥𝑖 )
is the score of the 𝑖-th observation obtained from the 𝑘-th tree. 𝐹
Δ𝑗 = [𝛿1𝑗 𝛿2𝑗 ⋯ 𝛿𝑐𝑗 ], 1 ≤ 𝑗 ≤ 𝑛,
is the trees space. Ω(𝑓𝑘 ) is a penalizing function on the complexity ∑
of the 𝑘-th tree 𝑓𝑘 , which is related to the number of leaves 𝑇𝑘 and 𝛿𝑖𝑗 = 1.
the weight of leaves 𝐰𝑘 = (𝑤𝑘1 , 𝑤𝑘2 , ⋯ , 𝑤𝑘𝑇 ). 𝑖
𝑘

6
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

Then, the 1 × 𝑐 class distribution vector is evaluated by

𝛿 ′ = Φ𝚫 = [𝛿1′ 𝛿2′ ⋯ 𝛿𝑛′ ]. (5)


In Eq. (5), Φ is the weight distribution vector that includes the
weight of each classifier, and can be expressed by

Φ = [𝜃1 𝜃2 ⋯ 𝜃𝑛 ]
where

0 ≤ 𝜃𝑗 ≤ 1, 𝜃𝑗 = 1
𝑗

and 𝚫 = [Δ′1 Δ′2 ⋯ Δ′𝑛 ] is the 𝑛 × 𝑐 class distribution vector for the
𝑛 classifiers.
Fig. 2. RegisterNode() inputs from identity management smart contract.
5. Experimental results and discussion
Algorithm 1: Identity management smart contract.
In this section, we discuss the proposed framework’s experimen- 1: Initialize identity management SC structure.
tal results. We first declare the blockchain prevention module over 2: Declare IDCard variables: SN_id, SN_password, SN_registered, CH_id, BS_id, SN_
time.
Ethereum, which is a permissionless blockchain deployed on BS and
3: Define a modifier onlyCH to restrict functions execution to the CH. If the sender
CH nodes. We design and evaluate the two proposed SCs using Remix is not a CH, revert the transaction.
IDE web integrated with Ethereum. The ML detection module is de- 4: Function RegisterNode()
signed using the LightGBM algorithm, which is selected following an • Create a new SN record with the provided initials and set SN_registered to
true.
extensive comparative analysis with other algorithms, including LR,
• Add the new SN to the SN array.
CompNB, NC, and Stacking. We use Google Colaboratory and Python 5: Function AuthNode()
programming to perform the performance comparison on the balanced • Accept SN_id as the index.
and imbalanced WSN-DS datasets. • Authenticate an SN based on provided credentials.
• If successfully authenticated, return “Sensor Authenticated” and set
SN_registered to true.
5.1. Blockchain prevention module • If not authenticated, return “Node revoked” and set SN_registered to false.
6: Function RevokeNode()
• Accept SN_id as the index.
The proposed blockchain prevention module is implemented using • Check if SN_id is valid in the SN array.
• Set SN_registered to false at the specified SN_id.
Ethereum, which is a permissionless blockchain and distributed appli-
7: Function InfoNode()
cation platform commonly known for its virtual cryptocurrency, Ether • Accept SN_id as the index.
or ETH. Ether is the token that powers Ethereum. The Remix IDE web, • If found, return SN_password, SN_registered status, CH_id, BS_id, and SN_
integrated with an Ethereum wallet created using a JavaScript injec- time.
• Otherwise, return “Node revoked” and set SN_registered to false.
tor called Metamask, is used to develop SC and consensus algorithm
8: Function TotalNode()
performance evaluations. • Return the total number of registered SNs.
Two SCs are built in the proposed framework: identity management • Return the length of the SN array.
SC and trust management SC. In general, an SC can be defined as a pro-
gram code that incorporates an automated legal agreement [7]. Vyper,
Bamboo, Serpent, and Mutan programming languages have been used put, logs, etc. The following log depicts an example of the output for
to develop SC code on various blockchain platforms; however, Solidity calling RegisterNode() (Fig. 2).
is the most popular object-oriented high-level programming language
adopted for writing SCs. Implementing SCs within the blockchain makes 1 from: 0x5B3...eddC4
2 to: IdentityManagement.RegisterNode(uint256,string,uint256,
it immutable and tamper-proof; therefore, a deployed contract cannot uint256,uint256) 0xd91...39138
be changed or removed. keccak256 is the hashing function works for 3 value: 0 wei
ETH, built into Solidity and used to generate the hash of node’s unique 4 data: 0xe61...00000
IDCard using the following formula: 5 logs: 0
6 hash: 0x3ac...9d2a9
7 status true Transaction mined and execution succeed
SN_id=keccak256(PA) 8 transaction hash: 0x3ac2...9d2a9
9 block hash: 0x615...68649
where SN_id represents the hashed sensor identity and PA is the node’s 10 block number 2
physical or MAC address. The keccak256 function reduces the cost when 11 from 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4
compared to other hashing algorithms. 12 to IdentityManagement.RegisterNode(uint256,string,uint256,
uint256,uint256)
13 gas 236872 gas
5.1.1. Identity management smart contract 14 transaction cost 205975 gas
15 execution cost 183799 gas
Algorithm 1 description represents identity management SC, which
16 input 0xe61...00000
is written in Solidity and mainly consists of RegisterNode(), AuthN- 17 decoded input {
ode(), and RevokeNode() as core payable functions and InfoNode() and 18 "uint256 SN_id": "1258625",
TotalNode() as non-payable functions. 19 "string SN_password": "Sensor_123",
20 "uint256 CH_id": "10",
Each contract call invoking identity management SC generates a
21 "uint256 BS_id": "1",
transaction that appears on the terminal and has the following fields: 22 "uint256 SN_time": "1"
status, transaction hash, from, to, input, decoded input, decoded out- 23 }

7
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

5.1.2. Trust management smart contract


Algorithm 2 description represents trust management SC, which is
written in Solidity and mainly consists of AddNodeTrust(), CalculateN-
odeTrust(), and UpdateNodeTrust() payable functions and QueryNode-
Trust() as a non-payable function. Fig. 3 depicts an example of calling
AddNodeTrust() and initializing the set of parameters, including node
status, TSS, RSS, PSR, PFR, FD, EnergyLevel, and ECA. For simula-
tion purposes, we use the Low Energy Adaptive Clustering Hierarchy
(LEACH) protocol as the routing protocol for IoT sensor networks.
We assumed that the sensor’s initial energy should exceed 3 J to be
elected as a CH. We also assumed the same TS_Threshold proposed
in Ref. [31] to determine if the node is normal or risky. The follow-
ing log depicts an example of the output for calling AddNodeTrust()
(Fig. 3).

1 from: 0x5B3...eddC4
2 to: Trust.AddNodeTrust(uint256,bool,uint256,uint256,uint256,
uint256,uint256,uint256,uint256) 0xd91...39138
3 value: 0 wei
4 data: 0x113...00019
5 logs: 0hash: 0xf36...43018
6 status true Transaction mined and execution succeed
7 transaction hash 0xf36...43018
8 block hash 0xb5f...940b4
9 block number 2
10 from 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4
11 to Trust.AddNodeTrust(uint256,bool,uint256,uint256,uint256, Fig. 3. AddNodeTrust() inputs from trust management smart contract.
uint256,uint256,uint256,uint256) 0
xd9145CCE52D386f254917e481eB44e9943F39138
12 gas 309437 gas
13 transaction cost 269075 gas
14 execution cost 246727 gas
15 input 0x113...00019
16 decoded input {
17 "uint256 _ID": "1258625",
18 "bool _Status": true,
19 "uint256 _TSS": "85",
20 "uint256 _RSS": "65",
21 "uint256 _PSR": "80",
22 "uint256 _PFR": "58",
23 "uint256 _FD": "62",
24 "uint256 _EnergyLevel": "81",
25 "uint256 _ECA": "25"
26 }

Fig. 4. PoW vs VBFT performance for identity management smart contract.


Algorithm 2: Trust management smart contract.
1: Initialize trust management SC structure. 5.1.3. Blockchain consensus
2: Declare TrustCard variables: ID, Status, TSS, RSS, PSR, PFR, FD, EnergyLevel, A transaction that invokes a contract function typically consumes
ECA, TS. more resources than other transaction types, and that transaction cost
3: Function AddNodeTrust()
can be evaluated once run by a miner. We test the proposed SCs us-
• Create a new Trust record with the provided initials, and set Status to
normal and TS to 0. ing two consensus algorithms: PoW and VBFT. The average transac-
• If EnergyLevel is greater than 3 J, the node is identified as a CH. tion costs for calling the payable functions of identity management SC
4: Function CalculateNodeTrust() and trust management SC are presented in Figs. 4 and 5, respectively.
• Check if the caller is a CH.
These values are more concisely expressed in Gwei. The evaluation
• Accept the parameters TSS, RSS, PSR, PFR, FD, and ECA.
• Check if the node’s EnergyLevel is greater than 0. of the identity management SC’s performance considers the payable
• Calculate the TS using weighted average of TSS, RSS, PSR, PFR, FD, and functions RegisterNode(), AuthNode(), and RevokeNode(). The func-
ECA. tions AddNodeTrust(), CalculateNodeTrust(), and UpdateNodeTrust()
• Save the TS of the node.
are evaluated for the trust management SC.
5: Function UpdateNodeTrust()
• Check if the caller is a CH. We observe that VBFT reduces the transaction cost of calling the
• Accept the parameters CH_id, SN_id, and the new TS. three functions of the identity management SC (Fig. 4). Both Reg-
• Update the TS of the node. isterNode() and AuthNode() functions use SN_id, SN_password,
• Check if the TS is less than TS_Threshold, the Status is updated to risky.
CH_id, BS_id, and SN_time attributes for each node to complete
6: Function QueryNodeTrust()
• Accept the parameters CH_id and SN_id. the registration, while the RevokeNode() function only requires the
• Return the TS of that specified node. SN_id to revoke the node from the network. The trust management
SC is similar, where VBFT also reduces the transaction cost of calling
AddNodeTrust(), CalculateNodeTrust(), and UpdateNodeTrust() func-
The time complexity for the two proposed SCs is represented by the tions (Fig. 5). These results are due to the fact that PoW miners need
time complexity of their functions, where no single function has more to solve difficult puzzles to verify and confirm transactions. These dif-
than one for loop; therefore, its complexity should be 𝑂(𝑛). ficult puzzles are time-consuming and require many resources to solve.

8
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

83.79%/73.67%, and 87.60%/96.05% for the imbalanced/balanced


WSN-DS dataset (Fig. 6(a)). LightGBM accuracy is obviously supe-
rior compared to its counterparts. Similarly, LightGBM also outper-
forms the rest of the classifiers in terms of precision (Fig. 6(b)), with
98%/100%, 84%/92%, 64%/77%, 47%/75%, and 71%/96% for the
imbalanced/balanced WSN-DS dataset. Interestingly, the LightGBM re-
call results were also the highest among the five classifiers (Fig. 6(c)).
Recall results were 98%/100%, 85%/90%, 47%/74%, 74%/74%, and
87%/96% for LightGBM, LR, CompNB, NC, and Stacking classifiers,
respectively. LightGBM once again outperformed the other classifiers
in terms of the F1-score (Fig. 6(d)), the results were 98%/100%,
84%/89%, 38%/72%, 53%/72%, and 73%/96% for LightGBM, LR,
CompNB, NC, and Stacking classifiers.
As Fig. 7 depicts, the ML experimental results of the physical re-
quirements indicated that Stacking requires the highest processing time
Fig. 5. PoW vs VBFT performance for trust management smart contract.
of 118.62 s and 523.8 s for imbalanced and balanced datasets, respec-
The first miner to solve the puzzles adds the next block to the chain tively. NC required the least processing time for both datasets. On the
and receives a payment. Miners must invest in expensive equipment other hand, LightGBM and LR required a similar amount of processing
and power to increase their chances of mining a block, which leads to time for the imbalanced dataset, but in the case of the balanced dataset,
higher transaction costs. In contrast, VBFT uses a simpler process that LR required less processing time than LightGBM. Regarding the mod-
does not involve math or mining; it relies on voting and random se- el’s size in memory, the memory requirements remain constant for both
lection. Nodes agree to vote, and a verified random function selects a imbalanced and balanced datasets and are 33, 27, 29, 27, and 33 bytes
leader who suggests the next block. for LightGBM, LR, CompNB, NC, and Stacking, respectively. LR and NC
require slightly less memory storage than LightGBM and CompNB.
5.2. Machine learning detection module Table 4 compares the performance of the ML algorithm for differ-
ent training dataset sizes. The splitting ratios used are 60:40, 70:30,
We used Google Colaboratory and Python programming on the bal- and 80:20. The aim is to investigate the performance of small (60%),
anced and imbalanced datasets to assess the performance of the ML medium (70%), and high (80%) training samples. The experimental
algorithms under consideration. The classical evaluation metrics are results listed in Table 4 illustrate the superiority of LightGBM for all
used, including accuracy, precision, recall, F1-score, and MCC, and can training dataset sizes in terms of accuracy, precision, recall, and F1-
be mathematically expressed by: score. All the algorithms exhibited an increasing trend in processing
time and training time when more data were available for training,
TP + TN
Accuracy = while they exhibited a decreasing trend in terms of prediction time.
T P + T N + F P + FN NC exhibited a minimum processing time, training time, and prediction
TP time for the different training dataset sizes.
Precision =
T P + FP Table 5 presents the computational complexity of the ML algorithms
TP in terms of training and prediction. LR, CompNB, and NC are rela-
Recall = tively lightweight algorithms for both training and prediction, with
T P + FN
linear or near-linear complexity with respect to the number of sam-
2(𝑃𝑑 × Precision)
F1-score = ples (n) and number of features (d); however, the choice of algorithm
(𝑃𝑑 + Precision)
should also consider other factors, such as the model’s performance,
T P × T N − FP × F N available computational resources, and the nature of your dataset. LR,
MCC = √
(TP + FP )(TP + FN )(TN + FP )(TN + FN ) CompNB, and NC are often faster and more memory-efficient for small
to medium-sized datasets. For complex and high-dimensional datasets,
Here, confusion matrix values TP , TN , FP , and FN are true positive,
LightGBM can provide excellent predictive performance with manage-
true negative, false positive, and false negative, respectively. Accuracy
able computational overhead [29]. Stacking can be computationally
is the ratio of correctly detected normal or attack observations to the to-
more demanding due to its combination of multiple models, and its
tal number of correctly or incorrectly detected observations. Precision
complexity increases with the number of base learners [34]. Overall,
estimates the total number of correctly detected attacks compared to the
based on the comprehensive analysis of the selected models, LightGBM
total number of correctly and incorrectly detected attacks. Recall, better
outperforms other models for detecting DoS attacks in terms of accu-
known as the probability of detection 𝑃𝑑 , evaluates the number of cor-
racy, precision, recall, F1-score, and MCC metrics, while NC achieves
rectly detected attacks compared to the total number of actual attacks.
the best processing time, training time, and prediction time. Accord-
F1-score approximates the harmonic precision-recall mean; it uses FN
ingly, we select LightGBM as the most appropriate ML model for the
and FP to efficiently classify noisy or imbalanced data. MCC takes into
detection module; however, other perspectives establish that the most
account all four values in the confusion matrix. MCC is generally con-
efficient model has the best results in terms of physical measures and
sidered a balanced measure, which we use even for imbalanced dataset
computational complexity.
where the classes are of very different sizes.
Physical measures are also used to provide a complete evaluation of
the proposed model, including processing time, memory usage, training 6. Conclusion
time, and prediction time [29]. The processing time refers to the mod-
el’s training and testing time. Memory usage is defined as the model Combining blockchain and ML technologies in an integrated solu-
storage size at the time it is ready to run. The training time is the time tion for securing IoT sensor networks is a new approach that has been
the model takes to train itself, while the prediction time represents the the focus of recent studies. IoT sensor networks are vulnerable to in-
time the model needs to estimate if a sample is normal or risky [2]. sider cyber-attacks, where attackers can initiate malicious actions, such
LightGBM, LR, CompNB, NC, and Stacking classifiers achieved as compromising other nodes, tampering with data, and dropping or
an accuracy of 99.67%/99.56%, 97.01%/89.65%, 82.60%/74.17%, sending duplicated packets.

9
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

Fig. 6. Performance metrics achieved by LightGBM, LR, CompNB, NC, and Stacking classifiers for balanced and original imbalanced WSN-DS dataset.

This work proposes an integrated security framework to protect throughout the network’s lifetime. VBFT is the consensus mechanism
IoT sensor networks against insider cyber-attacks using blockchain and used to validate the transactions and reduce the transaction costs com-
ML. A permissionless blockchain network is deployed on BS and CH pared to PoW. The ML detection module identifies and classifies the
nodes. Each MN should belong to one cluster and be registered with detected malicious nodes. LightGBM is selected following a detailed
its current CH. The proposed framework consists of two distinct mod- comparative analysis conducted using accuracy, precision, recall, F1-
ules: blockchain prevention and ML detection. The blockchain preven- score, processing time, training time, prediction time, computational
complexity, and MCC evaluation metrics.
tion module has two key mechanisms: identity management and trust
management. Each mechanism is associated with an SC that is imple-
CRediT authorship contribution statement
mented using Solidity and Remix IDE integrated with Ethereum wallet
using Metamask. The identity management mechanism is responsible Shereen Ismail: Investigation, Methodology, Writing – orginal
for verifying and registering the nodes on the blockchain network, draft. Muhammad Nouman: Investigation, Methodology, Writing –
while the trust management mechanism evaluates the trustworthiness orginal draft. Diana W. Dawoud: Investigation, Methodology, Writing
of each sensor node and helps track the historical behavior of the nodes – orginal draft. Hassan Reza: Supervision, Writing – review editing.

10
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

Table 4
Evaluation metrics comparison for LightGBM, LR, CompNB, NC, Stacking for different training ratios.

Training ratio ML classifier Accuracy Precision Recall F1-score Processing time Training time Prediction time

LightGBM 99.54 100 100 100 115.17 67.707 13.659


LR 89.58 91 90 89 65.33 45.545 0.102
60% CompNB 74.16 77 74 72 6.14 4.809 0.077
NC 73.71 75 74 72 1.52 1.293 0.078
Stacking 96.28 96 96 96 445.146 268.106 13.502

LightGBM 99.56 100 100 100 133.8 68.5 10.53


LR 89.65 92 90 89 74.97 51.63 0.076
70% CompNB 74.17 77 74 72 6.19 7.07 0.06
NC 73.67 75 74 72 1.70 1.565 0.058
Stacking 96.05 96 96 96 523.8 314.041 12.197

LightGBM 99.6 100 100 100 137.99 72.92 7.075


LR 89.58 91 90 89 82.83 59.629 0.051
80% CompNB 74.02 77 74 72 7.724 8.734 0.039
NC 73.6 75 74 72 1.889 1.751 0.037
Stacking 95.88 96 96 96 560.20 350.313 7.995

Electron. Comput. Eng. 10 (1-7) (2018) 17–21, https://jtec.utem.edu.my/jtec/


article/view/3589.
[5] M. Sharma, A. Tandon, S. Narayan, et al., Classification and analysis of security
attacks in WSNs and IEEE 802.15.4 standards: a survey, in: 2017 3rd International
Conference on Advances in Computing, Communication & Automation (ICACCA)
(Fall), IEEE, 2017, pp. 1–5, https://doi.org/10.1109/ICACCAF.2017.8344727.
[6] R. Agrawal, P. Verma, R. Sonanis, et al., Continuous security in IoT using blockchain,
in: 2018 IEEE International Conference on Acoustics, Speech and Signal Process-
ing (ICASSP), IEEE, 2018, pp. 6423–6427, https://doi.org/10.1109/ICASSP.2018.
8462513.
[7] L.D. Xu, Y. Lu, L. Li, Embedding blockchain technology into IoT for security: a
survey, IEEE Int. Things J. 8 (13) (2021) 10452–10473, https://doi.org/10.1109/
JIOT.2021.3060508.
[8] C. Zhang, C. Hu, T. Wu, et al., Achieving efficient and privacy-preserving neural net-
work training and prediction in cloud environments, IEEE Trans. Dependable Secure
Comput. 20 (5) (2022) 4245–4257, https://doi.org/10.1109/TDSC.2022.3208706.
[9] C. Zhang, M. Zhao, L. Zhu, et al., Fruit: a blockchain-based efficient and privacy-
preserving quality-aware incentive scheme, IEEE J. Sel. Areas Commun. 40 (12)
Fig. 7. Processing time for LightGBM, LR, CompNB, NC, and Stacking classifiers (2022) 3343–3357, https://doi.org/10.1109/JSAC.2022.3213341.
for imbalanced and balanced WSN-DS dataset. [10] S. Ismail, H. Reza, Evaluation of Naïve Bayesian algorithms for cyber-attacks detec-
tion in wireless sensor networks, in: 2022 IEEE World AI IoT Congress (AIIoT), IEEE,
2022, pp. 283–289, https://doi.org/10.1109/AIIoT54504.2022.9817298.
Table 5
[11] H. Xu, L. Zhang, Y. Liu, et al., Raft based wireless blockchain networks in the
Computational complexity for LightGBM, LR, CompNB, NC, and Stacking.
presence of malicious jamming, IEEE Wirel. Commun. Lett. 9 (6) (2020) 817–821,
ML algorithm Training complexity Prediction complexity https://doi.org/10.1109/LWC.2020.2971469.
[12] R. Xu, Y. Chen, E. Blasch, et al., BlendCAC: a blockchain-enabled decentralized
LightGBM 𝑂(𝑛𝑑 log 𝑛) 𝑂(𝑚𝑡 log 𝑚) capability-based access control for IoTs, in: 2018 IEEE International Conference
LR 𝑂(𝑛𝑑) 𝑂(𝑚𝑑) on Internet of Things (iThings) and IEEE Green Computing and Communications
CompNB 𝑂(𝑛𝑑) 𝑂(𝑚𝑑) (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE
NC 𝑂(𝑘𝑑) 𝑂(𝑚𝑑) Smart Data (SmartData), IEEE, 2018, pp. 1027–1034, https://doi.org/10.1109/
Stacking 𝑂(𝑛𝑑(𝑡 + 3)) 𝑂(𝑚𝑑(𝑡 + 3)) Cybermatics_2018.2018.00191.
Note: n is the number of samples available for training, t is the number of [13] A.A. Khalil, J. Franco, I. Parvez, et al., A literature review on blockchain-
enabled security and operation of cyber-physical systems, in: 2022 IEEE 46th An-
boosting rounds for LightGBM, m is the number of samples to be predicted, d is
nual Computers, Software, and Applications Conference (COMPSAC), IEEE, 2022,
the number of features, and k is the number of classes for NC. pp. 1774–1779, https://doi.org/10.1109/COMPSAC54236.2022.00282.
[14] Z. Cui, F. Xue, S. Zhang, et al., A hybrid blockChain-based identity authentication
scheme for multi-WSN, IEEE Trans. Serv. Comput. 13 (2) (2020) 241–251, https://
Declaration of competing interest doi.org/10.1109/TSC.2020.2964537.
[15] S. Pundir, M. Wazid, D.P. Singh, et al., Intrusion detection protocols in wireless
The authors declare that they have no known competing financial sensor networks integrated to Internet of things deployment: survey and future chal-
lenges, IEEE Access 8 (2019) 3343–3363, https://doi.org/10.1109/ACCESS.2019.
interests or personal relationships that could have appeared to influence
2962829.
the work reported in this paper. [16] M. Dener, S. Al, A. Orman, STLGBM-DDS: an efficient data balanced DoS detection
system for wireless sensor networks on big data environment, IEEE Access 10 (2022)
References 92931–92945, https://doi.org/10.1109/ACCESS.2022.3202807.
[17] N.P. Sable, V.U. Rathod, Rethinking blockchain and machine learning for resource-
[1] B.K. Mohanta, D. Jena, U. Satapathy, et al., Survey on IoT security: challenges and constrained WSN, in: A. Neustein, P.N. Mahalle, P. Joshi, et al. (Eds.), AI, IoT, Big
solution using machine learning, artificial intelligence and blockchain technology, Data and Cloud Computing for Industry 4.0, in: Signals and Communication Tech-
Int. Things. 11 (2020) 100227, https://doi.org/10.1016/j.iot.2020.100227. nology, Springer, Cham, 2023, pp. 303–318, https://doi.org/10.1007/978-3-031-
[2] S. Ismail, D.W. Dawoud, H. Reza, Securing wireless sensor networks using machine 29713-7_17.
learning and blockchain: a review, Future Internet. 15 (6) (2023), https://doi.org/ [18] G.G. Gebremariam, J. Panda, S. Indu, et al., Blockchain-based secure localization
10.3390/fi15060200. against malicious nodes in IoT-based wireless sensor networks using federated learn-
[3] J. Marchang, G. Ibbotson, P. Wheway, Will blockchain technology become a reality ing, Wirel. Commun. Mob. Comput. 2023 (2023), https://doi.org/10.1155/2023/
in sensor networks?, in: 2019 Wireless Days (WD), IEEE, 2019, pp. 1–4, https:// 8068038.
doi.org/10.1109/WD.2019.8734268. [19] M. Mamdouh, A.I. Awad, A.A. Khalaf, et al., Authentication and identity manage-
[4] M. Burhanuddin, A.A.-J. Mohammed, R. Ismail, et al., A review on security chal- ment of IoHT devices: achievements, challenges, and future directions, Comput.
lenges and features in wireless sensor networks: Iot perspective, J. Telecommun. Secur. 111 (2021) 102491, https://doi.org/10.1016/j.cose.2021.102491.

11
S. Ismail, M. Nouman, D.W. Dawoud et al. Blockchain: Research and Applications 5 (2024) 100174

[20] J. Yang, S. He, Y. Xu, et al., A trusted routing scheme using blockchain and reinforce- [30] K. Gilani, E. Bertin, J. Hatin, et al., A survey on blockchain-based identity man-
ment learning for wireless sensor networks, Sensors (Switzerland) 19 (4) (2019), agement and decentralized privacy for personal data, in: 2020 2nd Conference on
https://doi.org/10.3390/s19040970. Blockchain Research & Applications for Innovative Networks and Services (BRAINS),
[21] M. Revanesh, V. Sridhar, A trusted distributed routing scheme for wireless sen- IEEE, 2020, pp. 97–101, https://doi.org/10.1109/BRAINS49436.2020.9223312.
sor networks using blockchain and meta-heuristics-based deep learning technique, [31] S. Ismail, D.W. Dawoud, T. Al-Zyoud, et al., Towards blockchain-based adaptive
Trans. Emerg. Telecommun. Technol. 32 (9) (2021) e4259, https://doi.org/10. trust management in wireless sensor networks, in: 2023 IEEE International Confer-
1002/ett.4259. ence on Electro Information Technology (eIT), IEEE, 2023, pp. 163–168, https://
[22] I.A. Abd El-Moghith, S.M. Darwish, Towards designing a trusted routing scheme in doi.org/10.1109/eIT57321.2023.10187278.
wireless sensor networks: a new deep blockchain approach, IEEE Access 9 (2021) [32] X. Fu, H. Wang, P. Shi, A survey of blockchain consensus algorithms: mechanism,
103822–103834, https://doi.org/10.1109/ACCESS.2021.3098933. design and applications, Sci. China Inf. Sci. 64 (2021) 1–15, https://doi.org/10.
[23] S. Rajasoundaran, S. Kumar, M. Selvi, et al., Machine learning based volatile 1007/s11432-019-2790-1.
block chain construction for secure routing in decentralized military sensor net- [33] I. Almomani, B. Al-Kasasbeh, M. Al-Akhras, WSN-DS: a dataset for intrusion detec-
works, Wirel. Netw. 27 (7) (2021) 4513–4534, https://doi.org/10.1007/s11276- tion systems in wireless sensor networks, J. Sens. 2016 (2016), https://doi.org/10.
021-02748-2. 1155/2016/4731953.
[24] X. Yang, Y. Chen, X. Qian, et al., BCEAD: a blockchain-empowered ensemble [34] S. Ismail, Z. El Mrabet, H. Reza, An ensemble-based machine learning approach
anomaly detection for wireless sensor network via isolation forest, Secur. Commun. for cyber-attacks detection in wireless sensor networks, Appl. Sci. 13 (1) (2023),
Netw. 2021 (2021), https://doi.org/10.1155/2021/9430132. https://doi.org/10.3390/app13010030.
[25] M.B.E. Sajid, S. Ullah, N. Javaid, et al., Exploiting machine learning to detect ma- [35] G. Ke, Q. Meng, T. Finley, et al., LightGBM: a highly efficient gradient boosting
licious nodes in intelligent sensor-based systems using blockchain, Wirel. Commun. decision tree, Adv. Neural Inf. Process. Syst. 30 (2017).
Mob. Comput. 2022 (2022), https://doi.org/10.1155/2022/7386049. [36] C.-Y.J. Peng, K.L. Lee, G.M. Ingersoll, An introduction to logistic regression anal-
[26] S. Ismail, D. Dawoud, H. Reza, Towards a lightweight identity management and ysis and reporting, J. Educ. Res. 96 (1) (2002) 3–14, https://doi.org/10.1080/
secure authentication for IoT using blockchain, in: 2022 IEEE World AI IoT 00220670209598786.
Congress (AIIoT), IEEE, 2022, pp. 77–83, https://doi.org/10.1109/AIIoT54504. [37] J.D.M. Rennie, L. Shih, J. Teevan, et al., Tackling the poor assumptions of naive
2022.9817349. Bayes text classifiers, in: Proceedings of the Twentieth International Conference on
[27] M. Nouman, U. Qasim, H. Nasir, et al., Malicious node detection using machine International Conference on Machine Learning, Ser. ICML’03, AAAI Press, 2003,
learning and distributed data storage using blockchain in WSNs, IEEE Access 11 pp. 616–623.
(2023), https://doi.org/10.1109/ACCESS.2023.3236983. [38] M. Thulasidas, Nearest Centroid: a bridge between statistics and machine learning,
[28] M.M. Arifeen, A. Al Mamun, T. Ahmed, et al., A blockchain-based scheme for Sybil in: IEEE International Conference on Teaching, Assessment, and Learning for Engi-
attack detection in underwater wireless sensor networks, in: M.S. Kaiser, A. Bandy- neering, IEEE, 2020, pp. 9–16, https://doi.org/10.1109/TALE48869.2020.9368396.
opadhyay, M. Mahmud, et al. (Eds.), Proceedings of International Conference on [39] R. Sikora, O.H. Al-laymoun, A modified stacking ensemble machine learning algo-
Trends in Computational and Cognitive Engineering, Springer Singapore, Singapore, rithm using genetic algorithms, J. Int. Technol. Inf. Manag. 23 (1) (2014), https://
2021, pp. 467–476, https://doi.org/10.1007/978-981-33-4673-4_37. doi.org/10.58729/1941-6679.1061.
[29] S. Ismail, T.T. Khoei, R. Marsh, et al., A comparative study of machine learning
models for cyber-attacks detection in wireless sensor networks, in: 2021 IEEE 12th
Annual Ubiquitous Computing, Electronics & Mobile Communication Conference
(UEMCON), IEEE, 2021, pp. 313–318, https://doi.org/10.1109/UEMCON53757.
2021.9666581.

12

You might also like