Malicious Node Detection Using Machine Learning and Distributed Data Storage Using Blockchain in WSNs

The document presents a model for detecting malicious nodes in Wireless Sensor Networks (WSNs) using machine learning and blockchain technology. It employs a Histogram Gradient Boost classifier for node classification and utilizes the Interplanetary File System for secure data storage, achieving better performance compared to existing classifiers. The proposed model also implements Verifiable Byzantine Fault Tolerance for consensus, demonstrating improved efficiency in malicious node detection and data security.

Uploaded by

Jayaram B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

Malicious Node Detection Using Machine Learning and Distributed Data Storage Using Blockchain in WSNs

Uploaded by

Jayaram B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Received 22 December 2022, accepted 8 January 2023, date of publication 16 January 2023, date of current version 20 January 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3236983

Malicious Node Detection Using Machine

Learning and Distributed Data Storage
Using Blockchain in WSNs
MUHAMMAD NOUMAN1 , UMAR QASIM2 , HINA NASIR3,4 ,
ABDULLAH ALMASOUD 5 , (Member, IEEE), MUHAMMAD IMRAN 6, (Member, IEEE),
AND NADEEM JAVAID 1 , (Senior Member, IEEE)
1 Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
2 Department of Computer Science, University of Engineering and Technology Lahore (New Campus), Lahore 54000, Pakistan
3 School of Electronic & Electrical Engineering, Institute of Robotics, Autonomous Systems and Sensing, University of Leeds, LS2 9JT Leeds, U.K.
4 Department of Computer Science, Air University, Islamabad 44000, Pakistan
5 Department of Electrical Engineering, College of Engineering, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
6 Institute of Innovation, Science and Sustainability, Federation University, Brisbane, QLD 4000, Australia

Corresponding authors: Abdullah Almasoud ([email protected]) and Nadeem Javaid ([email protected])

This work was supported by the Deputyship for Research and Innovation, Ministry of Education, Saudi Arabia, under
Project IF-PSAU-2021/01/19156.

ABSTRACT In the proposed work, blockchain is implemented on the Base Stations (BSs) and Cluster
Heads (CHs) to register the nodes using their credentials and also to tackle various security issues. Moreover,
a Machine Learning (ML) classifier, termed as Histogram Gradient Boost (HGB), is employed on the BSs
to classify the nodes as malicious or legitimate. In case, the node is found to be malicious, its registration
is revoked from the network. Whereas, if a node is found to be legitimate, then its data is stored in an
Interplanetary File System (IPFS). IPFS stores the data in the form of chunks and generates hash for the
data, which is then stored in blockchain. In addition, Verifiable Byzantine Fault Tolerance (VBFT) is used
instead of Proof of Work (PoW) to perform consensus and validate transactions. Also, extensive simulations
are performed using the Wireless Sensor Network (WSN) dataset, referred as WSN-DS. The proposed model
is evaluated both on the original dataset and the balanced dataset. Furthermore, HGB is compared with other
existing classifiers, Adaptive Boost (AdaBoost), Gradient Boost (GB), Linear Discriminant Analysis (LDA),
Extreme Gradient Boost (XGB) and ridge, using different performance metrics like accuracy, precision,
recall, micro-F1 score and macro-F1 score. The performance evaluation of HGB shows that it outperforms
GB, AdaBoost, LDA, XGB and Ridge by 2-4%, 8-10%, 12-14%, 3-5% and 14-16%, respectively. Moreover,
the results with balanced dataset are better than those with original dataset. Also, VBFT performs 20-30%
better than PoW. Overall, the proposed model performs efficiently in terms of malicious node detection and
secure data storage.

INDEX TERMS Blockchain, histogram gradient boost, IPFS, malicious node detection, VBFT, WSN.

I. INTRODUCTION monitoring, etc., [1]. Sensor Nodes (SNs) are used to monitor
A Wireless Sensor Network (WSN), comprising thousands and gather environmental data. Besides, in crowd sensing
of nodes, is widely used in several applications like supply networks, SNs send massive amounts of the collected data
chain management, military surveillance, environmental to the nearby nodes and Cluster Heads (CHs). This process
decreases the cost of different types of equipment and con-
ventional methods for data collection. However, some nodes
The associate editor coordinating the review of this manuscript and do not participate in crowd sensing networks due to privacy
approving it for publication was Nitin Nitin . issues.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
6106 VOLUME 11, 2023
M. Nouman et al.: Malicious Node Detection Using ML and Distributed Data Storage Using Blockchain in WSNs

Moreover, in the absence of a security mechanism, WSNs Moreover, most of the researchers propose the Interplan-
become vulnerable to malicious nodes that modify the data etary File System (IPFS) for data storage, which was intro-
for their own interest. Furthermore, the SNs are resource duced by Juan Benet [14]. IPFS shares many of the same
constrained and do not perform efficient resource utilization. characteristics as blockchain. It uses a P2P, decentralized,
In addition, traditional methods are unable to detect malicious and distributed file storage system. Besides, IPFS nodes are
nodes. Whenever an attack is performed by a malicious node, the machines that execute the IPFS software to store and
the network is compromised, and malicious nodes perform retrieve files from the IPFS network. IPFS nodes use content
malicious activities that affect the entire network. To prevent addressing to store and retrieve the files. All IPFS nodes
the nodes from acting maliciously, many authors propose store the files in the form of chunks, similar to a BitTorrent
authentication schemes that allow only the authentic nodes network. There is no effect on the network if one node
to join the network [2]. However, the existing authentication fails. Furthermore, it uses two types of data structures to
schemes depend upon centralized entities, which are vulner- distribute the file. One is Distributed Hash Table (DHT),
able to cyber-attacks. and the second is Merkle Directed Acyclic Graph (DAG).
In WSNs, SNs are either randomly or statically deployed When nodes send a file to the IPFS for storage, then SHA256
depending upon network topology. SNs gather environmental algorithm is executed on the file, and the hash value for each
data and transfer it to their destination. However, some SNs stored file is generated. The hash value is called a content
do not store the location information because their topology identifier, which is used to retrieve the stored files from
is frequently changed, and the usage of a large number of the IPFS.
sensing nodes may cause network information congestion.
To solve this issue, a WSN is split into sub-networks that CHs A. MOTIVATION AND CONTRIBUTIONS
manage. CHs get data from SNs and send it to Base Stations The motivation of this work comes from the fact that in
(BSs) [3]. Moreover, SNs are resource constrained in terms WSNs, nodes are are randomly deployed. This random
of low storage and computational power. Also, SNs are prone deployment leads to various issues like loss of data, security
to different types of attacks and are easily compromised by risk, etc. In WSNs, data is collected from the surrounding
malicious nodes. Many researchers propose different tech- environment. WSNs are easily accessible, and any node can
niques to avoid malicious attacks and detect malicious nodes join them. As a result, malicious nodes enter the network and
[4], [5]. However, detection of the malicious node in WSNs perform malicious activities that affect the entire network.
depends on a third party, which can easily be compromised. The authors in [15] propose a centralized authentication
Therefore, blockchain is introduced to overcome the prob- mechanism that registers the nodes and protects confidential
lems associated with centralization and the involvement of node identification from an unauthorized node in WSNs.
third parties [6], [7], [8], [9], [10]. However, the centralized system causes the issue of a single
WSN nodes produce vast amount of data and store them point of failure. Moreover, SNs have resource constraints and
on a centralized system. However, security breaches and do not efficiently detect malicious behavior in the network.
failures might destabilize the WSNs. Therefore, a Peer-to- Also, malicious nodes can easily damage and compromise
Peer (P2P) network is proposed to overcome centralization the WSNs [16]. Furthermore, malicious nodes collect false
issues related to data storage [11]. In a P2P network, nodes data and deliver it to destination nodes where blockchain
directly transfer the data from the source to the destina- is deployed to store the data [17]. However, storing huge
tion without the assistance of a third party. With the rapid volumes of data in a blockchain is very expensive. In addition,
expansion of WSN nodes, P2P architecture faces security blockchain uses the PoW consensus mechanism for block
and privacy challenges. Therefore, blockchain technology is generation, which consumes a huge amount of computation
introduced to address the security issues of WSNs through a power during block generation [13]. Further motivation can
distributed, decentralized, and immutable ledger [12]. Once be taken from [18]. The results are provided in Section IV in
data is added to the blockchain, it will never be tampered the manuscript.
by any malicious party due to the distributive nature of The proposed model’s key contributions include the
the blockchain. Furthermore, the idea of integrating WSN following:
and blockchain has attracted much attention from the pub- • in a WSN, a blockchain based decentralized authenti-
lic. However, blockchain consumes a lot of computational cation mechanism is used to protect disclosure of node
resources, whereas, SNs have limited resources. Also, when identities by external nodes,
incorporating the new blockchain design into the WSNs, • for data storage in a WSN, IPFS is deployed that inte-
some other issues may arise. Besides, the Proof of Work grates blockchain technology. The cost of storing data in
(PoW) consensus mechanism is widely used in blockchain the blockchain is minimized when storing data on IPFS.
that effectively reduces the number of malicious nodes The data is stored in chunks in IPFS, and the hashes are
and verifies the transaction. However, the PoW consensus created that are recorded in the blockchain,
mechanism requires a large amount of computational power • the proposed blockchain based network uses the Ver-
to confirm a transaction and add it to the block in the ifiable Byzantine Fault Tolerance (VBFT) consen-
blockchain [13]. sus mechanism [19], which reduces the blockchain
VOLUME 11, 2023 6107
M. Nouman et al.: Malicious Node Detection Using ML and Distributed Data Storage Using Blockchain in WSNs

transaction cost and increases the throughput as com- the blockchain. If the IoT device is successfully registered
pared to the existing consensus mechanisms like PoW and authenticated, the activity is performed according to its
and capability. Similarly, users need to be authenticated in the
• the comparative analysis of the proposed classifier, i.e., blockchain network to be able to control and manage IoT
Histogram Gradient Boost (HGB), with Adaptive Boost devices. It restricts the malicious nodes from becoming a
(AdaBoost), Gradient Boost (GB), Extreme Gradient part of the network and stores all evidences on a blockchain.
Boost (XGB), Linear Discriminant Analysis (LDA), and In [26], the modified version of the station-to-station (STS)
ridge classifiers is performed. The analysis is done on protocol is presented. It first authenticates the user and then
the basis of numerous performance metrics, including establishes a secret exchange session key that ensures user
accuracy, precision, recall, micro F1-score, and macro anonymity inside a group.
F1-score. In [29], a blockchain based data structure model is used for
The remainder of the manuscript is organized as follows. malicious node detection. WSN nodes have limited memory
The related work is presented in Section II, while the problem and computational power, and are unable to detect mali-
statement and proposed system model are given in Section III cious nodes. Whenever an attack is performed on a node,
and Section IV, respectively. Section VI presents the out- it is compromised by a malicious node. In [30], the authors
comes of the simulations performed to verify the accuracy propose the three layered SDN architecture that monitors
of the proposed model. Section VI provides the feasibility and analyzes the traffic in the IoT environment. Another
of the proposed model. In Section 7, the conclusion of the pertinent point is that a blockchain is used for decentralized
manuscript is provided. attack detection. As a result, fog computing and mobile edge
computing provide attack detection, reducing the number
II. RELATED WORK of attacks that occur at the edge layer. In [31], a secure
In WSNs, SNs share information and communicate with each and privacy-preserving model is proposed for the smart city.
other. WSNs are easily accessible, and any node can join Three modules make up the proposed model. The first module
them. Malicious nodes acquire legitimate node identities, is trustworthiness, where authors use the blockchain among
which makes it easy for them to become part of the network. the IoT devices to maintain trust. The second module is two-
The authors propose a lightweight blockchain IoT authen- level privacy, where enhanced PoW is used in blockchain
tication scheme in [20]. This scheme ensures integrity and to achieve confidentiality and prevent the poisoning attack.
non-repudiation in the network. Whenever IoT nodes com- The third module is the intrusion detection system, which is
municate with one another, they must first authenticate each used for malicious node detection. XGB classifier is utilized
other, which is done using a lightweight blockchain. In [21], in the process of identifying malicious nodes. In [32], the
the authors develop a hybrid blockchain model for IoT nodes authors propose the secure privacy-preserving framework.
to prevent malicious or fake data packets from spreading The presented model has two major components: two-level of
throughout the network. Public and private blockchain make privacy and an intrusion detection mechanism. Blockchain is
up a hybrid blockchain. Between CHs and BSs, the pub- utilized in two-level privacy to securely transmit data among
lic blockchain is implemented, while the private blockchain IoT nodes. The two-level privacy uses principal component
is implemented between CHs and SNs. SNs are authenti- analysis (PCA) to transform data into a new form to protect
cated on CH using a smart contract, and CHs are authenti- it against inference attacks. The authors use gradient boost
cated on BS. In [22], blockchain and reinforcement learning anomaly detection (GBAD) for the intrusion detection system
based model is proposed for efficient and secure routing based on light gradient boost model (LGBM). GBAD is
in WSNs. The reinforcement learning algorithm selects the deployed in a smart city that can proficiently classify nor-
best possible routing path. It avoids the malicious routing mal and malicious observations. In [33], a blockchain based
links that might send data through compromised nodes, while automatic (AutoML) model is proposed for customer services
blockchain is used for node authentication and managing all to overcome the third parties’ challenges. IoT devices are
routing information. In [23], blockchain based key manage- used to collect data, and blockchain is used for secure data
ment is presented to tackle the issue of certificate-less key exchange in an open environment. Furthermore, AutoML is
management. The blockchain performs node authentication, designed to process data and reduce expert costs. In [34],
registration, and joining or quitting of nodes. In addition, the authors propose an ensemble learning technique that uses
it provides the mechanism for the detection of the compro- multiple ML techniques to classify data. The final classifica-
mised node. In [24], a data structure based on blockchain tion report is obtained based on all classifiers’ votes.
is used to hold nodes’ authentication and trust informa- In [35], secure routing with multi-layered IoT architec-
tion. Blockchain authentication consists of three aspects: ture is proposed, where light blockchain and cloud are used.
public keys, block mining, and mutual influence, while the Light blockchain is used for security and privacy, while the
blockchain trust model consists of two aspects: knowledge cloud is used for data storage. In [36], two different kinds of
based trust and trust evaluation. In [25], blockchain is used blockchain are used in a WSN: one for storing data and the
to overcome IoT issues. IoT devices register themselves on other for managing how users can access data. A verifiable