0% found this document useful (0 votes)
10 views12 pages

RL IoT

The document presents a reinforcement learning (RL)-based routing approach for cognitive radio-enabled Internet of Things (CR-IoT) communications aimed at improving data rates and minimizing routing delays. The proposed RL-IoT model integrates channel selection with routing decisions to enhance throughput and reduce packet collisions, outperforming existing routing mechanisms in simulations. The study highlights the challenges of dynamic spectrum allocation and the need for efficient routing protocols in CR-IoT environments.

Uploaded by

Sharda Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

RL IoT

The document presents a reinforcement learning (RL)-based routing approach for cognitive radio-enabled Internet of Things (CR-IoT) communications aimed at improving data rates and minimizing routing delays. The proposed RL-IoT model integrates channel selection with routing decisions to enhance throughput and reduce packet collisions, outperforming existing routing mechanisms in simulations. The study highlights the challenges of dynamic spectrum allocation and the need for efficient routing protocols in CR-IoT environments.

Uploaded by

Sharda Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1836 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO.

2, 15 JANUARY 2023

RL-IoT: Reinforcement Learning-Based Routing


Approach for Cognitive Radio-Enabled
IoT Communications
Tauqeer Safdar Malik , Kaleem Razzaq Malik , Ayesha Afzal , Muhammad Ibrar ,
Lei Wang , Member, IEEE, Houbing Song , Senior Member, IEEE, and Nadir Shah

Abstract—Internet of Things (IoT) devices are widely being I. I NTRODUCTION


used in various smart applications and being equipped with
ITH the advances in information and communication
cognitive radio (CR) capabilities for dynamic spectrum alloca-
tion. Our objectives in this work are to achieve higher data
rates and minimize end-to-end routing delays in CR-enabled IoT
W technologies, the Internet of Things (IoT) has emerged
as one of the key enabling paradigms for smart applications in
communication in order to maximize throughput. We propose a healthcare, smart cities, and smart homes [1]. IoT is a network
reinforcement learning (RL)-based routing approach in the cogni- of physical devices having embedded sensors, and software
tive radio network (CRN)-based IoT environment. The idea is to
add the channel selection decision capability to the network layer to enable device-to-device communication [2]. The continu-
in order to minimize packet collisions as well as end-to-end delay ous evolution of IoT in the fields, such as machine learn-
(EED). We perform a comprehensive performance evaluation of ing, embedded systems, wireless networks, sensor networks,
the proposed RL-IoT routing mechanism by simulating the cogni- automation, and others bring forth several concerns related to
tive radio-enabled Internet of Things (CR-IoT) communication
environment in the cognitive radio cognitive network (CRCN)
the Quality-of-Service (QoS), security, and privacy. Over 30
simulator and comparing the network performance achieved by billion networked devices are expected by the year 2023 as
our proposed mechanism with that of the recent AODV-based compared to 2 billion devices in year 2008 [3]. While this
routing mechanism for IoT (AODV-IoT), ELD-CRN, and SpEED- is a remarkable increase, it also poses a critical problem of
IoT routing approaches. Our evaluation results show that the frequency spectrum scarcity. A relatively new but widely rec-
RL-IoT model performs better than existing approaches in terms
of average data rate, throughput, packet collision, and EED. ognized and adopted alternative approach known as cognitive
radio network (CRN) has proved to be helpful in realizing
Index Terms—Channel selection, cognitive radio (CR)-based
intelligent next-generation networks for improved spectrum
Internet of Things (IoT), cognitive radio networks (CRNs),
dynamic spectrum access (DSA), IoT, reinforcement learning allocation and utilization for the IoT devices [4]. The main idea
(RL). behind cognitive radio-enabled Internet of Things (CR-IoT)
devices is to exploit the spectrum holes that are not currently in
use by the primary devices and dynamically perform spectrum
allocation for the secondary cognitive IoT devices in order
for them to opportunistically avail these free holes for their
transmission [3], [4].
Manuscript received 3 November 2021; revised 4 June 2022; accepted 21 The dynamic spectrum access (DSA) environment for CR-
September 2022. Date of publication 29 September 2022; date of current IoT is depicted in Fig. 1 where licensed primary user devices
version 6 January 2023. The work of Lei Wang was supported in part by
the National Nature Science Foundation of China under Grant 62027826 and called PUDs (e.g., 5G user devices) as well as various unli-
Grant 61902052; in part by the Science and Technology Major Industrial censed secondary user devices called SUDs (cognitive nodes,
Project of Liaoning Province under Grant 2020JH1/10100013; in part by the e.g., IoT devices equipped with CR capabilities) co-exist [5].
Dalian Science and Technology Innovation Fund under Grant 2019J11CY004
and Grant 2020JJ26GX037; and in part by the Fundamental Research PUDs in the CR-IoT environment are authorized to use
Funds for the Central Universities under Grant DUT20ZD210 and Grant licensed bands and connected with each other through one or
DUT20TD107. (Corresponding authors: Kaleem Razzaq Malik; Lei Wang.) more base stations forming the primary network. SUDs include
Tauqeer Safdar Malik, Kaleem Razzaq Malik, and Ayesha Afzal
are with the Department of Computer Science, Air University, Multan the unlicensed IoT devices which follow an overlay spectrum
60000, Pakistan (e-mail: [email protected]; [email protected]; sharing model, i.e., using their cognitive abilities, they can
[email protected]).
Muhammad Ibrar and Lei Wang are with the School of Software,
sense and opportunistically access free channels which are
Dalian University of Science and Technology, Dalian 116024, China not utilized by the PUDs at a given instant of time [6], [7].
(e-mail: [email protected]; [email protected]). In an infrastructure-less secondary network, SUDs communi-
Houbing Song is with the Department of Information Systems,
University of Maryland, Baltimore County, Baltimore, MD 21250 USA
cate directly with each other without a central base station and
(e-mail: [email protected]). employ cooperation schemes to share channel information for
Nadir Shah is with the Department of Computer Science, COMSATS spectrum access purposes.
University Islamabad (Wah Campus), Rawalpindi 47010, Pakistan (e-mail:
[email protected]). In the environment under consideration, the dynamic
Digital Object Identifier 10.1109/JIOT.2022.3210703 network topology of the CR-IoT, and frequent changes in
2327-4662 
c 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
MALIK et al.: RL-IoT: RL-BASED ROUTING APPROACH FOR COGNITIVE RADIO-ENABLED IoT COMMUNICATIONS 1837

evaluation of our proposed approach in comparison with exist-


ing approaches. Section V finally concludes this article and
provides future work directions.

II. L ITERATURE R EVIEW


The use of standard routing protocols in CRNs with-
out taking into account the issues inherent in a dynamic
spectrum environment may result in increased chances of
packet loss and decline in data rate and throughput [12]. As
spectrum availability restricts the CRN environment, route
maintenance can help resolve issues related to best route
selection, repeated route-induced delays, cross-layer commu-
nication/coordination, deafness in control messaging, etc. [13].
Furthermore, it is important to address the issues inherent
in wireless radio networks including, radio channel short-
ages, interference issues, limited battery life of devices, device
Fig. 1. DSA environment for CR-IoT. mobility, and issues related to channel characteristics, such
as the signal-to-noise ratio and error rates [14]. CR-enabled
IoT environment requires addressing additional challenges as
spectrum availability is the key challenge that affects the IoT applications typically have higher QoS requirements. This
connectivity and overall transmission rate and throughput of requires developing routing protocols for the CR-IoT environ-
SUDs [2]. An SUD can only utilize vacant channels cur- ment addressing issues related to throughput, delay, packet
rently not utilized by any PUD for their transmission and loss, and data rate which arise due to device and spectrum
always respects the priority of PUDs over channel utilization. mobility, and uncertainty of PUD activity [13], [15]. Below
Whenever a PUD returns to its channel to continue its trans- we discuss some of the related works on routing approaches
missions, the SUD has to pre-empt that channel and select in such an environment.
another available channel in the spectrum. The channel selec- Zhou et al. [16] proposed the JRCA method for routing
tion decisions involve spectrum sensing and selection, which and assignment of channel with an objective to minimize
are made at the MAC layer in the current CR-IoT network delay in multichannel multiflow CRNs. JRCA relies on delay
architecture [8]. Whereas, the routing decisions for the SUD’s prediction for transmission and media access for detecting
end-to-end transmission are made in the network layer. Hence, channel interference among different PUDs and SUDs. SUD
if a channel switching event occurs during the transmission of channels are determined and changed based on the mobility
a SUD, then the switching delay for the channel selection deci- and appearance patterns of PUDs. JRCA identifies routes with
sion can cause degradation in the SUD’s QoS. While higher minimum delay and performs channel assignment.
average data rate and throughput are essential to meet the Mansoor et al. [17] proposed the RARE protocol for CRNs
QoS requirements in CR-IoT communication, channel switch- for the MAC layer. RARE partitions the network into clusters
ing events are very frequent in the CR-IoT environment due to comprising of free common channels to support smooth transi-
dynamic spectrum allocation and drastically affect the average tioning of devices across channels. RARE considers delay as a
data rate and throughput in end-to-end routing. In this work, routing metric and relies on cross-layer information exchange
our main contributions are listed as follows. among the network and MAC layer, for selection of stable
1) We propose a novel reinforcement learning (RL)-based paths while ensuring efficient data delivery from source to
routing approach for CR-IoT communications that inte- destination devices.
grates the channel selection decisions with routing deci- Akter and Mansoor [18] proposed a reactive routing pro-
sions in the network layer for improved average data tocol for CR-enabled vehicular ad hoc networks. The pro-
rate and throughput. tocol selects a stable transmission path using a Next-Hop
2) We perform a comprehensive performance evaluation Determination Factor that takes into account the dynamic
of the proposed routing mechanism by simulating the behavior of the network, i.e., mobility of the vehicles and
CR-IoT communication environment in cognitive radio changes in spectrum availability. The protocol is spectrum-
cognitive network (CRCN) simulator and compare the aware in the sense that it also considers the number of shared
network performance achieved by our proposed mech- channels and their quality.
anism with that of the recent AODV-based routing Kannan and Jeetha [19] proposed the neighbor node dis-
mechanism for IoT (AODV-IoT) routing [9], ELD-CRN covery routing scheme which calculates the various types of
routing [10], and SpEED-IoT routing [11] approaches. delays, such as channel switching and MAC layer back-off
The remainder of this article is organized as follows. delays. This research assumes the sensing operation at the
In Section II, we present a review of related approaches. MAC layer for searching a new available channel and PUD
Section III provides our detailed approach for RL-based activities. It is devised on top of ad hoc on-demand distance
routing for CR-IoT. Section IV provides the performance vector (AODV) routing protocol-based routing mechanism.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
1838 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 2, 15 JANUARY 2023

TABLE I
Route request (RREQ) and route reply (RREP) messages are C OMPARISON OF R ELATED W ORKS U SING L EARNING -BASED M ETHODS
used without any modifications. In the case of channel switch-
ing due to unexpected arrival of PUD, it needs to update the
routing table by the help of MAC layer sensing operation
which lemmatize the data rate as per application require-
ment. In another work, Wang et al. [20] used routing metrics,
such as channel switching delay and the probability of chan-
nel availability on the basis of exponential distribution of
ON/OFF duration to increase data rate and minimize end-to-
end delay (EED) by fixing the static PUD’s activities. Hence,
the probability of channel availability is calculated without
user interference which can increase packet loss due to the
unexpected arrival of PUD. Iqbal et al. suggested in [21] a
resource allocation mechanism for critical links in IoT com-
munication, however the proper issue of routing especially
for the users with different parameters are not considered.
Majeed et al. [22] presented an energy-aware deployment
for IoT-enabled cellular networks. Qureshi and Aldeen [23]
highlighted in their survey some new challenges for various
applications based on IoT communications for future genera-
tion networks. The authors identified the proper QoS solutions
for varying routing parameters especially for environments
evolving w.r.t. channel and network topology.
Learning-based approaches have drawn significant attention
by researchers to address a variety of issues in communication
networks [24], [25], [26], [27], [28]. Mao et al. [24] presented
a solution for routing in software-defined networks on the
basis of convolution neural networks (CNNs) for the period-
ical learning process of network experiences. The proposed
approach incures high costs in terms of computation and stor-
age and hence, not feasible for IoT devices. Another routing
solution employing multiagent deep reinforcement learning
(MADRL) and real-time Markov decision process (RTMDP)
is proposed in [29] to control the network congestion and
network resources for mobile networks. This proposed routing
protocol is designed for mobile networks in which the par-
ticipating nodes do not communicate with each other. It does
not apply to environments with frequently evolving topologies
such as CRN-based environment due to the retraining require-
ment of MADRL. In an earlier work, Mao et al. [30] discussed
the concerns in nonsupervised deep learning-based solution
for software-defined networks, such as network resource allo-
cation, centralized routing, and traffic control on different
layers. contention on its channels between SUDs in seeking to uti-
Yang et al. [31] proposed a routing protocol employing lize the available channel. This problem becomes worse when
global optimization to address QoS issues in CR-enabled one route between two SUDs can be affected by different pri-
advanced metering infrastructure networks. Specifically, mary users (PUDs). There are some solutions presented to
Yang et al. employed the ant colony optimization algorithm for manage the various resources of CR-IoT communication as
optimal routing. The proposed algorithm supports PUD protec- shown in Table I and still there are some issues in which we
tion while satisfying utility needs of the SUDs. Du et al. [25] can use the various machine learning techniques for further
addressed the EED and power efficiency problems jointly improvement especially in routing and security issues. Hence,
in CRNs by proposing a cross-layer routing protocol using the channel selection decisions need to be included in the rout-
quasi-cooperative multiagent learning. ing parameters to decide the best end-to-end route during the
In CR-IoT routing, the time-varying availability of the chan- whole transmission of IoT communication. These decisions
nel due to the unexpected arrival of PUD, decreases the are directly related to the QoS quantitative parameters such as
average data rate due to the increased packet collisions and packet collision, which must be minimized during the forma-
hence, resulted in a decrements in overall throughput during tion of the end-to-end routing for CRN-based IoT. Therefore,
the routing. The unexpected arrival of the PUD causes the in the CRN-based IoT the routing needs to search the available

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
MALIK et al.: RL-IoT: RL-BASED ROUTING APPROACH FOR COGNITIVE RADIO-ENABLED IoT COMMUNICATIONS 1839

Fig. 2. Proposed RL-based routing mechanism for CR-IoT.

channels which can be selected for its transmission during end-


Fig. 3. Flow diagram covering routing framework for Algorithms 1 and 2.
to-end routing to minimize packet collision and maximize the
average data rate as well as the overall throughput. In the real
world, efficient spectrum utilization through the use of DSA is SUD. Essentially, the learning block observes the user activi-
very important in formulating the throughput of the economic ties on different channels and maintains a history of channel
network model especially in the case of IoT communication. selection in prior routing decisions. The learned knowledge
Currently, by allowing the IEEE 802.22 standard to utilize is then utilized by the decision block for channel selection
DSA for cellular networks and TV broadcast networks without decision when a new RREQ is received or there is a need to
degrading the QoS for their networks has made it challenging alter an existing route. The decision to switch to any avail-
to improve the CR-IoT routing performance. Therefore, the able channel or to change an existing route cannot be taken
RL-based routing protocol is very important for better routing just by consulting the regular routing table information which
decisions to higher the average data rate as well as to improve does not include any channel history during an SUD’s end-
the overall throughput of the CR-IoT network. to-end transmission. The QoS parameters of a channel, such
as error control information, prior channel history, etc. along
with the traffic type of the application are considered in the
channel selection decision with an objective to improve the
III. M ETHODOLOGY overall QoS in terms of average data rate and the throughput
Fig. 2 shows our proposed RL-based routing approach for of end-to-end transmission. The algorithms of route and chan-
CR-IoT devices. The proposed approach is built on the exist- nel selection in RREQ and RREP during the routing in the
ing CR-enabled protocol stack with channel selection decision network layer is shown as Algorithms 1 and 2. We presented
capability added to the network layer for end-to-end rout- a cross-layer approach in which we have proposed to incorpo-
ing decisions. We employ RL-based approaches to minimize rate the channel selection as routing parameter at the time of
channel switching for SUDs (cognitive-enabled IoT devices) routing in the network layer. The proposed routing mechanism
caused by unexpected arrival of PUDs. is not just offering a channel selection using the RL technique
Essentially, our efficient routing approach is supported by rather it is incorporating into routing decisions as presented
including the channel state information (channel ID, chan- and modified the RREQ and RREP messages in Algorithms 1
nel transmission rate, next hop, and all available routes from and 2. The channel information should be exchanged through
source to destination for each link) in the routing table. The RREQ, RREP, and route error (RERR) packets within SUs as
routing table contains the multiple paths options from avail- well as PUs in the presence of DSA and spectrum mobility.
able channels from the entire route. The choice of multiple The cooperative approach at the MAC layer is used to pass the
paths and channels is maintained in the routing table. If a spectrum-sensing information to the network layer to detect
PUD arrives in its channel, the SUD has to switch from the PUs’ activities. Moreover, SUs should be able to accomplish
current channel to another free channel based on the prere- channel selection from the spectrum mobility provided by the
corded history of routing choices maintained by the learning CR environment without causing excessive overhead for route
module. formation. One advantage of this strategy is the handling of
The network layer of each SUD is equipped with a learn- routing loops through the route maintenance process for han-
ing agent. The learning and decision blocks in the learning dling PUs’ activities. The route maintenance, due to the user
agent are the core components in our proposed routing mobility and wireless propagation is handled with the tradi-
mechanism that are responsible for efficiently managing the tional RERR message. The overall routing flow diagram to
unexpected arrival of a PUD on a channel occupied by the elaborate the routing process is presented in Fig. 3.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
1840 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 2, 15 JANUARY 2023

Algorithm 1: Route and Channel Selection in RREQ Algorithm 2: Route and Channel Selection in RREP
1 Function Route_Channel_Selection_RREQ() 1 Function Rout_Channel_Selection_RREP()
2 Received RREQ by CRN-IoT node N from CRN-IoT 2 CRN-IoT node N receives RREP from node M
node M through Channel = [i] if Channel i is free through channel i
from PUD then 3 if Channel i is free from PU then
3 Channel selection using exploitation and 4 Add channel in available channel list in through
exploration learning exploitation and exploration learning
4 if first RREQ for node N then 5 if first RREP for N then
5 create a route using channel i and broadcast 6 create a route using channel i and send RREP
RREQ to SUDs for free channels from PUD to SUDs those exist in that route
6 else 7 else
7 if extra RREQ but on different channel then 8 if the additional RREP from M but on
8 create a route from that channel different channel then
9 else 9 create a route and send RREP from that
10 if new RREQ then channel
11 update a route through channel i 10 else
11 if it is the new RREP then
12 else 12 update route from channel i
13 #channel i is not free from PUD
14 Route selection through best available channel 13 else
using reinforcement learning (LAC) 14 #channel i is not free from PUD
15 if node N receives multiple routes from channel i 15 Multiple routes are selected from the list of
then available channels
16 if N == destination and first hop node of 16 if N receives RREP from multiple routes then
RREQ! = stored first hop node in routing 17 if N == source node and first hop node of
table and Y! = next hop node in routing RREP! = stored first hop node in routing
table and HOP_RREQmin HOP then table and M! = next hop node in routing
17 create a route from channel i table and HOP_RREPmin HOP then
18 else 18 create a route from channel i
19 discard the RREQ
19 N discards the RREP
20 if N has a route for destination then
21 send RREQ to M
22 else
23 discards the RREQ so that it can be saved in the learning block. The successful
strategy of any user is selected by searching the Nash equilib-
rium point (NEP) for its transmission. It happens only if the
following (1) is satisfied [33]:
A. Preliminaries and Mathematical Notation
Ui (S) ≥ Ui (sa , s−a ) ∀i ∈ T, sa ∈ Si (1)
This section presents the formal modeling to observe the
effectiveness of RL algorithms for channel selection during where sa is the strategy of user i for action a and s−a is the
routing. We present the mathematical model of a noncooper- strategy of user i according to the action of its opponent. Once
ative game which avoids the centralized channel management the strategy (S) of every user (SUD) is chosen based on (1),
was adopted [33]. The noncooperative game is defined as then no user can benefit by changing their strategies while
τ = {T, {Si }i∈T } where T is the set of SUDs, and Si = the other players keep theirs unchanged. As previously men-
{s1 , s2 , . . . , sC } is the set of strategies for user i ∈ T with C tioned, three RL algorithms are applied for selecting the best
vacant channels. The SUDs coexist with PUDs over the same available channel for transmission such that the packet colli-
network and can access only one single vacant channel from a sion can be minimized in case of unexpected arrival of PUDs.
PUD at a time. This work focuses on the channel transmission These RL algorithms include, No-External Regret learning,
rate (Rtr ) parameter, for the channel selection purpose in the Q-learning, and Learning Automata. The following sections
SUD. According to the noncooperative game rule, every user describe these learning algorithms with respect to the channel
i selects the strategy for transmission on the basis of a utility selection decision.
function Ui : Si → T against its opponents S−i . Every user
participating in this game can select and update its strategy
profile at any point of time as: S = [s1 , s2 , . . . , sT ] but must B. No-External Regret Learning-Based Channel Selection
follow the rule of the game. The strategy profile is created for No-External Regret Learning means when the channel selec-
every SUD in the selection of its channel for the transmission tion decision for routing is updated by the exploitation of

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
MALIK et al.: RL-IoT: RL-BASED ROUTING APPROACH FOR COGNITIVE RADIO-ENABLED IoT COMMUNICATIONS 1841

previous channel selection decisions. In this learning tech- converge to the NEP. Secondary users (SUDs) maintain a Q-
nique, every SUD can reserve/save the channel availability table, and the values of the table are updated using (3) based
information for a specific time period and calculate future on the actions selected for rewards or punishments [35]. In the
selection of channel using (1) based on past channel utiliza- current study, the action for channel selection is based on the
tion [34]. Regrets are observed in this learning from past -greedy exploration. This mechanism selects a random action
channel utilization experiences of bad channel selection for with probability  and the best action on the basis of the high-
routing. Therefore, this technique is used to minimize the est Q-value at the moment with probability 1 − . This can be
channel switching regrets and to reduce routing table size for seen as defining a probability vector over the action set of the
any channel selection during routing [34] agent for each state. For example, if x = {x1 , x2 , . . . , xj } is the
set of actions for one of these vectors, then the probability xi
(1 + α)Ui (sa )
t
of playing action i is given by [36]
pi t+1
(sa ) = (2)
s´a ∈Si (1 + α)Ui (s´a ) 
t

(1 − ) + No. of actions in Set , if Q of i is the highest
xi = 
t j t  No. of actions in set , otherwise.
where in (2), Uit (Sa ) = j=1 Ui (Sa ) and Ui (S a ) =
t j  (4)
j=1 Ui (S a ) are used for the complete time span of t;
(t+1)
pi (sa )) is used for the probability allocated to strategy sa The Q-learning algorithm chooses a channel on the basis
at time period t + 1, while α > 0. Here, α indicates the learn- of -greedy exploration which has a maximum value of Q in
ing rate and determines to what extent the newly acquired the Q-table maintained through Q-learning. Users can start
information will override the old information. In practice, the exploration using very a low value of Q and updated
mostly a constant learning rate is used as α = 0.1. after each successful packet transmission using (4) [36]. The
In the case of the selected channel ID is greater than the Q-learning model for the implementation of the multiagent is
available vacant channels, the probability of selecting that shown below where two agents are composed with two actions
channel is calculated using (2) and updated the routing table. each and within a single state [36]
Therefore, it is categorized as exploitation learning through  
No-External Regret Learning. In the case of unavailability of a a12 b11 b12
A = 11 B=
available channels from the existing channel list, the explo- a21 a22 b21 b22
ration learning through Q-learning is used which is discussed
in the next section. where A shows the rewards for the first agent and B shows
the rewards for the second agent. For this multiagent model,
the Q-learning update rule can be simplified as follows [36]:
C. Q-Learning-Based Channel Selection
 
Q-learning is a popular exploration learning algorithm Qai = Qai + α rai − Qai (5)
which is based on the value-iteration model-free technique
with a computational requirement to empower the SUDs to where Qai represents the Q-value of agent a for action i for
learn the mapping of environment states into actions for the the reward rai that agent a is receiving for executing action i
maximum numerical reward. It is mathematically assembled and α is the learning rate.
as (S, A, T, R) where S denotes a discrete set of environment The Q-value for the user si for an action a is initialized as
states; A denotes a set of actions; T denotes a state transition 0 so that the exploration of finding a channel with maximum
function of ON and OFF as S → [0, 1]; and R is a reward reward is searched. The average reward value is calculated for
function S → R. The user get a reward through the learning every channel and the channel reward of every user is com-
agent from the environment which indicates its state s, and pared against its opponents as sbj for an action b at time t
selects an action a, for channel selection in case of routing and t − 1. The channel reward is calculated through RL algo-
decisions. It changes the state of the environment and gener- rithms and compared against its opponents through -greedy
ating a reinforcement signal once the action is performed r. exploration so that the selected channel does not belong to a
The quality of the decision is dependent on this signal to main- same spectrum. The reward value for the action is assigned and
tain the corresponding Q(s, a) rewards. The Q-rewards/values updated through the Learning Automata which is discussed in
are updated as follows [35]: the next section.
   
Q(s, a) = Q(s, a) + α r(s, a) + γ maxQ ś, á − Q(s, a) (3)
D. Learning Automata-Based Channel Selection
where 0 < α < 1 is the learning rate and 0 < γ < 1 is the In this algorithm, users select a channel a ∈ C based on
discount rate. the reward value of an action saved in the learning block. The
The Q(s, a) values are estimations of the Q∗ (s, a) values, action probability table is updated as in [37]
which represent the sum of the immediate reward obtained by 
taking action a at state s and the total expected future rewards qt (s, a) + αU(sa , sb ) 1 − qt (s, a) , where a = b
qt+1 (s, a) =
on the basis of its previous success or failure of channel selec- qt (s, a) − αU(sa , sb ) qt (s, a) , where a = b
tion. By updating Q(s, a) values, the agent eventually makes it (6)

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
1842 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 2, 15 JANUARY 2023

where qt+1 (s, a) and qt (s, a) represent action of the user for The integration of all the Q-values shows the differential
state s at time (t + 1) and at time t, respectively, for choosing equation at time t as [37]
an action a from the available state s using the normalized
utility function as follows [37]: Qt = Ke−at + E[U(sa , sb )] (15)
U(sa , sb ) where K is integration constant and Q-values at time t → ∞
U(sa , sb ) = (7)
maxa∈C E(U(sa , sb )) can be written as follows [37]:
where sa and sb represent the available states of action a and b lim Qt = E[U(sa , sb )]. (16)
for two users and maxa∈C E(U(sa , sb )) indicates the maximum x→∞
average reward of agent a depending on the probability of Equations (14)–(16) show that all three learning algorithms
agent b achieving the action using the learning mechanism. converge at time t for any user i ∈ T and can be written as
follows [37]:
E. Convergence Point C
In this section, the convergence point is proved to achieve Uit = Uit (sa , sb )pt (sa ) + Uit (sb , sa )pt (sb ). (17)
the balance between all of the learning techniques. The NEP b=a,b =a
can be defined as the function shown in the following [38]:
Hence, all SUDs can converge to a pure NEP after the con-
C θ{Ci }
vergence of the learning event. In spectrum mobility, multiple
P= Uj (θ {Ci }) (8) SUD pairs can make agreements simultaneously on different
j=1 j=1 channels.
where θ {Ci } denotes the cardinality of channel Ci which shows
the number of SUDs in a channel i ∈ C. The SUD can make IV. R ESULTS AND D ISCUSSION
a channel switch if (9) works as defined by [37]
⎧   The proposed routing is executed with the help of the CRCN
⎨ Ui [θ {Ci }] = Ui θ Cj+1 − U[θ {Ck }] simulator, an add-on of the network simulator (NS-2) [39].
P = if   (9) We have compared the network performance achieved by our

Ui θ Cj+1 > U[θ {Ck }] proposed RL-based routing for CRN-based IoT communica-
θ {Cj+1 } θ {Ck+1 } tions with the recent AODV-IoT [9], ELD-CRN [10], routing,
  and SpEED-IoT routing [11]. SpEED-IoT routing essentially
P = Ui θ Cj+1 + Ui θ {Ck+1 }
i=1 i=1
selects a route that ensures the connectivity and reachability
θ {Cj } θ{Ck }
of IoT devices with data rate optimization of the assigned
  routes in a mesh network-based IoT network. This means that
− Ui θ Cj + Ui [θ {Ck }]
the route encounters the one type of users without any effect
i=j i=1
  of PUDs unexpected arrival and, therefore, the data rate is
= Ui θ Cj+1 − U[θ {Ck }] optimized on the basis of multichannel routing for device-
= Ui [θ {Ci }]. (10) to-device communication in IoT mesh network. The results
are compared with the AODV-IoT and the ELD-CRN routing
Equation (10) shows that P is an exact potential function.
protocols which are designed for the CR-IoT environments.
Now, for No-External Regret learning, U = 0 happens at
ELD-CRN is a recent RL-based routing protocol which also
the NEP and can be written as follows [37]:
 addresses energy efficiency during the routing process in CRN.
(t+1)
Uit (sa ) = Ui (sa ) Hence, the proposed routing protocol can also be compared
P=  
 (t+1)    (11) for the latest energy constrains in future. Moreover, ELD-CRN
Ui s a = Ui
t sa .
routing conserves the limited battery resources of IoT devices
Equations (10) and (11) show that, once the network based on CRN and supports reliable packet delivery while
achieves the NEP, only then will users have link stability and incurring lower packet transfer latency and being energy effi-
further channel switching events will reduce, in the case of cient. This routing mechanism is limited for location-based
No-External Regret learning, as follows [38]: and operate over a single wireless channel using a channel
(t+1) (t+2) access mechanism that follows the IEEE 802.11 distributed
pi (sa ) = pi (sa ). (12)
coordination function.
Also, for Learning Automata, (6) and (11) indicate that after
the NEP is achieved, no further channel switching happens A. Average Data Rate Maximization
as [37]
The average data rate achieved from our RL-based routing
qt+1 (s, a) = qt (s, a). (13) approach for CRN-IoT communications (RL-IoT Routing) is
compared for the different network scenarios in Figs. 4–6. It is
Now, for Q-learning from (5), the differential equation of observed that when the numbers of SUDs are low, the average
Q-values from the Q-table is [37] data rate for all the routing protocols is also low while, for
 (s,a)
dQ Qt+1 (s, a) − Qtt = α[E(U(sa , sb )) − Qt (s, a)] higher values, performance increases reaching almost 90% of
= (14)
dt = α[E(U(sa , sb )) − Qt (s, a)]. delivered packets for the increasing standard deviation (sd) of

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
MALIK et al.: RL-IoT: RL-BASED ROUTING APPROACH FOR COGNITIVE RADIO-ENABLED IoT COMMUNICATIONS 1843

MAR in the RL-IoT routing. The MAR for the activity of the
PU indicates the availability of PU on the channel for channel
utilization, channel availability information, channel transmis-
sion rate, and channel transmission time. The real working of
PU on spectrum is not observed due to licensing restrictions.
Therefore, the PUs are distributed with the fixed allocation
on a spectrum in a stochastic environment of the CRN within
the mean arrival rate of [0, 1]. The Poisson process is a very
important model in queuing theory which can be used when
the packets originate from a large population of independent
users. It can be seen in Fig. 4 that initially the RL-IoT rout-
ing has a better average data rate compared to the other three
routing protocols which have almost the same performance at
the beginning for 0.0 sd of PUD’s MAR. This is due to the
lesser number of PUDs at the start of the simulations: as soon
Fig. 4. Average data rate of SUD at PUD’s sd of MAR = 0.0 (user/ms).
as the number of PUDs increases, the data rate changes for the
AODV-IoT, SpEED-IoT, and ELD-CRN routing protocols. At
the lowest MAR of PUDs, each SUD is affected by the activity
of the PUDs and, hence, the user is often isolated due to the
unavailability of free channels. Therefore, the packets deliv-
ered are mainly those sent when most PUDs are inactive and
those that are directed to destinations very close to the sources.
On the other hand, when the number of PUDs increases, the
routing choices also increase and, thus, the RL-IoT routing is
able to build routes unaffected by the activity of PUDs for
most of the flows as shown in Fig. 5. The RL-IoT and ELD-
CRN routing mechanisms have increased data rates as time
elapses. However, the RL-IoT routing protocol has a very fast
convergence time compared to the ELD-CRN routing protocol
due to exploitation learning combined with exploration learn-
ing. Hence, the average data rate is increasing for the 0.4 and Fig. 5. Average data rate of SUD at PUD’s sd of MAR = 0.4 (user/ms).
0.8 sd of PUD’s MAR, due to the more channel choices for
routing as shown in Figs. 5 and 6. On the contrary, the other
routing protocols are not capable of this technique and due
to the increment of PUD’s interferences, the average data rate
is declining. This enables it to improve the average data rate
by 69% in comparison with the AODV-IoT routing protocol
and nearly 39% and 43% in comparison with the ELD-CRN
and SpEED-IoT routing protocols, respectively. These results
are averaged for all the three scenarios of sd of MAR as
low, medium, and high and applied AWK script which can
be applied to process the trace file of NS-2 [40].

B. Packet Collision
Every channel c ∈ C is configured according to the Poisson
distribution for the mean arrival rate to monitor the activ-
ity of PUD. The sd value is calculated on the basis of Fig. 6. Average data rate of SUD at PUD’s sd of MAR = 0.8 (user/ms).
the Box–Muller transform from an interval of mean arrival
rate (MAR = [0,1]) for the stochastic environment of CR-
IoT network. The activity of PUD is predicted using the PUD activity can be observed for the MAR = 0.4 (user/ms)
sd distributed as low, medium, and high for the {0.0, 0.4, in which the user activity has a major role of the channel
0.8}, respectively. Initially, all the routing protocols achieve selection and there is a need to minimizes the packet collision
a similar probability of PUD–SUD packet collisions across between the different types of devices. It can be seen from
a CRN-based IoT communication (see Fig. 7). There is no the results presented in Fig. 8. which shows the reduction of
activity detected on the channel in case of MAR of PUD is packet collisions. This reduction in the number of the packet
0.0, hence, due to this unavailability of PUD’s activity, the collisions as compared to AODV-IoT routing is up to 30%. On
graph shows similar trends for all protocols. The effect of the other hand, 19% as compared to the SpEED-IoT routing

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
1844 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 2, 15 JANUARY 2023

Fig. 7. Packet collision between SUDs–PUDs at PUD’s sd of


MAR = 0.0 (user/ms).

Fig. 10. EED of SUD for PUD’s sd of MAR = 0.0 (user/ms).

Fig. 8. Packet collision between SUDs–PUDs at PUD’s sd of


MAR = 0.4 (user/ms).

Fig. 11. EED of SUD for PUD’s sd of MAR = 0.4 (user/ms).

the start, there is a significant increment in EED of the SUD


due to the increment in activity of the PUD (see Fig. 10).
This is because of learning techniques process which takes
time so that the learning techniques can be reach to the NEP
on which the channel decisions are maintained. The effect
of the availability of the PUD at different levels is shown in
Figs. 11 and 12) for MAR = 0.4 and 0.8, respectively. The
proposed routing shows a good performance for increasing
activity level as compare to other routing solutions. This is
because the proposed RL-IoT routing selects the routes accord-
Fig. 9. Packet collision between SUDs–PUDs at PUD’s sd of ing to the available channel list at a network layer. The EED
MAR = 0.8 (user/ms). of the SUD is much minimized for the MAR = 0.8 as com-
pared to the MAR = 0.4 (user/ms) because the increment in
and it is due to the best path selected in it. The proposed rout- user activities increases the more routing choices of different
ing outperforms in reducing the number of packet collision channels for RL-IoT routing. The proposed routing minimizes
as compare to other routing solutions in case of high activity the EED up to 89% for the MAR = 0.8 (user/ms) as compared
of PUD on a channel. Generally, the RL-IoT routing achieves to the other routing solutions.
almost better performance in minimizing the packet collision
as shown in Figs. 7–9. V. C ONCLUSION AND F UTURE W ORK
In this work, we have proposed a novel RL-based routing
C. End-to-End Delay approach for the highly dynamic CR-IoT communication envi-
The EED is minimized for the CR-based IoT network so that ronment. The proposed approach integrates channel selection
the QoS can be improved for overall network performance. In decisions in routing at the network layer to improve average

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
MALIK et al.: RL-IoT: RL-BASED ROUTING APPROACH FOR COGNITIVE RADIO-ENABLED IoT COMMUNICATIONS 1845

[12] M. M. Aslam, L. Du, X. Zhang, Y. Chen, Z. Ahmed, and B. Qureshi,


“Sixth generation (6G) cognitive radio network (CRN) application,
requirements, security issues, and key challenges,” Wireless Commun.
Mobile Comput., vol. 2021, Sep. 2021, Art. no. 1331428.
[13] R. A. Diab, N. Bastaki, and A. Abdrabou, “A survey on routing proto-
cols for delay and energy-constrained cognitive radio networks,” IEEE
Access, vol. 8, pp. 198779–198800, 2020.
[14] M. Youssef, M. Ibrahim, M. Abdelatif, L. Chen, and A. V. Vasilakos,
“Routing metrics of cognitive radio networks: A survey,” IEEE Commun.
Surveys Tuts., vol. 16, no. 1, pp. 92–109, 1st Quart., 2014.
[15] D. Tarek, A. Benslimane, M. Darwish, and A. M. Kotb, “Survey on spec-
trum sharing/allocation for cognitive radio networks Internet of Things,”
Egyptian Inform. J., vol. 21, no. 4, pp. 231–239, 2020.
[16] T. Zhou et al., “Joint routing and channel assignment for delay
minimization in multi-channel multi-flow mobile cognitive ad hoc
networks,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2015,
pp. 1–6.
Fig. 12. EED of SUD for PUD’s sd of MAR = 0.8 (user/ms). [17] N. Mansoor, A. M. Islam, M. Zareei, and C. Vargas-Rosales, “RARE:
A spectrum aware cross-layer MAC protocol for cognitive radio ad-hoc
networks,” IEEE Access, vol. 6, pp. 22210–22227, 2018.
[18] S. Akter and N. Mansoor, “A spectrum aware mobility pattern based
data rate and throughput. We have evaluated the performance routing protocol for CR-VANETs,” in Proc. IEEE Wireless Commun.
of our proposed approach through simulations by simulat- Netw. Conf. (WCNC), 2020, pp. 1–6.
[19] M. Kannan and B. R. Jeetha, “Neighbor node discovery mechanism
ing the CR-IoT communication environment in the CRCN based delay aware routing protocol (DARP—NND) for cognitive radio
simulator and comparing the network performance achieved ad hoc networks,” in Proc. IEEE Int. Conf. Adv. Comput. Appl. (ICACA),
by our proposed mechanism with that of the recent AODV- 2016, pp. 94–98.
[20] Y. Wang, G. Zheng, H. Ma, Y. Li, and J. Li, “A joint channel selection
IoT, ELD-CRN, and SpEED-IoT routing approaches. A key and routing protocol for cognitive radio network,” Wireless Commun.
concern in this spectrum-sharing environment is security and Mobile Comput., vol. 2018, pp. 1–7, Mar. 2018.
privacy of primary as well as secondary IoT users. In future, [21] S. Iqbal, A. H. Abdullah, K. N. Qureshi, and J. Lloret, “Soft-GORA:
Soft constrained globally optimal resource allocation for critical links
we intend to address the spectrum privacy and security issues in IoT backhaul communication,” IEEE Access, vol. 6, pp. 614–624,
through encryption and privacy preservation approaches. 2017.
[22] S. Majeed, A. Sohail, K. N. Qureshi, A. Kumar, S. Iqbal, and J. Lloret,
“Unmanned aerial vehicles optimal airtime estimation for energy
R EFERENCES aware deployment in IoT-enabled fifth generation cellular networks,”
EURASIP J. Wireless Commun. Netw., vol. 2020, no. 1, pp. 1–14,
[1] P. Schulz et al., “Latency critical IoT applications in 5G: Perspective on 2020.
the design of radio interface and network architecture,” IEEE Commun. [23] K. N. Qureshi and Y. A. A. S. Aldeen, “New trends in Internet of
Mag., vol. 55, no. 2, pp. 70–78, Feb. 2017. Things, applications, challenges, and solutions,” Telkomnika, vol. 16,
[2] H. B. Salameh, S. Otoum, M. Aloqaily, R. Derbas, I. Al Ridhawi, no. 3, pp. 1114–1119, 2018.
and Y. Jararweh, “Intelligent jamming-aware routing in multi-hop IoT- [24] B. Mao, F. Tang, Z. M. Fadlullah, and N. Kato, “An intelligent route
based opportunistic cognitive radio networks,” Ad Hoc Netw., vol. 98, computation approach based on real-time deep learning strategy for
Mar. 2020, Art. no. 102035. software defined communication systems,” IEEE Trans. Emerg. Topics
[3] F. Hu, B. Chen, and K. Zhu, “Full spectrum sharing in cognitive radio Comput., vol. 9, no. 3, pp. 1554–1565, Jul.–Sep. 2021.
networks toward 5G: A survey,” IEEE Access, vol. 6, pp. 15754–15776, [25] Y. Du, C. Chen, P. Ma, and L. Xue, “A cross-layer routing protocol
2018. based on quasi-cooperative multi-agent learning for multi-hop cognitive
[4] H. Yu and Y. B. Zikria, “Cognitive radio networks for Internet of Things radio networks,” Sensors, vol. 19, no. 1, p. 151, 2019.
and wireless sensor networks,” Sensors, vol. 20, no. 18, p. 5288, 2020. [26] T. Stephan, F. Al-Turjman, B. Balusamy, K. S. Joseph, “Energy and
[5] M. Huang, A. Liu, N. N. Xiong, T. Wang, and A. V. Vasilakos, spectrum aware unequal clustering with deep learning based primary
“An effective service-oriented networking management architecture for user classification in cognitive radio sensor networks,” Int. J. Mach.
5G-enabled Internet of Things,” Comput. Netw., vol. 173, May 2020, Learn. Cybern., vol. 12, no. 11, pp. 3261–3294, 2021.
Art. no. 107208. [27] M. U. Younus, M. K. Khan, and A. R. Bhatti, “Improving the software
[6] Y. Wang, Z. Ye, P. Wan, and J. Zhao, “A survey of dynamic spec- defined wireless sensor networks routing performance using reinforce-
trum allocation based on reinforcement learning algorithms in cogni- ment learning,” IEEE Internet Things J., vol. 9, no. 5, pp. 3495–3508,
tive radio networks,” Artif. Intell. Rev., vol. 51, no. 3, pp. 493–506, Mar. 2022.
2019. [28] X. Wang et al., “QoS and privacy-aware routing for 5G-enabled
[7] J. Zou, H. Xiong, D. Wang, and C. W. Chen, “Optimal power Industrial Internet of Things: A federated reinforcement learning
allocation for hybrid overlay/underlay spectrum sharing in multiband approach,” IEEE Trans. Ind. Informat., vol. 18, no. 6, pp. 4189–4197,
cognitive radio networks,” IEEE Trans. Veh. Technol., vol. 62, no. 4, Jun. 2022.
pp. 1827–1837, May 2013. [29] B. He, J. Wang, Q. Qi, H. Sun, and J. Liao, “RTHop: Real-time hop-
[8] G. Kaur, P. Chanak, and M. Bhattacharya, “Energy efficient intelligent by-hop mobile network routing by decentralized learning with semantic
routing scheme for IoT-enabled WSNs,” IEEE Internet Things J., vol. 8, attention,” IEEE Trans. Mobile Comput., early access, Aug. 19, 2021,
no. 14, pp. 11440–11449, Jul. 2021. doi: 10.1109/TMC.2021.3105963.
[9] S. Anamalamudi, A. R. Sangi, M. Alkatheiri, and A. M. Ahmed, [30] B. Mao et al., “A novel non-supervised deep-learning-based network
“AODV routing protocol for cognitive radio access based Internet of traffic control method for software defined wireless networks,” IEEE
Things (IoT),” Future Gener. Comput. Syst., vol. 83, pp. 228–238, Wireless Commun., vol. 25, no. 4, pp. 74–81, Aug. 2018.
Jun. 2018. [31] Z. Yang, S. Ping, A. Aijaz, and A.-H. Aghvami, “A global
[10] R. A. A. Diab, A. Abdrabou, and N. Bastaki, “An efficient routing proto- optimization-based routing protocol for cognitive-radio-enabled smart
col for cognitive radio networks of energy-limited devices,” Telecommun. grid AMI networks,” IEEE Syst. J., vol. 12, no. 1, pp. 1015–1023,
Syst., vol. 73, no. 4, pp. 577–594, 2020. Mar. 2018.
[11] S. Debroy, P. Samanta, A. Bashir, and M. Chatterjee, “SpEED- [32] K. N. Qureshi, A. Naveed, Y. Kashif, and G. Jeon, “Internet of
IoT: Spectrum aware energy efficient routing for device-to-device IoT Things for education: A smart and secure system for schools mon-
communication,” Future Gener. Comput. Syst., vol. 93, pp. 833–848, itoring and alerting,” Comput. Electr. Eng., vol. 93, Jul. 2021,
Apr. 2019. Art. no. 107275.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
1846 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 2, 15 JANUARY 2023

[33] B. Pourpeighambar, M. Dehghan, and M. Sabaei, “Non-cooperative rein- Ayesha Afzal received the B.S. degree in com-
forcement learning based routing in cognitive radio networks,” Comput. puter science from Bahauddin Zakariya University,
Commun., vol. 106, pp. 11–23, Jul. 2017. Multan, Pakistan, in 2004, and the M.S. and Ph.D.
[34] W. Krichene, B. Drighes, and A. Bayen, “On the convergence of no- degrees in computer science from Lahore University
regret learning in selfish routing,” in Proc. Int. Conf. Mach. Learn., 2014, of Management Sciences, Lahore, Pakistan, in 2007
pp. 163–171. and 2018, respectively.
[35] K.-L. A. Yau, P. Komisarczuk, and P. D. Teal, “Reinforcement learning She is currently serving as an Assistant Professor
for context awareness and intelligence in wireless networks: Review, with Air University, Multan. Her research interests
new features and open issues,” J. Netw. Comput. Appl., vol. 35, no. 1, are in the areas of services computing, business
pp. 253–267, 2012. process management, distributed workflow manage-
[36] A. Popescu, “Cognitive radio networks: Elements and architectures,” ment, and cloud computing.
Ph.D. dissertation, Dept. Commun. Syst., Blekinge Inst. Technol., Dr. Afzal is on the editorial board of the International Journal on Digital
Karlskrona, Sweden, 2014. Libraries.
[37] Y. Xu, J. Wang, Q. Wu, A. Anpalagan, and Y.-D. Yao, “Opportunistic
spectrum access in cognitive radio networks: Global optimization using
local interaction games,” IEEE J. Sel. Topics Signal Process., vol. 6,
no. 2, pp. 180–194, Apr. 2012.
[38] M. Felegyhazi, J.-P. Hubaux, and L. Buttyan, “Nash equilibria of packet
forwarding strategies in wireless ad hoc networks,” IEEE Trans. Mobile
Comput., vol. 5, no. 5, pp. 463–476, May 2006.
[39] T. Issariyakul and E. Hossain, “Introduction to network sim-
ulator 2 (NS2),” in Introduction to Network Simulator NS2.
Boston, MA, USA: Springer, 2009, pp. 1–18. [Online]. Available:
https://doi.org/10.1007/978-0-387-71760-9_2
[40] L. Wood. “Awk script to get end-to-end delays from NS2 trace files.”
2011. Accessed: Oct. 6, 2022. [Online]. Available: http://personal.
ee.surrey.ac.uk/Personal/L.Wood/ns/packet-delay-script/lloyd-wood-ns-
packet-delay-awk-script.pdf
Muhammad Ibrar received the B.S. degree
in telecommunication and networking from
COMSATS University Islamabad (Abbottabad
Campus), Abbottabad, Pakistan, in 2010, the M.S.
degree in telecommunication and networking from
Bahria University, Islamabad, Pakistan, in 2014,
and the Ph.D. degree from the School of Software,
Dalian University of Technology, Dalian, China, in
Tauqeer Safdar Malik received the B.S. degree March 2021.
in computer science from Bahauddin Zakariya He is a Postdoctoral Researcher with the School
University, Multan, Pakistan, in 2004, and the of Software, Dalian University of Technology.
M.S. degree in computer science from COMSATS His research interests include software-defined networking, fog computing,
University Islamabad (Wah Campus), Rawalpindi, wireless ad hoc, and sensor networks.
Pakistan, in 2007, and the Ph.D. degree from
Universiti Teknologi PETRONAS, Seri Iskandar,
Malaysia, in 2018.
He is currently an Assistant Professor with
the Department of Computer Science, Air
University (Multan Campus), Islamabad, Pakistan.
His research interests include wireless networks, cognitive radio ad hoc
networks, Internet of Things, 5G and 6G, IPv6, artificial intelligence,
machine learning, routing, and security issues in wireless networks.

Lei Wang (Member, IEEE) received the B.S., M.S.,


and Ph.D. degrees from Tianjin University, Tianjin,
Kaleem Razzaq Malik received the M.Sc. China, in 1995, 1998, and 2001, respectively.
degree in computer science from the National He is currently a Full Professor with the School of
University of Computer and Emerging Sciences, Software, Dalian University of Technology, Dalian,
Lahore, Pakistan, in 2008, and the Ph.D. degree China. He was a member of Technical Staff with
in computer science from the University of Bell Labs Research, Shanghai, China, from 2001
Engineering and Technology Lahore, Lahore, in to 2004, a Senior Researcher with Samsung, Seoul,
2018. South Korea, from 2004 to 2006, a Research
He has been serving as an Associate Professor Scientist with Seoul National University, Seoul,
with Air University, Multan, Pakistan, since March from 2006 to 2007, and a Research Associate with
2018. Earlier, he served with the COMSATS Washington State University, Vancouver, WA, USA, from 2007 to 2008. He
Institute of Information Technology, Sahiwal, serves as a Research Fellow with the Key Lab of Ubiquitous Network and
Pakistan; Government College University Faisalabad, Faisalabad, Pakistan; Service Software of Liaoning Province, Dalian, and the Center of Underwater
and the Virtual University of Pakistan, Lahore. He has published various arti- Robot, Peng Cheng Laboratory, Shenzhen, China. He has published more
cles in top national and international journals and a book chapter. His research than 160 papers, and the papers have more than 2900 citations. His research
interests include big data, cloud computing, data sciences, and semantic Web. interests involve wireless ad hoc networks, sensor networks, social networks,
Dr. Malik has performed reviews for many top ranking international jour- and network security.
nals published by Springer, IEEE, and Elsevier, and was a member of technical Prof. Wang is a member of ACM and a Senior Member of the China
program committees of international conferences. Computer Federation.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.
MALIK et al.: RL-IoT: RL-BASED ROUTING APPROACH FOR COGNITIVE RADIO-ENABLED IoT COMMUNICATIONS 1847

Houbing Song (Senior Member, IEEE) received Nadir Shah received the B.Sc. and M.Sc. degrees in
the Ph.D. degree in electrical engineering from the computer science degrees from Peshawar University,
University of Virginia, Charlottesville, VA, USA, in Peshawar, Pakistan, in 2002 and 2005, respec-
August 2012. tively, the M.S. degree in computer science
He is currently a Tenured Associate Professor from International Islamic University, Islamabad,
of AI and the Director of the Security and Pakistan, in 2007, and the Ph.D. degree from Sino-
Optimization for Networked Globe Laboratory German Joint Software Institute, Beihang University,
(SONG Lab, www.SONGLab.us), University of Beijing, China, in 2011.
Maryland, Baltimore County, Baltimore, MD, USA. He is currently an Associate Professor with
He was a Tenured Associate Professor of Electrical COMSATS University Islamabad (Wah Campus),
Engineering and Computer Science with Embry– Rawalpindi, Pakistan. His current research interests
Riddle Aeronautical University, Daytona Beach, FL, USA. SONG Lab include computer networks, distributed systems, and network security.
graduates work in a variety of companies and universities. Those seek- Dr. Shah is serving in the editorial board of International Journal
ing academic positions have been hired as the Tenure-Track Assistant of Communication Systems (Wiley), IEEE S OFTWARIZATIONS, AHWSN,
Professors at U.S. universities, such as Auburn University, Auburn, AL, and Malaysian Journal of Computer Science. He has been serving as a
USA; Bowling Green State University, Bowling Green, OH, USA; and Reviewer for several journals/conferences, including the ICC, the INFOCOM,
the University of Tennessee, Knoxville, TN, USA. His research has been the WCNC, Computer Networks (Elsevier), the IEEE C OMMUNICATIONS
featured by popular news media outlets, including IEEE GlobalSpec’s L ETTERS, the IEEE Communications Magazine, the IEEE T RANSACTIONS
Engineering360, Association for Uncrewed Vehicle Systems International, ON I NDUSTRIAL I NFORMATICS , and The Computer Journal.
Security Magazine, CXOTech Magazine, Fox News, U.S. News & World
Report, The Washington Times, New Atlas, Battle Space, and Defense Daily.
His research has been sponsored by federal agencies (including National
Science Foundation, U.S. Department of Transportation, Federal Aviation
Administration, Air Force Office of Scientific Research, U.S. Department of
Defense, and Air Force Research Laboratory) and industry. He has edited
eight books, including Aviation Cybersecurity: Foundations, principles, and
applications (Scitech Publishing, 2022), Smart Transportation: AI Enabled
Mobility and Autonomous Driving (CRC Press, 2021), Big Data Analytics
for Cyber-Physical Systems: Machine Learning for the Internet of Things
(Elsevier, 2019), Smart Cities: Foundations, Principles, and Applications
(Hoboken, NJ, USA: Wiley, 2017), Security and Privacy in Cyber-Physical
Systems: Foundations, Principles, and Applications, (Chichester, U.K.: Wiley-
IEEE Press, 2017), Cyber-Physical Systems: Foundations, Principles and
Applications (Boston, MA: Academic Press, 2016), and Industrial Internet of
Things: Cybermanufacturing Systems (Cham, Switzerland: Springer, 2016).
He has authored more than 100 articles and the inventor of two patents
(U.S. and WO). His research interests include cyber–physical systems/Internet
of Things, cybersecurity and privacy, AI/machine learning/big data analytics,
edge computing, unmanned aircraft systems, connected vehicle, smart and
connected health, and wireless communications and networking.
Dr. Song was a recipient of the Best Paper Award from the 12th IEEE
International Conference on Cyber, Physical, and Social Computing in 2019,
the Best Paper Award from the 2nd IEEE International Conference on
Industrial Internet 2019, the Best Paper Award from the 19th Integrated
Communication, Navigation and Surveillance technologies Conference 2019,
the Best Paper Award from the 6th IEEE International Conference on
Cloud and Big Data Computing 2020, the Best Paper Award from
the 15th International Conference on Wireless Algorithms, Systems, and
Applications 2020, the Best Paper Award from the 40th Digital Avionics
Systems Conference 2021, the Best Paper Award from 2021 IEEE Global
Communications Conference, and the Best Paper Award from 2022 IEEE
International Conference on Computer Communications. He is a Highly
Cited Researcher identified by Clarivate in 2021 and a Top 1000 Computer
Scientist identified by Research.com. He has been serving as an Associate
Technical Editor for IEEE Communications Magazine since 2017, an
Associate Editor for IEEE I NTERNET OF T HINGS J OURNAL since 2020,
IEEE T RANSACTIONS ON I NTELLIGENT T RANSPORTATION S YSTEMS since
2021, and IEEE J OURNAL ON M INIATURIZATION FOR A IR AND S PACE
S YSTEMS since 2020, and a Guest Editor for IEEE J OURNAL ON S ELECTED
A REAS IN C OMMUNICATIONS, IEEE I NTERNET OF T HINGS J OURNAL,
IEEE N ETWORK, IEEE T RANSACTIONS ON I NDUSTRIAL I NFORMATICS,
IEEE S ENSORS J OURNAL, IEEE T RANSACTIONS ON I NTELLIGENT
T RANSPORTATION S YSTEMS, and IEEE J OURNAL OF B IOMEDICAL AND
H EALTH I NFORMATICS. He is a Senior Member of ACM and an ACM
Distinguished Speaker.

Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on December 20,2023 at 09:08:18 UTC from IEEE Xplore. Restrictions apply.

You might also like