Future Generation Computer Systems: Anil Carie Mingchu Li Chang Liu Prakasha Reddy Waseef Jamal
article info a b s t r a c t
Article history: In this paper, we investigate the Hybrid Directional CR-MAC based on Q-Learning with Directional
Received 8 August 2017 Power Control in cognitive radio (CR) systems. In CR systems, nodes can switch to heterogeneous non-
Received in revised form 20 October 2017 overlapping channels opportunistically which offer higher achievable throughput. However, the random
Accepted 7 November 2017
channel selection policy in existing CR-MAC protocol has problems like delay, packet collisions, and
Available online xxxx
quality of service. The proposed channel selection scheme which is quite different from the traditional
Keywords: scheme is adopted by nodes to achieve context awareness and intelligence for adaptive channel selection.
Software defined cognitive radio network The nodes select a channel based on the results learned by interactions with the other nodes and channels.
MAC protocol The directional transmission power control scheme allows the nodes to reuse the channels subject
Directional antennas to interference constraints. The simulation results show that nodes using the proposed algorithm can
Software defined radio select channels adaptively and optimal transmission power which helps to achieve high throughput and
Q-learning minimized power consumption.
Power control © 2017 Elsevier B.V. All rights reserved.
2 A. Carie et al. / Future Generation Computer Systems ( ) –
channel impairments. However, it is not suitable for primary user’s of wireless network using directional antennas is studied in [35].
QoS if secondary users transmit with arbitrarily high power. Thus In [36], the achievable throughput of mobile ad hoc network with
it is natural that power control should rely on the interference the directional antenna is addressed. Directional Medium Access
levels. In this paper, we propose ‘‘Hybrid Directional CR-MAC based Control is studied in [37], it suffers from deafness and hidden
on Q-Learning with Directional Power Control’’ where learning terminal problems. First Power control with a directional antenna
algorithm is used for channel selection and a new Directional over packet radio network was considered in [38]. Power control
transmission power control scheme (DTPC) for enhancing the built on D-MAC with Directional RTS, Omni Directional CTS and
throughput and energy efficiency of the directional hybrid CR MAC optimal power for the data packet is studied in [39], increased
protocol. The main contributions of our work are twofold. First, network capacity and reduced power consumption. An optimiza-
using channel selection algorithm we try to select the best channel tion problem for selecting the range of channels for transmission
based on SU’s observation of PU’s traffic, channel characteristics with the control channel and aggregated data channel was em-
like throughput achieved, packets lost. Second we investigate the ployed in statistical channel allocation MAC (SCA-MAC) [40] which
problem of channel reuse in directional communication, where outperforms random scheme. DSA-MAC [41], DCRMAC [42], HC-
CR user has the power control capability. That is, CR user can MAC [43], DDMAC [44], SMA [45] are the protocols which are
transmit at any transmit power in the allowable power range that similar to IEEE802.11 DCF standard for reserving a channel.
to achieve maximum concurrent transmissions. The organization
of the rest of the paper is as follows. In Section 2, we present 3. System model
related work. In Section 3 we present system model. Overview of
the Q-learning algorithm is presented in Section 4. The proposed In this study, users with CR capabilities, referred to as sec-
Hybrid Directional CR-MAC based on Q-Learning with Directional ondary users, can communicate with other CR nodes utilizing the
Power Control Scheme is presented in Section 5. Simulation results primary networks available spectrum spatially and/or temporally.
for different network topologies are shown in the Section 6 to When the secondary network doesn’t have enough resources CR
establishes the substantial throughput and energy gains that can nodes form an ad-hoc network without a central controller or
be attained under the investigated scheme. Finally, we present our dedicated control channels. Due to highly dynamic and hetero-
conclusions and future work. geneous networking environment, a dedicated control channel
is not pre-defined for exchanging control messages. We assume
2. Related work that the nodes are close enough as to consider an interference-
limited spectrum sharing scenario in which a CRAHN operates.
Game theory based CR The system model is composed of M licensed channels which
In CR networks Game theory has recently been the most pop- are accessed opportunistically by K CR nodes (acts as both trans-
ular method for attaining context-awareness and intelligence. In mitter and receiver), D primary users. The primary transmitter,
which, SU’s interact to maximize their individual objective such primary receiver, and the mobile CR devices are distributed in
as delay, throughput etc. however, there are several limitations randomly within the coverage area. Similar to [46–48], the two-
in game theory which are addressed using RL approach. Firstly, state continuous-time Markov process is used to model the traffic
GT based CR requires a complete set of information to compute of each channel: Channel occupied by the PU (busy state) and the
the Nash equilibrium; hence it is more suitable for centralized CR channel that is not occupied by the PU(idle state). These two states
networks [15,16]. are referred as ON and OFF respectively. Each SU transmitter and its
Secondly, GT assumes a single type of objective function corresponding SU receiver, but also on the time-varying activities
throughout the CR network, and hence a homogeneous learning of the PUs. We consider the situation that several SUs may compete
mechanism in all the SUs. Thirdly, SU’s might converge to sub for the same channel, and one SU may have more than one channel
optimal action due to miss-coordination even when optimal ac- for selection
tion exists. Although the GT has been successfully applied in CR
networks [17–26], the RL approach is a good alternative which Antenna model
addresses the issues above associated with GT. For instance, the To predict the received power, as in [49,50], we consider a gen-
RL supports heterogeneous learning mechanism in each agent eral power propagation model Pr = Pt CG rt Gtr
where Pt is the trans-
because each agent can represent distinctive performance metrics mit power, dtr is the distance between transmitter and receiver,
as local rewards. α is the path-loss exponent, Gtr and Grt are the directive gains of
the transmitting and receiving antennas toward the direction of
Omni directional Power control the receiving and transmitting antennas, respectively, while C is a
SU’s vary their transmit power depending on interference constant determined by other factors as antenna heights and wave
at primary receiver and maximum secondary transmit power length.
constraint [27]. Concurrent transmission region is maximized
in [6] using optimal power control. The number of concurrent Time slot structure
transmissions are maximized in [7] using dynamic spectrum shar- The system model has slotted transmission structure as shown
ing. With objectives of maximizing sum-rate, achieving rate fair- in Fig. 3 and described as follows Each secondary user executes
ness, minimizing power consumption using power control are following stages synchronously during each time slot.
studied in [28–32]. The necessary and sufficient condition for the
- Channel Sensing: SU’s sense the PU channel’s to detect the
feasible region using Power controlled MAC consisting of only
activity of PU’s.
two transmission links is derived in [33]. Channel hopping se-
- PCL-EXCHANGE: After Sensing SU’s broadcasts their Primary
quence is used to allocate the control channel to one-hop neighbor
user free Channel List (PCL) to its neighbors. After receiving
nodes [25]. The basic drawback of sequential CCC based CR-MAC is
PCL information from neighbors, SU’s update PCL table.
longer channel rendezvous delays [26–31]. Channel rendezvous is
- CHANNEL RESERVATION: it is divided into N slots and every
more challenging for increased availability of PU channels.
slot is divided in to two sub slots sub slot (S2) for a node to
Directional power control send RTS directionally and sub slot (S2) for the destination
For reusing spectrum in the macro cell, underlying microcell node to reply with CTS or DNAV.
uses Antenna beamforming and power allocation schemes to max- - DATA TRANFER: SU’s which successfully reserved channel in
imize multiuser sum rate [34]. Capacity and power consumption CHANNEL RERVATION PHASE start data transmission
A. Carie et al. / Future Generation Computer Systems ( ) – 3
4. Overview of Q-learning algorithm the different nodes (states). Therefore, when choosing the next
channel, we let agent act greedily, taking, in each situation, the
We use Q-Learning [12], which is a recent form of reinforcement action with highest Q -value. If the node can transmit data suc-
learning algorithm that does not need a model of its environment cessfully using the channel, then reward R will be increased by the
and works by estimating the values of state–action pairs. The number of packets transmitted otherwise R will be 0. The discount
estimate of future reward value in Q- learning algorithm is given factor is an important parameter of the Q-Learning algorithm. We
by Q(S,A) when an agent takes a particular action A when in a use variable discount factor which is determined by (PU activity
particular state S. Learning intervals are denoted by t∈T = {1,2,. . . }, rate, number of SU competing for the channel, bandwidth).
a constant interval by tD , actions by a∈A, rewards by rt+1(at). Every
agent records then learnt values from the environment in Q-table
Set α ϵ [0, 1] //learning rate
for all its possible actions with | A| entries. The local reward for For initial time slot
an action is reflected through in its Q -value; hence agents change Select random channel
their actions when there is a change in Q -value. At each interval Broadcast PCL with chosen channel ID’s at the top of the list
t, agent i chooses an action at, and receives local reward rt+1 (at ) to other users.
Receive the information of other user’s channel selection.
at time t+1. The agent i updates the Q -value of at at time t+1 as Calculate population state of each channel (Count number of
follows: user selecting a given channel).
Choose the channel with least count.
Qti+1 (att ) ← (1 − α )Qti (ait ) + α rti+1 (ait ) (1) Sense and contend for the chosen channel and
Transmit data packets if successfully grabbing the channel.
where 0≤α≤1 is learning rate. Value of α decides the dependence If (receive ACK for the DATA packet sent)
on the reward, a higher value of α gives more importance to local Then ND = ND+1
reward than past knowledge. End if
End of initial time slot.
Agents search for action that maximizes value function Vπ as
shown below: For remaining time slot’s
rt+1 (at ) = (ND/TD) + population state
V π = max(Qti (a)). (2) Q t+1 (at ) = (1 − α ) Qt (at ) + α rt+1 at ).
a∈A R = uniform (0, 1) {generate random number}
By exploring the environment, the agents build a table of Q - If R<=ε then
a temp = uniform (1, k)
values for each environment state and each possible action. Ex- Else
ploitation chooses the best known action, or the greedy action, at a temp = Argmax a ∈ A (Q (a))
all times for performance enhancement. Exploration chooses the If | Q t+1 (a temp)− Qt+1 (at )| <= β then
other non-optimal actions once in a while to improve the estimates a t+1 = at
of all the Q -values in order to realize better actions. The learning
a t+1 = atemp
rate and the discount factor are important parameters of the Q- End if
learning algorithm. The learning rate parameter limits how quickly End if
learning can occur. The discount factor controls the value placed on Return a t+1
future rewards. End if
Broadcast PCL with chosen channel ID’s at the top of the list other
5. Hybrid directional CR-MAC based on Q-learning with direc- Receive the information of other user’s channel selection.
tional power control Calculate population state of each channel (Count number of the user
selecting a given channel).
Choose the channel with least count.
5.1. Channel selection using Q-learning Sense and contend for the chosen channel and
Transmit data packets if successfully grabbing the channel.
We present our directional antenna based hybrid CCC based End of time slot
CR-MAC with dynamic channel selection implementation in this
section. Each SU node is equipped with one transceiver which is
used for both control and data. 902MHz is used by nodes to ex- 5.2. Optimal directional power control
change their free channel list which is used to find common control
channel (CCC) among nodes. Nodes decide on PU free (available) In this section, the proposed optimal directional power control
data channel in the licensed band over CCC. Two-way handshaking algorithm for channel spatial reuse is presented. We first study
is performed by nodes to transmit control and data information. the feasibility of the channel reuse with proposed optimal power
An illustration of the cognitive MAC protocol is shown in Fig. 4. control algorithm. Fig. 2 illustrates a classical spectrum access or
Channel switching decision is made at the beginning of each time spectrum sharing scenario with D randomly distributed primary
slot which depends on the channel state information. users (PU in Fig. 2) and K secondary users (SU in Fig. 1). In this
In Directional Hybrid (DH) CR-MAC the DCS applies QL to select scenario, we assume that each of the PUs is equipped with Omni-
a channel. Each agent divides time in to fixed intervals of ‘t’ and antenna, while each of SUs is equipped with multi-antenna, which
keeps track of a number of packets transmitted successfully ‘ND ’. is available for beamforming technology. In this case, the small
At the beginning of each‘t’ every node updates Q -value using cell consists of PU broadcasting channels and SU beamforming
(1) and chooses the channel with highest Q -value and broadcast channels. In addition, considering the mobility of both the primary
to its neighbors along with PCL. Nodes after receiving broadcast users and secondary users, we assume that both PUs and SUs follow
information update PCL table and calculate population state of the homogeneous Poisson point process (HPPP). Let {N(A)} denotes
each channel, i.e. number of users selecting a given channel. Nodes the number of users in the area ‘‘A’’, such as the cell in Fig. 2. If {N
select the channel with least population state for the transmission. (A)} follows an HPPP with the intensity of λ > 0, that is, N(A) ∼
Every node maintains Q-table that consists of Q -values which are Poisson (λ| A|), then the probability of N(A)= k can be expressed
in the range of 0 to 1. We use dynamic Q-table that the size of as:
Q-table of the node is determined by the number of available e−λ|A| (λ|A|)k
channels. The Q-table and the learning task are distributed among P(N(A) = k) = . (3)
4 A. Carie et al. / Future Generation Computer Systems ( ) –
Fig. 4. Node communication using hybrid control channel using learning and power
where n is a zero-mean independent complex Gaussian noise with For, i = 1, 2, . . . , m where γs is the channel gain of the SU link.
unit variance; h ∈ C M ×1 is the channel vector between the Therefore, by solving the above optimization, we can finally obtain
A. Carie et al. / Future Generation Computer Systems ( ) – 5
Table 1
Simulation parameters.
Parameter name Description
Topology 1000 *1000 Flatgrid
No. of CR nodes 100
No. of PU channels 10(8 MHz) channels
No. of PU transmitters 10
Unlicensed channel ISM-902 MHz
PU active probability 10, 15 and 20 msec
Mobility model Random waypoint
Input CR transmit power 10 mW
Receiver threshold −95 dbm
Carrier Sense threshold −115 dbm
CR Tx range 200 m (Licensed channel)
PU Tx range 500 m (Licensed channel)
Data rate 2 Mbps
Antenna type(Channel reservation) Directional (4-element)
Antenna type(Data transmission) Directional(4-element)
Beamwidth 900
Interface Queue length 50
Simulation time (s) 100 Sec
Traffic type CBR/UDP
Packet size (bytes) 512 & 1024 bytes
6. Simulation results
6 A. Carie et al. / Future Generation Computer Systems ( ) –
