A Survey On Machine-Learning Techniques in Cognitive Radios
A Survey On Machine-Learning Techniques in Cognitive Radios
Abstract—In this survey paper, we characterize the learning interweave paradigms for spectrum co-existence by secondary
problem in cognitive radios (CRs) and state the importance of CRs in licensed spectrum bands [10].
artificial intelligence in achieving real cognitive communications
systems. We review various learning problems that have been To perform its cognitive tasks, a CR should be aware of its
studied in the context of CRs classifying them under two main RF environment. It should sense its surrounding environment
categories: Decision-making and feature classification. Decision- and identify all types of RF activities. Thus, spectrum sensing
making is responsible for determining policies and decision was identified as a major ingredient in CRs [6]. Many sensing
rules for CRs while feature classification permits identifying and
techniques have been proposed over the last decade [15], [39],
classifying different observation models. The learning algorithms
encountered are categorized as either supervised or unsupervised [40], based on matched filter, energy detection, cyclostationary
algorithms. We describe in detail several challenging learning detection, wavelet detection and covariance detection [30],
issues that arise in cognitive radio networks (CRNs), in particular [41]–[46]. In addition, cooperative spectrum sensing was
in non-Markovian environments and decentralized networks, and proposed as a means of improving the sensing accuracy by
present possible solution methods to address them. We discuss
addressing the hidden terminal problems inherent in wireless
similarities and differences among the presented algorithms and
identify the conditions under which each of the techniques may networks in [15], [33], [34], [42], [47]–[49]. In recent years,
be applied. cooperative CRs have also been considered in literature as in
[50]–[53]. Recent surveys on CRs can be found in [41], [54],
Index Terms—Artificial intelligence, cognitive radio, decision-
making, feature classification, machine learning, supervised [55]. A survey on the spectrum sensing techniques for CRs
learning, unsupervised learning, . can be found in [39]. Several surveys on the DSA techniques
and the medium access control (MAC) layer operations for
the CRs are provided in [56]–[60].
I. I NTRODUCTION
In addition to being aware of its environment, and in order
(SDRs). In this case, several parameters and policies need to be and multi-agent systems can be unsatisfactory [88]–[91]. Other
adjusted simultaneously (e.g. transmit power, coding scheme, types of learning mechanisms such as evolutionary learning
modulation scheme, sensing algorithm, communication pro- [89], [92], learning by imitation, learning by instruction [93]
tocol, sensing policy, etc.) and no simple formula may be and policy-gradient methods [90], [91] have been shown to
able to determine these setup parameters simultaneously. This outperform RL on certain problems under such conditions.
is due to the complex interactions among these factors and For example, the policy-gradient approach has been shown to
their impact on the RF environment. Thus, learning methods be more efficient in partially observable environments since it
can be applied to allow efficient adaption of the CRs to searches directly for optimal policies in the policy space, as
their environment, yet without the complete knowledge of we shall discuss later in this paper [90], [91].
the dependence among these parameters [74]. For example, Similarly, learning in multi-agent environments has been
in [71], [75], threshold-learning algorithms were proposed to considered in recent years, especially when designing learning
allow CRs to reconfigure their spectrum sensing processes policies for CRNs. For example, [94] compared a cognitive
under uncertainty conditions. network to a human society that exhibits both individual
The problem becomes even more complicated with hetero- and group behaviors, and a strategic learning framework for
geneous CRNs. In this case, a CR not only has to adapt to cognitive networks was proposed in [95]. An evolutionary
the RF environment, but also it has to coordinate its actions game framework was proposed in [96] to achieve adaptive
with respect to the other radios in the network. With only learning in cognitive users during their strategic interactions.
a limited amount of information exchange among nodes, a By taking into consideration the distributed nature of CRNs
CR needs to estimate the behavior of other nodes in order and the interactions among the CRs, optimal learning methods
to select its proper actions. For example, in the context of can be obtained based on cooperative schemes, which helps
DSA, CRs try to access idle primary channels while limiting avoid the selfish behaviors of individual nodes in a CRN.
collisions with both licensed and other secondary cognitive One of the main challenges of learning in distributed CRNs
users [38]. In addition, if the CRs are operating in unknown is the problem of action coordination [88]. To ensure optimal
RF environments [5], conventional solutions to the decision behavior, centralized policies may be applied to generate
process (i.e. Dynamic Programming in the case of Markov optimal joint actions for the whole network. However, central-
Decision Processes (MDPs) [76]) may not be feasible since ized schemes are not always feasible in distributed networks.
they require complete knowledge of the system. On the other Hence, the aim of cognitive nodes in distributed networks is to
hand, by applying special learning algorithms such as the apply decentralized policies that ensure near-optimal behavior
reinforcement learning (RL) [38], [74], [77], it is possible to while reducing the communication overhead among nodes. For
arrive at the optimal solution to the MDP, without knowing example, a decentralized technique that was proposed in [3],
the transition probabilities of the Markov model. Therefore, [97] was based on the concept of docitive networks, from the
given the reconfigurability requirements and the need for Latin word docere (to teach), which establishes knowledge
autonomous operation in unknown and heterogeneous RF transfer (i.e. teaching) over the wireless medium [3]. The
environment, CRs may use learning algorithms as a tool objective of docitive networks is to reduce the cognitive
for adaptation to the environment and to coordinate with complexity, speed up the learning rate and generate better and
peer radio devices. Moreover, incorporation of low-complexity more reliable decisions [3]. In a docitive network, radios teach
learning algorithms can lead to reduced system complexities each others by interchanging knowledge such that each node
in CRs. attempts to learn from a more intelligent node. The radios are
A look at the recent literature on CRs reveals that both not only supposed to teach end-results, but rather elements of
supervised and unsupervised learning techniques have been the methods of getting there [3]. For example, in a docitive
proposed for various learning tasks. The authors in [65], [78], network, new upcoming radios can acquire certain policies
[79] have considered supervised learning based on neural from existing radios in the network. Of course, there will
networks and support vector machines (SVMs) for CR ap- be communication overhead during the knowledge transfer
plications. On the other hand, unsupervised learning, such as process. However, as it is demonstrated in [3], [97], this
RL, has been considered in [80], [81] for DSS applications. overhead is compensated by the policy improvement achieved
The distributed Q-learning algorithm has been shown to be due to cooperative docitive behavior.
effective in a particular CR application in [77]. For example, in
[82], CRs used the Q-learning to improve detection and clas-
sification performance of primary signals. Other applications A. Purpose of this paper
of RL to CRs can be found, for example, in [14], [83]–[85]. This paper discusses the role of learning in CRs and
Recent work in [86] introduces novel approaches to improve emphasizes how crucial the autonomous learning ability in
the efficiency of RL by adopting a weight-driven exploration. realizing a real CR device. We present a survey of the state-of-
Unsupervised Bayesian non-parametric learning based on the the-art achievements in applying machine learning techniques
Dirichlet process was proposed in [13] and was used for signal to CRs.
classification in [72]. A robust signal classification algorithm It is perhaps helpful to emphasize how this paper is different
was also proposed in [87], based on unsupervised learning. from other related survey papers. The most relevant is the
Although the RL algorithms (such as Q-learning) may pro- survey of artificial intelligence for CRs provided in [98] which
vide a suitable framework for autonomous unsupervised learn- reviews several CR implementations that used the following
ing, their performance in partially observable, non-Markovian artificial intelligence techniques: artificial neural networks
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1138 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
Information Knowledge
Perception Learning Reasoning
Intelligent Design
Learning
Paradigms in
CR’s
Unsupervised Supervised
Learning Learning
Bayesian Non-
Reinforcement Artificial Neural Support Vector
Parametric Game Theory
Learning Networks Machine
Approaches
B. Unique characteristics of cognitive radio learning prob- To sum up, the three main characteristics that need to be
lems considered when designing efficient learning algorithms for
Although the term cognitive radio has been interpreted CRs are:
differently in various research communities [5], perhaps the 1) Learning in partially observable environments.
most widely accepted definition is as a radio that can sense and 2) Multi-agent learning in distributed CRNs.
adapt to its environment [2], [5], [6], [69]. The term cognitive 3) Autonomous learning in unknown RF environments.
implies awareness, perception, reasoning and judgement. As A CR design that embeds the above capabilities will be able
we already pointed out earlier, in order for a CR to derive to operate efficiently and optimally in any RF environment.
reasoning and judgement from perception, it must possess
the ability for learning [99]. Learning implies that the current
actions should be based on past and current observations of C. Types of learning paradigms: Supervised versus unsuper-
the environment [100]. Thus, history plays a major role in the vised learning
learning process of CRs.
Several learning problems are specific to CR applications Learning can be either supervised or unsupervised, as
due to the nature of the CRs and their operating RF environ- depicted in Fig. 3. Unsupervised learning may particularly be
ments. First, due to noisy observations and sensing errors, CRs suitable for CRs operating in alien RF environments [5]. In
can only obtain partial observations of their state variables. this case, autonomous unsupervised learning algorithms permit
The learning problem is thus equivalent to a learning process exploring the environment characteristics and self-adapting
in a partially observable environment and must be addressed actions accordingly without having any prior knowledge [5],
accordingly. [71]. However, if the CR has prior information about the envi-
Second, CRs in CRNs try to learn and optimize their ronment, it might exploit this knowledge by using supervised
behaviors simultaneously. Hence, the problem is naturally a learning techniques. For example, if certain signal waveform
multi-agent learning process. Furthermore, the desired learn- characteristics are known to the CR prior to its operation,
ing policy may be based on either cooperative or non- training algorithms may help CRs to better detect signals with
cooperative schemes and each CR might have either full or those characteristics.
partial knowledge of the actions of the other cognitive users in In [93], the two categories of supervised and unsupervised
the network. In the case of partial observability, a CR might learning are identified as learning by instruction and learn-
apply special learning algorithms to estimate the actions of ing by reinforcement, respectively. A third learning regime
the other nodes in the network before selecting its appropriate is defined as the learning by imitation in which an agent
actions, as in, for example, [88]. learns by observing the actions of similar agents [93]. In
Finally, autonomous learning methods are desired in order [93], it was shown that the performance of a learning agent
to enable CRs to learn on its own in an unknown RF (learner) is influenced by its learning regime and its operating
environment. In contrast to licensed wireless users, a truly CR environment. Thus, to learn efficiently, a CR must adopt the
may be expected to operate in any available spectrum band, at best learning regime for a given learning problem, whether it
any time and in any location [5]. Thus, a CR may not have any is learning by imitation, by reinforcement or by instruction
prior knowledge of the operating RF environment such as the [93]. Of course, some learning regimes may not be applicable
noise or interference levels, noise distribution or user traffics. under certain circumstances. For example, in the absence of an
Instead, it should possess autonomous learning algorithms that instructor, the CR may not be able to learn by instruction and
may reveal the underlying nature of the environment and its may have to resort to learning by reinforcement or imitation.
components. This makes the unsupervised learning a perfect An effective CR architecture is the one that can switch among
candidate for such learning problems in CR applications, as different learning regimes depending on its requirements, the
we shall point out throughout this survey paper. available information and the environment characteristics.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1140 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
CR Learning
Problems
Decision-
Classification making
Policy- Decision-
making rules
Supervised
(Data Unsupervised
Labelling)
Single-agent/ Multi-agent/ Parameter
centralized decentralized optimization
Fig. 4. Typical problems in cognitive radio and their corresponding learning algorithms.
1
N
In this survey, we discuss several learning algorithms that R(g) = Remp (g) = L(yi , g(xi )) , (1)
N i=1
can be used by CRs to achieve different goals. In order to
obtain a better insight on the functions and similarities among where L : Y × Y → R+ is a loss function. Hence,
the presented algorithms, we identify two main problem cate- ANN algorithms find the function g that best fits the data.
gories and show the learning algorithms under each category. However, if the function space G includes too many candidates
The hierarchical organization of the learning algorithms and or the training set is not sufficiently large (i.e. small N ),
their dependence is illustrated in Fig. 4. empirical risk minimization may lead to high variance and
Referring to Fig. 4, we identify two main CR problems (or poor generalization, which is known as overfitting. In order to
tasks) as: prevent overfitting, structural risk minimization can be used,
which incorporates a regularization penalty to the optimization
1) Decision-making. process [101]. This can be done by minimizing the following
2) Feature classification. risk function:
These problems are general in a sense that they cover a wide R(g) = Remp (g) + λC(g) , (2)
range of CR tasks. For example, classification problems arise
in spectrum sensing while decision-making problems arise in where λ controls the bias/variance tradeoff and C is a penalty
determining the spectrum sensing policy, power control or function [101].
adaptive modulation. In contrast with the supervised approaches, unsupervised
classification algorithms do not require labeled training data
The learning algorithms that are presented in this paper and can be classified as being either parametric or non-
can be classified under the above two tasks, and can be parametric. Unsupervised parametric classifiers include the K-
applied under specific conditions, as illustrated in Fig. 4. For means and Gaussian mixture model (GMM) algorithms and
example, the classification algorithms can be split into two require prior knowledge of the number of classes (or clusters).
different categories: Supervised and unsupervised. Supervised On the other hand, non-parametric unsupervised classifiers do
algorithms require training with labeled data and include, not require prior knowledge of the number of clusters and
among others, the ANN and SVM algorithms. The ANN can estimate this quantity from the observed data itself, for
algorithm is based on empirical risk minimization and does example using methods based on the Dirichlet process mixture
require prior knowledge of the observed process distribution, model (DPMM) [72], [104], [105].
as opposed to structural models [101]–[103]. However, SVM Decision-making is another major task that has been widely
algorithms, which are based on structural risk minimization, investigated in CR applications [17], [24]–[26], [35], [38],
have shown superior performance, in particular for small [77], [106]–[110]. Decision-making problems can in turn be
training examples, since they avoid the problem of overfitting split to policy-making and decision rules. Policy-making prob-
[101], [103]. lems can be classified as either centralized or decentralized.
For instance, consider a set of training data denoted as In a policy-making problem, an agent determines its optimal
{(x1 , y1 ), · · · , (xN , yN )} such that xi ∈ X, yi ∈ Y , ∀i ∈ set of actions over a certain time duration, thus defining
{1, · · · , N }. The objective of a supervised learning algorithm an optimal policy (or an optimal strategy in game theory
is to find a function g : X → Y that maximizes a certain terminology). In a centralized scenario with a Markov state,
score function [101]. In ANN, g is defined as the function RL algorithms can be used to obtain optimal solution to the
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1141
bility that the system is in state s at time epoch t + 1, Obviously, the value iteration algorithm requires explicit
when the decision-maker chooses action a ∈ A in state knowledge of the transition probability p(s |s, a). On the other
s ∈ S at time t. Note that, the subscript t might be hand, an RL algorithm, referred to as the Q-learning, was
dropped from pt (s |s, a) if the system is stationary. proposed by Watkins in 1989 [117] to solve the MDP problem
• A real-valued function rtMDP (s, a) defined for state s ∈ without knowledge of the transition probabilities and has
S and action a ∈ A to denote the value at time t of been recently applied to CRs [38], [77], [82], [118]. The Q-
the reward received in period t [76]. Note that, in RL learning algorithm is one of the important temporal difference
literature, the reward function is usually defined as the (TD) methods [74], [117]. It has been shown to converge
delayed reward rt+1 (s, a) that is obtained at time epoch to the optimal policy when applied to single agent MDP
t + 1 after taking action a in state s at time t [74]. models (i.e. centralized control) in [117] and [74]. However, it
can also generate satisfactory near-optimal solutions even for
At each time epoch t, the agent observes the current state
decentralized partially observable MDPs (DEC-POMDPs), as
s and chooses an action a. An optimum policy maximizes
shown in [77]. The one-step Q-learning is defined as follows:
the total expected rewards, which is usually discounted by
a discount factor γ ∈ [0, 1) in case of an infinite time Q(st , at ) ← (1 − α)Q(st , at ) +
horizon. Thus, the objective is to find the optimal policy π
+ α rt+1 (st , at ) + γ max Q(st+1 , a) . (8)
that maximizes the expected discounted return [74]: a
∞
The learned action-value function, Q in (8), directly approx-
R(t) = γ k rt+k+1 (st+k , at+k ) , (3) imates the optimal action-value function Q∗ [74]. However, it
k=0 is required that all state-action pairs need to be continuously
where st and at are, respectively, the state and action at time updated in order to guarantee correct convergence to Q∗ .
t ∈ Z. This can be achieved by applying an ε-greedy policy that
The optimal solution of an MDP can be obtained by using ensures that all state-action pairs are updated with a non-
several methods such as the value iteration algorithm based zero probability, thus leading to an optimal policy [74]. If
on dynamic programming [76]1. Given a certain policy π, the the system is in state s ∈ S, the ε-greedy policy selects action
value of state s ∈ S is defined as the expected discounted a∗ (s) such that:
return if the system starts in state s and follows policy π arg maxa∈A Q(s, a) , with Pr = 1 − ε
thereafter [74], [76]. This value function can be expressed as a∗ (s) = ,
∼ U (A) , with Pr = ε
[74]: (9)
∞
where U (A) is the discrete uniform probability distribution
V (s) = Eπ
π
γ rt+k+1 (st+k , at+k )|st = s , (4)
k
over the set of actions A.
k=0 In [77], the authors applied the Q-learning to achieve
where Eπ {.} denotes the expected value given that the agent interference control in a cognitive network. The problem setup
follows policy π. Similarly, the value of taking action a in state of [77] is illustrated in Fig. 6 in which multiple IEEE 802.22
s under a policy π is defined as the action-value function [74]: WRAN cells are deployed around a Digital TV (DTV) cell
∞ such that the aggregated interference caused by the secondary
Q (s, a) = Eπ
π
γ rt+k+1 (st+k , at+k )|st = s, at = a .
k networks to the DTV network is below a certain threshold. In
k=0
this scenario, the CR (agents) constitutes a distributed network
(5) and each radio tries to determine how much power it can
The value iteration algorithm finds an ε-optimal policy transmit so that the aggregated interference on the primary
assuming stationary rewards and transition probabilities (i.e. receivers does not exceed a certain threshold level.
rt (s, a) = r(s, a) and pt (s |s, a) = p(s |s, a)). The algorithm In this system, the secondary base stations form the learning
initializes a v 0 (s) for each s ∈ S arbitrarily and iteratively agents that are responsible for identifying the current envi-
updates v n (s) (where v n (s) is the estimated value of state s ronment state, selecting the action based on the Q-learning
after the n-th iteration) for each s ∈ S as follows [76]: methodology and executing it. The state of the i-th WRAN
⎧ ⎫
⎨ ⎬ network at time t consists of three components and is defined
v n+1 (s) = max r(s, a) + γ p(j|s, a)v n (j) . (6) as [77]:
a∈A ⎩ ⎭ sit = {Iti , dit , pit } , (10)
j∈S
The algorithm stops when v n+1 − v n < ε 1−γ where Iti is a binary indicator specifying whether the sec-
2γ and the ε-
optimal decision d (s) of each state s ∈ S is defined as: ondary network generates interference to the primary network
⎧ ⎫ above or below the specified threshold, dit denotes an estimate
⎨ ⎬ of the distance between the secondary user and the interference
d (s) = arg max r(s, a) + γ p(j|s, a)v n+1 (j) . (7) contour, and pit denotes the current power at which the
a∈A ⎩ ⎭
j∈S secondary user i is transmitting. In the case of full state
observability, the secondary user has complete knowledge of
1 There are other algorithms that can be applied to find the optimal policy of
the state of the environment. However, in a partially observable
an MDP such as policy iteration and linear programming methods. Interested environment, the agent i has only partial information of the
readers are referred to [76] for additional information regarding these methods. actual state and uses a belief vector to represent the probability
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1143
look for optimal policies in the policy space itself, without designed to obtain reasonable approximations of the gradient.
having to estimate the actual states of the systems [90], [91]. In Indeed, several approaches have been proposed to estimate the
particular, by adopting policy gradient algorithms, the policy gradient policy vector, mainly in robotics applications [119],
vector can be updated to reach an optimal solution (or a local [120]. Three different approaches have been considered in
optimum) in non-Markovian environments. [120] for policy gradient estimation:
The value-iteration approach has several other limitations as 1) Finite difference (FD) methods.
well: First, it is restricted to deterministic policies. Second, any 2) Vanilla policy gradient (VPG) methods.
small changes in the estimated value of an action can cause 3) Natural policy gradient (NG) methods.
that action to be, or not to be selected [90]. This would affect
Finite difference (FD) methods, originally used in stochastic
the optimality of the resulting policy since optimal actions
simulations literature, are among the oldest policy gradient
might be eliminated due to an underestimation of their value
approaches. The idea is based on changing the current policy
functions.
parameter θk by small perturbations δθi and computing δηi =
On the other hand, the gradient-policy approach has shown
η(θk + δθi ) − η(θk ). The policy gradient ∇η(θ) can be thus
promising results, for example, in robotics applications [119],
estimated as:
[120]. Compared to value-iteration methods, the gradient-
policy approach requires fewer parameters in the learning −1
gF D = ΔΘT ΔΘ ΔΘΔη , (15)
process and can be applied in model-free setups not requiring
prefect knowledge of the controlled system. where ΔΘ = [δθ1 , · · · , δθI ]T , Δη = [δη1 , · · · , δηI ]T and
The policy-search approach can be illustrated by the fol- I is the number of samples [119], [120]. Advantages of this
lowing overview of policy-gradient algorithms from [91]. We
approach is that it is straightforward to implement and does not
consider a class of stochastic policies that are parameterized introduce significant noise to the system during exploration.
by θ ∈ RK . By computing the gradient with respect to However, the gradient estimate can be very sensitive to per-
θ of the average reward, the policy could be improved by
turbations (i.e. δθi ) which may lead to bad results [120].
adjusting the parameters in the gradient direction. To be Instead of perturbing the parameter θk of a deterministic
concrete, assume r(X) to be a reward function that depends policy u = π(x) (with u being the action and x being
on a random variable X. Let q(θ, x) be the probability of the
the state), the VPG approach assumes a stochastic policy
event {X = x}. The gradient with respect to θ of the expected u ∼ π(u|x) and obtains an unbiased gradient estimate [120].
performance η(θ) = E{r(X)} can be expressed as:
However, in using the VPG method, the variance of the gradi-
∇q(θ, x) ent estimate depends on the squared average magnitude of the
∇η(θ) = E r(X) . (12)
q(θ, x) reward, which can be very large. In addition, the convergence
of the VPG to the optimal solution can be very slow, even
An unbiased estimate of the gradient can be obtained via
with an optimal baseline [120]. The NG approach which leads
simulation by generating N independent identically distributed
to fast policy gradient algorithms can alleviate this problem.
(i.i.d.) random variables X1 , · · · , XN that are distributed
Natural gradient approaches use the Fisher information F (θ)
according to q(θ, x). The unbiased estimate of ∇η(θ) is thus
to characterize the information about the policy parameters
expressed as:
θ that is contained in the observed path τ [120]. A path (or
1 a trajectory) τ = [x0:H , u0:H ] is defined as the sequence of
N
ˆ ∇q(θ, Xi )
∇η(θ) = r(Xi ) . (13) states and actions, where H denotes the horizon which can
N i=1 q(θ, Xi )
be infinite [119]. Thus, the Fisher information F (θ) can be
By the law of large numbers, ∇η(θ) ˆ → ∇η(θ) with expressed as:
probability one. Note that the quantity ∇q(θ,X i)
q(θ,Xi ) is referred F (θ) = E ∇θ log p(τ |θ)∇θ log p(τ |θ)T , (16)
to as the likelihood ratio or the score function. By having an
estimate of the reward gradient, the policy parameter θ ∈ RK where p(τ |θ) is the probability of trajectory τ , given certain
can be updated by following the gradient direction, such that: policy parameter θ. For a given policy change δθ, there is an
information loss of lθ (δθ) ≈ δθT F (θ)δθ, which can also be
θk+1 ← θk + αk ∇η(θ) , (14) seen as the change in path distribution p(τ |θ). By searching
for some step size αk > 0. for the policy change δθ that maximizes the expected return
Authors in [119], [120] identify two major steps when η(θ + δθ) for a constant information loss lθ (δθ) ≈ ε, the
performing policy gradient methods: algorithms searches for the highest return value on an ellipse
1) A policy evaluation step in which an estimate of the around the current parameter θ and then goes in the direction
gradient ∇η(θ) of the expected return η(θ) is obtained, of the highest values. More formally, the direction of the
given a certain policy πθ . steepest ascent on the ellipse around θ can be expressed as
2) A policy improvement step which updates the policy [120]:
parameter θ through steepest gradient ascent θk+1 = δθ = arg max δθT ∇θ η(θ) = F −1 (θ)∇θ η(θ) . (17)
θk + αk ∇η(θ). δθ s.t. lθ (δθ)=ε
Note that, estimating the gradient ∇η(θ) is not straight- This algorithm is further explained in [120] and can be easily
forward, especially in the absence of simulators that generate implemented based on the Natural Actor-Critic algorithms
the Xi ’s. To resolve this problem, special algorithms can be [120].
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1145
By comparing the above three approaches, the authors in A non-cooperative game can be classified as either a
[120] showed that NG and VPG methods are considerably complete or an incomplete information game. In a complete
faster and result in better performance, compared to FD. How- information game, each player can observe the information
ever, FD has the advantage of being simpler and applicable in of other players such as their payoffs and their strategies.
more general situations. On the other hand, in an incomplete information game, this
information is not available to other players. A game with
incomplete information can be modeled as a Bayesian game
C. Decentralized policy-making: Game Theory
in which the game outcomes can be estimated based on
Game theory [121] presents a suitable platform for mod- Bayesian analysis. A Bayesian Nash equilibrium is defined
eling rational behavior among CRs in CRNs. There is a rich for the Bayesian game, similar to the Nash equilibrium in the
literature on game theoretic techniques in CR, as can be found complete information game [115].
in [11], [122]–[132]. A survey on game theoretic approaches In addition, a game can also be classified as either static or
for multiple access wireless systems can be found in [115]. dynamic. In a static game, each player takes its actions without
Game theory [121] is a mathematical tool that attempts to knowledge of the strategies taken by the other players. This
implement the behavior of rational entities in an environment is denoted as a one-shot game which ends when actions of
of conflict. This branch of mathematics has primarily been all players are taken and payoffs are received. In a dynamic
popular in economics, and has later found its way into game, however, a player selects an action in the current stage
biology, political science, engineering and philosophy [115]. based on the knowledge of the actions taken by the other
In wireless communications, game theory has been applied players in the current or previous stages. A dynamic game is
to data communication networking, in particular, to model also called a sequential game since it consists of a sequence
and analyze routing and resource allocation in competitive of repeated static games. The common equilibrium solution
environments. in dynamic games is the subgame perfect Nash equilibrium
A game model consists of several rational entities that which represents a Nash equilibrium of every subgame in the
are denoted as the players. Assuming a game model G = original game [115].
(N , (Ai )i∈N , (Ui )i∈N ), where N = {1, · · · , N } denotes the 2) Applications of Game Theory to Cognitive Radios:
set of N players and each player i ∈ N has a set Ai of avail- Several types of games have been adapted to model different
able actions and a utility function Ui . Let A = A1 × · · · × AN situations in CRNs [98]. For example, supermodular games
be the set of strategy profiles of all players. In general, the (the games having the following important and useful prop-
utility function of an individual player i ∈ N depends on erty: there exists at least one pure strategy Nash equilibrium)
the actions taken by all the players involved in the game and have been used for distributed power control in [133], [134]
is denoted as Ui (ai , a−i ), where ai ∈ Ai is an action (or and for rate adaptation in [135]. Repeated games were applied
strategy) of player i and a−i ∈ A−i is a strategy profile of for DSA by multiple secondary users that share the same
all players except player i. Each player selects its strategy in spectrum hole in [136]. In this context, repeated games are
order to maximize its utility function. A Nash equilibrium of a useful in building reputations and applying punishments in
game is defined as a point at which the utility function of each order to reinforce a certain desired outcome. The Stackelberg
player does not increase if the player deviates from that point, game model can be used as a model for implementing CR
given that all the other players’ actions are fixed. Formally, behavior in cooperative spectrum leasing where the primary
a strategy profile (a∗1 , · · · , a∗N ) ∈ A is a Nash equilibrium if users act as the game-leaders and secondary cognitive users
[112]: as the followers [50].
Auctions are one of the most popular methods used for
Ui (a∗i , a−i ) ≥ Ui (ai , a−i ), ∀i ∈ N , ∀ai ∈ Ai . (18) selling a variety of items, ranging from antiques to wireless
spectrum. In auction games the players are the buyers who
A key advantage of applying game theoretic solutions to must select the appropriate bidding strategy in order to max-
CR protocols is in reducing the complexity of adaptation algo- imize their perceived utility (i.e., the value of the acquired
rithms in large cognitive networks. While optimal centralized items minus the payment to the seller). The concept of auction
control can be computationally prohibitive in most CRNs, due games has successfully been applied to cooperative dynamic
to communication overhead and algorithm complexity, game spectrum leasing (DSL) in [37], [137], as well as to spectrum
theory presents a distributed platform to handle such situations allocation problems in [138]. The basics of the auction games
[98]. Another justification for applying game theoretic ap- and the open challenges of applying auction games to the field
proaches to CRs is the assumed cognition in the CR behavior, of spectrum management are discussed in [139].
which induces rationality among CRs, similar to the players Stochastic games (or Markov games) can be used to model
in a game. the greedy selfish behavior of CRs in a CRN, where CRs
1) Game Theoretic Approaches: There are two major game try to learn their best response and improve their strategies
theoretic approaches that can be used to model the behavior of over time [140]. In the context of CRs, stochastic games
nodes in a wireless medium: Cooperative and non-cooperative are dynamic, competitive games with probabilistic actions
games. In a non-cooperative game, the players make rational played by secondary spectrum users. The game is played
decisions considering only their individual payoff. In a co- in a sequence of stages. At the beginning of each stage,
operative game, however, players are grouped together and the game is in a certain state. The secondary users choose
establish an enforceable agreement in their group [115]. their actions, and each secondary user receives a reward that
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1146 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
depends on both its current state and its selected actions. The in [141], the communication overhead among the CR users
game then moves to the next stage having a new state with is reduced. Furthermore, the model in [141] provides an
a certain probability, which depends on the previous state alternative solution to opportunistic spectrum access schemes
as well as the actions selected by the secondary users. The proposed in [107], [108] that do not consider the interactions
process continues for a finite or infinite number of stages. among multiple secondary users in a partially observable MDP
The stochastic games are generalizations of repeated games (POMDP) framework [141].
that have only a single state. Thus, learning in a game theoretic framework can help CRs
to adapt to environment variations given a certain uncertainty
3) Learning in Game Theoretic Models: There are sev- about the other users’ strategies. Therefore, it provides a
eral learning algorithms that have been proposed to estimate potential solution for multi-agent learning problems under
unknown parameters in a game model (e.g. other players’ partial observability assumptions.
strategies, environment states, etc.). In particular, no-regret
learning allows initially uninformed players to acquire knowl-
edge about their environment state in a repeated game [111]. D. Decision rules under uncertainty: Threshold-learning
This algorithm does not require prior knowledge of the number
of players nor the strategies of other players. Instead, each A CR may be implemented on a mobile device that changes
player will learn a better strategy based on the rewards location over time and switches transmissions among several
obtained from playing each of its strategies [111]. channels. This mobility and multi-band/multi-channels oper-
ability may pose a major challenge for CRs in adapting to
The concept of regret is related to the benefit a player feels
their RF environments. A CR may encounter different noise or
after taking a particular action, compared to other possible
interference levels when switching between different bands or
actions. This can be computed as the average reward the
when moving from one place to another. Hence, the operating
player gets from a particular action, averaged over all other
parameters (e.g. test thresholds and sampling rate) of CRs need
possible actions that could be taken instead of that particular
to be adapted with respect to each particular situation. More-
action. Actions resulting in lower regret are updated with
over, CRs may be operating in unknown RF environments and
higher weights and are thus selected more frequently [111]. In
may not have perfect knowledge of the characteristics of the
general, no-regret learning algorithms help players to choose
other existing primary or secondary signals, requiring special
their policies when they do not know the other players’ ac-
learning algorithms to allow the CR to explore and adapt to
tions. Furthermore, no-regret learning can adapt to a dynamic
its surrounding environment. In this context, special types of
environment with little system overhead [111].
learning can be applied to directly learn the optimal values of
No-regret learning was applied in [111] to allow a CR to certain design and operation parameters.
update both its transmission power and frequencies simul-
Threshold learning presents a technique that permits such
taneously. In [113], it was used to detect malicious nodes
dynamic adaptation of operating parameters to satisfy the per-
in spectrum sensing whereas in [112] no-regret learning
formance requirements, while continuously learning from the
was used to achieve a correlated equilibrium in opportunis-
past experience. By assessing the effect of previous parameter
tic spectrum access for CRs. Assuming the game model
values on the system performance, the learning algorithm op-
G = (N , (Ai )i∈N , (Ui )i∈N ) defined above, in a correlated
timizes the parameters values to ensure a desired performance.
equilibrium, a strategy profile (a1 , · · · , aN ) ∈ A is chosen
For example, in considering energy detection, after measuring
randomly according to a certain probability distribution p
the energy levels at each frequency, a CR decides on the
[112]. A probability distribution p is a correlated strategy, if
occupancy of a certain frequency band by comparing the
and only if, for all i ∈ N , ai ∈ Ai , a−i ∈ A−i [112]:
measured energy levels to a certain threshold. The threshold
levels are usually designed based on Neyman-Pearson tests in
p(ai , a−i ) [Ui (ai , a−i ) − Ui (ai , a−i )] ≤ 0, ∀ai ∈ Ai . order to maximize the detection probability of primary signals,
a−i ∈A−i while satisfying a constraint on the false alarm. However, in
(19) such tests, the optimal threshold depends on the noise level.
Note that, every Nash equilibrium is a correlated equilibrium An erroneous estimation of the noise level might cause sub-
and Nash equilibria correspond to the special case where optimal behavior and violation of the operation constraints
p(ai , a−i ) is a product of each individual player’s probability (for example, exceeding a tolerable collision probability with
for different actions, i.e. the play of the different players is primary users). In this case, and in the absence of perfect
independent [112]. Compared to the non-cooperative Nash knowledge about the noise levels, threshold-learning algo-
equilibrium, the correlated equilibrium in [112] was shown rithms can be devised to learn the optimal threshold values.
to achieve better performance and fairness. Given each choice of a threshold, the resulting false alarm
Recently, [141] proposed a game-theoretic stochastic learn- rate determines how the test threshold should be regulated
ing solution for opportunistic spectrum access when the chan- to achieve a desired false alarm probability. An example of
nel availability statistics and the number of secondary users application of threshold learning can be found in [75] where
are unknown a priori. This model attempts to resolve non- a threshold learning algorithm was derived for optimizing
feasible opportunistic spectrum access solution which requires spectrum sensing in CRs. The resulting algorithm was shown
prior knowledge of the environment and the actions taken by to converge to the optimal threshold that satisfies a given false
the other users. By applying the stochastic learning solution alarm probability.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1147
finite dimensional Dirichlet distribution with parameters non-parametric Bayesian classification problems in which the
(α0 G0 (A1 ), · · · , α0 G0 (Ar )), where α0 > 0 [104]. We de- number of clusters is unknown a priori (i.e. allowing for
note: infinite number of clusters), with the infinite discrete support
(i.e. {φk }∞
k=1 being the set of clusters. However, due to the
(G(A1 ), · · · , G(Ar )) ∼ Dir(α0 G0 (A1 ), · · · , α0 G0 (Ar )) , infinite sum in G, it may not be practical to construct G
(20) directly by using this approach in many applications. An
where G ∼ DP (α0 , G0 ), denotes that the probability measure alternative approach to construct G is by using either the
G is drawn from the Dirichlet process DP (α0 , G0 ). In other Polya urn model [143] or the Chinese Restaurant Process
words, G is a random probability measure whose distribution (CRP) [144]. The CRP is a discrete-time stochastic process. A
is given by the Dirichlet process DP (α0 , G0 ) [104]. typical example of this process can be described by a Chinese
1) Construction of the Dirichlet process: Teh [104] de- restaurant with infinitely many tables and each table (cluster)
scribes several ways of constructing the Dirichlet process. A having infinite capacity. Each customer (feature point) that
first method is a direct approach that constructs the random arrives at the restaurant (RF spectrum) will choose a table
probability distribution G based on the stick-breaking method. with a probability proportional to the number of customers on
The stick-breaking construction of G can be summarized as that table. It may also choose a new table with a certain fixed
follows [104]: probability.
1) Generate independent i.i.d. sequences {πk }∞
k=1 and
A second approach to constructing a Dirichlet process
{φk }∞
k=1 such that
does not define G explicitly. Instead, it characterizes the
distribution of the drawings θ of G. Note that G is discrete
πk |α0 , G0 ∼ Beta(1, α0 ) with probability 1. For example, the Polya urn model [143]
, (21)
φk |α0 , G0 ∼ G0 does not construct G directly, but it characterizes the draws
where Beta(a, b) is the beta distribution whose prob- from G. Let θ1 , θ2 , · · · be i.i.d. random variables distributed
ability density function (pdf) is given by f (x, a, b) = according to G. These random variables are independent,
1
xa−1 (1−x)b−1
. given G. However, if G is integrated out, θ1 , θ2 , · · · are no
ua−1 (1−u)b−1 du
0
more conditionally independent and they can be characterized
2) Define πk = πk k−1 l=1 (1 − πl ). We can write π = as:
(π1 , π2 , · · · ) ∼ GEM (α0 ), where GEM stands for
K
mk α0
Griffiths, Engen and McCloskey [104]. The GEM (α) θi |{θj }i−1
j=1 , α0 , G0 ∼ δφ + G0 ,
process generates the vector π as described above, given i − 1 + α0 k i − 1 + α0
k=1
a parameter α0 in (21). (22)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1148 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
N (25)
θi |G ∼ G . (23) where B(yi ) = A(yi ) + f (y ), h(θ |y ) =
l=1,l = i θ l i i i
α0
A(yi ) f θ i (y i )G0 (θ i ) and A(y) = α0 f θ (y)G 0 (θ)dθ.
2) Dirichlet Process Mixture Model: The Dirichlet process
In order to illustrate this clustering method, consider a
makes a perfect candidate for non-parametric classification
simple example summarizing the process. We assume a set
problems through the DPMM. The DPMM imposes a non-
of mixture components θ ∈ R. Also, we assume G0 (θ) to
parametric prior on the parameters of the mixture model [104].
be uniform over the range [θmin , θmax ]. Note that this is a
The DPMM can be defined as follows:
worst-case scenario assumption whenever there is no prior
⎧ knowledge of the distribution of θ, except its range. Let
⎨ G ∼ DP (α0 , G0 ) (y−θ)2
θi |G ∼ G , (24)
1
fθ (y) = √2πσ 2
e− 2σ2 .
⎩ Hence,
yi |θi ∼ f (θi )
α0 θmin − y θmax − y
where θi ’s denote the mixture components and the yi is drawn A(y) = Q −Q (26)
θmax − θmin σ σ
according to this mixture model with a density function f
given a certain mixture component θi . and
3) Data clustering based on the DPMM and the Gibbs (yi −θi )2
sampling: Consider a sequence of observations {yi }N i=1 and h(θi |yi ) =
1
B √2πσ 2
e− 2σ2 if θmin ≤ θi ≤ θmax ,
assume that these observations are drawn from a mixture 0 otherwise
model. If the number of mixture components is unknown, (27)
it is reasonable to assume a non-parametric model, such as where B = 1 . Initially, we set θi = yi
θ −yi θmax −yi
Q minσ −Q σ
the DPMM. Thus, the mixture components θi are drawn
for all i ∈ {1, · · · , N }. The algorithm is described in Algo-
from G ∼ DP (α0 , G0 ), where G can be expressed as
rithm 1.
G= ∞ k=1 πk δφk , φk ’s are the unique values of θi , and πk are
If the observation points yi ∈ Rk (with k > 1), the
their corresponding probabilities. Denote y = (y1 , · · · , yN ).
distribution of h(θi |yi ) may become too complicated to be
The problem is to estimate the mixture component θ̂i for used in the sampling process of θi ’s. In [116], if G0 (θ) is
each observation yi , for all i ∈ {1, · · · , N }. This can be constant in a large area around yi , h(θ|yi ) was shown to be
achieved by applying the Gibbs sampling method proposed approximated by the Gaussian distribution (assuming that the
in [116] which has been applied for various unsupervised observation pdf fθ (yi ) is Gaussian). Thus, assuming a large
clustering problems, such as speaker clustering problem in uniform prior distribution on θ, we may approximate h(θ|y)
[145]. The Gibbs sampling is a technique for generating by a Gaussian pdf so that (27) becomes:
random variables from a (marginal) distribution indirectly,
without having to calculate the density. As a result, by using te h(θi |yi ) = N (yi , Σ) , (28)
Gibbs sampling, one can avoid difficult calculations, replacing where Σ is the covariance matrix.
them instead with a sequence of easier calculations. Although In order to illustrate this approach in a multidimensional
the roots of the Gibbs sampling can be traced back to at least scenario, we may generate a Gaussian mixture model having
Metropolis et al. [146], the Gibbs sampling perhaps became 4 mixture components. The mixture components have different
more popular after the paper of Geman and Geman [147], who means in R2 and have an identity covariance matrix. We will
studied image-processing models. assume that the covariance matrix is known.
In the Gibbs sampling method proposed in [116], the We plot in Fig. 8 the results of the clustering algorithm
estimates θ̂i is sampled from the conditional distribution of θi , based on DPMM. Three of the clusters were almost perfectly
given all the other feature points and the observation vector identified, whereas the forth cluster was split into three parts.
y. By assuming that {yi }N i=1 are distributed according to the The main advantage of this technique is its ability for learning
DPMM in (24), the conditional distribution of θi was obtained the number of clusters from the data itself, without any prior
in [116] to be knowledge. As opposed to heuristic or supervised classifi-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1149
DPMM classifcation with Gibbs sampling with σ= 1, α = 2 after 20000 iterations should be handled by the embedded flexibility offered by non-
0
30 parametric learning approaches.
The advantages of the Dirichlet process-based learning tech-
nique in [148] is that it does not rely on training data, making
Second coordinate of the feature vector
mum a posteriori (MAP) detection can be applied to a cluster status of the network affect its performance on different
center μc to estimate the wireless system that it belongs to. channels. In particular, an implementation of the proposed
However, the classification of feature points into clusters can Cognitive Controller for dynamic channel selection in IEEE
be done based on the CRP. 802.11 wireless networks was presented. Performance eval-
The classification of a feature point into a certain cluster is uation carried out on an IEEE 802.11 wireless network de-
made based on the Gibbs sampling applied to the CRP. The ployment demonstrated that the Cognitive Controller is able
algorithm fixes the cluster assignments of all other feature to effectively learn how the network performance is affected
points. Given that assignment, it generates a cluster index for by changes in the environment, and to perform dynamic
the current feature point. This sampling process is applied channel selection thereby providing significant throughput
to all the feature points separately until certain convergence enhancements.
criterion is satisfied. Other examples of the CRP-based feature In [153], an application of a Feedbackward ANN in con-
classification can be found in speaker clustering [145] and junction with the cyclostationarity-based spectrum sensing was
document clustering applications [149]. presented to perform spectrum sensing. The results showed
that the proposed approach is able to detect the signals at
B. Supervised Classification Methods in Cognitive Radios considerably low signal-to-noise ratio (SNR) values. In [102],
the authors designed a channel status predictor using a MFNN
Unlike the unsupervised learning techniques discussed in model. The authors argued that their proposed MFNN-based
the previous section that may be used in alien environments prediction is superior to the HMM based approaches, by
without having any prior knowledge, supervised learning pointing out that the HMM based approaches require a huge
techniques can generally be used in familiar/known envi- memory space to store a large number of past observations
ronments with prior knowledge about the characteristics of with high computational complexity.
the environment. In the following, we introduce some of the
major supervised learning techniques that have been applied In [154], the authors proposed a methodology for spectrum
to classification tasks in CRs. prediction by modeling licensed-user features as a multivariate
1) Artificial Neural Network: The ANN has been motivated chaotic time series, which is then input to an ANN that
by the recognition that human brain computes in an entirely predicts the evolution of RF time series to decide if the
different way compared to the conventional digital comput- unlicensed user can exploit the spectrum band. Experimental
ers [150]. A neural network is defined to be “a massively results showed a similar trend between predicted and observed
parallel distributed processor made up of simple processing values. This proposed spectrum evolution prediction method
units, which has a natural propensity for storing experiential was done by exploiting the cyclostationary signal features to
knowledge and making it available for use” [150]. An ANN construct an RF multivariate time series that contain more
resembles the brain in two respects [150]: 1) Knowledge information than the univariate time series, in contrast to most
is acquired by the network from its environment through of the previously suggested modeling methodologies which
a learning process and 2) interneuron connection strengths, focused on univariate time series prediction [156].
known as synaptic weights, are used to store the acquired To illustrate the operation of ANNs in CR contexts, we
knowledge. present the model proposed in [78] and describe the main steps
Some of the most beneficial properties and capabilities of in the implementation of ANNs. In particular, [78] considers
ANNs include: 1) Nonlinear fitness to underlying physical a multilayer perceptron (MLP) neural network which maps
mechanisms, 2) adaptation ability to minor changes in sur- sets of input data onto a set of appropriate outputs. An MLP
rounding environment and 3) providing information about the consists of multiple layers of nodes in a directed graph, which
confidence in the decision made. However, the disadvantages is fully connected from one layer to the next [78]. Except the
of ANNs are that they require training under many different input nodes, each node in the MLP is a neuron with a nonlinear
environment conditions and their training outcomes may de- activation function that computes a weighted sum of the up-
pend crucially on the choice of initial parameters. layer output (denoted as the activation). An example of one
Various applications of ANNs to CRs can be found in recent of the most popular activation functions that is used in ANNs
literature [102], [151]–[155]. The authors in [151], for ex- is the sigmoid function:
ample, proposed the use of Multilayered Feedforward Neural 1
Networks (MFNN) as a technique to synthesize performance f (a) = . (29)
1 + e−a
evaluation functions in CRs. The benefit of using MFNNs is
that they provide a general-purpose black-box modeling of The ANN proposed in [78] has an input layer, output
the performance as a function of the measurements collected layer and multiple hidden layers. Note that, having additional
by the CR; furthermore, this characterization can be obtained hidden layers improves the nonlinear performance of the
and updated by a CR at run-time, thus effectively achieving a ANN in terms of classifying linearly non-separable data.
certain level of learning capability. The authors in [151] also However, adding more hidden layers makes the network more
demonstrated in several IEEE 802.11 based environments how complicated and may require longer training time.
these modeling capabilities can be used for optimizing the In the following, we consider an MLP network and let yjl
configuration of a CR. to be the output of the j-th neuron in the l-th layer. Denote
l
In [152], the authors proposed an ANN-based cognitive also by wji the weight between the j-th neuron in the l-th
engine that learns how environmental measurements and the layer and the i-th neuron in the l − 1-th layer. The output yjl
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1151
1
K
2
M SE = (tk − ok ) , (31)
K
k=1
oj (tj − o
j )(1 − oj ) if l is the output layer
δjl = yjl (1 − yjl ) k δkl+1 wkj
l+1
if l is the hidden layer
The authors in [78] used the above described MLP neural polynomial kernel of infinite degree. In performing classifica-
network to implement a learner in a cognitive engine. By tion, a hyperplane which allows for the largest generalization
assuming a WiMax configurable radio technology, the learner in this high-dimensional space is found. This is so-called
is able to choose a certain modulation mode according to a maximal margin classifier [159]. Note that, the margin is
the SNR, such that a certain bit-error rate (BER) will be defined as the distance from a separating hyperplane to the
achieved. Thus, the inputs of the neural network consists of closest data points. As shown in Fig. 9, there could be many
the code rate and SNR values and the output is the resulting possible separating hyperplanes between the two classes of
SNR. By supplying training data to the neural network, the data, but only one of them allows for the maximum margin.
cognitive engine is trained to identify the BER that results The corresponding closest data points are named support
from a certain choice of modulation, given a certain SNR level. vectors and the hyperplane allowing for the maximum margin
By comparing the performance of different scales of neural is called an optimal separating hyperplane. The interested
networks, the simulation results in [78] showed that increasing reader is referred to [79], [160], [161] for insightful discussion
the number of hidden layers reduces the speed of convergence on SVMs.
but leads to a smaller MSE. However, more training data are
required for larger number of hidden layers. Thus, given a An SVM-based classifier was described in [161] for signal
certain set of training data, a trade-off must be made between classification in CRs. The classifier in [161] assumed a training
the speed of convergence and the convergence accuracy of the set {(xi , yi )}li=1 with x ∈ RN and y ∈ {−1, 1}. The objective
neural network. is to find a hyperplane:
2) Support Vector Machine: The SVM, developed by Vap- wT ϕ(x) + b = 0 , (33)
nik and others [157], has been used for many machine learning
tasks such as pattern recognition and object classifications. The where ϕ can be a non-linear function that maps x into a higher
SVM is characterized by the absence of local minima, the dimensional Hilbert space [160], w is a weight vector and b
sparseness of the solution and the capacity control obtained is a scalar parameter. In general, it is not possible to obtain an
by acting on the margin, or on other dimension independent expression for the mapping function ϕ. However, this function
quantities such as the number of support vectors [157]. SVM can be characterized by a Kernel function K(xi , xj ) and, as
based techniques have achieved superior performances in a it turns out fortunately, the Kernel function is sufficient to
wide variety of real world problems due to their generalization optimize the parameters w and b in (33) [160].
ability and robustness against noise and outliers [158]. The hyperplane in (33) is assumed to separate the data into
The basic idea of SVMs is to map the input vectors into a two classes such that the distance between the closest points
high-dimensional feature space in which they become linearly of each class to the hyperplane is maximized. This can be
separable. This mapping from the input vector space to the achieved by minimizing the norm w2 [160].
feature space is a non-linear mapping which is achieved by In order to solve the optimization problem, the slacks vari-
using kernel functions. Depending on the application different ables {ξi , i = 1, · · · , l} are introduced and the optimization
types of kernel functions can be used. A common choice problem can be formulated as [161]:
l
for classification problems is the Gaussian kernel which is a minw,b,ξi 12 wT w + C i=1 ξi (34)
T
3 Since a certain target value (i.e. a label) is required during the training
s.t. yi w ϕ(xi ) + b ≥ 1 − ξi , ∀i = 1, · · · , l (35)
process, neural networks are considered as supervised learning algorithms. ξi ≥ 0, ∀i = 1, · · · , l (36)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1152 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
where C is the penalty parameter that controls the training SVMs. A simulated model of an SVM signal classifier was
error. implemented and trained to recognize seven distinct modula-
The Lagrangian of the above optimization problem can be tion schemes; five digital (BPSK, QPSK, GMSK, 16-QAM
written as: and 64-QAM) and two analog (FM and AM). The signals
l l were generated using realistic carrier frequency, sampling
1
L = w2 + C ξi − βi ξi − frequency and symbol rate values, and realistic Raised- cosine
2 i=1 i=1 and Gaussian pulse-shaping filters. The results showed that the
l
implemented classifier can correctly classify signals with high
− αi wT ϕ(xi + b) − 1 + ξi , probabilities.
i=1
where αi , βi ≥ 0 are the Lagrange multipliers. By computing V. C ENTRALIZED AND D ECENTRALIZED L EARNING IN
the derivatives with respect to w, b and ξi , the dual represen- C OGNITIVE R ADIO
tation of the optimization problem can be expressed as [161]:
l l Since noise uncertainties, shadowing, and multi-path fading
max(α1 ,··· ,αl ) i=1 αi − 12 j=1 αi αj yi yj K(xi , xj ) effects limit the performance of spectrum sensing, when the
s.t. 0 ≤ αi ≤ C, ∀i = 1, · · · , l received primary SNR is too low, there exists a SNR wall,
l below which reliable spectrum detection is impossible in
i=1 yi αi = 0
some cases [168], [169]. If secondary users cannot detect the
where K(xi , xj ) = ϕ(xi )T ϕ(xj ) is the Kernel function. primary transmitter, while the primary receiver is within the
In this case, the decision function (i.e. the learning machine secondary users transmission range, a hidden terminal problem
[160]) is computed as: occurs [170], [171], and the primary user’s transmission will
l be interfered with. By taking advantage of diversity offered
f (x) = sgn αi yi K(xi , x) + b . (37) by multiple independent fading channels (multiuser diversity),
i=1 cooperative spectrum sensing improves the reliability of spec-
trum sensing and the utilization of idle spectrum [25], [26],
Other applications of SVMs to CR can be found in current as opposed to non-cooperative spectrum sensing.
literature, including [65], [79], [103], [158], [161]–[167]. Most
In centralized cooperative spectrum sensing [25], [26], a
of these applications of the SVM in CR context, however, has
central controller collects local observations from multiple
been for performing signal classification.
secondary users, decides the spectrum occupancy by using
In [164], for example, a MAC protocol classification scheme decision fusion rules, and informs the secondary users which
was proposed to classify contention-based and control-based channels to access. In distributed cooperative spectrum sensing
MAC protocols in an unknown primary network based on [55], [172], on the other hand, secondary users within a CRN
SVMs. To perform the classification in an unknown primary exchange their local sensing results among themselves without
network, the mean and variance of the received power are requiring a backbone or centralized infrastructure. On the other
chosen as two features for the SVM. The SVM is embedded hand, in the non-cooperative decentralized sensing framework,
in a CR terminal of the secondary network. A TDMA and a no communications are assumed among the secondary users
slotted Aloha network were setup as the primary networks. [173].
Simulation results showed that TDMA and slotted Aloha In [174], the authors showed how various centralized and
MAC protocol could be effectively classified by the CR decentralized spectrum access markets (where CRs can com-
terminal and the correct classification rate was proportional pete over time for dynamically available transmission opportu-
to the transmission rate of the primary networks, where the nities) can be designed based on a stochastic game (discussed
transmission rate for the primary networks is defined as the above in Section III-C) framework and solved using a learning
new packet generating/arriving probability in each time slot. algorithm. Their proposed learning algorithm was to learn the
The reason for the increase in the correct classification rate following information in the stochastic game: state transition
when the transmission rate increases is the following: for model, state and the policy of other secondary users and the
slotted Aloha network, the higher transmission rate brings the network resource state. The proposed learning algorithm was
higher collision probability, and thus the higher instantaneous similar to Q-learning. However, the main difference compared
received power captured by a CR terminal; for TDMA net- to Q-learning was that it explicitly considered the impact of
work, however, there is no relation between transmission rate other secondary user actions through the state classifications
and instantaneous captured received power. Therefore, when and transition probability approximation. The computational
the transmission rates of both primary networks increase, it complexity and performance were also discussed in [174].
makes a CR terminal easier to differentiate TDMA and slotted In [37] the authors proposed and analyzed both a central-
Aloha. ized and a decentralized decision-making architecture with
SVM classifiers can not only be a binary classifier as shown RL for the secondary CRN. In this work, a new way to
in the previous example, but also it can be easily used as encourage primary users to lease their spectrum was proposed:
a multi-class classifiers by treating a K-class classification the secondary users place bids indicating how much power
problem as K two-class problems. For example, in [165] the they are willing to spend for relaying the primary signals
authors presented a study of multi-class signal classification to their destinations. In this formulation, the primary users
based on automatic modulation classification (AMC) through achieve power savings due to asymmetric cooperation. In the
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1153
Spectrum Sensing
Classification and
Feature Detection
Power Allocation
Reconfiguration
Parameters
adaptation
and MAC
Protocols
and Rate
System
Signal
Pros Cons
Fig. 10. A comparison among the learning algorithms that are presented in this survey.
centralized architecture, a secondary system decision center The results of the estimation of channel contention for a simple
(SSDC) selects a bid for each primary channel based on carrier sense multiple access (CSMA) channel sharing scheme
optimal channel assignment for secondary users. In a decen- was also presented.
tralized CRN architecture, an auction game-based protocol In [175], the authors proposed an auction framework for
was proposed in which each secondary user independently CRNs to allow secondary users to share the available spectrum
places bids for each primary channel and receivers of each of licensed primary users fairly and efficiently, subject to
primary link pick the bid that will lead to the most power the interference temperature constraint at each primary user.
savings. A simple and robust distributed RL mechanism was The competition among secondary users was studied by for-
developed to allow the users to revise their bids and to increase mulating a non-cooperative multiple-primary users multiple-
their subsequent rewards. The performance results given in secondary users auction game. The resulting equilibrium was
[37] showed the significant impact of RL in both improving found by solving a non-continuous two-dimensional optimiza-
spectrum utilization and meeting individual secondary user tion problem. A distributed algorithm was also developed
performance requirements. in which each secondary user updates its strategy based
In [12], the authors considered DSA among CRs from an on local information to converge to the equilibrium. The
adaptive, game theoretic learning perspective, in which CRs proposed auction framework was then extended to the more
compete for channels temporarily vacated by licensed primary challenging scenario with free spectrum bands. An algorithm
users in order to satisfy their own demands while minimizing was developed based on the no-regret learning to reach a
interference. For both slowly varying primary user activity correlated equilibrium of the auction game. The proposed
and slowly varying statistics of fast primary user activity, the algorithm, which can be implemented distributedly based
authors applied an adaptive regret based learning procedure on local observation, is especially suited in decentralized
which tracks the set of correlated equilibria of the game, adaptive learning environments. The authors demonstrated the
treated as a distributed stochastic approximation. The proposed effectiveness of the proposed auction framework in achieving
approach was decentralized in terms of both radio awareness high efficiency and fairness in spectrum allocation through
and activity; radios estimate spectral conditions based on their numerical examples.
own experience, and adapt by choosing spectral allocations In general, there is always a trade-off between the cen-
which yield them the greatest utility. Iterated over time, this tralized and decentralized control in radio networks. This is
process converges so that each radio’s performance is an also true for CRNs. While the centralized schemes ensure
optimal response to others’ activity. This apparently selfish efficient management of the spectrum resources, they often
scheme was also used to deliver system-wide performance by a suffer from signaling and processing overhead. On the other
judicious choice of utility function. This procedure was shown hand, a decentralized scheme can reduce the complexity of
to perform well compared to other similar adaptive algorithms. the decision-making in cognitive networks. However, radios
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1154 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
that act according to a decentralized scheme may adopt a the global utility of the network. However, since it is not
selfish behavior and try to maximize their own utilities, at realistic to consider that individual users will seek the global
the expense of the sum-utility of the network (social welfare), optimum, another policy (corresponding to the Nash equilib-
leading to overall network inefficiency. This problem can rium) was obtained such that it maximizes the users’ utilities.
become particularly severe when considering heterogeneous Finally, a Stackelberg game formulation was developed for
networks in which different nodes belong to different types the operator to control the equilibrium of its wireless users.
of systems and have different objectives (usually conflicting This leads to maximizing the operator’s utility by sending
objectives). To resolve this problem, [176] proposes a hybrid appropriate load information l ∈ L to the distributed radios.
approach for heterogeneous CRNs where the wireless users The authors of [176] analyzed the network performance
are assisted in their decisions by the network which broadcasts under these three different association policies. They demon-
aggregated information to the users [176]. At some states of strated, by means of Stackelberg formulation, how the operator
the system, the network manager imposes its decisions on can optimize its global utility by sending appropriate infor-
users in the network. In other states, the mobile nodes may mation about the network state, while users maximize their
take autonomous actions in response to the information sent individual utilities. The resulting hybrid architecture achieved
by the network center. As a result, the model in [176] avoids a good trade-off between the global network performance and
having a completely decentralized network, due to possible the signaling overhead, making it a viable alternative to be
inefficiency of such non-cooperative networks. Nevertheless, considered when designing CRNs.
a large part of the decision-making is still delegated to the
mobile nodes to reduce the processing overhead at the central VI. C ONCLUSION
node. In this survey paper, we have characterized the learning
In the problem formulation of [176], the authors consider a problems in CRs and stated the importance of machine learn-
wireless network composed of S systems that are managed by ing in developing real CRs. We have presented the state-of-
the same operator. The set of all serving systems is denoted by the-art learning methods that have been applied to CRs clas-
S = {1, · · · , S}. Since the throughput of each serving system sifying them under supervised and unsupervised learning. A
drops as a function of the distance of between the mobile and discussion of some of the most important, and commonly used,
the base station, the throughput of a mobile changes within learning algorithms was provided along with their advantages
a given cell. To capture this variation, each cell is split into and disadvantages. We also showed some of the challenging
N circles of radius dn (n ∈ N = {1, · · · , N }). Each circle learning problems encountered in CRs and presented possible
area is assumed to have the same radio characteristics. In solution methods to address them.
this case, all mobile systems that are located within circle
n ∈ N and are served by system s ∈ S achieve the same R EFERENCES
throughput. The network state matrix is denoted by M ∈ F , [1] J. Mitola III and G. Q. Maguire, Jr., “Cognitive radio: making software
where F = NN ×S . The (n, s)-th element Mns of the matrix radios more personal,” IEEE Pers. Commun., vol. 6, no. 4, pp. 13 –18,
Aug. 1999.
M denotes the number of users with radio condition n ∈ N [2] J. Mitola, “Cognitive radio: An integrated agent architecture for soft-
which are served by system s ∈ S in the circle. The network ware defined radio,” Ph.D. dissertation, Royal Institute of Technology
is fully characterized by its state M, but this information is (KTH), Stockholm, Sweden, 2000.
[3] L. Giupponi, A. Galindo-Serrano, P. Blasco, and M. Dohler, “Docitive
not available to the mobile nodes when the radio resource networks: an emerging paradigm for dynamic spectrum management
management (RRM) is decentralized. In this case, by using [dynamic spectrum management],” IEEE Wireless Commun., vol. 17,
the radio enabler proposed in IEEE 1900.4, the network no. 4, pp. 47 –54, Aug. 2010.
[4] T. Costlow, “Cognitive radios will adapt to users,” IEEE Intell. Syst.,
reconfiguration manager (NRM) broadcasts to the terminal vol. 18, no. 3, p. 7, May-June 2003.
reconfiguration manager (TRM) an aggregated load informa- [5] S. K. Jayaweera and C. G. Christodoulou, “Radiobots: Architecture,
tion that takes values in some finite set L = {1, · · · , L} algorithms and realtime reconfigurable antenna designs for
autonomous, self-learning future cognitive radios,” University of
indicating whether the load state at mobile terminals are either New Mexico, Technical Report EECE-TR-11-0001, Mar. 2011.
low, medium or high. The mapping f : M → L specifies [Online]. Available: http://repository.unm.edu/handle/1928/12306
a macro-state f (M) for each network micro-state M. This [6] S. Haykin, “Cognitive radio: brain-empowered wireless communica-
tions,” IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201–220, Feb.
state encoding reduces the signaling overhead, while satisfying 2005.
the requirements of the IEEE 1900.4 standard which state [7] FCC, “Report of the spectrum efficiency working group,” FCC spec-
that “the network manager side shall periodically update the trum policy task force, Tech. Rep., Nov. 2002.
[8] , “ET docket no 03-322 notice of proposed rulemaking and order,”
terminal side with context information” [177]. Given the load Tech. Rep., Dec. 2003.
information l = f (M) and the radio condition n ∈ N , the [9] N. Devroye, M. Vu, and V. Tarokh, “Cognitive radio networks,” IEEE
mobile makes its decision Pn,l ∈ S, specifying which system Signal Processing Mag., vol. 25, pp. 12–23, Nov. 2008.
[10] A. Goldsmith, S. A. Jafar, I. Maric, and S. Srinivasa, “Breaking
it will connect to, and the user’s decision vector is denoted by spectrum gridlock with cogntive radios: An information theoretic
Pl = [P1,l · · · , PN,l ]. perspective,” Proc. IEEE, vol. 97, no. 5, pp. 894–914, May 2009.
The authors in [176] find the association policies by fol- [11] V. Krishnamurthy, “Decentralized spectrum access amongst cognitive
radios - An interacting multivariate global game-theoretic approach,”
lowing three different approaches: IEEE Trans. Signal Process., vol. 57, no. 10, pp. 3999 –4013, Oct.
1) Global optimum approach. 2009.
2) Nash equilibrium approach. [12] M. Maskery, V. Krishnamurthy, and Q. Zhao, “Decentralized dynamic
spectrum access for cognitive radios: cooperative design of a non-
3) Stackelberg game approach. cooperative game,” IEEE Trans. Commun., vol. 57, no. 2, pp. 459
The global optimum approach finds the policy that maximizes –469, Feb. 2009.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1155
[13] Z. Han, R. Zheng, and H. Poor, “Repeated auctions with Bayesian non- [35] Y. Chen, Q. Zhao, and A. Swami, “Joint design and separation principle
parametric learning for spectrum access in cognitive radio networks,” for opportunistic spectrum access in the presence of sensing errors,”
IEEE Trans. Wireless Commun., vol. 10, no. 3, pp. 890 –900, Mar. IEEE Trans. Inf. Theory, vol. 54, no. 5, pp. 2053 –2071, May 2008.
2011. [36] S. Huang, X. Liu, and Z. Ding, “Opportunistic spectrum access
[14] J. Lunden, V. Koivunen, S. Kulkarni, and H. Poor, “Reinforcement in cognitive radio networks,” in The 27th Conference on Computer
learning based distributed multiagent sensing policy for cognitive radio Communications (IEEE INFOCOM ’08), Phoenix, AZ, Apr. 2008, pp.
networks,” in IEEE Symposium on New Frontiers in Dynamic Spectrum 1427 –1435.
Access Networks (DySPAN ’11), Aachen, Germany, May 2011, pp. 642 [37] S. Jayaweera, M. Bkassiny, and K. Avery, “Asymmetric cooperative
–646. communications based spectrum leasing via auctions in cognitive radio
[15] K. Ben Letaief and W. Zhang, “Cooperative communications for networks,” IEEE Trans. Wireless Commun., vol. 10, no. 8, pp. 2716
cognitive radio networks,” Proc. IEEE, vol. 97, no. 5, pp. 878 –893, –2724, Aug. 2011.
May 2009. [38] M. Bkassiny, S. K. Jayaweera, and K. A. Avery, “Distributed rein-
[16] Q. Zhao and B. M. Sadler, “A survey of dynamic spectrum access,” forcement learning based MAC protocols for autonomous cognitive
IEEE Signal Processing Mag., vol. 24, no. 3, pp. 79–89, May 2007. secondary users,” in 20th Annual Wireless and Optical Communica-
[17] S. K. Jayaweera and T. Li, “Dynamic spectrum leasing in cognitive tions Conference (WOCC ’11), Newark, NJ, Apr. 2011, pp. 1 –6.
radio networks via primary-secondary user power control games,” IEEE [39] T. Yucek and H. Arslan, “A survey of spectrum sensing algorithms
Trans. Wireless Commun., vol. 8, no. 6, pp. 3300–3310, July 2009. for cognitive radio applications,” IEEE Commun. Surveys Tutorials,
[18] S. K. Jayaweera, , G. Vazquez-Vilar, and C. Mosquera, “Dynamic vol. 11, no. 1, pp. 116 –130, quarter 2009.
spectrum leasing: A new paradigm for spectrum sharing in cognitive [40] S. Haykin, D. Thomson, and J. Reed, “Spectrum sensing for cognitive
radio networks,” IEEE Trans. Veh. Technol., vol. 59, no. 5, pp. 2328– radio,” Proc. IEEE, vol. 97, no. 5, pp. 849 –877, May 2009.
2339, May 2010. [41] J. Ma, G. Y. Li, and B. H. Juang, “Signal processing in cognitive
[19] G. Zhao, J. Ma, Y. Li, T. Wu, Y. H. Kwon, A. Soong, and C. Yang, radio,” Proc. IEEE, vol. 97, no. 5, pp. 805–823, May 2009.
“Spatial spectrum holes for cognitive radio with directional transmis- [42] W. Zhang, R. Mallik, and K. Letaief, “Optimization of cooperative
sion,” in IEEE Global Telecommunications Conference (GLOBECOM spectrum sensing with energy detection in cognitive radio networks,”
’08), Nov. 2008, pp. 1–5. IEEE Trans. Wireless Commun., vol. 8, no. 12, pp. 5761 –5766, Dec.
[20] A. Ghasemi and E. Sousa, “Spectrum sensing in cognitive radio net- 2009.
works: requirements, challenges and design trade-offs,” IEEE Commun. [43] Y. M. Kim, G. Zheng, S. H. Sohn, and J. M. Kim, “An alternative
Mag., vol. 46, no. 4, pp. 32–39, Apr. 2008. energy detection using sliding window for cognitive radio system,” in
[21] B. Farhang-Boroujeny, “Filter bank spectrum sensing for cognitive 10th International Conference on Advanced Communication Technol-
radios,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 1801–1811, ogy (ICACT ’08), vol. 1, Gangwon-Do, South Korea, Feb. 2008, pp.
May 2008. 481 –485.
[22] B. Farhang-Boroujeny and R. Kempter, “Multicarrier communication [44] J. Lunden, V. Koivunen, A. Huttunen, and H. Poor, “Collaborative
techniques for spectrum sensing and communication in cognitive cyclostationary spectrum sensing for cognitive radio systems,” IEEE
radios,” IEEE Commun. Mag., vol. 46, no. 4, pp. 80–85, Apr. 2008. Trans. Signal Process., vol. 57, no. 11, pp. 4182 –4195, Nov. 2009.
[23] C. R. C. da Silva, C. Brian, and K. Kyouwoong, “Distributed spectrum [45] A. Dandawate and G. Giannakis, “Statistical tests for presence of
sensing for cognitive radio systems,” in Information Theory and cyclostationarity,” IEEE Trans. Signal Process., vol. 42, no. 9, pp. 2355
Applications Workshop, Feb. 2007, pp. 120–123. –2369, Sep. 1994.
[24] Y. Li, S. Jayaweera, M. Bkassiny, and K. Avery, “Optimal myopic [46] B. Deepa, A. Iyer, and C. Murthy, “Cyclostationary-based architectures
sensing and dynamic spectrum access in cognitive radio networks with for spectrum sensing in IEEE 802.22 WRAN,” in IEEE Global
low-complexity implementations,” IEEE Trans. Wireless Commun., Telecommunications Conference (GLOBECOM ’10), Miami, FL, Dec.
vol. 11, no. 7, pp. 2412 –2423, July 2012. 2010, pp. 1 –5.
[25] , “Optimal myopic sensing and dynamic spectrum access in [47] M. Gandetto and C. Regazzoni, “Spectrum sensing: A distributed
centralized secondary cognitive radio networks with low-complexity approach for cognitive terminals,” IEEE J. Sel. Areas Commun., vol. 25,
implementations,” in IEEE 73rd Vehicular Technology Conference no. 3, pp. 546 –557, Apr. 2007.
(VTC-Spring ’11), May 2011, pp. 1 –5. [48] J. Unnikrishnan and V. Veeravalli, “Cooperative sensing for primary
[26] M. Bkassiny, S. K. Jayaweera, Y. Li, and K. A. Avery, “Optimal and detection in cognitive radio,” IEEE J. Sel. Topics Signal Process.,
low-complexity algorithms for dynamic spectrum access in centralized vol. 2, no. 1, pp. 18 –27, Feb. 2008.
cognitive radio networks with fading channels,” in IEEE Vehicular [49] T. Cui, F. Gao, and A. Nallanathan, “Optimization of cooperative
Technology Conference (VTC-spring ’11), Budapest, Hungary, May spectrum sensing in cognitive radio,” IEEE Trans. Veh. Technol.,
2011. vol. 60, no. 4, pp. 1578 –1589, May 2011.
[27] C. Cordeiro, M. Ghosh, D. Cavalcanti, and K. Challapali, “Spectrum [50] O. Simeone, I. Stanojev, S. Savazzi, Y. Bar-Ness, U. Spagnolini, and
sensing for dynamic spectrum access of TV bands,” in 2nd Interna- R. Pickholtz, “Spectrum leasing to cooperating secondary ad hoc
tional Conference on Cognitive Radio Oriented Wireless Networks and networks,” IEEE J. Sel. Areas Commun., vol. 26, pp. 203–213, Jan.
Communications (CrownCom ’07), Aug. 2007, pp. 225–233. 2008.
[28] H. Chen, W. Gao, and D. G. Daut, “Signature based spectrum sensing [51] Q. Zhang, J. Jia, and J. Zhang, “Cooperative relay to improve diversity
algorithms for IEEE 802.22 WRAN,” in IEEE International Confer- in cognitive radio networks,” IEEE Commun. Mag., vol. 47, no. 2, pp.
ence on Communications (ICC ’07), June 2007, pp. 6487–6492. 111 –117, Feb. 2009.
[29] Y. Zeng and Y. Liang, “Maximum-minimum eigenvalue detection for [52] Y. Han, A. Pandharipande, and S. Ting, “Cooperative decode-and-
cognitive radio,” in 18th International Symposium on Personal, Indoor forward relaying for secondary spectrum access,” IEEE Trans. Wireless
and Mobile Radio Communications (PIMRC ’07), Sep. 2007, pp. 1–5. Commun., vol. 8, no. 10, pp. 4945 –4950, Oct. 2009.
[30] , “Covariance based signal detections for cognitive radio,” in 2nd [53] L. Li, X. Zhou, H. Xu, G. Li, D. Wang, and A. Soong, “Simplified relay
IEEE International Symposium on New Frontiers in Dynamic Spectrum selection and power allocation in cooperative cognitive radio systems,”
Access Networks (DySPAN ’07), Apr. 2007, pp. 202–207. IEEE Trans. Wireless Commun., vol. 10, no. 1, pp. 33 –36, Jan. 2011.
[31] X. Zhou, Y. Li, Y. H. Kwon, and A. Soong, “Detection timing and [54] E. Hossain and V. K. Bhargava, Cognitive Wireless Communication
channel selection for periodic spectrum sensing in cognitive radio,” Networks. Springer, 2007.
in IEEE Global Telecommunications Conference (GLOBECOM ’08), [55] B. Wang and K. J. R. Liu, “Advances in cognitive radio networks: A
Nov. 2008, pp. 1–5. survey,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 1, pp. 5 –23,
[32] Z. Tian and G. B. Giannakis, “A wavelet approach to wideband Feb. 2011.
spectrum sensing for cognitive radios,” in 1st International Conference [56] I. Akyildiz, W.-Y. Lee, M. Vuran, and S. Mohanty, “A survey on
on Cognitive Radio Oriented Wireless Networks and Communications, spectrum management in cognitive radio networks,” IEEE Commun.
June 2006, pp. 1–5. Mag., vol. 46, no. 4, pp. 40 –48, Apr. 2008.
[33] G. Ganesan and Y. Li, “Cooperative spectrum sensing in cognitive [57] K. Shin, H. Kim, A. Min, and A. Kumar, “Cognitive radios for dynamic
radio, part I: Two user networks,” IEEE Trans. Wireless Commun., spectrum access: from concept to reality,” IEEE Wireless Commun.,
vol. 6, no. 6, pp. 2204–2213, June 2007. vol. 17, no. 6, pp. 64 –74, Dec. 2010.
[34] , “Cooperative spectrum sensing in cognitive radio, part II: Mul- [58] A. De Domenico, E. Strinati, and M.-G. Di Benedetto, “A survey on
tiuser networks,” Wireless Communications, IEEE Trans.on, vol. 6, MAC strategies for cognitive radio networks,” IEEE Commun. Surveys
no. 6, pp. 2214–2222, June 2007. Tutorials, vol. 14, no. 1, pp. 21 –44, quarter 2012.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1156 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
[59] A. Mody, M. Sherman, R. Martinez, R. Reddy, and T. Kiernan, “Survey [80] B. Hamdaoui, P. Venkatraman, and M. Guizani, “Opportunistic ex-
of IEEE standards supporting cognitive radio and dynamic spectrum ploitation of bandwidth resources through reinforcement learning,”
access,” in IEEE Military Communications Conference (MILCOM ’08), in IEEE Global Telecommunications Conference (GLOBECOM ’09),
Nov. 2008, pp. 1 –7. Honolulu, HI, Dec. 2009, pp. 1 –6.
[60] Q. Zhao and A. Swami, “A survey of dynamic spectrum access: [81] K.-L. A. Yau, P. Komisarczuk, and P. D. Teal, “Applications of rein-
Signal processing and networking perspectives,” in IEEE International forcement learning to cognitive radio networks,” in IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP ’07), Conference on Communications Workshops (ICC), 2010, Cape Town,
vol. 4, Apr. 2007, pp. IV–1349 –IV–1352. South Africa, May 2010, pp. 1 –6.
[61] J. Mitola, “Cognitive radio architecture evolution,” Proc. IEEE, vol. 97, [82] Y. Reddy, “Detecting primary signals for efficient utilization of spec-
no. 4, pp. 626 –641, Apr. 2009. trum using Q-learning,” in Fifth International Conference on Informa-
[62] S. Jayaweera, Y. Li, M. Bkassiny, C. Christodoulou, and K. Avery, tion Technology: New Generations (ITNG ’08), Las Vegas, NV, Apr.
“Radiobots: The autonomous, self-learning future cognitive radios,” 2008, pp. 360 –365.
in International Symposium on Intelligent Signal Processing and [83] M. Li, Y. Xu, and J. Hu, “A Q-learning based sensing task selection
Communications Systems (ISPACS ’11), Chiangmai, Thailand, Dec. scheme for cognitive radio networks,” in International Conference on
2011, pp. 1 –5. Wireless Communications Signal Processing (WCSP ’09), Nanjing,
[63] A. El-Saleh, M. Ismail, M. Ali, and J. Ng, “Development of a cognitive China, Nov. 2009, pp. 1 –5.
radio decision engine using multi-objective hybrid genetic algorithm,” [84] Y. Yao and Z. Feng, “Centralized channel and power allocation for
in IEEE 9th Malaysia International Conference on Communications cognitive radio networks: A Q-learning solution,” in Future Network
(MICC 2009), Dec. 2009, pp. 343 –347. and Mobile Summit, Florence, Italy, June 2010, pp. 1 –8.
[64] L. Morales-Tirado, J. Suris-Pietri, and J. Reed, “A hybrid cognitive [85] P. Venkatraman, B. Hamdaoui, and M. Guizani, “Opportunistic band-
engine for improving coverage in 3G wireless networks,” in IEEE Inter- width sharing through reinforcement learning,” IEEE Trans. Veh.
national Conference on Communications Workshops (ICC Workshops Technol., vol. 59, no. 6, pp. 3148 –3153, July 2010.
2009)., June 2009, pp. 1 –5. [86] T. Jiang, D. Grace, and P. Mitchell, “Efficient exploration in re-
[65] Y. Huang, H. Jiang, H. Hu, and Y. Yao, “Design of learning engine inforcement learning-based cognitive radio spectrum sharing,” IET
based on support vector machine in cognitive radio,” in International Communications, vol. 5, no. 10, pp. 1309 –1317, Jan. 2011.
Conference on Computational Intelligence and Software Engineering [87] T. Clancy, A. Khawar, and T. Newman, “Robust signal classification
(CiSE ’09), Wuhan, China, Dec. 2009, pp. 1 –4. using unsupervised learning,” IEEE Trans. Wireless Commun., vol. 10,
[66] Y. Huang, J. Wang, and H. Jiang, “Modeling of learning inference no. 4, pp. 1289 –1299, Apr. 2011.
and decision-making engine in cognitive radio,” in Second Interna- [88] C. Claus and C. Boutilier, “The dynamics of reinforcement learning in
tional Conference on Networks Security Wireless Communications and cooperative multiagent systems,” in Proc. Fifteenth National Confer-
Trusted Computing (NSWCTC), vol. 2, Apr. 2010, pp. 258 –261. ence on Artificial Intelligence, Madison, WI, Jul. 1998, pp. 746–752.
[67] Y. Yang, H. Jiang, and J. Ma, “Design of optimal engine for cognitive [89] G. D. Croon, M. F. V. Dartel, and E. O. Postma, “Evolutionary learning
radio parameters based on the DUGA,” in 3rd International Conference outperforms reinforcement learning on non-Markovian tasks,” in 8th
on Information Sciences and Interaction Sciences (ICIS 2010), June European Conference on Artificial Life Workshop on Memory and
2010, pp. 694 –698. Learning Mechanisms in Autonomous Robots, Canterbury, Kent, UK,
[68] H. Volos and R. Buehrer, “Cognitive engine design for link adapta- 2005.
tion: An application to multi-antenna systems,” IEEE Trans. Wireless [90] R. Sutton, D. Mcallester, S. Singh, and Y. Mansour, “Policy gradient
Commun., vol. 9, no. 9, pp. 2902 –2913, Sep. 2010. methods for reinforcement learning with function approximation,” in
[69] C. Clancy, J. Hecker, E. Stuntebeck, and T. O’Shea, “Applications Proc. 12th conference on Advances in Neural Information Processing
of machine learning to cognitive radio networks,” IEEE Wireless Systems (NIPS ’99). Denver, CO: MIT Press, 2001, pp. 1057–1063.
Commun., vol. 14, no. 4, pp. 47 –52, Aug. 2007. [91] J. Baxter and P. L. Bartlett, “Infinite-horizon policy-gradient estima-
[70] A. N. Mody, S. R. Blatt, N. B. Thammakhoune, T. P. McElwain, J. D. tion,” Journal of Artificial Intelligence Research, vol. 15, pp. 319–350,
Niedzwiecki, D. G. Mills, M. J. Sherman, and C. S. Myers, “Machine 2001.
learning based cognitive communications in white as well as the gray [92] D. E. Moriarty, A. C. Schultz, and J. J. Grefenstette, “Evolutionary
space,” in IEEE Military Communications Conference. (MILCOM ’07), algorithms for reinforcement learning,” J. Artificial Intelligence Re-
Orlando, FL, Oct. 2007, pp. 1 –7. search, vol. 11, pp. 241–276, 1999.
[71] M. Bkassiny, S. K. Jayaweera, Y. Li, and K. A. Avery, “Wideband spec- [93] F. Dandurand and T. Shultz, “Connectionist models of reinforcement,
trum sensing and non-parametric signal classification for autonomous imitation, and instruction in learning to solve complex problems,” IEEE
self-learning cognitive radios,” IEEE Trans. Wireless Commun., vol. 11, Trans. Autonomous Mental Development, vol. 1, no. 2, pp. 110 –121,
no. 7, pp. 2596 –2605, July 2012. Aug. 2009.
[72] , “Blind cyclostationary feature detection based spectrum sensing [94] Y. Xing and R. Chandramouli, “Human behavior inspired cognitive
for autonomous self-learning cognitive radios,” in IEEE International radio network design,” IEEE Commun. Mag., vol. 46, no. 12, pp. 122
Conference on Communications (ICC ’12), Ottawa, Canada, June 2012. –127, Dec. 2008.
[73] X. Gao, B. Jiang, X. You, Z. Pan, Y. Xue, and E. Schulz, “Efficient [95] M. van der Schaar and F. Fu, “Spectrum access games and strategic
channel estimation for MIMO single-carrier block transmission with learning in cognitive radio networks for delay-critical applications,”
dual cyclic timeslot structure,” IEEE Trans. Commun., vol. 55, no. 11, Proc. IEEE, vol. 97, no. 4, pp. 720 –740, Apr. 2009.
pp. 2210 –2223, Nov. 2007. [96] B. Wang, K. Ray Liu, and T. Clancy, “Evolutionary cooperative
[74] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. spectrum sensing game: how to collaborate?” IEEE Trans. Commun.,
Cambridge, MA: MIT Press, 1998. vol. 58, no. 3, pp. 890 –900, Mar. 2010.
[75] S. Gong, W. Liu, W. Yuan, W. Cheng, and S. Wang, “Threshold- [97] A. Galindo-Serrano, L. Giupponi, P. Blasco, and M. Dohler, “Learning
learning in local spectrum sensing of cognitive radio,” in IEEE 69th from experts in cognitive radio networks: The docitive paradigm,”
Vehicular Technology Conference (VTC Sp. ’09), Barcelona, Spain, in Proc. Fifth International Conference on Cognitive Radio Ori-
Apr. 2009, pp. 1 –6. ented Wireless Networks Communications (CROWNCOM ’10), Cannes,
[76] M. L. Puterman, Markov Decision Processes: Discrete Stochastic France, June 2010, pp. 1 –6.
Dynamic Programming. New York: John Wiley and Sons, 1994. [98] A. He, K. K. Bae, T. Newman, J. Gaeddert, K. Kim, R. Menon,
[77] A. Galindo-Serrano and L. Giupponi, “Distributed Q-learning for L. Morales-Tirado, J. Neel, Y. Zhao, J. Reed, and W. Tranter, “A
aggregated interference control in cognitive radio networks,” IEEE survey of artificial intelligence for cognitive radios,” IEEE Trans. Veh.
Trans. Veh. Technol., vol. 59, no. 4, pp. 1823 –1834, May 2010. Technol., vol. 59, no. 4, pp. 1578 –1592, May 2010.
[78] X. Dong, Y. Li, C. Wu, and Y. Cai, “A learner based on neural [99] R. S. Michalski, “Learning and cognition,” in World Conference on the
network for cognitive radio,” in 12th IEEE International Conference Fundamentals of Artificial Intelligence (WOCFAI ’95), Paris, France,
on Communication Technology (ICCT ’10), Nanjing, China, Nov. 2010, July 1995, pp. 507–510.
pp. 893 –896. [100] J. Burbank, A. Hammons, and S. Jones, “A common lexicon and design
[79] M. M. Ramon, T. Atwood, S. Barbin, and C. G. Christodoulou, “Signal issues surrounding cognitive radio networks operating in the presence
classification with an SVM-FFT approach for feature extraction in of jamming,” in IEEE Military Communications Conference (MILCOM
cognitive radio,” in SBMO/IEEE MTT-S International Microwave and ’08), San Diego, CA, Nov. 2008, pp. 1 –7.
Optoelectronics Conference (IMOC ’09), Belem, Brazil, Nov. 2009, [101] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:
pp. 286 –289. Springer-Verlag, 1995.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1157
[102] V. Tumuluru, P. Wang, and D. Niyato, “A neural network based agile wireless networks,” IEEE J. Sel. Areas Commun., vol. 3, no. 25,
spectrum prediction scheme for cognitive radio,” in IEEE International pp. 601–612, Apr. 2007.
Conference on Communications (ICC ’10), May 2010, pp. 1 –5. [124] O. Ileri, D. Samardzija, and N. B. Mandayam, “Demand responsive
[103] H. Hu, J. Song, and Y. Wang, “Signal classification based on spec- pricing and competitive spectrum allocation via a spectrum server,” in
tral correlation analysis and SVM in cognitive radio,” in Advanced New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN
Information Networking and Applications, 2008. AINA 2008. 22nd 2005. 2005 First IEEE International Symposium on, Nov. 2005, pp.
International Conference on, Mar. 2008, pp. 883 –887. 194–202.
[104] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical [125] Y. Zhao, S. Mao, J. Neel, and J. Reed, “Performance evaluation of
Dirichlet processes,” J. American Statistical Association, vol. 101, no. cognitive radios: Metrics, utility functions, and methodology,” Proc.
476, pp. 1566–1581, Dec. 2006. IEEE, vol. 97, no. 4, pp. 642–659, Apr. 2009.
[105] M. Bkassiny, S. K. Jayaweera, and Y. Li, “Multidimensional Dirichlet [126] J. Neel, R. M. Buehrer, B. H. reed, and R. P. Gilles, “Game theoretic
process-based non-parametric signal classification for autonomous self- analysis of a network of cognitive radio,” in 45th Midwest Symp. on
learning cognitive radios,” IEEE Trans. Wireless Commun., May 2012, Circuits and Systems, vol. 3, Aug. 2002, pp. III–409–III–412.
[In review]. [127] M. R. Musku and P. Cotae, “Cognitive radio: Time domain spectrum
[106] J. Unnikrishnan and V. V. Veeravalli, “Algorithms for dynamic spec- allocation using game theory,” in IEEE Int. Conf. on System and
trum access with learning for cognitive radio,” IEEE Trans. Signal Systems Engineering (SoSE), Apr. 2007, pp. 1–6.
Process., vol. 58, no. 2, pp. 750 –760, Feb. 2010. [128] W. Wang, Y. Cui, T. Peng, and W. Wang, “Noncooperative power
[107] Q. Zhao, L. Tong, A. Swami, and Y. Chen, “Decentralized cognitive control game with exponential pricing for cognitive radio network,”
MAC for opportunistic spectrum access in ad hoc networks: A POMDP in IEEE 65th Vehicular Technology Conf. (VTC)-Spring, Apr. 2007,
framework,” IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589–600, pp. 3125–3129.
Apr. 2007. [129] J. Li, D. Chen, W. Li, and J. Ma, “Multiuser power and channel
[108] Q. Zhao, L. Tong, and A. Swami, “Decentralized cognitive MAC for aloocation algorithm in cognitive radio,” in Int. Conf. on Parallel
dynamic spectrum access,” in First IEEE International Symposium on Processing, (ICPP), Sep. 2007, pp. 72–72.
New Frontiers in Dynamic Spectrum Access Networks (DySPAN ’05), [130] Z. Ji and K. J. R. Liu, “Cognitive radios for dynamic spectrum
Nov. 2005, pp. 224–232. access- dynamic spectrum sharing: A game theoretical overview,” IEEE
[109] S. K. Jayaweera and C. Mosquera, “A dynamic spectrum leasing (DSL) Commun. Mag., vol. 45, no. 5, pp. 88–94, May 2007.
framework for spectrum sharing in cognitive radio networks,” in 43rd [131] N. Nie and C. Comaniciu, “Adaptive channel allocation spectrum
Annual Asilomar Conf. on Signals, Systems and Computers, Pacific etiquette for cognitive radio networks,” in 1st IEEE Int. Symp. on
Grove, CA, Nov. 2009. New Frontiers in Dynamic Spectrum Access Networks (DySPAN), Nov.
[110] K. Hakim, S. Jayaweera, G. El-Howayek, and C. Mosquera, “Efficient 2005, pp. 269–278.
dynamic spectrum sharing in cognitive radio networks: Centralized [132] R. G. Wendorf and H. Blum, “A channel-change game for multiple
dynamic spectrum leasing (C-DSL),” IEEE Trans. Wireless Commun., interfering cognitive wireless networks,” in Military Communi. Conf.
vol. 9, no. 9, pp. 2956 –2967, Sep. 2010. (MILCOM), Oct. 2006, pp. 1–7.
[111] B. Latifa, Z. Gao, and S. Liu, “No-regret learning for simultaneous [133] J. Li, D. Chen, W. Li, and J. Ma, “Multiuser power and channel
power control and channel allocation in cognitive radio networks,” allocation algorithm in cognitive radio,” in International Conference
in Computing, Communications and Applications Conference (Com- on Parallel Processing (ICPP ’07), XiAn, China, Sep. 2007, p. 72.
ComAp ’12), Hong Kong, China, Jan. 2012, pp. 267 –271. [134] X. Zhang and J. Zhao, “Power control based on the asynchronous
[112] Z. Han, C. Pandana, and K. Liu, “Distributive opportunistic spectrum distributed pricing algorithm in cognitive radios,” in IEEE Youth
access for cognitive radio using correlated equilibrium and no-regret Conference on Information Computing and Telecommunications (YC-
learning,” in IEEE Wireless Commun. and Networking Conference ICT ’10), Beijing, China, Nov. 2010, pp. 69 –72.
(WCNC ’07), Hong Kong, China, Mar. 2007, pp. 11 –15. [135] L. Pillutla and V. Krishnamurthy, “Game theoretic rate adaptation for
[113] Q. Zhu, Z. Han, and T. Basar, “No-regret learning in collaborative spectrum-overlay cognitive radio networks,” in IEEE Global Telecom-
spectrum sensing with malicious nodes,” in IEEE International Con- munications Conference (GLOBECOM ’08), New Orleans, LA, Dec.
ference on Communications (ICC ’10), Cape Town, South Africa, May 2008, pp. 1 –5.
2010, pp. 1 –6. [136] H. Li, Y. Liu, and D. Zhang, “Dynamic spectrum access for cognitive
[114] D. Pados, P. Papantoni-Kazakos, D. Kazakos, and A. Koyiantis, “On- radio systems with repeated games,” in IEEE International Conference
line threshold learning for Neyman-Pearson distributed detection,” on Wireless Communications, Networking and Information Security
IEEE Trans. Syst. Man Cybern., vol. 24, no. 10, pp. 1519 –1531, Oct. (WCNIS ’10), Beijing, China, June 2010, pp. 59 –62.
1994. [137] S. K. Jayaweera and M. Bkassiny, “Learning to thrive in a leasing
[115] K. Akkarajitsakul, E. Hossain, D. Niyato, and D. I. Kim, “Game market: an auctioning framework for distributed dynamic spectrum
theoretic approaches for multiple access in wireless networks: A leasing (D-DSL),” in IEEE Wireless Commun. & Networking Confer-
survey,” IEEE Commun. Surveys Tutorials, vol. 13, no. 3, pp. 372 ence (WCNC ’11), Cancun, Mexico, Mar. 2011.
–395, quarter 2011. [138] L. Chen, S. Iellamo, M. Coupechoux, and P. Godlewski, “An auction
[116] M. D. Escobar, “Estimating normal means with a Dirichlet framework for spectrum allocation with interference constraint in
process prior,” J. American Statistical Association, vol. 89, cognitive radio networks,” in IEEE INFOCOM ’10, San Diego, CA,
no. 425, pp. 268–277, Mar. 1994. [Online]. Available: Mar. 2010, pp. 1 –9.
http://www.jstor.org/stable/2291223 [139] G. Iosifidis and I. Koutsopoulos, “Challenges in auction theory driven
[117] C. Watkins, “Learning from delayed rewards,” Ph.D. dissertation, spectrum management,” IEEE Commun. Mag., vol. 49, no. 8, pp. 128
University of Cambridge, United Kingdom, 1989. –135, Aug. 2011.
[118] H. Li, “Multi-agent Q-learning of channel selection in multi-user [140] F. Fu and M. van der Schaar, “Stochastic game formulation for
cognitive radio systems: A two by two case,” in IEEE International cognitive radio networks,” in 3rd IEEE Symposium on New Frontiers
Conference on Systems, Man and Cybernetics (SMC ’09), San Antonio, in Dynamic Spectrum Access Networks (DySPAN ’08), Chicago, IL,
TX, Oct. 2009, pp. 1893 –1898. Oct. 2008, pp. 1 –5.
[119] J. Peters and S. Schaal, “Policy gradient methods for robotics,” in [141] Y. Xu, J. Wang, Q. Wu, A. Anpalagan, and Y.-D. Yao, “Opportunistic
IEEE/RSJ International Conference on Intelligent Robots and Systems spectrum access in unknown dynamic environment: A game-theoretic
(2006), Beijing, China, Oct. 2006, pp. 2219 –2225. stochastic learning solution,” IEEE Trans. Wireless Commun., vol. 11,
[120] M. Riedmiller, J. Peters, and S. Schaal, “Evaluation of policy gra- no. 4, pp. 1380 –1391, Apr. 2012.
dient methods and variants on the cart-pole benchmark,” in IEEE [142] T. Ferguson, “A Bayesian analysis of some nonparametric problems,”
International Symposium on Approximate Dynamic Programming and The Annals of Statistics, vol. 1, pp. 209–230, 1973.
Reinforcement Learning (ADPRL ’07), Honolulu, HI, Apr. 2007, pp. [143] D. Blackwell and J. MacQueen, “Ferguson distribution via Polya urn
254 –261. schemes,” The Annals of Statistics, vol. 1, pp. 353–355, 1973.
[121] D. Fudenberg and J. Tirole, Game Theory. MIT Press, 1991. [144] M. Jordan. (2005) Dirichlet processes, Chinese restaurant processes and
[122] P. Zhou, W. Yuan, W. Liu, and W. Cheng, “Joint power and rate control all that. [Online]. Available: http://www.cs.berkeley.edu/ jordan/nips-
in cognitive radio networks: A game-theoretical approach,” in Proc. tutorial05.ps
IEEE International Conference on Communications (ICC’08), May [145] N. Tawara, S. Watanabe, T. Ogawa, and T. Kobayashi, “Speaker clus-
2008, pp. 3296 – 3301. tering based on utterance-oriented Dirichlet process mixture model,” in
[123] A. R. Fattahi, F. Fu, M. V. D. Schaar, and F. Paganini, “Mechanism- 12th Annual Conference of the International Speech Communication
based resource allocation for multimedia transmission over spectrum Association (ISCA ’11), Florence, Italy, Aug. 2011, pp. 2905–2908.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1158 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013
[146] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and [168] Z. Sun, G. Bradford, and J. Laneman, “Sequence detection algorithms
E. Teller, “Equation of state calculations by fast computing machines,” for PHY-layer sensing in dynamic spectrum access networks,” IEEE
The Journal of Chemical Physics, vol. 21, no. 6, pp. 1087–1092, J. Sel. Topics Signal Process., vol. 5, no. 1, pp. 97 –109, Feb. 2011.
1953. [Online]. Available: http://dx.doi.org/10.1063/1.1699114 [169] D. Cabric, “Addressing feasibility of cognitive radios,” IEEE Signal
[147] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, Processing Mag., vol. 25, no. 6, pp. 85 –93, Nov. 2008.
and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. [170] Z. Han, R. Fan, and H. Jiang, “Replacement of spectrum sensing in
Mach. Intell., vol. PAMI-6, no. 6, pp. 721 –741, nov. 1984. cognitive radio,” IEEE Trans. Wireless Commun., vol. 8, no. 6, pp.
[148] N. Shetty, S. Pollin, and P. Pawelczak, “Identifying spectrum usage 2819 –2826, June 2009.
by unknown systems using experiments in machine learning,” in IEEE [171] S. Jha, U. Phuyal, M. Rashid, and V. Bhargava, “Design of OMC-
Wireless Communications and Networking Conference (WCNC ’09), MAC: An opportunistic multi-channel MAC with QoS provisioning for
Budapest, Hungary, Apr. 2009, pp. 1 –6. distributed cognitive radio networks,” IEEE Trans. Wireless Commun.,
[149] G. Yu, R. Huang, and Z. Wang, “Document clustering via Dirichlet vol. 10, no. 10, pp. 3414 –3425, Oct. 2011.
process mixture model with feature selection,” in Proc. 16th ACM [172] B. Wang, K. Liu, and T. Clancy, “Evolutionary game framework for
SIGKDD International conference on Knowledge Discovery and Data behavior dynamics in cooperative spectrum sensing,” in IEEE Global
mining (KDD ’10). New York, NY, USA: ACM, 2010, pp. 763–772. Telecommunications Conference (IEEE GLOBECOM ’08), Dec. 2008,
[Online]. Available: http://doi.acm.org/10.1145/1835804.1835901 pp. 1 –5.
[150] S. S. Haykin, Neural networks : A Comprehensive Foundation, 2nd ed. [173] E. C. Y. Peh, Y.-C. Liang, Y. L. Guan, and Y. Zeng, “Power control
Prentice Hall, Jul. 1999. in cognitive radios under cooperative and non-cooperative spectrum
[151] N. Baldo and M. Zorzi, “Learning and adaptation in cognitive radios sensing,” IEEE Trans. Wireless Commun., vol. 10, no. 12, pp. 4238
using neural networks,” in 5th IEEE Consumer Communications and –4248, Dec. 2011.
Networking Conference (CCNC ’08), Jan. 2008, pp. 998 –1003. [174] M. van der Schaar and F. Fu, “Spectrum access games and strategic
[152] N. Baldo, B. Tamma, B. Manojt, R. Rao, and M. Zorzi, “A neural learning in cognitive radio networks for delay-critical applications,”
network based cognitive controller for dynamic channel selection,” in Proc. IEEE, vol. 97, no. 4, pp. 720 –740, Apr. 2009.
IEEE International Conference on Communications (ICC ’09), June [175] L. Chen, S. Iellamo, M. Coupechoux, and P. Godlewski, “An auction
2009, pp. 1 –5. framework for spectrum allocation with interference constraint in
cognitive radio networks,” in Proc. IEEE INFOCOM ’10, Mar. 2010,
[153] Y.-J. Tang, Q.-Y. Zhang, and W. Lin, “Artificial neural network based
pp. 1 –9.
spectrum sensing method for cognitive radio,” in 6th International
[176] M. Haddad, S. Elayoubi, E. Altman, and Z. Altman, “A hybrid
Conference on Wireless Communications Networking and Mobile Com-
puting (WiCOM ’10), Sep. 2010, pp. 1 –4. approach for radio resource management in heterogeneous cognitive
networks,” IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 831 –842,
[154] M. I. Taj and M. Akil, “Cognitive radio spectrum evolution prediction
Apr. 2011.
using artificial neural networks based multivariate time series model-
[177] S. Buljore, M. Muck, P. Martigne, P. Houze, H. Harada, K. Ishizu,
ing,” Wireless Conference 2011 - Sustainable Wireless Technologies
O. Holland, A. Mikhailovic, K. A. Tsagkariss, O. Sallent, M. S. G.
(European Wireless), 11th European, pp. 1 –6, Apr. 2011.
Clemo, V. Ivanov, K. Nolte, and M. Stametalos, “Introduction to IEEE
[155] J. Popoola and R. van Olst, “A novel modulation-sensing method,” p1900.4 activities,” IEICE Trans. Commun., vol. E91-B, no. 1, 2008.
IEEE Veh. Technol. Mag., vol. 6, no. 3, pp. 60 –69, Sep. 2011.
[156] M. Han, J. Xi, S. Xu, and F.-L. Yin, “Prediction of chaotic time series
based on the recurrent predictor neural network,” IEEE Trans. Signal
Process., vol. 52, no. 12, pp. 3409 – 3416, dec. 2004.
[157] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
Mario Bkassiny (S’06) received the B.E. degree
[158] T. Atwood, “RF channel characterization for cognitive radio using sup- in Electrical Engineering with High Distinction and
port vector machines,” Ph.D. dissertation, University of New Mexico,
the M.S. degree in Computer Engineering from the
Nov. 2009. Lebanese American University, Lebanon, in 2008
[159] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training and 2009, respectively. He is currently working
algorithm for optimal margin classifiers,” in Proc. fifth annual towards his PhD degree in Electrical Engineering
workshop on Computational Learning Theory, ser. COLT ’92. New at the Communication and Information Sciences
York, NY, USA: ACM, 1992, pp. 144–152. [Online]. Available: Laboratory (CISL), Department of Electrical and
http://doi.acm.org/10.1145/130385.130401 Computer Engineering at the University of New
[160] M. Martinez-Ramon and C. G. Christodoulou, Support Vector Ma- Mexico, Albuquerque, NM, USA. His current re-
chines for Antenna Array Processing and Electromagnetics, 1st ed., search interests are in cognitive radios, distributed
C. A. Balanis, Ed. USA: Morgan and Claypool Publishers, 2006. learning and reasoning, cognitive and cooperative communications, machine
[161] H. Hu, J. Song, and Y. Wang, “Signal classification based on spectral learning and dynamic spectrum leasing (DSL).
correlation analysis and SVM in cognitive radio,” in 22nd International
Conference on Advanced Information Networking and Applications
(AINA ’08), Okinawa, Japan, Mar. 2008, pp. 883 –887.
[162] G. Xu and Y. Lu, “Channel and modulation selection based on support
vector machines for cognitive radio,” in International Conference Yang Li received the B.E. degree in Electrical
on Wireless Communications, Networking and Mobile Computing Engineering from the Beijing University of Aero-
(WiCOM ’06), Sep. 2006, pp. 1 –4. nautics and Astronautics, Beijing, China, in 2005
[163] L. Hai-Yuan and J.-C. Sun, “A modulation type recognition method and the M.S. degree in Electrical Engineering from
using wavelet support vector machines,” in 2nd International Congress New Mexico Institute of Mining and Technology,
on Image and Signal Processing (CISP ’09), Oct. 2009, pp. 1 –4. Socorro, New Mexico, USA in 2009. He is cur-
[164] Z. Yang, Y.-D. Yao, S. Chen, H. He, and D. Zheng, “MAC protocol rently working towards his PhD degree in Electrical
classification in a cognitive radio network,” in 19th Annual Wireless Engineering at the Communication and Information
and Optical Communications Conference (WOCC ’10), May 2010, pp. Sciences Laboratory (CISL), Department of Electri-
1 –5. cal and Computer Engineering at the University of
[165] M. Petrova, P. Ma andho andnen, and A. Osuna, “Multi-class clas- New Mexico, Albuquerque, NM, USA. His current
sification of analog and digital signals in cognitive radios using research interests are in cognitive radios, spectrum sensing, cooperative
support vector machines,” in 7th International Symposium on Wireless communications, and dynamic spectrum access (DSA).
Communication Systems (ISWCS ’10), Sep. 2010, pp. 986 –990.
[166] D. Zhang and X. Zhai, “SVM-based spectrum sensing in cognitive
radio,” in 7th International Conference on Wireless Communications,
Networking and Mobile Computing (WiCOM ’11), Sep. 2011, pp. 1
–4.
[167] T. D. Atwood, M. Martnez-Ramon, and C. G. Christodoulou, “Robust
support vector machine spectrum estimation in cognitive radio,” in
Proc. 2009 IEEE International Symposium on Antennas and Propa-
gation and USNC/URSI National Radio Science Meeting, 2009.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1159
Sudharman K. Jayaweera (S’00, M’04, SM’09) Vehicles Directorate (AFRL/RVSV) during the summers of 2009-2011.
was born in Matara, Sri Lanka. He received the Dr. Jayaweera is currently an associate editor of IEEE Trans. Vehicular
B.E. degree in Electrical and Electronic Engineering Technology and EURASIP Journal on Advances in Signal Processing. He
with First Class Honors from the University of has also served as a member of the Technical Program Committees of
Melbourne, Australia, in 1997 and M.A. and PhD numerous IEEE conferences including ICC (2010-2012), Globecom (2006,
degrees in Electrical Engineering from Princeton 2008, 2009,2011), WCNC (2011, 2012) and PIMRC (2011, 2012). His
University in 2001 and 2003, respectively. He is current research interests include cooperative and cognitive communications,
currently an associate Professor in Electrical Engi- information theory of networked-control systems, control and optimization in
neering at the Department of Electrical and Com- smart-grid, machine learning techniques for cognitive radios, statistical signal
puter Engineering at University of New Mexico, processing and wireless sensor networks.
Albuquerque, NM. Dr. Jayaweera held an Air Force
Summer Faculty Fellowship at the Air Force Research Laboratory, Space
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.