0% found this document useful (0 votes)

46 views24 pages

A Survey On Machine-Learning Techniques in Cognitive Radios

Uploaded by

Kavindar Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views24 pages

A Survey On Machine-Learning Techniques in Cognitive Radios

Uploaded by

Kavindar Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

1136 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO.

3, THIRD QUARTER 2013

A Survey on Machine-Learning Techniques in

Cognitive Radios
Mario Bkassiny, Student Member, IEEE, Yang Li, Student Member, IEEE, and
Sudharman K. Jayaweera, Senior Member, IEEE

Abstract—In this survey paper, we characterize the learning interweave paradigms for spectrum co-existence by secondary
problem in cognitive radios (CRs) and state the importance of CRs in licensed spectrum bands [10].
artificial intelligence in achieving real cognitive communications
systems. We review various learning problems that have been To perform its cognitive tasks, a CR should be aware of its
studied in the context of CRs classifying them under two main RF environment. It should sense its surrounding environment
categories: Decision-making and feature classification. Decision- and identify all types of RF activities. Thus, spectrum sensing
making is responsible for determining policies and decision was identified as a major ingredient in CRs [6]. Many sensing
rules for CRs while feature classification permits identifying and
techniques have been proposed over the last decade [15], [39],
classifying different observation models. The learning algorithms
encountered are categorized as either supervised or unsupervised [40], based on matched filter, energy detection, cyclostationary
algorithms. We describe in detail several challenging learning detection, wavelet detection and covariance detection [30],
issues that arise in cognitive radio networks (CRNs), in particular [41]–[46]. In addition, cooperative spectrum sensing was
in non-Markovian environments and decentralized networks, and proposed as a means of improving the sensing accuracy by
present possible solution methods to address them. We discuss
addressing the hidden terminal problems inherent in wireless
similarities and differences among the presented algorithms and
identify the conditions under which each of the techniques may networks in [15], [33], [34], [42], [47]–[49]. In recent years,
be applied. cooperative CRs have also been considered in literature as in
[50]–[53]. Recent surveys on CRs can be found in [41], [54],
Index Terms—Artificial intelligence, cognitive radio, decision-
making, feature classification, machine learning, supervised [55]. A survey on the spectrum sensing techniques for CRs
learning, unsupervised learning, . can be found in [39]. Several surveys on the DSA techniques
and the medium access control (MAC) layer operations for
the CRs are provided in [56]–[60].
I. I NTRODUCTION
In addition to being aware of its environment, and in order

T HE TERM cognitive radio (CR) has been used to refer

to radio devices that are capable of learning and adapting
to their environment [1], [2]. Cognition, from the Latin word
to be really cognitive, a CR should be equipped with the
abilities of learning and reasoning [1], [2], [5], [61], [62].
These capabilities are to be embedded in a cognitive engine
cognoscere (to know), is defined as a process involved in which has been identified as the core of a CR [63]–[68],
gaining knowledge and comprehension, including thinking, following the pioneering vision of [2]. The cognitive engine
knowing, remembering, judging and problem solving [3]. A is to coordinate the actions of the CR by making use of
key aspect of any CR is the ability for self-programming or machine learning algorithms. However, only in recent years
autonomous learning [4], [5]. In [6], Haykin envisioned CRs there has been a growing interest in applying machine learning
to be brain-empowered wireless devices that are specifically algorithms to CRs [38], [69]–[72].
aimed at improving the utilization of the electromagnetic In general, learning becomes necessary if the precise effects
spectrum. According to Haykin, a CR is assumed to use the of the inputs on the outputs of a given system are not known
methodology of understanding-by-building and is aimed to [69]. In other words, if the input-output function of the system
achieve two primary objectives: Permanent reliable commu- is unknown, learning techniques are required to estimate that
nications and efficient utilization of the spectrum resources function in order to design proper inputs. For example, in
[6]. With this interpretation of CRs, a new era of CRs began, wireless communications, the wireless channels are non-ideal
focusing on dynamic spectrum sharing (DSS) techniques to and may cause uncertainty. If it is desired to reduce the proba-
improve the utilization of the crowded RF spectrum [6]–[10]. bility of error over a wireless link by reducing the coding rate,
This led to research on various aspects of communications and learning techniques can be applied to estimate the wireless
signal processing required for dynamic spectrum access (DSA) channel characteristics and to determine the specific coding
networks [6], [11]–[38]. These included underlay, overlay and rate that is required to achieve a certain probability of error
[69]. The problem of channel estimation is relatively simple
Manuscript received 27 January 2012; revised 7 July 2012. This work was
supported in part by the National Science foundation (NSF) under the grant and can be solved via estimation algorithms [73]. However,
CCF-0830545. in the case of CRs and cognitive radio networks (CRNs),
M. Bkassiny, Y. Li and S. K. Jayaweera are with the Department of Elec- problems become more complicated with the increase in the
trical and Computer Engineering, University of New Mexico, Albuquerque,
NM, USA (e-mail: {bkassiny, yangli, jayaweera}@ece.unm.edu). degrees of freedom of wireless systems especially with the
Digital Object Identifier 10.1109/SURV.2012.100412.00017 introduction of highly-reconfigurable software-defined radios
1553-877X/13/$31.00
c 2013 IEEE
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1137

(SDRs). In this case, several parameters and policies need to be and multi-agent systems can be unsatisfactory [88]–[91]. Other
adjusted simultaneously (e.g. transmit power, coding scheme, types of learning mechanisms such as evolutionary learning
modulation scheme, sensing algorithm, communication pro- [89], [92], learning by imitation, learning by instruction [93]
tocol, sensing policy, etc.) and no simple formula may be and policy-gradient methods [90], [91] have been shown to
able to determine these setup parameters simultaneously. This outperform RL on certain problems under such conditions.
is due to the complex interactions among these factors and For example, the policy-gradient approach has been shown to
their impact on the RF environment. Thus, learning methods be more efficient in partially observable environments since it
can be applied to allow efficient adaption of the CRs to searches directly for optimal policies in the policy space, as
their environment, yet without the complete knowledge of we shall discuss later in this paper [90], [91].
the dependence among these parameters [74]. For example, Similarly, learning in multi-agent environments has been
in [71], [75], threshold-learning algorithms were proposed to considered in recent years, especially when designing learning
allow CRs to reconfigure their spectrum sensing processes policies for CRNs. For example, [94] compared a cognitive
under uncertainty conditions. network to a human society that exhibits both individual
The problem becomes even more complicated with hetero- and group behaviors, and a strategic learning framework for
geneous CRNs. In this case, a CR not only has to adapt to cognitive networks was proposed in [95]. An evolutionary
the RF environment, but also it has to coordinate its actions game framework was proposed in [96] to achieve adaptive
with respect to the other radios in the network. With only learning in cognitive users during their strategic interactions.
a limited amount of information exchange among nodes, a By taking into consideration the distributed nature of CRNs
CR needs to estimate the behavior of other nodes in order and the interactions among the CRs, optimal learning methods
to select its proper actions. For example, in the context of can be obtained based on cooperative schemes, which helps
DSA, CRs try to access idle primary channels while limiting avoid the selfish behaviors of individual nodes in a CRN.
collisions with both licensed and other secondary cognitive One of the main challenges of learning in distributed CRNs
users [38]. In addition, if the CRs are operating in unknown is the problem of action coordination [88]. To ensure optimal
RF environments [5], conventional solutions to the decision behavior, centralized policies may be applied to generate
process (i.e. Dynamic Programming in the case of Markov optimal joint actions for the whole network. However, central-
Decision Processes (MDPs) [76]) may not be feasible since ized schemes are not always feasible in distributed networks.
they require complete knowledge of the system. On the other Hence, the aim of cognitive nodes in distributed networks is to
hand, by applying special learning algorithms such as the apply decentralized policies that ensure near-optimal behavior
reinforcement learning (RL) [38], [74], [77], it is possible to while reducing the communication overhead among nodes. For
arrive at the optimal solution to the MDP, without knowing example, a decentralized technique that was proposed in [3],
the transition probabilities of the Markov model. Therefore, [97] was based on the concept of docitive networks, from the
given the reconfigurability requirements and the need for Latin word docere (to teach), which establishes knowledge
autonomous operation in unknown and heterogeneous RF transfer (i.e. teaching) over the wireless medium [3]. The
environment, CRs may use learning algorithms as a tool objective of docitive networks is to reduce the cognitive
for adaptation to the environment and to coordinate with complexity, speed up the learning rate and generate better and
peer radio devices. Moreover, incorporation of low-complexity more reliable decisions [3]. In a docitive network, radios teach
learning algorithms can lead to reduced system complexities each others by interchanging knowledge such that each node
in CRs. attempts to learn from a more intelligent node. The radios are
A look at the recent literature on CRs reveals that both not only supposed to teach end-results, but rather elements of
supervised and unsupervised learning techniques have been the methods of getting there [3]. For example, in a docitive
proposed for various learning tasks. The authors in [65], [78], network, new upcoming radios can acquire certain policies
[79] have considered supervised learning based on neural from existing radios in the network. Of course, there will
networks and support vector machines (SVMs) for CR ap- be communication overhead during the knowledge transfer
plications. On the other hand, unsupervised learning, such as process. However, as it is demonstrated in [3], [97], this
RL, has been considered in [80], [81] for DSS applications. overhead is compensated by the policy improvement achieved
The distributed Q-learning algorithm has been shown to be due to cooperative docitive behavior.
effective in a particular CR application in [77]. For example, in
[82], CRs used the Q-learning to improve detection and clas-
sification performance of primary signals. Other applications A. Purpose of this paper
of RL to CRs can be found, for example, in [14], [83]–[85]. This paper discusses the role of learning in CRs and
Recent work in [86] introduces novel approaches to improve emphasizes how crucial the autonomous learning ability in
the efficiency of RL by adopting a weight-driven exploration. realizing a real CR device. We present a survey of the state-of-
Unsupervised Bayesian non-parametric learning based on the the-art achievements in applying machine learning techniques
Dirichlet process was proposed in [13] and was used for signal to CRs.
classification in [72]. A robust signal classification algorithm It is perhaps helpful to emphasize how this paper is different
was also proposed in [87], based on unsupervised learning. from other related survey papers. The most relevant is the
Although the RL algorithms (such as Q-learning) may pro- survey of artificial intelligence for CRs provided in [98] which
vide a suitable framework for autonomous unsupervised learn- reviews several CR implementations that used the following
ing, their performance in partially observable, non-Markovian artificial intelligence techniques: artificial neural networks
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1138 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

Information Knowledge
Perception Learning Reasoning

• Sensing the • Classifying • Achieving

environment and goals and
and the generalizing decision-
internal information making
states

Intelligent Design

Fig. 1. An intelligent design can transform the acquired information into

knowledge by learning.

(ANNs), metaheuristic algorithms, hidden Markov models

(HMMs), rule-based reasoning (RBR), ontology-based reason-
ing (OBR), and case-based reasoning (CBR). To help readers
better understand the design issues, two design examples are Fig. 2. The cognition cycle of an autonomous cognitive radio (referred to
presented: one using an ANN and the other using CBR. The as the Radiobot) [5]. Decisions that drive Actions are made based on the
Observations and Learnt knowledge. The impact of actions on the system
first example uses ordinary laboratory testing equipment to performance and environment leads to new Learning. The Radiobot’s new
build a fast CR prototype. It also proves that, in general, an Observations are guided by this Learnt Knowledge of the effects of past
artificial intelligence technique (e.g., an ANN) can be chosen Actions.
to accomplish complicated parameter optimization in the CR
for a given channel state and application requirement. The
methodology of understanding-by-building to learn from the
second example builds upon the first example and develops
environment and adapt to statistical variations in the input
a refined cognitive engine framework and process flow based
stimuli” [6]. As a result, a CR is expected to be intelligent
on CBR.
by nature. It is capable of learning from its experience by
Artificial intelligence includes several sub-categories such
interacting with its RF environment [5]. According to [99],
as knowledge representation and machine learning, machine
learning should be an indispensable component of any intelli-
perception, among others. In our survey, however, we focus
gent system, which justifies it being designated a fundamental
on the special challenges that are encountered in applying
requirement of CRs.
machine learning techniques to CRs, given the importance
of learning in CR applications, as we mentioned earlier. In As identified in [99], there are three main conditions for
particular, we provide in-depth discussions on the different intelligence: 1) Perception, 2) learning and 3) reasoning, as
types of learning paradigms in the two main categories: illustrated in Fig. 1. Perception is the ability of sensing the
supervised learning and unsupervised learning. The machine surrounding environment and the internal states to acquire
learning techniques discussed in this paper include those information. Learning is the ability of transforming the ac-
that have been already proposed in the literature as well as quired information into knowledge by using methodologies
those that might be reasonably applied to CRs in future. The of classification and generalization of hypotheses. Finally,
advantages and limitations of these techniques are discussed knowledge is used to achieve certain goals through reasoning.
to identify perhaps the most suitable learning methods in As a result, learning is at the core of any intelligent device
a particular context or in learning a particular task or an including, in particular, CRs. It is the fundamental tool that
attribute. Moreover, we provide discussions on the central- allows a CR to acquire knowledge from its observed data.
ized and decentralized learning techniques as well as the In the followings, we discuss how the above three con-
challenging machine learning problems in the non-Markovian stituents of intelligence are built into CRs. First, perception
environments. can be achieved through the sensing measurements of the
spectrum. This allows the CR to identify ongoing RF activities
B. Organization of the paper in its surrounding environment. After acquiring the sensing
observations, the CR tries to learn from them in order to
This survey paper is organized as follows: Section II de- classify and organize the observations into suitable categories
fines the learning problem in CRs and presents the different (knowledge). Finally, the reasoning ability allows the CR
learning paradigms. Sections III and IV present the decision- to use the knowledge acquired through learning to achieve
making and feature classification problems, respectively. In its objectives. These characteristics were initially specified
Section V, we describe the learning problem in centralized by Mitola in defining the so-called cognition cycle [1]. We
and decentralized CRNs and we conclude the paper in Section illustrate in Fig. 2 an example of a simplified cognition cycle
VI. that was proposed in [5] for autonomous CRs, referred to
as Radiobots [62]. Figure 2 shows that Radiobots can learn
II. N EED OF LEARNING IN COGNITIVE RADIOS
from their previous actions by observing their impact on the
A. Definition of the learning problem outcomes. The learning outcomes are then used to update, for
A CR is defined to be “an intelligent wireless communi- example, the sensing (i.e. observation) and channel access (i.e.
cation system that is aware of its environment and uses the decision) policies in DSA applications [6], [16], [35], [38].
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1139

Learning
Paradigms in
CR’s

Unsupervised Supervised
Learning Learning

Bayesian Non-
Reinforcement Artificial Neural Support Vector
Parametric Game Theory
Learning Networks Machine
Approaches

Fig. 3. Supervised and unsupervised learning approaches for cognitive radios.

B. Unique characteristics of cognitive radio learning prob- To sum up, the three main characteristics that need to be
lems considered when designing efficient learning algorithms for
Although the term cognitive radio has been interpreted CRs are:
differently in various research communities [5], perhaps the 1) Learning in partially observable environments.
most widely accepted definition is as a radio that can sense and 2) Multi-agent learning in distributed CRNs.
adapt to its environment [2], [5], [6], [69]. The term cognitive 3) Autonomous learning in unknown RF environments.
implies awareness, perception, reasoning and judgement. As A CR design that embeds the above capabilities will be able
we already pointed out earlier, in order for a CR to derive to operate efficiently and optimally in any RF environment.
reasoning and judgement from perception, it must possess
the ability for learning [99]. Learning implies that the current
actions should be based on past and current observations of C. Types of learning paradigms: Supervised versus unsuper-
the environment [100]. Thus, history plays a major role in the vised learning
learning process of CRs.
Several learning problems are specific to CR applications Learning can be either supervised or unsupervised, as
due to the nature of the CRs and their operating RF environ- depicted in Fig. 3. Unsupervised learning may particularly be
ments. First, due to noisy observations and sensing errors, CRs suitable for CRs operating in alien RF environments [5]. In
can only obtain partial observations of their state variables. this case, autonomous unsupervised learning algorithms permit
The learning problem is thus equivalent to a learning process exploring the environment characteristics and self-adapting
in a partially observable environment and must be addressed actions accordingly without having any prior knowledge [5],
accordingly. [71]. However, if the CR has prior information about the envi-
Second, CRs in CRNs try to learn and optimize their ronment, it might exploit this knowledge by using supervised
behaviors simultaneously. Hence, the problem is naturally a learning techniques. For example, if certain signal waveform
multi-agent learning process. Furthermore, the desired learn- characteristics are known to the CR prior to its operation,
ing policy may be based on either cooperative or non- training algorithms may help CRs to better detect signals with
cooperative schemes and each CR might have either full or those characteristics.
partial knowledge of the actions of the other cognitive users in In [93], the two categories of supervised and unsupervised
the network. In the case of partial observability, a CR might learning are identified as learning by instruction and learn-
apply special learning algorithms to estimate the actions of ing by reinforcement, respectively. A third learning regime
the other nodes in the network before selecting its appropriate is defined as the learning by imitation in which an agent
actions, as in, for example, [88]. learns by observing the actions of similar agents [93]. In
Finally, autonomous learning methods are desired in order [93], it was shown that the performance of a learning agent
to enable CRs to learn on its own in an unknown RF (learner) is influenced by its learning regime and its operating
environment. In contrast to licensed wireless users, a truly CR environment. Thus, to learn efficiently, a CR must adopt the
may be expected to operate in any available spectrum band, at best learning regime for a given learning problem, whether it
any time and in any location [5]. Thus, a CR may not have any is learning by imitation, by reinforcement or by instruction
prior knowledge of the operating RF environment such as the [93]. Of course, some learning regimes may not be applicable
noise or interference levels, noise distribution or user traffics. under certain circumstances. For example, in the absence of an
Instead, it should possess autonomous learning algorithms that instructor, the CR may not be able to learn by instruction and
may reveal the underlying nature of the environment and its may have to resort to learning by reinforcement or imitation.
components. This makes the unsupervised learning a perfect An effective CR architecture is the one that can switch among
candidate for such learning problems in CR applications, as different learning regimes depending on its requirements, the
we shall point out throughout this survey paper. available information and the environment characteristics.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1140 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

CR Learning
Problems

Decision-
Classification making

Policy- Decision-
making rules
Supervised
(Data Unsupervised
Labelling)
Single-agent/ Multi-agent/ Parameter
centralized decentralized optimization

Empirical risk Structural risk Non- Non-Markov Gradient

Parametric Markov State Markov State
minimization minimization Parametric State approach

x K-means Gradient-policy Reinforcement Stochastic Game

ANN SVM DPMM Game Theory Threshold learning
x GMM search Learning or Markov Game

Fig. 4. Typical problems in cognitive radio and their corresponding learning algorithms.

D. Learning problems in cognitive radio that minimizes the empirical risk:

1
N
In this survey, we discuss several learning algorithms that R(g) = Remp (g) = L(yi , g(xi )) , (1)
N i=1
can be used by CRs to achieve different goals. In order to
obtain a better insight on the functions and similarities among where L : Y × Y → R+ is a loss function. Hence,
the presented algorithms, we identify two main problem cate- ANN algorithms find the function g that best fits the data.
gories and show the learning algorithms under each category. However, if the function space G includes too many candidates
The hierarchical organization of the learning algorithms and or the training set is not sufficiently large (i.e. small N ),
their dependence is illustrated in Fig. 4. empirical risk minimization may lead to high variance and
Referring to Fig. 4, we identify two main CR problems (or poor generalization, which is known as overfitting. In order to
tasks) as: prevent overfitting, structural risk minimization can be used,
which incorporates a regularization penalty to the optimization
1) Decision-making. process [101]. This can be done by minimizing the following
2) Feature classification. risk function:
These problems are general in a sense that they cover a wide R(g) = Remp (g) + λC(g) , (2)
range of CR tasks. For example, classification problems arise
in spectrum sensing while decision-making problems arise in where λ controls the bias/variance tradeoff and C is a penalty
determining the spectrum sensing policy, power control or function [101].
adaptive modulation. In contrast with the supervised approaches, unsupervised
classification algorithms do not require labeled training data
The learning algorithms that are presented in this paper and can be classified as being either parametric or non-
can be classified under the above two tasks, and can be parametric. Unsupervised parametric classifiers include the K-
applied under specific conditions, as illustrated in Fig. 4. For means and Gaussian mixture model (GMM) algorithms and
example, the classification algorithms can be split into two require prior knowledge of the number of classes (or clusters).
different categories: Supervised and unsupervised. Supervised On the other hand, non-parametric unsupervised classifiers do
algorithms require training with labeled data and include, not require prior knowledge of the number of clusters and
among others, the ANN and SVM algorithms. The ANN can estimate this quantity from the observed data itself, for
algorithm is based on empirical risk minimization and does example using methods based on the Dirichlet process mixture
require prior knowledge of the observed process distribution, model (DPMM) [72], [104], [105].
as opposed to structural models [101]–[103]. However, SVM Decision-making is another major task that has been widely
algorithms, which are based on structural risk minimization, investigated in CR applications [17], [24]–[26], [35], [38],
have shown superior performance, in particular for small [77], [106]–[110]. Decision-making problems can in turn be
training examples, since they avoid the problem of overfitting split to policy-making and decision rules. Policy-making prob-
[101], [103]. lems can be classified as either centralized or decentralized.
For instance, consider a set of training data denoted as In a policy-making problem, an agent determines its optimal
{(x1 , y1 ), · · · , (xN , yN )} such that xi ∈ X, yi ∈ Y , ∀i ∈ set of actions over a certain time duration, thus defining
{1, · · · , N }. The objective of a supervised learning algorithm an optimal policy (or an optimal strategy in game theory
is to find a function g : X → Y that maximizes a certain terminology). In a centralized scenario with a Markov state,
score function [101]. In ANN, g is defined as the function RL algorithms can be used to obtain optimal solution to the
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1141

corresponding MDP, without prior knowledge of the transition

probabilities [74], [76]. In non-Markov environments, optimal
policies can be obtained based on gradient policy search
algorithms which search directly for solutions in the policy
space. On the other hand, for multi-agent scenarios, game
theory is proposed as a solution that can capture the distributed
nature of the environment and the interactions among users.
With a Markov state assumption, the system can be modeled
as a Markov game (or a stochastic game), while conventional
game models can be used, otherwise. Note that learning
algorithms can be applied to the game-theoretic models (such
as the no-regret learning [111]–[113]) to arrive at equilibrium
under uncertainty conditions.
Finally, decision rules form another class of decision-
making problems which can be formulated as hypothesis test-
ing problems for certain observation models. In the presence
of uncertainty about the observation model, learning tools can
be applied to implement a certain decision-rule. For example,
the threshold-learning algorithm proposed in [72], [114] was
used to optimize the threshold of the Neyman-Pearson test
under uncertainty about the noise distribution.
In brief, we have identified two main classes of problems
Fig. 5. The reinforcement learning cycle: At the beginning of each learning
and have determined the conditions under which certain al- cycle, the agent receives a full or partial observation of the current state, as
gorithms can be applied for these problems. For example, the well as the accrued reward. By using the state observation and the reward
DPMM algorithm can be applied for classification problems value, the agent updates its policy (e.g. updating the Q-values) during the
learning stage. Finally, during the decision stage, the agent selects a certain
if the number of clusters is unknown, whereas the SVM may action according to the updated policy.
be better suited if labeled data is available for training.
The learning algorithms that are presented in this survey
help to optimize the behavior of the learning agent (in par- action is. The agent’s objective is to maximize these rewards
ticular the CR) under uncertainty conditions. For example, by exploiting the system.
the RL leads to the optimal policy for MDPs [74] while An RL-based cognition cycle for CRs was defined in [81],
game theory leads to Nash equilibrium, whenever it exists, of as illustrated in Fig. 5. It shows the interactions between the
certain types of games [115]. The SVM algorithm optimizes CR and its RF environment. The learning agent receives an
the structural risk by finding a global minimum, whereas the observation ot of the state st at time instant t. The observation
ANN only leads to local minimum of the empirical risk [102], is accompanied by a delayed reward rt (st−1 , at−1 ) represent-
[103]. The DPMM is useful for non-parametric classification ing the reward received at time t resulting from taking action
and converges to the stationary probability distribution of the at−1 in state st−1 at time t − 1. The learning agent uses
Markov chain in the Markov-chain Monte-Carlo (MCMC) the observation ot and the delayed reward rt (st−1 , at−1 ) to
Gibbs sampling procedure [104], [116]. As a result, the pro- compute the action at that should be taken at time t. The
posed learning algorithms achieve certain optimality criterion action at results in a state transition from st to st+1 and a
within their application contexts. delayed reward rt+1 (st , at ). It should be noted that here the
learning agent is not passive and does not only observe the
outcomes from the environment, but also affects the state of
III. D ECISION - MAKING IN COGNITIVE RADIOS
the system via its actions such that it might be able to drive the
A. Centralized policy-making under Markov states: Reinforce- environment to a desired state that brings the highest reward
ment learning to the agent.
Reinforcement learning is a technique that permits an agent 1) RL for aggregate interference control: RL algorithms
to modify its behavior by interacting with its environment are applied under the assumption that the agent-environment
[74]. This type of learning can be used by agents to learn interaction forms an MDP. An MDP is characterized by the
autonomously without supervision. In this case, the only following elements [76]:
source of knowledge is the feedback an agent receives from • A set of decision epochs T including the point of times
its environment after executing an action. Two main features at which decisions are made. The time interval between
characterize the RL: trial-and-error and delayed reward. By decision epoch t ∈ T and decision epoch t + 1 ∈ T is
trial-and-error it is assumed that an agent does not have any denoted as period t.
prior knowledge about the environment, and executes actions • A finite set S of states for the agent (i.e. secondary user).
blindly in order to explore the environment. The delayed • A finite set A of actions that are available to the agent.
reward is the feedback signal that an agent receives from the In particular, in each state s ∈ S, a subset As ⊆ A might
environment after executing each action. These rewards can be available.

be positive or negative quantities, telling how good or bad an • A non-negative function pt (s |s, a) denoting the proba-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1142 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

bility that the system is in state s at time epoch t + 1, Obviously, the value iteration algorithm requires explicit
when the decision-maker chooses action a ∈ A in state knowledge of the transition probability p(s |s, a). On the other
s ∈ S at time t. Note that, the subscript t might be hand, an RL algorithm, referred to as the Q-learning, was
dropped from pt (s |s, a) if the system is stationary. proposed by Watkins in 1989 [117] to solve the MDP problem
• A real-valued function rtMDP (s, a) defined for state s ∈ without knowledge of the transition probabilities and has
S and action a ∈ A to denote the value at time t of been recently applied to CRs [38], [77], [82], [118]. The Q-
the reward received in period t [76]. Note that, in RL learning algorithm is one of the important temporal difference
literature, the reward function is usually defined as the (TD) methods [74], [117]. It has been shown to converge
delayed reward rt+1 (s, a) that is obtained at time epoch to the optimal policy when applied to single agent MDP
t + 1 after taking action a in state s at time t [74]. models (i.e. centralized control) in [117] and [74]. However, it
can also generate satisfactory near-optimal solutions even for
At each time epoch t, the agent observes the current state
decentralized partially observable MDPs (DEC-POMDPs), as
s and chooses an action a. An optimum policy maximizes
shown in [77]. The one-step Q-learning is defined as follows:
the total expected rewards, which is usually discounted by
a discount factor γ ∈ [0, 1) in case of an infinite time Q(st , at ) ← (1 − α)Q(st , at ) +
horizon. Thus, the objective is to find the optimal policy π
+ α rt+1 (st , at ) + γ max Q(st+1 , a) . (8)
that maximizes the expected discounted return [74]: a
∞
The learned action-value function, Q in (8), directly approx-
R(t) = γ k rt+k+1 (st+k , at+k ) , (3) imates the optimal action-value function Q∗ [74]. However, it
k=0 is required that all state-action pairs need to be continuously
where st and at are, respectively, the state and action at time updated in order to guarantee correct convergence to Q∗ .
t ∈ Z. This can be achieved by applying an ε-greedy policy that
The optimal solution of an MDP can be obtained by using ensures that all state-action pairs are updated with a non-
several methods such as the value iteration algorithm based zero probability, thus leading to an optimal policy [74]. If
on dynamic programming [76]1. Given a certain policy π, the the system is in state s ∈ S, the ε-greedy policy selects action
value of state s ∈ S is defined as the expected discounted a∗ (s) such that:
return if the system starts in state s and follows policy π arg maxa∈A Q(s, a) , with Pr = 1 − ε
thereafter [74], [76]. This value function can be expressed as a∗ (s) = ,
∼ U (A) , with Pr = ε
[74]: (9)
∞
where U (A) is the discrete uniform probability distribution
V (s) = Eπ
π
γ rt+k+1 (st+k , at+k )|st = s , (4)
k
over the set of actions A.
k=0 In [77], the authors applied the Q-learning to achieve
where Eπ {.} denotes the expected value given that the agent interference control in a cognitive network. The problem setup
follows policy π. Similarly, the value of taking action a in state of [77] is illustrated in Fig. 6 in which multiple IEEE 802.22
s under a policy π is defined as the action-value function [74]: WRAN cells are deployed around a Digital TV (DTV) cell
∞ such that the aggregated interference caused by the secondary

Q (s, a) = Eπ
π
γ rt+k+1 (st+k , at+k )|st = s, at = a .
k networks to the DTV network is below a certain threshold. In
k=0
this scenario, the CR (agents) constitutes a distributed network
(5) and each radio tries to determine how much power it can
The value iteration algorithm finds an ε-optimal policy transmit so that the aggregated interference on the primary
assuming stationary rewards and transition probabilities (i.e. receivers does not exceed a certain threshold level.
rt (s, a) = r(s, a) and pt (s |s, a) = p(s |s, a)). The algorithm In this system, the secondary base stations form the learning
initializes a v 0 (s) for each s ∈ S arbitrarily and iteratively agents that are responsible for identifying the current envi-
updates v n (s) (where v n (s) is the estimated value of state s ronment state, selecting the action based on the Q-learning
after the n-th iteration) for each s ∈ S as follows [76]: methodology and executing it. The state of the i-th WRAN
⎧ ⎫
⎨ ⎬ network at time t consists of three components and is defined

v n+1 (s) = max r(s, a) + γ p(j|s, a)v n (j) . (6) as [77]:
a∈A ⎩ ⎭ sit = {Iti , dit , pit } , (10)
j∈S

The algorithm stops when v n+1 − v n < ε 1−γ where Iti is a binary indicator specifying whether the sec-
2γ and the ε-
optimal decision d (s) of each state s ∈ S is defined as: ondary network generates interference to the primary network
⎧ ⎫ above or below the specified threshold, dit denotes an estimate
⎨ ⎬ of the distance between the secondary user and the interference
d (s) = arg max r(s, a) + γ p(j|s, a)v n+1 (j) . (7) contour, and pit denotes the current power at which the
a∈A ⎩ ⎭
j∈S secondary user i is transmitting. In the case of full state
observability, the secondary user has complete knowledge of
1 There are other algorithms that can be applied to find the optimal policy of
the state of the environment. However, in a partially observable
an MDP such as policy iteration and linear programming methods. Interested environment, the agent i has only partial information of the
readers are referred to [76] for additional information regarding these methods. actual state and uses a belief vector to represent the probability
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1143

eration among the learning agents. They proposed docitive

algorithms aimed at reducing the complexity of cooperative
learning in decentralized networks. Docitive algorithms are
based on the concept of knowledge sharing in which different
Secondary Base
Station nodes try to teach each other by exchanging their learning
skills. The learning skills do not simply consist of certain end
Primary
observations or decisions. Cognitive nodes in a docitive system
Base Station
can learn certain policies and learning techniques from other
nodes that have demonstrated superior performance.
In particular, the authors in [3], [97] applied the docition
paradigm to the same problem of aggregated interference
Protection control that was presented in [77] and described above. The
Contour authors compared the performance of the CRN under both
docitive and non-docitive policies and showed that docition
leads to superior performance in terms of speed of conver-
gence and precision (i.e. oscillations around the target SINR)
[3].
In [97], the authors proposed three different docitive ap-
proaches that can be applied in CRNs:
• Startup docition: Docitive radios teach their policies to
any newcoming radios joining the network. In practice,
this can be achieved by supplying the Q-table of the
Fig. 6. System model of [77] which is formed of a Digital TV (DTV) cell incumbent radios to newcomers. Hence, new radios do
and multiple WRAN cells. not have to learn from scratch, but instead can use
the learnt policies of existing radios to speed-up their
learning process. Note that, newcomer radios learn in-
distribution of the state values. In this case, the randomness in
dependently after initializing their Q-tables. However,
sit is only related to the parameter Iti which is characterized
the information and policy exchange among radios is
by two elements B = {b(1), b(2)}, i.e. the values of the
useful at the beginning of the learning process due to
probability mass function (pmf) of Iti .
high correlation among the different learning nodes in
The set of possible actions is the set P of power levels
the cognitive network.
that the secondary base station can assign to the i-th user.
• Adaptive docition: According to this technique, CRs
The cost cit denotes the immediate reward incurred due to the
share policies based on performances. The learning nodes
assignment of action a in state s and is defined as:
share information about the performance parameters of
2
c = SIN Rti − SIN RT h , (11) their learning processes such as variance of the oscilla-
tions with respect to the target and speed of convergence.
where SIN Rti is the instantaneous SINR at the control point Based on this information, cognitive nodes can learn from
of WRAN cell i whereas SIN RT h is the maximum value of neighboring nodes that are performing better.
SINR that can be perceived by primary receivers [77]. • Iterative docition: Docitive radios periodically share parts
By applying the Q-learning algorithm, the results in [77] of their policies based on the reliability of their expert
showed that it can control the interference to the primary re- knowledge. Expert nodes exchange rows of the Q-table
ceivers, even in the case of partial state observability. Thus, the corresponding to the states that have been previously
network can achieve equilibrium in a completely distributed visited.
way without the intervention of centralized controllers. By By comparing the docitive algorithms with the independent
using the past experience and by interacting with the en- learning case described in [77], the results in [97] showed
vironment, the decision-makers can achieve self-adaptation that docitive algorithms achieve faster convergence and more
progressively in real-time. Note that, a learning phase is accurate results. Furthermore, compared to other docitive
required to acquire the optimal/suboptimal policy. However, algorithms, iterative docitive algorithms have shown superior
once this policy is reached, the multi-agent system takes only performance [97].
one iteration to reach the optimal power configuration, starting
at any initial state [77].
2) Docition in cognitive radio networks: As we have dis- B. Centralized policy-making with non-Markovian states:
cussed above, the decentralized decision-makers of a CRN re- Gradient-policy search
quire a learning phase before acquiring an optimal/suboptimal While RL and value-iteration methods [74], [76] can lead
policy. The learning phase will cause delays in the adaptation to optimal policies for the MDP problem, their performance
process since each agent has to learn individually from scratch in non-Markovian environments remains questionable [90],
[3], [97]. In an attempt to resolve this problem, the authors in [91]. Hence, the authors in [89]–[91] proposed the policy-
[3], [97] proposed a timely solution to enhance the learning search approach as an alternative solution method for non-
process in decentralized CRNs by allowing efficient coop- Markovian learning tasks. Policy-search algorithms directly
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1144 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

look for optimal policies in the policy space itself, without designed to obtain reasonable approximations of the gradient.
having to estimate the actual states of the systems [90], [91]. In Indeed, several approaches have been proposed to estimate the
particular, by adopting policy gradient algorithms, the policy gradient policy vector, mainly in robotics applications [119],
vector can be updated to reach an optimal solution (or a local [120]. Three different approaches have been considered in
optimum) in non-Markovian environments. [120] for policy gradient estimation:
The value-iteration approach has several other limitations as 1) Finite difference (FD) methods.
well: First, it is restricted to deterministic policies. Second, any 2) Vanilla policy gradient (VPG) methods.
small changes in the estimated value of an action can cause 3) Natural policy gradient (NG) methods.
that action to be, or not to be selected [90]. This would affect
Finite difference (FD) methods, originally used in stochastic
the optimality of the resulting policy since optimal actions
simulations literature, are among the oldest policy gradient
might be eliminated due to an underestimation of their value
approaches. The idea is based on changing the current policy
functions.
parameter θk by small perturbations δθi and computing δηi =
On the other hand, the gradient-policy approach has shown
η(θk + δθi ) − η(θk ). The policy gradient ∇η(θ) can be thus
promising results, for example, in robotics applications [119],
estimated as:
[120]. Compared to value-iteration methods, the gradient-
policy approach requires fewer parameters in the learning −1
gF D = ΔΘT ΔΘ ΔΘΔη , (15)
process and can be applied in model-free setups not requiring
prefect knowledge of the controlled system. where ΔΘ = [δθ1 , · · · , δθI ]T , Δη = [δη1 , · · · , δηI ]T and
The policy-search approach can be illustrated by the fol- I is the number of samples [119], [120]. Advantages of this
lowing overview of policy-gradient algorithms from [91]. We
approach is that it is straightforward to implement and does not
consider a class of stochastic policies that are parameterized introduce significant noise to the system during exploration.
by θ ∈ RK . By computing the gradient with respect to However, the gradient estimate can be very sensitive to per-
θ of the average reward, the policy could be improved by
turbations (i.e. δθi ) which may lead to bad results [120].
adjusting the parameters in the gradient direction. To be Instead of perturbing the parameter θk of a deterministic
concrete, assume r(X) to be a reward function that depends policy u = π(x) (with u being the action and x being
on a random variable X. Let q(θ, x) be the probability of the
the state), the VPG approach assumes a stochastic policy
event {X = x}. The gradient with respect to θ of the expected u ∼ π(u|x) and obtains an unbiased gradient estimate [120].
performance η(θ) = E{r(X)} can be expressed as:
However, in using the VPG method, the variance of the gradi-
∇q(θ, x) ent estimate depends on the squared average magnitude of the
∇η(θ) = E r(X) . (12)
q(θ, x) reward, which can be very large. In addition, the convergence
of the VPG to the optimal solution can be very slow, even
An unbiased estimate of the gradient can be obtained via
with an optimal baseline [120]. The NG approach which leads
simulation by generating N independent identically distributed
to fast policy gradient algorithms can alleviate this problem.
(i.i.d.) random variables X1 , · · · , XN that are distributed
Natural gradient approaches use the Fisher information F (θ)
according to q(θ, x). The unbiased estimate of ∇η(θ) is thus
to characterize the information about the policy parameters
expressed as:
θ that is contained in the observed path τ [120]. A path (or
1 a trajectory) τ = [x0:H , u0:H ] is defined as the sequence of
N
ˆ ∇q(θ, Xi )
∇η(θ) = r(Xi ) . (13) states and actions, where H denotes the horizon which can
N i=1 q(θ, Xi )
be infinite [119]. Thus, the Fisher information F (θ) can be
By the law of large numbers, ∇η(θ) ˆ → ∇η(θ) with expressed as:
probability one. Note that the quantity ∇q(θ,X i)
q(θ,Xi ) is referred F (θ) = E ∇θ log p(τ |θ)∇θ log p(τ |θ)T , (16)
to as the likelihood ratio or the score function. By having an
estimate of the reward gradient, the policy parameter θ ∈ RK where p(τ |θ) is the probability of trajectory τ , given certain
can be updated by following the gradient direction, such that: policy parameter θ. For a given policy change δθ, there is an
information loss of lθ (δθ) ≈ δθT F (θ)δθ, which can also be
θk+1 ← θk + αk ∇η(θ) , (14) seen as the change in path distribution p(τ |θ). By searching
for some step size αk > 0. for the policy change δθ that maximizes the expected return
Authors in [119], [120] identify two major steps when η(θ + δθ) for a constant information loss lθ (δθ) ≈ ε, the
performing policy gradient methods: algorithms searches for the highest return value on an ellipse
1) A policy evaluation step in which an estimate of the around the current parameter θ and then goes in the direction
gradient ∇η(θ) of the expected return η(θ) is obtained, of the highest values. More formally, the direction of the
given a certain policy πθ . steepest ascent on the ellipse around θ can be expressed as
2) A policy improvement step which updates the policy [120]:
parameter θ through steepest gradient ascent θk+1 = δθ = arg max δθT ∇θ η(θ) = F −1 (θ)∇θ η(θ) . (17)
θk + αk ∇η(θ). δθ s.t. lθ (δθ)=ε

Note that, estimating the gradient ∇η(θ) is not straight- This algorithm is further explained in [120] and can be easily
forward, especially in the absence of simulators that generate implemented based on the Natural Actor-Critic algorithms
the Xi ’s. To resolve this problem, special algorithms can be [120].
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1145

By comparing the above three approaches, the authors in A non-cooperative game can be classified as either a
[120] showed that NG and VPG methods are considerably complete or an incomplete information game. In a complete
faster and result in better performance, compared to FD. How- information game, each player can observe the information
ever, FD has the advantage of being simpler and applicable in of other players such as their payoffs and their strategies.
more general situations. On the other hand, in an incomplete information game, this
information is not available to other players. A game with
incomplete information can be modeled as a Bayesian game
C. Decentralized policy-making: Game Theory
in which the game outcomes can be estimated based on
Game theory [121] presents a suitable platform for mod- Bayesian analysis. A Bayesian Nash equilibrium is defined
eling rational behavior among CRs in CRNs. There is a rich for the Bayesian game, similar to the Nash equilibrium in the
literature on game theoretic techniques in CR, as can be found complete information game [115].
in [11], [122]–[132]. A survey on game theoretic approaches In addition, a game can also be classified as either static or
for multiple access wireless systems can be found in [115]. dynamic. In a static game, each player takes its actions without
Game theory [121] is a mathematical tool that attempts to knowledge of the strategies taken by the other players. This
implement the behavior of rational entities in an environment is denoted as a one-shot game which ends when actions of
of conflict. This branch of mathematics has primarily been all players are taken and payoffs are received. In a dynamic
popular in economics, and has later found its way into game, however, a player selects an action in the current stage
biology, political science, engineering and philosophy [115]. based on the knowledge of the actions taken by the other
In wireless communications, game theory has been applied players in the current or previous stages. A dynamic game is
to data communication networking, in particular, to model also called a sequential game since it consists of a sequence
and analyze routing and resource allocation in competitive of repeated static games. The common equilibrium solution
environments. in dynamic games is the subgame perfect Nash equilibrium
A game model consists of several rational entities that which represents a Nash equilibrium of every subgame in the
are denoted as the players. Assuming a game model G = original game [115].
(N , (Ai )i∈N , (Ui )i∈N ), where N = {1, · · · , N } denotes the 2) Applications of Game Theory to Cognitive Radios:
set of N players and each player i ∈ N has a set Ai of avail- Several types of games have been adapted to model different
able actions and a utility function Ui . Let A = A1 × · · · × AN situations in CRNs [98]. For example, supermodular games
be the set of strategy profiles of all players. In general, the (the games having the following important and useful prop-
utility function of an individual player i ∈ N depends on erty: there exists at least one pure strategy Nash equilibrium)
the actions taken by all the players involved in the game and have been used for distributed power control in [133], [134]
is denoted as Ui (ai , a−i ), where ai ∈ Ai is an action (or and for rate adaptation in [135]. Repeated games were applied
strategy) of player i and a−i ∈ A−i is a strategy profile of for DSA by multiple secondary users that share the same
all players except player i. Each player selects its strategy in spectrum hole in [136]. In this context, repeated games are
order to maximize its utility function. A Nash equilibrium of a useful in building reputations and applying punishments in
game is defined as a point at which the utility function of each order to reinforce a certain desired outcome. The Stackelberg
player does not increase if the player deviates from that point, game model can be used as a model for implementing CR
given that all the other players’ actions are fixed. Formally, behavior in cooperative spectrum leasing where the primary
a strategy profile (a∗1 , · · · , a∗N ) ∈ A is a Nash equilibrium if users act as the game-leaders and secondary cognitive users
[112]: as the followers [50].
Auctions are one of the most popular methods used for
Ui (a∗i , a−i ) ≥ Ui (ai , a−i ), ∀i ∈ N , ∀ai ∈ Ai . (18) selling a variety of items, ranging from antiques to wireless
spectrum. In auction games the players are the buyers who
A key advantage of applying game theoretic solutions to must select the appropriate bidding strategy in order to max-
CR protocols is in reducing the complexity of adaptation algo- imize their perceived utility (i.e., the value of the acquired
rithms in large cognitive networks. While optimal centralized items minus the payment to the seller). The concept of auction
control can be computationally prohibitive in most CRNs, due games has successfully been applied to cooperative dynamic
to communication overhead and algorithm complexity, game spectrum leasing (DSL) in [37], [137], as well as to spectrum
theory presents a distributed platform to handle such situations allocation problems in [138]. The basics of the auction games
[98]. Another justification for applying game theoretic ap- and the open challenges of applying auction games to the field
proaches to CRs is the assumed cognition in the CR behavior, of spectrum management are discussed in [139].
which induces rationality among CRs, similar to the players Stochastic games (or Markov games) can be used to model
in a game. the greedy selfish behavior of CRs in a CRN, where CRs
1) Game Theoretic Approaches: There are two major game try to learn their best response and improve their strategies
theoretic approaches that can be used to model the behavior of over time [140]. In the context of CRs, stochastic games
nodes in a wireless medium: Cooperative and non-cooperative are dynamic, competitive games with probabilistic actions
games. In a non-cooperative game, the players make rational played by secondary spectrum users. The game is played
decisions considering only their individual payoff. In a co- in a sequence of stages. At the beginning of each stage,
operative game, however, players are grouped together and the game is in a certain state. The secondary users choose
establish an enforceable agreement in their group [115]. their actions, and each secondary user receives a reward that
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1146 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

depends on both its current state and its selected actions. The in [141], the communication overhead among the CR users
game then moves to the next stage having a new state with is reduced. Furthermore, the model in [141] provides an
a certain probability, which depends on the previous state alternative solution to opportunistic spectrum access schemes
as well as the actions selected by the secondary users. The proposed in [107], [108] that do not consider the interactions
process continues for a finite or infinite number of stages. among multiple secondary users in a partially observable MDP
The stochastic games are generalizations of repeated games (POMDP) framework [141].
that have only a single state. Thus, learning in a game theoretic framework can help CRs
to adapt to environment variations given a certain uncertainty
3) Learning in Game Theoretic Models: There are sev- about the other users’ strategies. Therefore, it provides a
eral learning algorithms that have been proposed to estimate potential solution for multi-agent learning problems under
unknown parameters in a game model (e.g. other players’ partial observability assumptions.
strategies, environment states, etc.). In particular, no-regret
learning allows initially uninformed players to acquire knowl-
edge about their environment state in a repeated game [111]. D. Decision rules under uncertainty: Threshold-learning
This algorithm does not require prior knowledge of the number
of players nor the strategies of other players. Instead, each A CR may be implemented on a mobile device that changes
player will learn a better strategy based on the rewards location over time and switches transmissions among several
obtained from playing each of its strategies [111]. channels. This mobility and multi-band/multi-channels oper-
ability may pose a major challenge for CRs in adapting to
The concept of regret is related to the benefit a player feels
their RF environments. A CR may encounter different noise or
after taking a particular action, compared to other possible
interference levels when switching between different bands or
actions. This can be computed as the average reward the
when moving from one place to another. Hence, the operating
player gets from a particular action, averaged over all other
parameters (e.g. test thresholds and sampling rate) of CRs need
possible actions that could be taken instead of that particular
to be adapted with respect to each particular situation. More-
action. Actions resulting in lower regret are updated with
over, CRs may be operating in unknown RF environments and
higher weights and are thus selected more frequently [111]. In
may not have perfect knowledge of the characteristics of the
general, no-regret learning algorithms help players to choose
other existing primary or secondary signals, requiring special
their policies when they do not know the other players’ ac-
learning algorithms to allow the CR to explore and adapt to
tions. Furthermore, no-regret learning can adapt to a dynamic
its surrounding environment. In this context, special types of
environment with little system overhead [111].
learning can be applied to directly learn the optimal values of
No-regret learning was applied in [111] to allow a CR to certain design and operation parameters.
update both its transmission power and frequencies simul-
Threshold learning presents a technique that permits such
taneously. In [113], it was used to detect malicious nodes
dynamic adaptation of operating parameters to satisfy the per-
in spectrum sensing whereas in [112] no-regret learning
formance requirements, while continuously learning from the
was used to achieve a correlated equilibrium in opportunis-
past experience. By assessing the effect of previous parameter
tic spectrum access for CRs. Assuming the game model
values on the system performance, the learning algorithm op-
G = (N , (Ai )i∈N , (Ui )i∈N ) defined above, in a correlated
timizes the parameters values to ensure a desired performance.
equilibrium, a strategy profile (a1 , · · · , aN ) ∈ A is chosen
For example, in considering energy detection, after measuring
randomly according to a certain probability distribution p
the energy levels at each frequency, a CR decides on the
[112]. A probability distribution p is a correlated strategy, if
occupancy of a certain frequency band by comparing the
and only if, for all i ∈ N , ai ∈ Ai , a−i ∈ A−i [112]:
measured energy levels to a certain threshold. The threshold
levels are usually designed based on Neyman-Pearson tests in
p(ai , a−i ) [Ui (ai , a−i ) − Ui (ai , a−i )] ≤ 0, ∀ai ∈ Ai . order to maximize the detection probability of primary signals,
a−i ∈A−i while satisfying a constraint on the false alarm. However, in
(19) such tests, the optimal threshold depends on the noise level.
Note that, every Nash equilibrium is a correlated equilibrium An erroneous estimation of the noise level might cause sub-
and Nash equilibria correspond to the special case where optimal behavior and violation of the operation constraints
p(ai , a−i ) is a product of each individual player’s probability (for example, exceeding a tolerable collision probability with
for different actions, i.e. the play of the different players is primary users). In this case, and in the absence of perfect
independent [112]. Compared to the non-cooperative Nash knowledge about the noise levels, threshold-learning algo-
equilibrium, the correlated equilibrium in [112] was shown rithms can be devised to learn the optimal threshold values.
to achieve better performance and fairness. Given each choice of a threshold, the resulting false alarm
Recently, [141] proposed a game-theoretic stochastic learn- rate determines how the test threshold should be regulated
ing solution for opportunistic spectrum access when the chan- to achieve a desired false alarm probability. An example of
nel availability statistics and the number of secondary users application of threshold learning can be found in [75] where
are unknown a priori. This model attempts to resolve non- a threshold learning algorithm was derived for optimizing
feasible opportunistic spectrum access solution which requires spectrum sensing in CRs. The resulting algorithm was shown
prior knowledge of the environment and the actions taken by to converge to the optimal threshold that satisfies a given false
the other users. By applying the stochastic learning solution alarm probability.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1147

IV. F EATURE C LASSIFICATION IN C OGNITIVE R ADIOS

A. Non-parametric unsupervised classification: The Dirichlet

Process Mixture Model
A major challenge an autonomous CR can face is the lack
of knowledge about the surrounding RF environment [5], in
particular, when operating in the presence of unknown primary
signals. Even in such situations, a CR is expected to be able to
adapt to its environment while satisfying certain requirements.
For example, in DSA, a CR must not exceed a certain collision
probability with primary users. For this reason, a CR should
be equipped with the ability to autonomously explore its
surrounding environment and to make decisions about the
primary activity based on the observed data. In particular, a
CR must be able to extract knowledge concerning the statistics
of the primary signals based on measurements [5], [72]. This
Fig. 7. One realization of the Dirichlet process.
makes unsupervised learning an appealing approach for CRs
in this context. In the following, we may explore a Dirichlet
process prior based [142], [143] technique as a framework for ∞
such non-parametric learning and point out its potentials and 3) Define G = k=1 πk δφk , where
∞δφ is a probability
limitations. The Dirichlet process prior based techniques are measure concentrated at φ (and k=1 πk = 1).
considered as unsupervised learning methods since they make
few assumptions about the distribution from which the data is In the above construction G is a random probability measure
drawn [104], as can been seen in the following discussion. distributed according to DP (α0 , G0 ). The randomness in G
A Dirichlet process DP (α0 , G0 ) is defined to be the stems from the random nature of both the weights πk and the
distribution of a random probability measure G that is weights positions φk . A sample distribution G of a Dirichlet
defined over a measurable space (Θ, B), such that, for process is illustrated in Fig. 7, using the steps described above
any finite measurable partition (A1 , · · · , Ar ) of Θ, the in the stick-breaking method. Since G has an infinite discrete
random vector (G(A1 ), · · · , G(Ar )) is distributed as a support (i.e. {φk }∞k=1 ), this makes it a suitable candidate for

finite dimensional Dirichlet distribution with parameters non-parametric Bayesian classification problems in which the
(α0 G0 (A1 ), · · · , α0 G0 (Ar )), where α0 > 0 [104]. We de- number of clusters is unknown a priori (i.e. allowing for
note: infinite number of clusters), with the infinite discrete support
(i.e. {φk }∞
k=1 being the set of clusters. However, due to the
(G(A1 ), · · · , G(Ar )) ∼ Dir(α0 G0 (A1 ), · · · , α0 G0 (Ar )) , infinite sum in G, it may not be practical to construct G
(20) directly by using this approach in many applications. An
where G ∼ DP (α0 , G0 ), denotes that the probability measure alternative approach to construct G is by using either the
G is drawn from the Dirichlet process DP (α0 , G0 ). In other Polya urn model [143] or the Chinese Restaurant Process
words, G is a random probability measure whose distribution (CRP) [144]. The CRP is a discrete-time stochastic process. A
is given by the Dirichlet process DP (α0 , G0 ) [104]. typical example of this process can be described by a Chinese
1) Construction of the Dirichlet process: Teh [104] de- restaurant with infinitely many tables and each table (cluster)
scribes several ways of constructing the Dirichlet process. A having infinite capacity. Each customer (feature point) that
first method is a direct approach that constructs the random arrives at the restaurant (RF spectrum) will choose a table
probability distribution G based on the stick-breaking method. with a probability proportional to the number of customers on
The stick-breaking construction of G can be summarized as that table. It may also choose a new table with a certain fixed
follows [104]: probability.
1) Generate independent i.i.d. sequences {πk }∞
k=1 and
A second approach to constructing a Dirichlet process
{φk }∞
k=1 such that
does not define G explicitly. Instead, it characterizes the
distribution of the drawings θ of G. Note that G is discrete
πk |α0 , G0 ∼ Beta(1, α0 ) with probability 1. For example, the Polya urn model [143]
, (21)
φk |α0 , G0 ∼ G0 does not construct G directly, but it characterizes the draws
where Beta(a, b) is the beta distribution whose prob- from G. Let θ1 , θ2 , · · · be i.i.d. random variables distributed
ability density function (pdf) is given by f (x, a, b) = according to G. These random variables are independent,
1
xa−1 (1−x)b−1
. given G. However, if G is integrated out, θ1 , θ2 , · · · are no
ua−1 (1−u)b−1 du
0

more conditionally independent and they can be characterized
2) Define πk = πk k−1 l=1 (1 − πl ). We can write π = as:
(π1 , π2 , · · · ) ∼ GEM (α0 ), where GEM stands for

K
mk α0
Griffiths, Engen and McCloskey [104]. The GEM (α) θi |{θj }i−1
j=1 , α0 , G0 ∼ δφ + G0 ,
process generates the vector π as described above, given i − 1 + α0 k i − 1 + α0
k=1
a parameter α0 in (21). (22)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1148 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

where {φk }K k=1 are the K distinct values of θi ’s and mk is

Algorithm 1 Clustering algorithm.
the number of values of θi that are equal to φk . Note that this Initialize θ̂i = yi , ∀i ∈ {1, · · · , N }.
conditional distribution is not necessarily discrete since G0 while Convergence condition not satisfied do
might be a continuous distribution (in contrast with G which for i = shuffle {1, · · · , N } do
is discrete with probability 1). The θi ’s that are drawn from Use Gibbs sampling to obtain θ̂i from the distribution
G exhibit a clustering behavior since a certain value of θi in (25).
is most likely to reoccur with a nonnegative probability (due end for
to the point mass functions in the conditional distribution). end while
Moreover, the number of distinct θi values is infinite, in
general, since there is a nonnegative probability that the new θi
value is distinct from the previous θ1 , · · · , θi−1 . This conforms
with the definition of G as a pmf over an infinite discrete set.
1
θj with Pr = B(yi ) fθj (yi )
Since θi ’s are distributed according to G, given G, we denote: θi |{θj }j=i , y = 1 ,
∼ h(θ|yi ) with Pr = B(yi ) A(yi )

N (25)
θi |G ∼ G . (23) where B(yi ) = A(yi ) + f (y ), h(θ |y ) =

l=1,l = i θ l i i i
α0
A(yi ) f θ i (y i )G0 (θ i ) and A(y) = α0 f θ (y)G 0 (θ)dθ.
2) Dirichlet Process Mixture Model: The Dirichlet process
In order to illustrate this clustering method, consider a
makes a perfect candidate for non-parametric classification
simple example summarizing the process. We assume a set
problems through the DPMM. The DPMM imposes a non-
of mixture components θ ∈ R. Also, we assume G0 (θ) to
parametric prior on the parameters of the mixture model [104].
be uniform over the range [θmin , θmax ]. Note that this is a
The DPMM can be defined as follows:
worst-case scenario assumption whenever there is no prior
⎧ knowledge of the distribution of θ, except its range. Let
⎨ G ∼ DP (α0 , G0 ) (y−θ)2

θi |G ∼ G , (24)
1
fθ (y) = √2πσ 2
e− 2σ2 .
⎩ Hence,
yi |θi ∼ f (θi )
α0 θmin − y θmax − y
where θi ’s denote the mixture components and the yi is drawn A(y) = Q −Q (26)
θmax − θmin σ σ
according to this mixture model with a density function f
given a certain mixture component θi . and

3) Data clustering based on the DPMM and the Gibbs (yi −θi )2
sampling: Consider a sequence of observations {yi }N i=1 and h(θi |yi ) =
1
B √2πσ 2
e− 2σ2 if θmin ≤ θi ≤ θmax ,
assume that these observations are drawn from a mixture 0 otherwise
model. If the number of mixture components is unknown, (27)
it is reasonable to assume a non-parametric model, such as where B = 1 . Initially, we set θi = yi
θ −yi θmax −yi
Q minσ −Q σ
the DPMM. Thus, the mixture components θi are drawn
for all i ∈ {1, · · · , N }. The algorithm is described in Algo-
from G ∼ DP (α0 , G0 ), where G can be expressed as
rithm 1.
G= ∞ k=1 πk δφk , φk ’s are the unique values of θi , and πk are
If the observation points yi ∈ Rk (with k > 1), the
their corresponding probabilities. Denote y = (y1 , · · · , yN ).
distribution of h(θi |yi ) may become too complicated to be
The problem is to estimate the mixture component θ̂i for used in the sampling process of θi ’s. In [116], if G0 (θ) is
each observation yi , for all i ∈ {1, · · · , N }. This can be constant in a large area around yi , h(θ|yi ) was shown to be
achieved by applying the Gibbs sampling method proposed approximated by the Gaussian distribution (assuming that the
in [116] which has been applied for various unsupervised observation pdf fθ (yi ) is Gaussian). Thus, assuming a large
clustering problems, such as speaker clustering problem in uniform prior distribution on θ, we may approximate h(θ|y)
[145]. The Gibbs sampling is a technique for generating by a Gaussian pdf so that (27) becomes:
random variables from a (marginal) distribution indirectly,
without having to calculate the density. As a result, by using te h(θi |yi ) = N (yi , Σ) , (28)
Gibbs sampling, one can avoid difficult calculations, replacing where Σ is the covariance matrix.
them instead with a sequence of easier calculations. Although In order to illustrate this approach in a multidimensional
the roots of the Gibbs sampling can be traced back to at least scenario, we may generate a Gaussian mixture model having
Metropolis et al. [146], the Gibbs sampling perhaps became 4 mixture components. The mixture components have different
more popular after the paper of Geman and Geman [147], who means in R2 and have an identity covariance matrix. We will
studied image-processing models. assume that the covariance matrix is known.
In the Gibbs sampling method proposed in [116], the We plot in Fig. 8 the results of the clustering algorithm
estimates θ̂i is sampled from the conditional distribution of θi , based on DPMM. Three of the clusters were almost perfectly
given all the other feature points and the observation vector identified, whereas the forth cluster was split into three parts.
y. By assuming that {yi }N i=1 are distributed according to the The main advantage of this technique is its ability for learning
DPMM in (24), the conditional distribution of θi was obtained the number of clusters from the data itself, without any prior
in [116] to be knowledge. As opposed to heuristic or supervised classifi-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1149

DPMM classifcation with Gibbs sampling with σ= 1, α = 2 after 20000 iterations should be handled by the embedded flexibility offered by non-
0
30 parametric learning approaches.
The advantages of the Dirichlet process-based learning tech-
nique in [148] is that it does not rely on training data, making
Second coordinate of the feature vector

25 it suitable for identifying unknown signals via unsupervised

learning. In this survey, we do not delve into details of
choosing and computing appropriate feature points for the
20 particular application considered in [148]. Instead, our focus
below is on the implementation of the unsupervised learning
and the associated clustering technique.
15
After sensing a certain signal, the CR extracts a feature
point that captures certain spectrum characteristics. Usually,
the extracted feature points are noisy and might be affected by
10
estimation errors, receiver noise and path loss. Moreover, the
statistical distribution of these observations might be unknown
itself. It is expected that feature points that are extracted from a
particular system will belong to the same cluster in the feature
5
8 10 12 14 16 18 20 22 24 26 space. Depending on the feature definition, different systems
First coordinate of the feature vector
might result in different clusters that are located at different
places in the feature space. For example, if the feature point
represents the center frequency, two systems transmitting at
Fig. 8. The observation points yi are classified into different clusters, denoted different carrier frequencies will result in feature points that
with different marker shapes. The original data points are generated from a are distributed around different mean points.
Gaussian mixture model with 4 mixture components and with an identity
covariance matrix. The authors in [148] argue that the clusters of a certain
system are random themselves and might be drawn from a
certain distribution. To illustrate this idea, assume two WiFi
cation approaches that assume a fixed number of clusters transmitters located at different distances from the receiver
(such as the K-mean approach), the DPMM-based clustering that both uses WLAN channel 1. Although the two transmitters
technique is completely unsupervised, yet, provides effective belong to the same system (i.e. WiFi channel 1), their received
classification results. This makes it a perfect choice for au- powers might be different, resulting in variations of the
tonomous CRs that rely on unsupervised learning for decision- features extracted from the signals of the same system. To
making, as suggested in [72]. capture this randomness, it can be assumed that the position
4) Applications of Dirichlet process to cognitive radios: and structure of the clusters formed (i.e. mean, variance, etc.)
The Dirichlet process has been used as a framework for are themselves drawn from some distribution.
non-parametric Bayesian learning in CRs in [13], [148]. The To be concrete, denote x as the derived feature point
approach was used for identifying and classifying wireless and assume that x is normally distributed with mean μc
systems in [148], based on the CRP. The method consists and covariance matrix Σc (i.e. x ∼ N (μc , Σ)). These two
of extracting two features from the observed signals (in parameters characterize a certain cluster and are drawn from
particular, the center frequency and frequency spread) and to a certain distribution. For example, it can be assumed that
classify these feature points in a feature space by adopting μc ∼ N (μM , ΣM ) and Σc ∼ W(V, n), where W denotes
an unsupervised clustering technique, based on the CRP. The the Wishart distribution, which can be used to model the
objective is to identify both the number and types of wireless distribution of the covariance matrix of multivariate Gaussian
systems that exist in a certain frequency band at a certain variables.
moment. One application of this could be when multiple In the method proposed in [148], a training stage2 is re-
wireless systems co-exist in the same frequency band and quired to estimate the parameters μM and ΣM . This estimation
try to communicate without interfering with each other. Such can be performed by sensing a certain system (e.g. WiFi, or
scenarios could arise in ISM bands where wireless local area Zigbee) under different scenarios and estimating the centers
networks (WLAN IEEE 802.11) coexist with wireless personal of the clusters resulting from each experiment (i.e. estimating
area networks (WPANs), such as Zigbee (IEEE 802.15.4) and μc ). The average of all μc ’s forms a maximum-likelihood
Bluetooth (IEEE 802.15.1). In that case, a WPAN should sense (ML) estimate of the parameter μM of the corresponding
the ISM band before selecting its communication channel so wireless system. This step is equivalent to estimating the
that it does not interfere with the WLAN or other WPAN hyperparameters of a Dirichlet process [104]. A similar es-
systems. A realistic assumption in that case is that individual timation method can also be performed to estimate ΣM .
wireless users do not know the number of other coexisting The knowledge of μM and ΣM helps identify the corre-
wireless users. Instead, these unknown variables should be sponding wireless system of each cluster. That is, the maxi-
learnt based on appropriate autonomous learning algorithms.
2 Note that the training process used in [148] refers to the cluster formation
Moreover, the designed learning algorithms should account for
process. The training used in [148] is done without data labeling nor human
the dynamics of the RF environment. For example, the number instructions, but with the CRP [144] and the Gibbs sampling [116], thus
of wireless users might change over time. These dynamics qualifying to be an unsupervised learning scheme.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1150 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

mum a posteriori (MAP) detection can be applied to a cluster status of the network affect its performance on different
center μc to estimate the wireless system that it belongs to. channels. In particular, an implementation of the proposed
However, the classification of feature points into clusters can Cognitive Controller for dynamic channel selection in IEEE
be done based on the CRP. 802.11 wireless networks was presented. Performance eval-
The classification of a feature point into a certain cluster is uation carried out on an IEEE 802.11 wireless network de-
made based on the Gibbs sampling applied to the CRP. The ployment demonstrated that the Cognitive Controller is able
algorithm fixes the cluster assignments of all other feature to effectively learn how the network performance is affected
points. Given that assignment, it generates a cluster index for by changes in the environment, and to perform dynamic
the current feature point. This sampling process is applied channel selection thereby providing significant throughput
to all the feature points separately until certain convergence enhancements.
criterion is satisfied. Other examples of the CRP-based feature In [153], an application of a Feedbackward ANN in con-
classification can be found in speaker clustering [145] and junction with the cyclostationarity-based spectrum sensing was
document clustering applications [149]. presented to perform spectrum sensing. The results showed
that the proposed approach is able to detect the signals at
B. Supervised Classification Methods in Cognitive Radios considerably low signal-to-noise ratio (SNR) values. In [102],
the authors designed a channel status predictor using a MFNN
Unlike the unsupervised learning techniques discussed in model. The authors argued that their proposed MFNN-based
the previous section that may be used in alien environments prediction is superior to the HMM based approaches, by
without having any prior knowledge, supervised learning pointing out that the HMM based approaches require a huge
techniques can generally be used in familiar/known envi- memory space to store a large number of past observations
ronments with prior knowledge about the characteristics of with high computational complexity.
the environment. In the following, we introduce some of the
major supervised learning techniques that have been applied In [154], the authors proposed a methodology for spectrum
to classification tasks in CRs. prediction by modeling licensed-user features as a multivariate
1) Artificial Neural Network: The ANN has been motivated chaotic time series, which is then input to an ANN that
by the recognition that human brain computes in an entirely predicts the evolution of RF time series to decide if the
different way compared to the conventional digital comput- unlicensed user can exploit the spectrum band. Experimental
ers [150]. A neural network is defined to be “a massively results showed a similar trend between predicted and observed
parallel distributed processor made up of simple processing values. This proposed spectrum evolution prediction method
units, which has a natural propensity for storing experiential was done by exploiting the cyclostationary signal features to
knowledge and making it available for use” [150]. An ANN construct an RF multivariate time series that contain more
resembles the brain in two respects [150]: 1) Knowledge information than the univariate time series, in contrast to most
is acquired by the network from its environment through of the previously suggested modeling methodologies which
a learning process and 2) interneuron connection strengths, focused on univariate time series prediction [156].
known as synaptic weights, are used to store the acquired To illustrate the operation of ANNs in CR contexts, we
knowledge. present the model proposed in [78] and describe the main steps
Some of the most beneficial properties and capabilities of in the implementation of ANNs. In particular, [78] considers
ANNs include: 1) Nonlinear fitness to underlying physical a multilayer perceptron (MLP) neural network which maps
mechanisms, 2) adaptation ability to minor changes in sur- sets of input data onto a set of appropriate outputs. An MLP
rounding environment and 3) providing information about the consists of multiple layers of nodes in a directed graph, which
confidence in the decision made. However, the disadvantages is fully connected from one layer to the next [78]. Except the
of ANNs are that they require training under many different input nodes, each node in the MLP is a neuron with a nonlinear
environment conditions and their training outcomes may de- activation function that computes a weighted sum of the up-
pend crucially on the choice of initial parameters. layer output (denoted as the activation). An example of one
Various applications of ANNs to CRs can be found in recent of the most popular activation functions that is used in ANNs
literature [102], [151]–[155]. The authors in [151], for ex- is the sigmoid function:
ample, proposed the use of Multilayered Feedforward Neural 1
Networks (MFNN) as a technique to synthesize performance f (a) = . (29)
1 + e−a
evaluation functions in CRs. The benefit of using MFNNs is
that they provide a general-purpose black-box modeling of The ANN proposed in [78] has an input layer, output
the performance as a function of the measurements collected layer and multiple hidden layers. Note that, having additional
by the CR; furthermore, this characterization can be obtained hidden layers improves the nonlinear performance of the
and updated by a CR at run-time, thus effectively achieving a ANN in terms of classifying linearly non-separable data.
certain level of learning capability. The authors in [151] also However, adding more hidden layers makes the network more
demonstrated in several IEEE 802.11 based environments how complicated and may require longer training time.
these modeling capabilities can be used for optimizing the In the following, we consider an MLP network and let yjl
configuration of a CR. to be the output of the j-th neuron in the l-th layer. Denote
l
In [152], the authors proposed an ANN-based cognitive also by wji the weight between the j-th neuron in the l-th
engine that learns how environmental measurements and the layer and the i-th neuron in the l − 1-th layer. The output yjl
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1151

is given by:

1
yjl = l l−1 . (30)
1 + e− i wji yi
During the training, the network tries to match the target
value tk to the output ok of the k-th output neuron3. The
error between the target and actual outputs is evaluated, for
example, according to the mean-squared error (MSE):

1
K
2
M SE = (tk − ok ) , (31)
K
k=1

where K is the number of output nodes. The update process

will repeat until the MSE is smaller than a certain threshold.
The update rule can be performed according to a delta rule
l
which adjusts the weights wji by an amount [78]:
Δwji
l
= ηδjl yil−1 , (32) Fig. 9. A diagram showing the basic idea of SVM: optimal separation
hyperplane (solid red line) and two margin hyperplanes (dashed lines) in
where η is a learning rate and δjl is defined as: a binary classification example; Support vectors are bolded.

oj (tj − o
j )(1 − oj ) if l is the output layer
δjl = yjl (1 − yjl ) k δkl+1 wkj
l+1
if l is the hidden layer

The authors in [78] used the above described MLP neural polynomial kernel of infinite degree. In performing classifica-
network to implement a learner in a cognitive engine. By tion, a hyperplane which allows for the largest generalization
assuming a WiMax configurable radio technology, the learner in this high-dimensional space is found. This is so-called
is able to choose a certain modulation mode according to a maximal margin classifier [159]. Note that, the margin is
the SNR, such that a certain bit-error rate (BER) will be defined as the distance from a separating hyperplane to the
achieved. Thus, the inputs of the neural network consists of closest data points. As shown in Fig. 9, there could be many
the code rate and SNR values and the output is the resulting possible separating hyperplanes between the two classes of
SNR. By supplying training data to the neural network, the data, but only one of them allows for the maximum margin.
cognitive engine is trained to identify the BER that results The corresponding closest data points are named support
from a certain choice of modulation, given a certain SNR level. vectors and the hyperplane allowing for the maximum margin
By comparing the performance of different scales of neural is called an optimal separating hyperplane. The interested
networks, the simulation results in [78] showed that increasing reader is referred to [79], [160], [161] for insightful discussion
the number of hidden layers reduces the speed of convergence on SVMs.
but leads to a smaller MSE. However, more training data are
required for larger number of hidden layers. Thus, given a An SVM-based classifier was described in [161] for signal
certain set of training data, a trade-off must be made between classification in CRs. The classifier in [161] assumed a training
the speed of convergence and the convergence accuracy of the set {(xi , yi )}li=1 with x ∈ RN and y ∈ {−1, 1}. The objective
neural network. is to find a hyperplane:
2) Support Vector Machine: The SVM, developed by Vap- wT ϕ(x) + b = 0 , (33)
nik and others [157], has been used for many machine learning
tasks such as pattern recognition and object classifications. The where ϕ can be a non-linear function that maps x into a higher
SVM is characterized by the absence of local minima, the dimensional Hilbert space [160], w is a weight vector and b
sparseness of the solution and the capacity control obtained is a scalar parameter. In general, it is not possible to obtain an
by acting on the margin, or on other dimension independent expression for the mapping function ϕ. However, this function
quantities such as the number of support vectors [157]. SVM can be characterized by a Kernel function K(xi , xj ) and, as
based techniques have achieved superior performances in a it turns out fortunately, the Kernel function is sufficient to
wide variety of real world problems due to their generalization optimize the parameters w and b in (33) [160].
ability and robustness against noise and outliers [158]. The hyperplane in (33) is assumed to separate the data into
The basic idea of SVMs is to map the input vectors into a two classes such that the distance between the closest points
high-dimensional feature space in which they become linearly of each class to the hyperplane is maximized. This can be
separable. This mapping from the input vector space to the achieved by minimizing the norm w2 [160].
feature space is a non-linear mapping which is achieved by In order to solve the optimization problem, the slacks vari-
using kernel functions. Depending on the application different ables {ξi , i = 1, · · · , l} are introduced and the optimization
types of kernel functions can be used. A common choice problem can be formulated as [161]:
l
for classification problems is the Gaussian kernel which is a minw,b,ξi 12 wT w + C i=1 ξi (34)
T
3 Since a certain target value (i.e. a label) is required during the training
s.t. yi w ϕ(xi ) + b ≥ 1 − ξi , ∀i = 1, · · · , l (35)
process, neural networks are considered as supervised learning algorithms. ξi ≥ 0, ∀i = 1, · · · , l (36)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1152 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

where C is the penalty parameter that controls the training SVMs. A simulated model of an SVM signal classifier was
error. implemented and trained to recognize seven distinct modula-
The Lagrangian of the above optimization problem can be tion schemes; five digital (BPSK, QPSK, GMSK, 16-QAM
written as: and 64-QAM) and two analog (FM and AM). The signals
l l were generated using realistic carrier frequency, sampling
1
L = w2 + C ξi − βi ξi − frequency and symbol rate values, and realistic Raised- cosine
2 i=1 i=1 and Gaussian pulse-shaping filters. The results showed that the

l
implemented classifier can correctly classify signals with high
− αi wT ϕ(xi + b) − 1 + ξi , probabilities.
i=1

where αi , βi ≥ 0 are the Lagrange multipliers. By computing V. C ENTRALIZED AND D ECENTRALIZED L EARNING IN
the derivatives with respect to w, b and ξi , the dual represen- C OGNITIVE R ADIO
tation of the optimization problem can be expressed as [161]:
l l Since noise uncertainties, shadowing, and multi-path fading
max(α1 ,··· ,αl ) i=1 αi − 12 j=1 αi αj yi yj K(xi , xj ) effects limit the performance of spectrum sensing, when the
s.t. 0 ≤ αi ≤ C, ∀i = 1, · · · , l received primary SNR is too low, there exists a SNR wall,
l below which reliable spectrum detection is impossible in
i=1 yi αi = 0
some cases [168], [169]. If secondary users cannot detect the
where K(xi , xj ) = ϕ(xi )T ϕ(xj ) is the Kernel function. primary transmitter, while the primary receiver is within the
In this case, the decision function (i.e. the learning machine secondary users transmission range, a hidden terminal problem
[160]) is computed as: occurs [170], [171], and the primary user’s transmission will
l be interfered with. By taking advantage of diversity offered

f (x) = sgn αi yi K(xi , x) + b . (37) by multiple independent fading channels (multiuser diversity),
i=1 cooperative spectrum sensing improves the reliability of spec-
trum sensing and the utilization of idle spectrum [25], [26],
Other applications of SVMs to CR can be found in current as opposed to non-cooperative spectrum sensing.
literature, including [65], [79], [103], [158], [161]–[167]. Most
In centralized cooperative spectrum sensing [25], [26], a
of these applications of the SVM in CR context, however, has
central controller collects local observations from multiple
been for performing signal classification.
secondary users, decides the spectrum occupancy by using
In [164], for example, a MAC protocol classification scheme decision fusion rules, and informs the secondary users which
was proposed to classify contention-based and control-based channels to access. In distributed cooperative spectrum sensing
MAC protocols in an unknown primary network based on [55], [172], on the other hand, secondary users within a CRN
SVMs. To perform the classification in an unknown primary exchange their local sensing results among themselves without
network, the mean and variance of the received power are requiring a backbone or centralized infrastructure. On the other
chosen as two features for the SVM. The SVM is embedded hand, in the non-cooperative decentralized sensing framework,
in a CR terminal of the secondary network. A TDMA and a no communications are assumed among the secondary users
slotted Aloha network were setup as the primary networks. [173].
Simulation results showed that TDMA and slotted Aloha In [174], the authors showed how various centralized and
MAC protocol could be effectively classified by the CR decentralized spectrum access markets (where CRs can com-
terminal and the correct classification rate was proportional pete over time for dynamically available transmission opportu-
to the transmission rate of the primary networks, where the nities) can be designed based on a stochastic game (discussed
transmission rate for the primary networks is defined as the above in Section III-C) framework and solved using a learning
new packet generating/arriving probability in each time slot. algorithm. Their proposed learning algorithm was to learn the
The reason for the increase in the correct classification rate following information in the stochastic game: state transition
when the transmission rate increases is the following: for model, state and the policy of other secondary users and the
slotted Aloha network, the higher transmission rate brings the network resource state. The proposed learning algorithm was
higher collision probability, and thus the higher instantaneous similar to Q-learning. However, the main difference compared
received power captured by a CR terminal; for TDMA net- to Q-learning was that it explicitly considered the impact of
work, however, there is no relation between transmission rate other secondary user actions through the state classifications
and instantaneous captured received power. Therefore, when and transition probability approximation. The computational
the transmission rates of both primary networks increase, it complexity and performance were also discussed in [174].
makes a CR terminal easier to differentiate TDMA and slotted In [37] the authors proposed and analyzed both a central-
Aloha. ized and a decentralized decision-making architecture with
SVM classifiers can not only be a binary classifier as shown RL for the secondary CRN. In this work, a new way to
in the previous example, but also it can be easily used as encourage primary users to lease their spectrum was proposed:
a multi-class classifiers by treating a K-class classification the secondary users place bids indicating how much power
problem as K two-class problems. For example, in [165] the they are willing to spend for relaying the primary signals
authors presented a study of multi-class signal classification to their destinations. In this formulation, the primary users
based on automatic modulation classification (AMC) through achieve power savings due to asymmetric cooperation. In the
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1153

Spectrum Sensing

Classification and
Feature Detection
Power Allocation

Reconfiguration
Parameters
adaptation
and MAC
Protocols

and Rate

System
Signal
Pros Cons

In general, suboptimal for

Reinforcement
x Optimal solution for MDP’s POMDP’s, DEC-MDP’s and DEC-
learning (RL)
POMDP’s

Does not require prior knowledge

learning techniques

Non-parametric Requires large number of iterations,

x about the number of mixture
Unsupervised

Learning: DPMM compared to parametric methods

components

Requires knowledge of different

Game theory- Suitable for multi-player decision parameters (e.g. SINR, power, price
x x
based Learning problems from base stations, etc.) which is
impractical in many situations

Suitable for controlling specific

Threshold
x parameters under uncertainty Requires training data
Learning
conditions

Does not require prior knowledge

Supervised learning

Artificial Neural Suffers from overfitting

x of the distribution of the observed
Network (ANN) Requires data labeling
process
techniques

Requires prior knowledge of the

Has better performance for small
Support Vector distribution of the observed
x training examples, compared to
Machine (SVM) process
ANN
Requires data labeling

Fig. 10. A comparison among the learning algorithms that are presented in this survey.

centralized architecture, a secondary system decision center The results of the estimation of channel contention for a simple
(SSDC) selects a bid for each primary channel based on carrier sense multiple access (CSMA) channel sharing scheme
optimal channel assignment for secondary users. In a decen- was also presented.
tralized CRN architecture, an auction game-based protocol In [175], the authors proposed an auction framework for
was proposed in which each secondary user independently CRNs to allow secondary users to share the available spectrum
places bids for each primary channel and receivers of each of licensed primary users fairly and efficiently, subject to
primary link pick the bid that will lead to the most power the interference temperature constraint at each primary user.
savings. A simple and robust distributed RL mechanism was The competition among secondary users was studied by for-
developed to allow the users to revise their bids and to increase mulating a non-cooperative multiple-primary users multiple-
their subsequent rewards. The performance results given in secondary users auction game. The resulting equilibrium was
[37] showed the significant impact of RL in both improving found by solving a non-continuous two-dimensional optimiza-
spectrum utilization and meeting individual secondary user tion problem. A distributed algorithm was also developed
performance requirements. in which each secondary user updates its strategy based
In [12], the authors considered DSA among CRs from an on local information to converge to the equilibrium. The
adaptive, game theoretic learning perspective, in which CRs proposed auction framework was then extended to the more
compete for channels temporarily vacated by licensed primary challenging scenario with free spectrum bands. An algorithm
users in order to satisfy their own demands while minimizing was developed based on the no-regret learning to reach a
interference. For both slowly varying primary user activity correlated equilibrium of the auction game. The proposed
and slowly varying statistics of fast primary user activity, the algorithm, which can be implemented distributedly based
authors applied an adaptive regret based learning procedure on local observation, is especially suited in decentralized
which tracks the set of correlated equilibria of the game, adaptive learning environments. The authors demonstrated the
treated as a distributed stochastic approximation. The proposed effectiveness of the proposed auction framework in achieving
approach was decentralized in terms of both radio awareness high efficiency and fairness in spectrum allocation through
and activity; radios estimate spectral conditions based on their numerical examples.
own experience, and adapt by choosing spectral allocations In general, there is always a trade-off between the cen-
which yield them the greatest utility. Iterated over time, this tralized and decentralized control in radio networks. This is
process converges so that each radio’s performance is an also true for CRNs. While the centralized schemes ensure
optimal response to others’ activity. This apparently selfish efficient management of the spectrum resources, they often
scheme was also used to deliver system-wide performance by a suffer from signaling and processing overhead. On the other
judicious choice of utility function. This procedure was shown hand, a decentralized scheme can reduce the complexity of
to perform well compared to other similar adaptive algorithms. the decision-making in cognitive networks. However, radios
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1154 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

that act according to a decentralized scheme may adopt a the global utility of the network. However, since it is not
selfish behavior and try to maximize their own utilities, at realistic to consider that individual users will seek the global
the expense of the sum-utility of the network (social welfare), optimum, another policy (corresponding to the Nash equilib-
leading to overall network inefficiency. This problem can rium) was obtained such that it maximizes the users’ utilities.
become particularly severe when considering heterogeneous Finally, a Stackelberg game formulation was developed for
networks in which different nodes belong to different types the operator to control the equilibrium of its wireless users.
of systems and have different objectives (usually conflicting This leads to maximizing the operator’s utility by sending
objectives). To resolve this problem, [176] proposes a hybrid appropriate load information l ∈ L to the distributed radios.
approach for heterogeneous CRNs where the wireless users The authors of [176] analyzed the network performance
are assisted in their decisions by the network which broadcasts under these three different association policies. They demon-
aggregated information to the users [176]. At some states of strated, by means of Stackelberg formulation, how the operator
the system, the network manager imposes its decisions on can optimize its global utility by sending appropriate infor-
users in the network. In other states, the mobile nodes may mation about the network state, while users maximize their
take autonomous actions in response to the information sent individual utilities. The resulting hybrid architecture achieved
by the network center. As a result, the model in [176] avoids a good trade-off between the global network performance and
having a completely decentralized network, due to possible the signaling overhead, making it a viable alternative to be
inefficiency of such non-cooperative networks. Nevertheless, considered when designing CRNs.
a large part of the decision-making is still delegated to the
mobile nodes to reduce the processing overhead at the central VI. C ONCLUSION
node. In this survey paper, we have characterized the learning
In the problem formulation of [176], the authors consider a problems in CRs and stated the importance of machine learn-
wireless network composed of S systems that are managed by ing in developing real CRs. We have presented the state-of-
the same operator. The set of all serving systems is denoted by the-art learning methods that have been applied to CRs clas-
S = {1, · · · , S}. Since the throughput of each serving system sifying them under supervised and unsupervised learning. A
drops as a function of the distance of between the mobile and discussion of some of the most important, and commonly used,
the base station, the throughput of a mobile changes within learning algorithms was provided along with their advantages
a given cell. To capture this variation, each cell is split into and disadvantages. We also showed some of the challenging
N circles of radius dn (n ∈ N = {1, · · · , N }). Each circle learning problems encountered in CRs and presented possible
area is assumed to have the same radio characteristics. In solution methods to address them.
this case, all mobile systems that are located within circle
n ∈ N and are served by system s ∈ S achieve the same R EFERENCES
throughput. The network state matrix is denoted by M ∈ F , [1] J. Mitola III and G. Q. Maguire, Jr., “Cognitive radio: making software
where F = NN ×S . The (n, s)-th element Mns of the matrix radios more personal,” IEEE Pers. Commun., vol. 6, no. 4, pp. 13 –18,
Aug. 1999.
M denotes the number of users with radio condition n ∈ N [2] J. Mitola, “Cognitive radio: An integrated agent architecture for soft-
which are served by system s ∈ S in the circle. The network ware defined radio,” Ph.D. dissertation, Royal Institute of Technology
is fully characterized by its state M, but this information is (KTH), Stockholm, Sweden, 2000.
[3] L. Giupponi, A. Galindo-Serrano, P. Blasco, and M. Dohler, “Docitive
not available to the mobile nodes when the radio resource networks: an emerging paradigm for dynamic spectrum management
management (RRM) is decentralized. In this case, by using [dynamic spectrum management],” IEEE Wireless Commun., vol. 17,
the radio enabler proposed in IEEE 1900.4, the network no. 4, pp. 47 –54, Aug. 2010.
[4] T. Costlow, “Cognitive radios will adapt to users,” IEEE Intell. Syst.,
reconfiguration manager (NRM) broadcasts to the terminal vol. 18, no. 3, p. 7, May-June 2003.
reconfiguration manager (TRM) an aggregated load informa- [5] S. K. Jayaweera and C. G. Christodoulou, “Radiobots: Architecture,
tion that takes values in some finite set L = {1, · · · , L} algorithms and realtime reconfigurable antenna designs for
autonomous, self-learning future cognitive radios,” University of
indicating whether the load state at mobile terminals are either New Mexico, Technical Report EECE-TR-11-0001, Mar. 2011.
low, medium or high. The mapping f : M → L specifies [Online]. Available: http://repository.unm.edu/handle/1928/12306
a macro-state f (M) for each network micro-state M. This [6] S. Haykin, “Cognitive radio: brain-empowered wireless communica-
tions,” IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201–220, Feb.
state encoding reduces the signaling overhead, while satisfying 2005.
the requirements of the IEEE 1900.4 standard which state [7] FCC, “Report of the spectrum efficiency working group,” FCC spec-
that “the network manager side shall periodically update the trum policy task force, Tech. Rep., Nov. 2002.
[8] , “ET docket no 03-322 notice of proposed rulemaking and order,”
terminal side with context information” [177]. Given the load Tech. Rep., Dec. 2003.
information l = f (M) and the radio condition n ∈ N , the [9] N. Devroye, M. Vu, and V. Tarokh, “Cognitive radio networks,” IEEE
mobile makes its decision Pn,l ∈ S, specifying which system Signal Processing Mag., vol. 25, pp. 12–23, Nov. 2008.
[10] A. Goldsmith, S. A. Jafar, I. Maric, and S. Srinivasa, “Breaking
it will connect to, and the user’s decision vector is denoted by spectrum gridlock with cogntive radios: An information theoretic
Pl = [P1,l · · · , PN,l ]. perspective,” Proc. IEEE, vol. 97, no. 5, pp. 894–914, May 2009.
The authors in [176] find the association policies by fol- [11] V. Krishnamurthy, “Decentralized spectrum access amongst cognitive
radios - An interacting multivariate global game-theoretic approach,”
lowing three different approaches: IEEE Trans. Signal Process., vol. 57, no. 10, pp. 3999 –4013, Oct.
1) Global optimum approach. 2009.
2) Nash equilibrium approach. [12] M. Maskery, V. Krishnamurthy, and Q. Zhao, “Decentralized dynamic
spectrum access for cognitive radios: cooperative design of a non-
3) Stackelberg game approach. cooperative game,” IEEE Trans. Commun., vol. 57, no. 2, pp. 459
The global optimum approach finds the policy that maximizes –469, Feb. 2009.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1155

[13] Z. Han, R. Zheng, and H. Poor, “Repeated auctions with Bayesian non- [35] Y. Chen, Q. Zhao, and A. Swami, “Joint design and separation principle
parametric learning for spectrum access in cognitive radio networks,” for opportunistic spectrum access in the presence of sensing errors,”
IEEE Trans. Wireless Commun., vol. 10, no. 3, pp. 890 –900, Mar. IEEE Trans. Inf. Theory, vol. 54, no. 5, pp. 2053 –2071, May 2008.
2011. [36] S. Huang, X. Liu, and Z. Ding, “Opportunistic spectrum access
[14] J. Lunden, V. Koivunen, S. Kulkarni, and H. Poor, “Reinforcement in cognitive radio networks,” in The 27th Conference on Computer
learning based distributed multiagent sensing policy for cognitive radio Communications (IEEE INFOCOM ’08), Phoenix, AZ, Apr. 2008, pp.
networks,” in IEEE Symposium on New Frontiers in Dynamic Spectrum 1427 –1435.
Access Networks (DySPAN ’11), Aachen, Germany, May 2011, pp. 642 [37] S. Jayaweera, M. Bkassiny, and K. Avery, “Asymmetric cooperative
–646. communications based spectrum leasing via auctions in cognitive radio
[15] K. Ben Letaief and W. Zhang, “Cooperative communications for networks,” IEEE Trans. Wireless Commun., vol. 10, no. 8, pp. 2716
cognitive radio networks,” Proc. IEEE, vol. 97, no. 5, pp. 878 –893, –2724, Aug. 2011.
May 2009. [38] M. Bkassiny, S. K. Jayaweera, and K. A. Avery, “Distributed rein-
[16] Q. Zhao and B. M. Sadler, “A survey of dynamic spectrum access,” forcement learning based MAC protocols for autonomous cognitive
IEEE Signal Processing Mag., vol. 24, no. 3, pp. 79–89, May 2007. secondary users,” in 20th Annual Wireless and Optical Communica-
[17] S. K. Jayaweera and T. Li, “Dynamic spectrum leasing in cognitive tions Conference (WOCC ’11), Newark, NJ, Apr. 2011, pp. 1 –6.
radio networks via primary-secondary user power control games,” IEEE [39] T. Yucek and H. Arslan, “A survey of spectrum sensing algorithms
Trans. Wireless Commun., vol. 8, no. 6, pp. 3300–3310, July 2009. for cognitive radio applications,” IEEE Commun. Surveys Tutorials,
[18] S. K. Jayaweera, , G. Vazquez-Vilar, and C. Mosquera, “Dynamic vol. 11, no. 1, pp. 116 –130, quarter 2009.
spectrum leasing: A new paradigm for spectrum sharing in cognitive [40] S. Haykin, D. Thomson, and J. Reed, “Spectrum sensing for cognitive
radio networks,” IEEE Trans. Veh. Technol., vol. 59, no. 5, pp. 2328– radio,” Proc. IEEE, vol. 97, no. 5, pp. 849 –877, May 2009.
2339, May 2010. [41] J. Ma, G. Y. Li, and B. H. Juang, “Signal processing in cognitive
[19] G. Zhao, J. Ma, Y. Li, T. Wu, Y. H. Kwon, A. Soong, and C. Yang, radio,” Proc. IEEE, vol. 97, no. 5, pp. 805–823, May 2009.
“Spatial spectrum holes for cognitive radio with directional transmis- [42] W. Zhang, R. Mallik, and K. Letaief, “Optimization of cooperative
sion,” in IEEE Global Telecommunications Conference (GLOBECOM spectrum sensing with energy detection in cognitive radio networks,”
’08), Nov. 2008, pp. 1–5. IEEE Trans. Wireless Commun., vol. 8, no. 12, pp. 5761 –5766, Dec.
[20] A. Ghasemi and E. Sousa, “Spectrum sensing in cognitive radio net- 2009.
works: requirements, challenges and design trade-offs,” IEEE Commun. [43] Y. M. Kim, G. Zheng, S. H. Sohn, and J. M. Kim, “An alternative
Mag., vol. 46, no. 4, pp. 32–39, Apr. 2008. energy detection using sliding window for cognitive radio system,” in
[21] B. Farhang-Boroujeny, “Filter bank spectrum sensing for cognitive 10th International Conference on Advanced Communication Technol-
radios,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 1801–1811, ogy (ICACT ’08), vol. 1, Gangwon-Do, South Korea, Feb. 2008, pp.
May 2008. 481 –485.
[22] B. Farhang-Boroujeny and R. Kempter, “Multicarrier communication [44] J. Lunden, V. Koivunen, A. Huttunen, and H. Poor, “Collaborative
techniques for spectrum sensing and communication in cognitive cyclostationary spectrum sensing for cognitive radio systems,” IEEE
radios,” IEEE Commun. Mag., vol. 46, no. 4, pp. 80–85, Apr. 2008. Trans. Signal Process., vol. 57, no. 11, pp. 4182 –4195, Nov. 2009.
[23] C. R. C. da Silva, C. Brian, and K. Kyouwoong, “Distributed spectrum [45] A. Dandawate and G. Giannakis, “Statistical tests for presence of
sensing for cognitive radio systems,” in Information Theory and cyclostationarity,” IEEE Trans. Signal Process., vol. 42, no. 9, pp. 2355
Applications Workshop, Feb. 2007, pp. 120–123. –2369, Sep. 1994.
[24] Y. Li, S. Jayaweera, M. Bkassiny, and K. Avery, “Optimal myopic [46] B. Deepa, A. Iyer, and C. Murthy, “Cyclostationary-based architectures
sensing and dynamic spectrum access in cognitive radio networks with for spectrum sensing in IEEE 802.22 WRAN,” in IEEE Global
low-complexity implementations,” IEEE Trans. Wireless Commun., Telecommunications Conference (GLOBECOM ’10), Miami, FL, Dec.
vol. 11, no. 7, pp. 2412 –2423, July 2012. 2010, pp. 1 –5.
[25] , “Optimal myopic sensing and dynamic spectrum access in [47] M. Gandetto and C. Regazzoni, “Spectrum sensing: A distributed
centralized secondary cognitive radio networks with low-complexity approach for cognitive terminals,” IEEE J. Sel. Areas Commun., vol. 25,
implementations,” in IEEE 73rd Vehicular Technology Conference no. 3, pp. 546 –557, Apr. 2007.
(VTC-Spring ’11), May 2011, pp. 1 –5. [48] J. Unnikrishnan and V. Veeravalli, “Cooperative sensing for primary
[26] M. Bkassiny, S. K. Jayaweera, Y. Li, and K. A. Avery, “Optimal and detection in cognitive radio,” IEEE J. Sel. Topics Signal Process.,
low-complexity algorithms for dynamic spectrum access in centralized vol. 2, no. 1, pp. 18 –27, Feb. 2008.
cognitive radio networks with fading channels,” in IEEE Vehicular [49] T. Cui, F. Gao, and A. Nallanathan, “Optimization of cooperative
Technology Conference (VTC-spring ’11), Budapest, Hungary, May spectrum sensing in cognitive radio,” IEEE Trans. Veh. Technol.,
2011. vol. 60, no. 4, pp. 1578 –1589, May 2011.
[27] C. Cordeiro, M. Ghosh, D. Cavalcanti, and K. Challapali, “Spectrum [50] O. Simeone, I. Stanojev, S. Savazzi, Y. Bar-Ness, U. Spagnolini, and
sensing for dynamic spectrum access of TV bands,” in 2nd Interna- R. Pickholtz, “Spectrum leasing to cooperating secondary ad hoc
tional Conference on Cognitive Radio Oriented Wireless Networks and networks,” IEEE J. Sel. Areas Commun., vol. 26, pp. 203–213, Jan.
Communications (CrownCom ’07), Aug. 2007, pp. 225–233. 2008.
[28] H. Chen, W. Gao, and D. G. Daut, “Signature based spectrum sensing [51] Q. Zhang, J. Jia, and J. Zhang, “Cooperative relay to improve diversity
algorithms for IEEE 802.22 WRAN,” in IEEE International Confer- in cognitive radio networks,” IEEE Commun. Mag., vol. 47, no. 2, pp.
ence on Communications (ICC ’07), June 2007, pp. 6487–6492. 111 –117, Feb. 2009.
[29] Y. Zeng and Y. Liang, “Maximum-minimum eigenvalue detection for [52] Y. Han, A. Pandharipande, and S. Ting, “Cooperative decode-and-
cognitive radio,” in 18th International Symposium on Personal, Indoor forward relaying for secondary spectrum access,” IEEE Trans. Wireless
and Mobile Radio Communications (PIMRC ’07), Sep. 2007, pp. 1–5. Commun., vol. 8, no. 10, pp. 4945 –4950, Oct. 2009.
[30] , “Covariance based signal detections for cognitive radio,” in 2nd [53] L. Li, X. Zhou, H. Xu, G. Li, D. Wang, and A. Soong, “Simplified relay
IEEE International Symposium on New Frontiers in Dynamic Spectrum selection and power allocation in cooperative cognitive radio systems,”
Access Networks (DySPAN ’07), Apr. 2007, pp. 202–207. IEEE Trans. Wireless Commun., vol. 10, no. 1, pp. 33 –36, Jan. 2011.
[31] X. Zhou, Y. Li, Y. H. Kwon, and A. Soong, “Detection timing and [54] E. Hossain and V. K. Bhargava, Cognitive Wireless Communication
channel selection for periodic spectrum sensing in cognitive radio,” Networks. Springer, 2007.
in IEEE Global Telecommunications Conference (GLOBECOM ’08), [55] B. Wang and K. J. R. Liu, “Advances in cognitive radio networks: A
Nov. 2008, pp. 1–5. survey,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 1, pp. 5 –23,
[32] Z. Tian and G. B. Giannakis, “A wavelet approach to wideband Feb. 2011.
spectrum sensing for cognitive radios,” in 1st International Conference [56] I. Akyildiz, W.-Y. Lee, M. Vuran, and S. Mohanty, “A survey on
on Cognitive Radio Oriented Wireless Networks and Communications, spectrum management in cognitive radio networks,” IEEE Commun.
June 2006, pp. 1–5. Mag., vol. 46, no. 4, pp. 40 –48, Apr. 2008.
[33] G. Ganesan and Y. Li, “Cooperative spectrum sensing in cognitive [57] K. Shin, H. Kim, A. Min, and A. Kumar, “Cognitive radios for dynamic
radio, part I: Two user networks,” IEEE Trans. Wireless Commun., spectrum access: from concept to reality,” IEEE Wireless Commun.,
vol. 6, no. 6, pp. 2204–2213, June 2007. vol. 17, no. 6, pp. 64 –74, Dec. 2010.
[34] , “Cooperative spectrum sensing in cognitive radio, part II: Mul- [58] A. De Domenico, E. Strinati, and M.-G. Di Benedetto, “A survey on
tiuser networks,” Wireless Communications, IEEE Trans.on, vol. 6, MAC strategies for cognitive radio networks,” IEEE Commun. Surveys
no. 6, pp. 2214–2222, June 2007. Tutorials, vol. 14, no. 1, pp. 21 –44, quarter 2012.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1156 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

[59] A. Mody, M. Sherman, R. Martinez, R. Reddy, and T. Kiernan, “Survey [80] B. Hamdaoui, P. Venkatraman, and M. Guizani, “Opportunistic ex-
of IEEE standards supporting cognitive radio and dynamic spectrum ploitation of bandwidth resources through reinforcement learning,”
access,” in IEEE Military Communications Conference (MILCOM ’08), in IEEE Global Telecommunications Conference (GLOBECOM ’09),
Nov. 2008, pp. 1 –7. Honolulu, HI, Dec. 2009, pp. 1 –6.
[60] Q. Zhao and A. Swami, “A survey of dynamic spectrum access: [81] K.-L. A. Yau, P. Komisarczuk, and P. D. Teal, “Applications of rein-
Signal processing and networking perspectives,” in IEEE International forcement learning to cognitive radio networks,” in IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP ’07), Conference on Communications Workshops (ICC), 2010, Cape Town,
vol. 4, Apr. 2007, pp. IV–1349 –IV–1352. South Africa, May 2010, pp. 1 –6.
[61] J. Mitola, “Cognitive radio architecture evolution,” Proc. IEEE, vol. 97, [82] Y. Reddy, “Detecting primary signals for efficient utilization of spec-
no. 4, pp. 626 –641, Apr. 2009. trum using Q-learning,” in Fifth International Conference on Informa-
[62] S. Jayaweera, Y. Li, M. Bkassiny, C. Christodoulou, and K. Avery, tion Technology: New Generations (ITNG ’08), Las Vegas, NV, Apr.
“Radiobots: The autonomous, self-learning future cognitive radios,” 2008, pp. 360 –365.
in International Symposium on Intelligent Signal Processing and [83] M. Li, Y. Xu, and J. Hu, “A Q-learning based sensing task selection
Communications Systems (ISPACS ’11), Chiangmai, Thailand, Dec. scheme for cognitive radio networks,” in International Conference on
2011, pp. 1 –5. Wireless Communications Signal Processing (WCSP ’09), Nanjing,
[63] A. El-Saleh, M. Ismail, M. Ali, and J. Ng, “Development of a cognitive China, Nov. 2009, pp. 1 –5.
radio decision engine using multi-objective hybrid genetic algorithm,” [84] Y. Yao and Z. Feng, “Centralized channel and power allocation for
in IEEE 9th Malaysia International Conference on Communications cognitive radio networks: A Q-learning solution,” in Future Network
(MICC 2009), Dec. 2009, pp. 343 –347. and Mobile Summit, Florence, Italy, June 2010, pp. 1 –8.
[64] L. Morales-Tirado, J. Suris-Pietri, and J. Reed, “A hybrid cognitive [85] P. Venkatraman, B. Hamdaoui, and M. Guizani, “Opportunistic band-
engine for improving coverage in 3G wireless networks,” in IEEE Inter- width sharing through reinforcement learning,” IEEE Trans. Veh.
national Conference on Communications Workshops (ICC Workshops Technol., vol. 59, no. 6, pp. 3148 –3153, July 2010.
2009)., June 2009, pp. 1 –5. [86] T. Jiang, D. Grace, and P. Mitchell, “Efficient exploration in re-
[65] Y. Huang, H. Jiang, H. Hu, and Y. Yao, “Design of learning engine inforcement learning-based cognitive radio spectrum sharing,” IET
based on support vector machine in cognitive radio,” in International Communications, vol. 5, no. 10, pp. 1309 –1317, Jan. 2011.
Conference on Computational Intelligence and Software Engineering [87] T. Clancy, A. Khawar, and T. Newman, “Robust signal classification
(CiSE ’09), Wuhan, China, Dec. 2009, pp. 1 –4. using unsupervised learning,” IEEE Trans. Wireless Commun., vol. 10,
[66] Y. Huang, J. Wang, and H. Jiang, “Modeling of learning inference no. 4, pp. 1289 –1299, Apr. 2011.
and decision-making engine in cognitive radio,” in Second Interna- [88] C. Claus and C. Boutilier, “The dynamics of reinforcement learning in
tional Conference on Networks Security Wireless Communications and cooperative multiagent systems,” in Proc. Fifteenth National Confer-
Trusted Computing (NSWCTC), vol. 2, Apr. 2010, pp. 258 –261. ence on Artificial Intelligence, Madison, WI, Jul. 1998, pp. 746–752.
[67] Y. Yang, H. Jiang, and J. Ma, “Design of optimal engine for cognitive [89] G. D. Croon, M. F. V. Dartel, and E. O. Postma, “Evolutionary learning
radio parameters based on the DUGA,” in 3rd International Conference outperforms reinforcement learning on non-Markovian tasks,” in 8th
on Information Sciences and Interaction Sciences (ICIS 2010), June European Conference on Artificial Life Workshop on Memory and
2010, pp. 694 –698. Learning Mechanisms in Autonomous Robots, Canterbury, Kent, UK,
[68] H. Volos and R. Buehrer, “Cognitive engine design for link adapta- 2005.
tion: An application to multi-antenna systems,” IEEE Trans. Wireless [90] R. Sutton, D. Mcallester, S. Singh, and Y. Mansour, “Policy gradient
Commun., vol. 9, no. 9, pp. 2902 –2913, Sep. 2010. methods for reinforcement learning with function approximation,” in
[69] C. Clancy, J. Hecker, E. Stuntebeck, and T. O’Shea, “Applications Proc. 12th conference on Advances in Neural Information Processing
of machine learning to cognitive radio networks,” IEEE Wireless Systems (NIPS ’99). Denver, CO: MIT Press, 2001, pp. 1057–1063.
Commun., vol. 14, no. 4, pp. 47 –52, Aug. 2007. [91] J. Baxter and P. L. Bartlett, “Infinite-horizon policy-gradient estima-
[70] A. N. Mody, S. R. Blatt, N. B. Thammakhoune, T. P. McElwain, J. D. tion,” Journal of Artificial Intelligence Research, vol. 15, pp. 319–350,
Niedzwiecki, D. G. Mills, M. J. Sherman, and C. S. Myers, “Machine 2001.
learning based cognitive communications in white as well as the gray [92] D. E. Moriarty, A. C. Schultz, and J. J. Grefenstette, “Evolutionary
space,” in IEEE Military Communications Conference. (MILCOM ’07), algorithms for reinforcement learning,” J. Artificial Intelligence Re-
Orlando, FL, Oct. 2007, pp. 1 –7. search, vol. 11, pp. 241–276, 1999.
[71] M. Bkassiny, S. K. Jayaweera, Y. Li, and K. A. Avery, “Wideband spec- [93] F. Dandurand and T. Shultz, “Connectionist models of reinforcement,
trum sensing and non-parametric signal classification for autonomous imitation, and instruction in learning to solve complex problems,” IEEE
self-learning cognitive radios,” IEEE Trans. Wireless Commun., vol. 11, Trans. Autonomous Mental Development, vol. 1, no. 2, pp. 110 –121,
no. 7, pp. 2596 –2605, July 2012. Aug. 2009.
[72] , “Blind cyclostationary feature detection based spectrum sensing [94] Y. Xing and R. Chandramouli, “Human behavior inspired cognitive
for autonomous self-learning cognitive radios,” in IEEE International radio network design,” IEEE Commun. Mag., vol. 46, no. 12, pp. 122
Conference on Communications (ICC ’12), Ottawa, Canada, June 2012. –127, Dec. 2008.
[73] X. Gao, B. Jiang, X. You, Z. Pan, Y. Xue, and E. Schulz, “Efficient [95] M. van der Schaar and F. Fu, “Spectrum access games and strategic
channel estimation for MIMO single-carrier block transmission with learning in cognitive radio networks for delay-critical applications,”
dual cyclic timeslot structure,” IEEE Trans. Commun., vol. 55, no. 11, Proc. IEEE, vol. 97, no. 4, pp. 720 –740, Apr. 2009.
pp. 2210 –2223, Nov. 2007. [96] B. Wang, K. Ray Liu, and T. Clancy, “Evolutionary cooperative
[74] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. spectrum sensing game: how to collaborate?” IEEE Trans. Commun.,
Cambridge, MA: MIT Press, 1998. vol. 58, no. 3, pp. 890 –900, Mar. 2010.
[75] S. Gong, W. Liu, W. Yuan, W. Cheng, and S. Wang, “Threshold- [97] A. Galindo-Serrano, L. Giupponi, P. Blasco, and M. Dohler, “Learning
learning in local spectrum sensing of cognitive radio,” in IEEE 69th from experts in cognitive radio networks: The docitive paradigm,”
Vehicular Technology Conference (VTC Sp. ’09), Barcelona, Spain, in Proc. Fifth International Conference on Cognitive Radio Ori-
Apr. 2009, pp. 1 –6. ented Wireless Networks Communications (CROWNCOM ’10), Cannes,
[76] M. L. Puterman, Markov Decision Processes: Discrete Stochastic France, June 2010, pp. 1 –6.
Dynamic Programming. New York: John Wiley and Sons, 1994. [98] A. He, K. K. Bae, T. Newman, J. Gaeddert, K. Kim, R. Menon,
[77] A. Galindo-Serrano and L. Giupponi, “Distributed Q-learning for L. Morales-Tirado, J. Neel, Y. Zhao, J. Reed, and W. Tranter, “A
aggregated interference control in cognitive radio networks,” IEEE survey of artificial intelligence for cognitive radios,” IEEE Trans. Veh.
Trans. Veh. Technol., vol. 59, no. 4, pp. 1823 –1834, May 2010. Technol., vol. 59, no. 4, pp. 1578 –1592, May 2010.
[78] X. Dong, Y. Li, C. Wu, and Y. Cai, “A learner based on neural [99] R. S. Michalski, “Learning and cognition,” in World Conference on the
network for cognitive radio,” in 12th IEEE International Conference Fundamentals of Artificial Intelligence (WOCFAI ’95), Paris, France,
on Communication Technology (ICCT ’10), Nanjing, China, Nov. 2010, July 1995, pp. 507–510.
pp. 893 –896. [100] J. Burbank, A. Hammons, and S. Jones, “A common lexicon and design
[79] M. M. Ramon, T. Atwood, S. Barbin, and C. G. Christodoulou, “Signal issues surrounding cognitive radio networks operating in the presence
classification with an SVM-FFT approach for feature extraction in of jamming,” in IEEE Military Communications Conference (MILCOM
cognitive radio,” in SBMO/IEEE MTT-S International Microwave and ’08), San Diego, CA, Nov. 2008, pp. 1 –7.
Optoelectronics Conference (IMOC ’09), Belem, Brazil, Nov. 2009, [101] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:
pp. 286 –289. Springer-Verlag, 1995.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1157

[102] V. Tumuluru, P. Wang, and D. Niyato, “A neural network based agile wireless networks,” IEEE J. Sel. Areas Commun., vol. 3, no. 25,
spectrum prediction scheme for cognitive radio,” in IEEE International pp. 601–612, Apr. 2007.
Conference on Communications (ICC ’10), May 2010, pp. 1 –5. [124] O. Ileri, D. Samardzija, and N. B. Mandayam, “Demand responsive
[103] H. Hu, J. Song, and Y. Wang, “Signal classification based on spec- pricing and competitive spectrum allocation via a spectrum server,” in
tral correlation analysis and SVM in cognitive radio,” in Advanced New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN
Information Networking and Applications, 2008. AINA 2008. 22nd 2005. 2005 First IEEE International Symposium on, Nov. 2005, pp.
International Conference on, Mar. 2008, pp. 883 –887. 194–202.
[104] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical [125] Y. Zhao, S. Mao, J. Neel, and J. Reed, “Performance evaluation of
Dirichlet processes,” J. American Statistical Association, vol. 101, no. cognitive radios: Metrics, utility functions, and methodology,” Proc.
476, pp. 1566–1581, Dec. 2006. IEEE, vol. 97, no. 4, pp. 642–659, Apr. 2009.
[105] M. Bkassiny, S. K. Jayaweera, and Y. Li, “Multidimensional Dirichlet [126] J. Neel, R. M. Buehrer, B. H. reed, and R. P. Gilles, “Game theoretic
process-based non-parametric signal classification for autonomous self- analysis of a network of cognitive radio,” in 45th Midwest Symp. on
learning cognitive radios,” IEEE Trans. Wireless Commun., May 2012, Circuits and Systems, vol. 3, Aug. 2002, pp. III–409–III–412.
[In review]. [127] M. R. Musku and P. Cotae, “Cognitive radio: Time domain spectrum
[106] J. Unnikrishnan and V. V. Veeravalli, “Algorithms for dynamic spec- allocation using game theory,” in IEEE Int. Conf. on System and
trum access with learning for cognitive radio,” IEEE Trans. Signal Systems Engineering (SoSE), Apr. 2007, pp. 1–6.
Process., vol. 58, no. 2, pp. 750 –760, Feb. 2010. [128] W. Wang, Y. Cui, T. Peng, and W. Wang, “Noncooperative power
[107] Q. Zhao, L. Tong, A. Swami, and Y. Chen, “Decentralized cognitive control game with exponential pricing for cognitive radio network,”
MAC for opportunistic spectrum access in ad hoc networks: A POMDP in IEEE 65th Vehicular Technology Conf. (VTC)-Spring, Apr. 2007,
framework,” IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589–600, pp. 3125–3129.
Apr. 2007. [129] J. Li, D. Chen, W. Li, and J. Ma, “Multiuser power and channel
[108] Q. Zhao, L. Tong, and A. Swami, “Decentralized cognitive MAC for aloocation algorithm in cognitive radio,” in Int. Conf. on Parallel
dynamic spectrum access,” in First IEEE International Symposium on Processing, (ICPP), Sep. 2007, pp. 72–72.
New Frontiers in Dynamic Spectrum Access Networks (DySPAN ’05), [130] Z. Ji and K. J. R. Liu, “Cognitive radios for dynamic spectrum
Nov. 2005, pp. 224–232. access- dynamic spectrum sharing: A game theoretical overview,” IEEE
[109] S. K. Jayaweera and C. Mosquera, “A dynamic spectrum leasing (DSL) Commun. Mag., vol. 45, no. 5, pp. 88–94, May 2007.
framework for spectrum sharing in cognitive radio networks,” in 43rd [131] N. Nie and C. Comaniciu, “Adaptive channel allocation spectrum
Annual Asilomar Conf. on Signals, Systems and Computers, Pacific etiquette for cognitive radio networks,” in 1st IEEE Int. Symp. on
Grove, CA, Nov. 2009. New Frontiers in Dynamic Spectrum Access Networks (DySPAN), Nov.
[110] K. Hakim, S. Jayaweera, G. El-Howayek, and C. Mosquera, “Efficient 2005, pp. 269–278.
dynamic spectrum sharing in cognitive radio networks: Centralized [132] R. G. Wendorf and H. Blum, “A channel-change game for multiple
dynamic spectrum leasing (C-DSL),” IEEE Trans. Wireless Commun., interfering cognitive wireless networks,” in Military Communi. Conf.
vol. 9, no. 9, pp. 2956 –2967, Sep. 2010. (MILCOM), Oct. 2006, pp. 1–7.
[111] B. Latifa, Z. Gao, and S. Liu, “No-regret learning for simultaneous [133] J. Li, D. Chen, W. Li, and J. Ma, “Multiuser power and channel
power control and channel allocation in cognitive radio networks,” allocation algorithm in cognitive radio,” in International Conference
in Computing, Communications and Applications Conference (Com- on Parallel Processing (ICPP ’07), XiAn, China, Sep. 2007, p. 72.
ComAp ’12), Hong Kong, China, Jan. 2012, pp. 267 –271. [134] X. Zhang and J. Zhao, “Power control based on the asynchronous
[112] Z. Han, C. Pandana, and K. Liu, “Distributive opportunistic spectrum distributed pricing algorithm in cognitive radios,” in IEEE Youth
access for cognitive radio using correlated equilibrium and no-regret Conference on Information Computing and Telecommunications (YC-
learning,” in IEEE Wireless Commun. and Networking Conference ICT ’10), Beijing, China, Nov. 2010, pp. 69 –72.
(WCNC ’07), Hong Kong, China, Mar. 2007, pp. 11 –15. [135] L. Pillutla and V. Krishnamurthy, “Game theoretic rate adaptation for
[113] Q. Zhu, Z. Han, and T. Basar, “No-regret learning in collaborative spectrum-overlay cognitive radio networks,” in IEEE Global Telecom-
spectrum sensing with malicious nodes,” in IEEE International Con- munications Conference (GLOBECOM ’08), New Orleans, LA, Dec.
ference on Communications (ICC ’10), Cape Town, South Africa, May 2008, pp. 1 –5.
2010, pp. 1 –6. [136] H. Li, Y. Liu, and D. Zhang, “Dynamic spectrum access for cognitive
[114] D. Pados, P. Papantoni-Kazakos, D. Kazakos, and A. Koyiantis, “On- radio systems with repeated games,” in IEEE International Conference
line threshold learning for Neyman-Pearson distributed detection,” on Wireless Communications, Networking and Information Security
IEEE Trans. Syst. Man Cybern., vol. 24, no. 10, pp. 1519 –1531, Oct. (WCNIS ’10), Beijing, China, June 2010, pp. 59 –62.
1994. [137] S. K. Jayaweera and M. Bkassiny, “Learning to thrive in a leasing
[115] K. Akkarajitsakul, E. Hossain, D. Niyato, and D. I. Kim, “Game market: an auctioning framework for distributed dynamic spectrum
theoretic approaches for multiple access in wireless networks: A leasing (D-DSL),” in IEEE Wireless Commun. & Networking Confer-
survey,” IEEE Commun. Surveys Tutorials, vol. 13, no. 3, pp. 372 ence (WCNC ’11), Cancun, Mexico, Mar. 2011.
–395, quarter 2011. [138] L. Chen, S. Iellamo, M. Coupechoux, and P. Godlewski, “An auction
[116] M. D. Escobar, “Estimating normal means with a Dirichlet framework for spectrum allocation with interference constraint in
process prior,” J. American Statistical Association, vol. 89, cognitive radio networks,” in IEEE INFOCOM ’10, San Diego, CA,
no. 425, pp. 268–277, Mar. 1994. [Online]. Available: Mar. 2010, pp. 1 –9.
http://www.jstor.org/stable/2291223 [139] G. Iosifidis and I. Koutsopoulos, “Challenges in auction theory driven
[117] C. Watkins, “Learning from delayed rewards,” Ph.D. dissertation, spectrum management,” IEEE Commun. Mag., vol. 49, no. 8, pp. 128
University of Cambridge, United Kingdom, 1989. –135, Aug. 2011.
[118] H. Li, “Multi-agent Q-learning of channel selection in multi-user [140] F. Fu and M. van der Schaar, “Stochastic game formulation for
cognitive radio systems: A two by two case,” in IEEE International cognitive radio networks,” in 3rd IEEE Symposium on New Frontiers
Conference on Systems, Man and Cybernetics (SMC ’09), San Antonio, in Dynamic Spectrum Access Networks (DySPAN ’08), Chicago, IL,
TX, Oct. 2009, pp. 1893 –1898. Oct. 2008, pp. 1 –5.
[119] J. Peters and S. Schaal, “Policy gradient methods for robotics,” in [141] Y. Xu, J. Wang, Q. Wu, A. Anpalagan, and Y.-D. Yao, “Opportunistic
IEEE/RSJ International Conference on Intelligent Robots and Systems spectrum access in unknown dynamic environment: A game-theoretic
(2006), Beijing, China, Oct. 2006, pp. 2219 –2225. stochastic learning solution,” IEEE Trans. Wireless Commun., vol. 11,
[120] M. Riedmiller, J. Peters, and S. Schaal, “Evaluation of policy gra- no. 4, pp. 1380 –1391, Apr. 2012.
dient methods and variants on the cart-pole benchmark,” in IEEE [142] T. Ferguson, “A Bayesian analysis of some nonparametric problems,”
International Symposium on Approximate Dynamic Programming and The Annals of Statistics, vol. 1, pp. 209–230, 1973.
Reinforcement Learning (ADPRL ’07), Honolulu, HI, Apr. 2007, pp. [143] D. Blackwell and J. MacQueen, “Ferguson distribution via Polya urn
254 –261. schemes,” The Annals of Statistics, vol. 1, pp. 353–355, 1973.
[121] D. Fudenberg and J. Tirole, Game Theory. MIT Press, 1991. [144] M. Jordan. (2005) Dirichlet processes, Chinese restaurant processes and
[122] P. Zhou, W. Yuan, W. Liu, and W. Cheng, “Joint power and rate control all that. [Online]. Available: http://www.cs.berkeley.edu/ jordan/nips-
in cognitive radio networks: A game-theoretical approach,” in Proc. tutorial05.ps
IEEE International Conference on Communications (ICC’08), May [145] N. Tawara, S. Watanabe, T. Ogawa, and T. Kobayashi, “Speaker clus-
2008, pp. 3296 – 3301. tering based on utterance-oriented Dirichlet process mixture model,” in
[123] A. R. Fattahi, F. Fu, M. V. D. Schaar, and F. Paganini, “Mechanism- 12th Annual Conference of the International Speech Communication
based resource allocation for multimedia transmission over spectrum Association (ISCA ’11), Florence, Italy, Aug. 2011, pp. 2905–2908.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
1158 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO. 3, THIRD QUARTER 2013

[146] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and [168] Z. Sun, G. Bradford, and J. Laneman, “Sequence detection algorithms
E. Teller, “Equation of state calculations by fast computing machines,” for PHY-layer sensing in dynamic spectrum access networks,” IEEE
The Journal of Chemical Physics, vol. 21, no. 6, pp. 1087–1092, J. Sel. Topics Signal Process., vol. 5, no. 1, pp. 97 –109, Feb. 2011.
1953. [Online]. Available: http://dx.doi.org/10.1063/1.1699114 [169] D. Cabric, “Addressing feasibility of cognitive radios,” IEEE Signal
[147] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, Processing Mag., vol. 25, no. 6, pp. 85 –93, Nov. 2008.
and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. [170] Z. Han, R. Fan, and H. Jiang, “Replacement of spectrum sensing in
Mach. Intell., vol. PAMI-6, no. 6, pp. 721 –741, nov. 1984. cognitive radio,” IEEE Trans. Wireless Commun., vol. 8, no. 6, pp.
[148] N. Shetty, S. Pollin, and P. Pawelczak, “Identifying spectrum usage 2819 –2826, June 2009.
by unknown systems using experiments in machine learning,” in IEEE [171] S. Jha, U. Phuyal, M. Rashid, and V. Bhargava, “Design of OMC-
Wireless Communications and Networking Conference (WCNC ’09), MAC: An opportunistic multi-channel MAC with QoS provisioning for
Budapest, Hungary, Apr. 2009, pp. 1 –6. distributed cognitive radio networks,” IEEE Trans. Wireless Commun.,
[149] G. Yu, R. Huang, and Z. Wang, “Document clustering via Dirichlet vol. 10, no. 10, pp. 3414 –3425, Oct. 2011.
process mixture model with feature selection,” in Proc. 16th ACM [172] B. Wang, K. Liu, and T. Clancy, “Evolutionary game framework for
SIGKDD International conference on Knowledge Discovery and Data behavior dynamics in cooperative spectrum sensing,” in IEEE Global
mining (KDD ’10). New York, NY, USA: ACM, 2010, pp. 763–772. Telecommunications Conference (IEEE GLOBECOM ’08), Dec. 2008,
[Online]. Available: http://doi.acm.org/10.1145/1835804.1835901 pp. 1 –5.
[150] S. S. Haykin, Neural networks : A Comprehensive Foundation, 2nd ed. [173] E. C. Y. Peh, Y.-C. Liang, Y. L. Guan, and Y. Zeng, “Power control
Prentice Hall, Jul. 1999. in cognitive radios under cooperative and non-cooperative spectrum
[151] N. Baldo and M. Zorzi, “Learning and adaptation in cognitive radios sensing,” IEEE Trans. Wireless Commun., vol. 10, no. 12, pp. 4238
using neural networks,” in 5th IEEE Consumer Communications and –4248, Dec. 2011.
Networking Conference (CCNC ’08), Jan. 2008, pp. 998 –1003. [174] M. van der Schaar and F. Fu, “Spectrum access games and strategic
[152] N. Baldo, B. Tamma, B. Manojt, R. Rao, and M. Zorzi, “A neural learning in cognitive radio networks for delay-critical applications,”
network based cognitive controller for dynamic channel selection,” in Proc. IEEE, vol. 97, no. 4, pp. 720 –740, Apr. 2009.
IEEE International Conference on Communications (ICC ’09), June [175] L. Chen, S. Iellamo, M. Coupechoux, and P. Godlewski, “An auction
2009, pp. 1 –5. framework for spectrum allocation with interference constraint in
cognitive radio networks,” in Proc. IEEE INFOCOM ’10, Mar. 2010,
[153] Y.-J. Tang, Q.-Y. Zhang, and W. Lin, “Artificial neural network based
pp. 1 –9.
spectrum sensing method for cognitive radio,” in 6th International
[176] M. Haddad, S. Elayoubi, E. Altman, and Z. Altman, “A hybrid
Conference on Wireless Communications Networking and Mobile Com-
puting (WiCOM ’10), Sep. 2010, pp. 1 –4. approach for radio resource management in heterogeneous cognitive
networks,” IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 831 –842,
[154] M. I. Taj and M. Akil, “Cognitive radio spectrum evolution prediction
Apr. 2011.
using artificial neural networks based multivariate time series model-
[177] S. Buljore, M. Muck, P. Martigne, P. Houze, H. Harada, K. Ishizu,
ing,” Wireless Conference 2011 - Sustainable Wireless Technologies
O. Holland, A. Mikhailovic, K. A. Tsagkariss, O. Sallent, M. S. G.
(European Wireless), 11th European, pp. 1 –6, Apr. 2011.
Clemo, V. Ivanov, K. Nolte, and M. Stametalos, “Introduction to IEEE
[155] J. Popoola and R. van Olst, “A novel modulation-sensing method,” p1900.4 activities,” IEICE Trans. Commun., vol. E91-B, no. 1, 2008.
IEEE Veh. Technol. Mag., vol. 6, no. 3, pp. 60 –69, Sep. 2011.
[156] M. Han, J. Xi, S. Xu, and F.-L. Yin, “Prediction of chaotic time series
based on the recurrent predictor neural network,” IEEE Trans. Signal
Process., vol. 52, no. 12, pp. 3409 – 3416, dec. 2004.
[157] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
Mario Bkassiny (S’06) received the B.E. degree
[158] T. Atwood, “RF channel characterization for cognitive radio using sup- in Electrical Engineering with High Distinction and
port vector machines,” Ph.D. dissertation, University of New Mexico,
the M.S. degree in Computer Engineering from the
Nov. 2009. Lebanese American University, Lebanon, in 2008
[159] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training and 2009, respectively. He is currently working
algorithm for optimal margin classifiers,” in Proc. fifth annual towards his PhD degree in Electrical Engineering
workshop on Computational Learning Theory, ser. COLT ’92. New at the Communication and Information Sciences
York, NY, USA: ACM, 1992, pp. 144–152. [Online]. Available: Laboratory (CISL), Department of Electrical and
http://doi.acm.org/10.1145/130385.130401 Computer Engineering at the University of New
[160] M. Martinez-Ramon and C. G. Christodoulou, Support Vector Ma- Mexico, Albuquerque, NM, USA. His current re-
chines for Antenna Array Processing and Electromagnetics, 1st ed., search interests are in cognitive radios, distributed
C. A. Balanis, Ed. USA: Morgan and Claypool Publishers, 2006. learning and reasoning, cognitive and cooperative communications, machine
[161] H. Hu, J. Song, and Y. Wang, “Signal classification based on spectral learning and dynamic spectrum leasing (DSL).
correlation analysis and SVM in cognitive radio,” in 22nd International
Conference on Advanced Information Networking and Applications
(AINA ’08), Okinawa, Japan, Mar. 2008, pp. 883 –887.
[162] G. Xu and Y. Lu, “Channel and modulation selection based on support
vector machines for cognitive radio,” in International Conference Yang Li received the B.E. degree in Electrical
on Wireless Communications, Networking and Mobile Computing Engineering from the Beijing University of Aero-
(WiCOM ’06), Sep. 2006, pp. 1 –4. nautics and Astronautics, Beijing, China, in 2005
[163] L. Hai-Yuan and J.-C. Sun, “A modulation type recognition method and the M.S. degree in Electrical Engineering from
using wavelet support vector machines,” in 2nd International Congress New Mexico Institute of Mining and Technology,
on Image and Signal Processing (CISP ’09), Oct. 2009, pp. 1 –4. Socorro, New Mexico, USA in 2009. He is cur-
[164] Z. Yang, Y.-D. Yao, S. Chen, H. He, and D. Zheng, “MAC protocol rently working towards his PhD degree in Electrical
classification in a cognitive radio network,” in 19th Annual Wireless Engineering at the Communication and Information
and Optical Communications Conference (WOCC ’10), May 2010, pp. Sciences Laboratory (CISL), Department of Electri-
1 –5. cal and Computer Engineering at the University of
[165] M. Petrova, P. Ma andho andnen, and A. Osuna, “Multi-class clas- New Mexico, Albuquerque, NM, USA. His current
sification of analog and digital signals in cognitive radios using research interests are in cognitive radios, spectrum sensing, cooperative
support vector machines,” in 7th International Symposium on Wireless communications, and dynamic spectrum access (DSA).
Communication Systems (ISWCS ’10), Sep. 2010, pp. 986 –990.
[166] D. Zhang and X. Zhai, “SVM-based spectrum sensing in cognitive
radio,” in 7th International Conference on Wireless Communications,
Networking and Mobile Computing (WiCOM ’11), Sep. 2011, pp. 1
–4.
[167] T. D. Atwood, M. Martnez-Ramon, and C. G. Christodoulou, “Robust
support vector machine spectrum estimation in cognitive radio,” in
Proc. 2009 IEEE International Symposium on Antennas and Propa-
gation and USNC/URSI National Radio Science Meeting, 2009.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.
BKASSINY et al.: A SURVEY ON MACHINE-LEARNING TECHNIQUES IN COGNITIVE RADIOS 1159

Sudharman K. Jayaweera (S’00, M’04, SM’09) Vehicles Directorate (AFRL/RVSV) during the summers of 2009-2011.
was born in Matara, Sri Lanka. He received the Dr. Jayaweera is currently an associate editor of IEEE Trans. Vehicular
B.E. degree in Electrical and Electronic Engineering Technology and EURASIP Journal on Advances in Signal Processing. He
with First Class Honors from the University of has also served as a member of the Technical Program Committees of
Melbourne, Australia, in 1997 and M.A. and PhD numerous IEEE conferences including ICC (2010-2012), Globecom (2006,
degrees in Electrical Engineering from Princeton 2008, 2009,2011), WCNC (2011, 2012) and PIMRC (2011, 2012). His
University in 2001 and 2003, respectively. He is current research interests include cooperative and cognitive communications,
currently an associate Professor in Electrical Engi- information theory of networked-control systems, control and optimization in
neering at the Department of Electrical and Com- smart-grid, machine learning techniques for cognitive radios, statistical signal
puter Engineering at University of New Mexico, processing and wireless sensor networks.
Albuquerque, NM. Dr. Jayaweera held an Air Force
Summer Faculty Fellowship at the Air Force Research Laboratory, Space

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on October 07,2023 at 07:08:49 UTC from IEEE Xplore. Restrictions apply.

Unit 9 Cognitive Radio
No ratings yet
Unit 9 Cognitive Radio
40 pages
10 1080@0952813X 2020 1818291 PDF
No ratings yet
10 1080@0952813X 2020 1818291 PDF
41 pages
UNIT II Notes
No ratings yet
UNIT II Notes
21 pages
Reinforcement Learning For Resource Allocation in Cognitive Radio Networks
No ratings yet
Reinforcement Learning For Resource Allocation in Cognitive Radio Networks
18 pages
Design and Simulation of Adaptive Cognitive Radio Based On Software-Defined Radio (SDR) Using Higher-Order Moments and Cumulants
No ratings yet
Design and Simulation of Adaptive Cognitive Radio Based On Software-Defined Radio (SDR) Using Higher-Order Moments and Cumulants
14 pages
Cognitive Radio Spectrum Sensing Thesis
100% (3)
Cognitive Radio Spectrum Sensing Thesis
6 pages
Ref 92-2015
No ratings yet
Ref 92-2015
16 pages
Pawelczak 2011
No ratings yet
Pawelczak 2011
11 pages
Guan 2010
No ratings yet
Guan 2010
10 pages
A Review For SS in Modern CRN
No ratings yet
A Review For SS in Modern CRN
17 pages
Simulation Analysis of Prototype Filter Bank Multi
No ratings yet
Simulation Analysis of Prototype Filter Bank Multi
10 pages
Spectrum Prediction in Cognitive Radio Networks
No ratings yet
Spectrum Prediction in Cognitive Radio Networks
7 pages
I Jcs It 20140504152
No ratings yet
I Jcs It 20140504152
4 pages
Cognitive Radio Technology - An Intelligent Radio Approach
No ratings yet
Cognitive Radio Technology - An Intelligent Radio Approach
9 pages
Chanel Selection Cognitiv
No ratings yet
Chanel Selection Cognitiv
6 pages
IntroductiontoCRSystems Report2008
No ratings yet
IntroductiontoCRSystems Report2008
42 pages
IJAREEIE Paper Template
No ratings yet
IJAREEIE Paper Template
4 pages
2018 CR - GRT
No ratings yet
2018 CR - GRT
9 pages
(Berna Sayrac (Auth.), Hrishikesh Venkataraman
No ratings yet
(Berna Sayrac (Auth.), Hrishikesh Venkataraman
409 pages
Spectrum Sensing Using Soft Computing Skills
No ratings yet
Spectrum Sensing Using Soft Computing Skills
7 pages
Engproc 41 00007 v2
No ratings yet
Engproc 41 00007 v2
11 pages
Securing Cognitive Radio Networks: O. Le On, J. Hern Andez-Serrano and M. Soriano
No ratings yet
Securing Cognitive Radio Networks: O. Le On, J. Hern Andez-Serrano and M. Soriano
20 pages
1spectrum Sensing - pdf1
No ratings yet
1spectrum Sensing - pdf1
10 pages
Approaches For Advanced Spectrum Sensing in Cognitive Radio Networks
No ratings yet
Approaches For Advanced Spectrum Sensing in Cognitive Radio Networks
6 pages
Unit Ix Cost Effectiveness and Cost Accounting
No ratings yet
Unit Ix Cost Effectiveness and Cost Accounting
38 pages
Machine Learnign in CR
No ratings yet
Machine Learnign in CR
24 pages
A Spectrum Sensing Survey
No ratings yet
A Spectrum Sensing Survey
6 pages
In Cognitive Radio The Analysis of Bit-Error-Rate (BER) by Using PSO Algorithm
No ratings yet
In Cognitive Radio The Analysis of Bit-Error-Rate (BER) by Using PSO Algorithm
4 pages
A Survey On Machine-Learning Techniques in PDF
No ratings yet
A Survey On Machine-Learning Techniques in PDF
24 pages
Realistic or Not
No ratings yet
Realistic or Not
8 pages
PSO Algorithm Based Resource Allocation For OFDM Cognitive Radio
No ratings yet
PSO Algorithm Based Resource Allocation For OFDM Cognitive Radio
7 pages
Revised CognitiveRadios
No ratings yet
Revised CognitiveRadios
1 page
Cognitive Radio
No ratings yet
Cognitive Radio
6 pages
Welcome To International Journal of Engineering Research and Development (IJERD)
No ratings yet
Welcome To International Journal of Engineering Research and Development (IJERD)
8 pages
Review On Resource Efficient Relay Selection Scheme For Cognitive Radio Network
No ratings yet
Review On Resource Efficient Relay Selection Scheme For Cognitive Radio Network
6 pages
Cognitive Radio (CR)
No ratings yet
Cognitive Radio (CR)
18 pages
Cooperative Radio Network
No ratings yet
Cooperative Radio Network
19 pages
Dynamic Spectrum Allocation
No ratings yet
Dynamic Spectrum Allocation
4 pages
Keysight Technologies: Cognitive Radio Algorithm Development and Testing
No ratings yet
Keysight Technologies: Cognitive Radio Algorithm Development and Testing
13 pages
Cognitive Radio System Analysis Using MATLAB
No ratings yet
Cognitive Radio System Analysis Using MATLAB
4 pages
Congative Radio
No ratings yet
Congative Radio
14 pages
Spectrele Lui Marx - Derrida PDF
100% (1)
Spectrele Lui Marx - Derrida PDF
35 pages
Challenges For Network Aspects of Cognitive Radio: Conference Paper
No ratings yet
Challenges For Network Aspects of Cognitive Radio: Conference Paper
11 pages
Performance Analysis of Energy Detection Algorithm in Cognitive Radio
No ratings yet
Performance Analysis of Energy Detection Algorithm in Cognitive Radio
29 pages
Cognitive Radio System
No ratings yet
Cognitive Radio System
21 pages
Respectable Sins Student Edition Sample
100% (2)
Respectable Sins Student Edition Sample
16 pages
Improving Detection Performance of Cognitive Femtocell Networks
No ratings yet
Improving Detection Performance of Cognitive Femtocell Networks
4 pages
Survey of Cognitive Radio Architectures
No ratings yet
Survey of Cognitive Radio Architectures
6 pages
Cognitive Radio Networks: Seminar Report On
No ratings yet
Cognitive Radio Networks: Seminar Report On
35 pages
Channels Reallocation in Cognitive Radio Networks Based On DNA Sequence Alignment
No ratings yet
Channels Reallocation in Cognitive Radio Networks Based On DNA Sequence Alignment
12 pages
Art:10.1007/s11277 011 0243 5
No ratings yet
Art:10.1007/s11277 011 0243 5
11 pages
AdrianPopescu TalkBucharest June2012
No ratings yet
AdrianPopescu TalkBucharest June2012
6 pages
Department of Electronics & Communication Engineering: - Aeshwer Tyagi
No ratings yet
Department of Electronics & Communication Engineering: - Aeshwer Tyagi
29 pages
Gautama Buddha Was Born in Hela Bima
33% (3)
Gautama Buddha Was Born in Hela Bima
62 pages
Cognitive Radio: Fajar Adityawarman Telecommunication Engineering 2014 Jalan Telekomunikasi No 1 Bandung
No ratings yet
Cognitive Radio: Fajar Adityawarman Telecommunication Engineering 2014 Jalan Telekomunikasi No 1 Bandung
4 pages
The General Scope of This Series Includes, But Is Not Limited To, The Following
No ratings yet
The General Scope of This Series Includes, But Is Not Limited To, The Following
5 pages
Role of Women in Mozart and Puccinis Operas
No ratings yet
Role of Women in Mozart and Puccinis Operas
12 pages
Wafers: Basic Wafer Types
No ratings yet
Wafers: Basic Wafer Types
7 pages
Artificial Neural Networks For Cognitive Radio Network A Survey
No ratings yet
Artificial Neural Networks For Cognitive Radio Network A Survey
6 pages
Analysis of Various Spectrum Sensing Techniques in Cognitive Radio
No ratings yet
Analysis of Various Spectrum Sensing Techniques in Cognitive Radio
5 pages
Cognitive Radio Techniques For Wide Area Networks: William Krenik and Anuj Batra
No ratings yet
Cognitive Radio Techniques For Wide Area Networks: William Krenik and Anuj Batra
4 pages
Text
No ratings yet
Text
3 pages
Software-Defined Radio (SDR)
No ratings yet
Software-Defined Radio (SDR)
5 pages
Cognitive Spectrum Sensing Techniques A Brief Review
No ratings yet
Cognitive Spectrum Sensing Techniques A Brief Review
4 pages
Monitoring and Evaluation
100% (1)
Monitoring and Evaluation
2 pages
Cognitive Radio Networks: Rahul Chumble & S. S. Gundal
No ratings yet
Cognitive Radio Networks: Rahul Chumble & S. S. Gundal
8 pages
Belt Conveyor (V1)
No ratings yet
Belt Conveyor (V1)
45 pages
Kruse
No ratings yet
Kruse
25 pages
Work With Colleagues and Customers
No ratings yet
Work With Colleagues and Customers
35 pages
Information Package: Including Terms & Conditions
No ratings yet
Information Package: Including Terms & Conditions
8 pages
RTES Lecture05
No ratings yet
RTES Lecture05
22 pages
Teip7419 Mo
No ratings yet
Teip7419 Mo
22 pages
The Gomti Riverfront in Lucknow, India: Revitalization of A Cultural Heritage Landscape
No ratings yet
The Gomti Riverfront in Lucknow, India: Revitalization of A Cultural Heritage Landscape
20 pages
Daily Lesson Log: Tle - Icttd9 - 12al - Ic - E - 3
No ratings yet
Daily Lesson Log: Tle - Icttd9 - 12al - Ic - E - 3
4 pages
Geuself Module 3 Solo PDF March 2024
No ratings yet
Geuself Module 3 Solo PDF March 2024
8 pages
Module Body Fluids For Board Exam
No ratings yet
Module Body Fluids For Board Exam
8 pages
Sandeep Julakanti - Resume
No ratings yet
Sandeep Julakanti - Resume
9 pages
Statistical Reasoning For Everyday Life 5th Edition Bennett Test Bank Download
100% (3)
Statistical Reasoning For Everyday Life 5th Edition Bennett Test Bank Download
40 pages
Quotation Dumbwaiter
No ratings yet
Quotation Dumbwaiter
10 pages
Action Plan in English
No ratings yet
Action Plan in English
4 pages
Loan Approval Prediction System Using Machina Learning
No ratings yet
Loan Approval Prediction System Using Machina Learning
4 pages
Dhupguri Report
No ratings yet
Dhupguri Report
11 pages
Vet Pharm Therapeutics - 2020 - Broughton Neiswanger - Pharmacometabolomics With A Combination of PLS DA and Random
No ratings yet
Vet Pharm Therapeutics - 2020 - Broughton Neiswanger - Pharmacometabolomics With A Combination of PLS DA and Random
11 pages
23PGHR023 Final Review Ather
No ratings yet
23PGHR023 Final Review Ather
13 pages
1 Maxwell's Equations in Matter (Integrate With Next Section)
100% (1)
1 Maxwell's Equations in Matter (Integrate With Next Section)
2 pages
Pietro Lunardi
No ratings yet
Pietro Lunardi
5 pages
Chawimawi Ru
No ratings yet
Chawimawi Ru
1 page
The Nexus Between Visioning and Planning
No ratings yet
The Nexus Between Visioning and Planning
2 pages

A Survey On Machine-Learning Techniques in Cognitive Radios

Uploaded by

A Survey On Machine-Learning Techniques in Cognitive Radios

Uploaded by

1136 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 15, NO.

3, THIRD QUARTER 2013

A Survey on Machine-Learning Techniques in

T HE TERM cognitive radio (CR) has been used to refer

• Sensing the • Classifying • Achieving

Fig. 1. An intelligent design can transform the acquired information into

(ANNs), metaheuristic algorithms, hidden Markov models

Fig. 3. Supervised and unsupervised learning approaches for cognitive radios.

Empirical risk Structural risk Non- Non-Markov Gradient

x K-means Gradient-policy Reinforcement Stochastic Game

D. Learning problems in cognitive radio that minimizes the empirical risk:

corresponding MDP, without prior knowledge of the transition

eration among the learning agents. They proposed docitive

IV. F EATURE C LASSIFICATION IN C OGNITIVE R ADIOS

A. Non-parametric unsupervised classification: The Dirichlet

where {φk }K k=1 are the K distinct values of θi ’s and mk is

25 it suitable for identifying unknown signals via unsupervised

is given by:    

where K is the number of output nodes. The update process

In general, suboptimal for

Does not require prior knowledge

Non-parametric Requires large number of iterations,

Learning: DPMM compared to parametric methods

Requires knowledge of different

Suitable for controlling specific

Does not require prior knowledge

Artificial Neural  Suffers from overfitting

 Requires prior knowledge of the

You might also like

is given by:

Artificial Neural Suffers from overfitting

Requires prior knowledge of the